open calais

Semantic Web Patterns: A guide to Semantic Technology

Posted on March 25, 2008. Filed under: adaptiveblue, blueorganizer, freebase, open calais, reuters, semantic web, smartlinks, snap, twine | Tags: |

In this article, we’ll analyze the trends and technologies that power the Semantic Web. We’ll identify patterns that are beginning to emerge, classify the different trends, and peak into what the future holds.

In a recent interview Tim Berners-Lee pointed out that the infrastructure to power the Semantic Web is already here. ReadWriteWeb’s founder, Richard MacManus, even picked it to be the number one trend in 2008. And rightly so. Not only are the bits of infrastructure now in place, but we are also seeing startups and larger corporations working hard to deliver end user value on top of this sophisticated set of technologies.

The Semantic Web means many things to different people, because there are a lot of pieces to it. To some, the Semantic Web is the web of data, where information is represented in RDF and OWL. Some people replace RDF with Microformats. Others think that the Semantic Web is about web services, while for many it is about artificial intelligence – computer programs solving complex optimization problems that are out of our reach. And business people always redefine the problem in terms of end user value, saying that whatever it is, it needs to have simple and tangible applications for consumers and enterprises.

The disagreement is not accidental, because the technology and concepts are broad. Much is possible and much is to be imagined.

1. Bottom-Up and Top-Down

We have written a lot about the different approaches to the Semantic Web – the classic bottom-up approach and the new top-down one. The bottom-up approach is focused on annotating information in pages, using RDF, so that it is machine readable. The top-down approach is focused on leveraging information in existing web pages, as-is, to derive meaning automatically. Both approaches are making good progress.

A big win for the bottom-up approach was recent announcement from Yahoo! that their search engine is going to support RDF and microformats. This is a win-win-win for publishers, for Yahoo!, and for customers – publishers now have an incentive to annotate information because Yahoo! Search will be taking advantage of it, and users will then see better, more precise results.

Another recent win for the bottom-up approach was the announcement of the Semantify web service from Dapper (previous coverage). This offering will enable publishers to add semantic annotations to existing web pages. The more tools like Semantify that pop up, the easier it will be for publishers to annotate pages. Automatic annotation tools combined with the incentive to annotate the pages is going to make the bottom-up approach more compelling.

But even if the tools and incentive exists, to make the bottom-up approach widespread is difficult. Today, the magic of Google is that it can understand information as is, without asking people to fully comply with W3C standards of SEO optimization techniques. Similarly, top-down semantic tools are focused on dealing with imperfections in existing information. Among them are the natural language processing tools that do entity extraction – such as the Calais and TextWise APIs that recognize people, companies, places, etc. in documents; vertical search engines, like ZoomInfo and Spock, which mine the web for people; technologies like Dapper and BlueOrganizer, which recognize objects in web pages; and Yahoo! Shortcuts, Snap and SmartLinks, which recognize objects in text and links.

[Disclosure: Alex Iskold is founder and CEO of AdaptiveBlue, which makes BlueOrganizer and SmartLinks.]

Top-down technologies are racing forward despite imperfect information. And, of course, they benefit from the bottom-up annotations as well. The more annotations there are, the more precise top-down technologies will get – because they will be able to take advantage of structured information as well.

2. Annotation Technologies: RDF, Microformats, and Meta Headers

Within the bottom-up approach to annotation of data, there are several choices for annotation. They are not equally powerful, and in fact each approach is a tradeoff between simplicity and completeness. The most comprehensive approach is RDF – a powerful, graph-based language for declaring things, and attributes and relationships between things. In a simplistic way, one can think of RDF as the language that allows expressing truths like: Alex IS human (type expression), Alex HAS a brain (attribute expression), and Alex IS the father of Alice, Lilly, and Sofia (relationship expression). RDF is powerful, but because it is highly recursive, precise, and mathematically sound, it is also complex.

(more…)

Read Full Post | Make a Comment ( 2 so far )

Liked it here?
Why not try sites on the blogroll...

Follow

Get every new post delivered to your Inbox.