Does Microsoft + Powerset Beat Google?

Posted on July 3, 2008. Filed under: alexiskold | Tags: , , , , |

What can the plan be with Microsoft’s purchase of hot startup Powerset? The 3-year old company, founded by Dr Barney Pell, recently launched a semantic search experience for Wikipedia.

It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer. In this post we look at what the thinking behind the acquisition might be.

Most initial reviews found the Powerset product release underwhelming. Critics appreciated the innovative semantic UI and recognized its potential, but believed it didn’t vastly improve Wikipedia. So in view of the lukewarm reviews, the acquisition by Microsoft was unexpected. The 100M price tag is around 5x the 12M Series A + 8M investment put into the company. Microsoft execs must believe Powerset can be a weapon in its battle with Google.

What Powerset is today

Given a set of unstructured information, Powerset applies Natural Language Processing techniques to extract concepts and the key semantic concepts out of the text. It then builds a semantic index (similar to Google’s) as well as a conceptual graph of relationships between entities. This graph is typically expressed in RDF triples.

One of the Powerset innovations is surfacing of semantics to the user interface. The contextual gadget is overlaid to help navigate the unstructured information.

Many thought Powerset to be a generic semantic search engine, but its first product is limited to Wikipedia. It is not trivial to scale the technology to the entire web.

Why Powerset is Powerful

When semantic technologies emerged a few years ago, people started talking about how semantic web and/or semantic search might be a Google killer. The talk was supported by logic that semantic search can deliver more relevant results because it “knows” the content.

Industry realizes that isn’t the case. Semantic search has no huge advantage over the statistical approach used by Google. We discussed this in the post Semantic Search – Myth and Reality.

What is powerful about Powerset? Precisely that it doesn’t try to search the web as a whole. Right now, the solution works on Wikipedia, but the infrastructure is generic, so any other site could also be enhanced. The contextual outline developed can be used to navigate any content.

Instead of dealing with the whole web, the idea may be firstly to build solutions for specific sites.

Head-on with Google?

Powerset as it is today is no Google killer. At this point only something with huge traction and momentum would stand a chance.

In the search market, Google has a strong hold – potentially stronger if the Yahoo deal goes through. People are conditioned to Google: it’s simple and, yes, imperfect, but it’s good enough and the results are still better than Live Search.

If Microsoft bought Powerset with the goal to incorporate it into Live Search, then it’s likely to be another acquisition to make little impact on the bottom line. In fact, the announcement on the Live Search blog states just that. The number one reason is acquiring talent; the second is the belief that NLP and semantic algorithms will be able to patch holes in today’s search.

Today Powerset brings only interesting technology; it doesn’t bring traction. So what were they thinking up in Redmond? There may be more subtle play, leveraging the fact Powerset works well on knowledge sets like Wikipedia.

Possibly Microsoft plans to deploy Powerset across its own sites, then perhaps incorporate Powerset into Internet Explorer.

Imagine going to Wikipedia and having a semantic overlay on each page. Now imagine scaling this experience across major information sources around the web.

Providing contextual, semantic experience allows Microsoft to retain eyes longer, shaving off the time people spend searching Google.

This is an important point because Google doesn’t make money on search – it makes money on advertising.

Can Microsoft ever beat Google in Advertising?

The real problem Microsoft is seeking to solve is advertising. Until now the web has figured out two fundamentals for advertising – portals and search.

Portals show ads on each page; the more people browse the content, the more ads are shown and the more money is made. The search model emerged as an alternative, now more successful, path to advertising dollars.

With Powerset and other semantic technologies, there’s another model: contextual information exploration overlaid on existing content.

If Microsoft can figure how to keep eyes off Google’s home page, the game will shift dramatically. The browser is one of Microsoft’s most powerful tools – and the default box is Live Search.

If Microsoft wants to win over advertisers, it might just do more with the browser. Incorporating aspects of Powerset’s semantic navigator into the browser by default could be a game changer. This is not a straightforward play. A large company with bureaucracy and execution problems is unlikely to be able to merge semantics into the browser quickly and elegantly.

Conclusion

The Powerset acquisition is an interesting move by Microsoft. This hot semantic startup was on everyone’s radar.

What can the plan be? It is doubtful that Microsoft bought the company just to enhance Live Search. Possibly the plan is to replicate the Wikipedia solution, then incorporate Powerset into Internet Explorer.

That is a bold play requiring exact execution – not the kind Redmond has shown lately.

What do you think Microsoft is going to do with Powerset? What are the other applications of this technology that you can think of?

Read Full Post | Make a Comment ( 2 so far )

Semantic Search: The Myth and Reality

Posted on May 30, 2008. Filed under: alexiskold | Tags: , |

For a few years now people have been talking about semantic search. Any technology that stands a chance to dethrone Google is of great interest to all of us, particularly one that takes advantage of long-awaited and much-hyped semantic technologies. But no matter how much progress has been made, most of us are still underwhelmed by the results. In head-to-head comparisons with Google, the results have not come out much different. What are we doing wrong?

For example, when asked, What is the capital of France? both approaches come back with the correct answer – Paris. Also, a lot of queries that we are used to typing into Google in abbreviated form, come back with similar results if we type them using natural language. Clearly something is off. We all know that semantic technologies are powerful, but how and why? In this post we will show that the problem is that we are asking wrong questions.

The mistake is that semantic search engines present us with Google-like search box and allow us to enter free form queries. So we type the things that we are used to asking – primitive queries. It never occurs to us to type in What actor starred in both Pulp Fiction and Saturday Night Fever? or What two US Senators received donations from a foreign entity? We type simple questions, but this is not where the power of semantic search lies. Lets look at the spectrum of semantic technologies from Google, to SearchMonkey, to Powerset, and Freebase to understand what is going on.

What Problem Are We Trying to Solve?

The first confusion in the space comes from the fact that semantic search is
being positioned as the answer to all possible problems – from modern search, currently dominated by Google, to problems that are computationally impossible. The situation is made more difficult by the fact that right now there is only a thin range of problems where semantic search can clearly do better. This range is complex queries involving inferencing and reasoning over a complex data set.

As shown in the diagram above basic queries are easily handled by Google. Sadly, natural language processing gives little advantage when it comes to this category of problems. Google correctly answers the question about Leonardo Da Vinci’s birthday leaving no opportunities to improve
the search by understanding the nouns and the verbs that user typed in.

Before looking at the problems that are perfect for semantic search, lets look at the hardest problems. These are computationally challenging problems that really have nothing to do with understanding semantics. The misconception has been perpetuated since early days of the Semantic Web that somehow, because we will annotate the web, we will be able to solve these super complex problems. This is simply not true. There are fundamental limits to what we can compute, and a class of problems that have an exponential number of possible solutions is not going to be magically solved because we represent data as RDF.

The good news is that there is a set of problems that are great for semantic search. These are the problems we have been solving so wonderfully with relational database. Way too often we forget that semantic technologies are here to help us represent relational data spread over the entire web – so it should be no surprise to us that it is relational queries that semantic search engines would excel at.

The Spectrum of Semantic Search Players

But semantic search is not just about the questions that we are asking. Because the web is just a bunch of unstructured HTML pages, semantic search is also about the underlying data. At its most structured extreme we find Freebase – the semantic database of everything. Freebase is accessible via free text search, but more importantly via MQL (Metaweb Query Language). MQL is essentially JSON with wildcards. Using it you can construct any query against Freebase and the result will be the same query with answers filled in.

Powerset, in a way, is just a relational database. It operates against certain, structured information. On the other end of the spectrum is Google, which is all about statistical frequencies and very little semantics. The recently launched SearchMonkey from Yahoo! is an interesting twist. It does not add anything to the result set, but instead uses semantic annotations to present a richer, more interactive and useful user interface.

Companies like Hakia and Powerset are probably working the hardest. These companies are trying to simultaneously build Freebase-like structures on the fly and then do natural language queries on top of them. The difference is that Hakia is using (likely similar) technology to query over the entire web, while Powerset has (probably shrewdly) chosen to restrict the search to Wikipedia.

Are Hakia, Powerset and Freebase All That Different?

This analysis brings up a question – which of these technologies are different and which are essentially the same? Lets get the easy one down first. Yahoo!’s SearchMonkey is no different from Google or any other search, as far as the core search technology is concerned. The difference is simply in the presentation layer. SearchMonkey is smart about creating a better user experience by letting publishers present the search results to the users in the best possible way.

But when it comes to Hakia, Powerset and Freebase the situation is much more complicated.
On the surface all these products are different – Hakia lets you search the whole web, Powerset is restricted to Wikipedia (and Freebase!) and Freebase itself has two search interfaces – the search box and query language. Here is the problem – the natural language interface has nothing to do with the underlying data representation.

The fact is that all of these semantic search technologies allow people to type in arbitrarily complex questions and then interpret these queries and execute them against their databases. Fundamentally, Hakia, Powerset, and Freebase are databases. Fundamentally, all of them have some kind of Natural Language Processing that translates the question into a canonical query over the database.

To gain insight into all of this, think about Freebase and its query language MQL. Unlike natural language, which allows all sorts of constructs, MQL is non-ambiguous. This JSON-like language allows users to construct precise statements against Freebase. The fact that Powerset allows natural language queries does not say that inside Powerset there is a database. For sure, though, there is a similar kind of database as there is beneath the Freebase search box. What is really different about Freebase and Powerset is the data gathering approach and user experience.

Back to the Future: It’s All About UI

Probably the most striking revelation about the semantic search space is User Interface. First, to go on the tangent, Powerset got it right by realizing that semantics needs to be surfaced in the UI. After a user searches Powerset, a contextual gadget, aware of the semantics of the results, helps the user complete the
search experience.

Yet the biggest mistake that I think Powerset is making is also in the UI. The search box that everyone is familiar with via traditional web search engines needs to go. Having a simplistic search interface hurts Powerset and Hakia, and to a lesser extent Freebase, which is not positioning itself as generic search.

Think about the recent launch of Powerset. The company released a vastly better way to interact with one of the most important sources of information on the web – Wikipedia. But what did the critics say? Lets see if this is a Google killer. And the answer to that is “no.”

But what if Powerset restricted what can be searched? What if instead of a search box there was another interface or what if they told users not to look up things that they can find easily on Google? Why is it that new companies are expected to improve on the algorithm that has ruled the web for over a decade? Instead, the expectation should really be to solve the problems that can not be solved by Google today.

Conclusion

Semantic search is an upcoming technology that has set the expectations way too high. We have all been misled into thinking that these technologies are here to dethrone Google by delivering better search results. Neither of those things are true. What is true, however is that semantic search is going to be big and it is going to help us answer questions that we simply cannot answer today – complex, inferencing queries asked over the entire web as if it was a database.

In order for these semantic search technologies to make a dent in the market, they need
to clean up their messaging and most importantly, their user interface. Presenting a search box is both misleading and detrimental, as people associate it with the simplistic questions that Google solves without any problems. To really showcase semantic search, these companies need to come up
with innovative UIs that will help users to understand the power that is being put at their fingers.

As always, please tell us what you think. What should semantic search companies do to gain their place in the marketplace?

Read Full Post | Make a Comment ( 4 so far )

Liked it here?
Why not try sites on the blogroll...

Follow

Get every new post delivered to your Inbox.