When it comes to common searches that repeat millions of times like “Britney Spears” or “Hybrid Cars,” returning the most appropriate results, or advertisements, is not difficult. But what about queries that are exceptionally rare and may never repeat more than a single time? Clearly, these queries are infinitely harder for the search engine to understand.
Andrei Broder, Yahoo! Research Fellow and Vice President of Search Technology and Computational Advertising, and a team of Yahoo! researchers set out to tackle this problem. Their work is outlined in a paper called Robust Classification of Rare Queries Using Web Knowledge, that appeared in SIGIR 2007.
To address the problem, the Yahoo! team proposed a methodology for using search results, as well as information available on the Web, as a source of external knowledge. To this end, they sent rare queries to a search engine and assumed that a majority of the highest-ranking search results were relevant to the query. Categorizing these results allowed the team to classify the original query with high accuracy.
The results definitively confirmed that using the Web as a repository of world knowledge contributes valuable information about the query, and aids in its correct classification. “We discovered the best source of information to understand what these rare queries are about is to look at the search results,” Broder explains. “If you look at each returned page as a vote on what the query is about, you find that the majority tends to be correct even though many individual pages are wrong.”