Tuesday, February 16, 2010

Review:Anatomy of a Large-Scale Social Search Engine By Damon, Lion Tamer | Published: February 2, 2010

Recently, I read a paper about Aardwark, a social search engine, which google bought recently. The paradigm shift in the search engine is that the users ask some questions and instead of searching for the answers in documents, the engine searches for people in a user's friends-of-friends list, who are most likely to answer the question and poses the question to them. Aardwark tries to imitate age-old human habit of taking opinion of friends, who we think have some knowledge about the activity, before doing some activity.

The primary steps in Aardvark are
- Labelling and Classifying users : To classify which users like and have knowledge of which topics
- Query Processing : To understand what the users want
- Ranking Function : To select the best resources to provide information.

Main ideas:
When a query comes, then the system tries to define how likely it is for a particular user to answer a query. The proabability of the query being answered by the user i, is the summation over all topics for P(ui/t)*P(t/q)


P(ui/q) = &Sigma &forall t (P(ui/t)P(t/q))

Second metric used is the scoring function,
s(ui,uj,q) = p(ui/uj)* p(ui/q)

The only thing that is calculated for the score S(ui, uj, q) on the fly when the query is generated is P(t/q), everything is pre-calculated and kept.

How to get P(t/ui) ?
-SVM identifies general subject area of the text
- Name entity resolution and tf-idf
- Periodically runs a Topic resolution algorithm for revision.

Connectedness P(ui/uj) can be measured by:
-the number of links
-Profile similarity
-Demographic similarity
-Politeness match

No comments:

Post a Comment