Date |
June 30, 2008 |
Speaker |
Dr. Vo Ngoc Anh, The University of Melbourne, Australia |
Title |
Impact-Based Document Ranking
|
Abstract |
Given a large collection of text documents and a natural
language query q, the principal task of document ranking is to identify
the documents that would answer the query by means of ordering them in
decreasing order of their (likelihood of) relevance to q. The ranking
mechanisms find their applications in a number of practical systems such
as searching on the Web, or searching in large repositories of
scientific articles.
A number of document ranking models have been developed. Amongst them,
the vector space model offers an efficient and effective way to do the
task, although in the last few years other mechanisms such as BM25 and
language modelling seem to perform better in terms of retrieval
effectiveness.
In this talk we describe impact-based retrieval - an approach to
document ranking that combines a simple document-centric view of text,
and fast evaluation strategies that have been developed in connection
with the vector space model. The new method defines the importance of a
term within a document qualitatively rather than quantitatively, and in
doing so eliminates the need for tuning parameters. In addition, the
method supports very fast query processing, with most of the computation
carried out on small integers, and dynamic pruning an effective option.
Experiments on a wide range of TREC data show that the new method is
highly competitive in terms of both retrieval effectiveness and efficiency.
|
|