This is an implementation of the
positional language model for ad hoc information retrieval. Please refer to the
following paper for more details of the algorithm:
[1] Yuanhua Lv and ChengXiang Zhai. "Positional
Language Models for Information Retrieval". In Proceedings of the 32nd
Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval (SIGIR'09), pages 299-306, 2009.
The main source code file "PLMRetEval.cpp" is
implemented in C++ and works with the
Lemur toolkit (currently not supporting Indri search engine). Most of the
codes were used in the experiments for our sigir'09 paper. The algorithm has
only been tested on Lemur 4.10 (probably it can also work with other
versions, but we haven't tested it yet) in a Linux environment, where the index
type is "key", built using the BuildIndex application provided by Lemur.
The current version of the algorithm can only "re-rank" result documents retrieved by other retrieval models, e.g., language models + Dirichlet prior smoothing method (as default). Note that we did not change any internal implementation of Lemur. As for an experimental system, we haven't yet put too much effort to improve the efficiency, which could be done easily by using an index with term position information.
The
PLMRetEval.param file provides some recommended parameter settings. The
PLM-specific parameters include
<!-- Number of documents to be ranked using PLM -->
<RerankedDocCount>2000</RerankedDocCount>
<!-- Size of the "soft" passage (sigma) -->
<PLMSigma>175</PLMSigma>
<!-- Propagation function: -1 Passage; 0 Gaussian; 1: Cosine; 2: Triangle; 3:
Arc; 4: Circle-->
<PropFunction>0</PropFunction>
<!-- Jelinek-Mercer Smoothing 0; Dirichlet prior Smoothing 1 -->
<PLMSmoothMethod>1</PLMSmoothMethod>
<PLMJMLambda>0.5</PLMJMLambda>
<PLMDirPrior>500</PLMDirPrior>
<!-- The weight of PLM if we interpolate PLM with the original relevance score
-->
<PLMCoefficient>1.0</PLMCoefficient>
<!-- 1: do not use PLM for single-term query; 0: otherwise -->
<IgnoreSingleTermQuery>0</IgnoreSingleTermQuery>
Other parameters in the PLMRetEval.param file are
standard parameters used in Lemur. For example, you can also do a pseudo
relevance feedback after re-ranking documents using PLM.
Besides, we support standard
Lemur query format, as shown below:
<DOC 301>
intern
organ
crime
</DOC>
<DOC 302>
poliomyel
post
polio
</DOC>
<DOC 303>
hubbl
telescop
achiev
</DOC>
where 301-303 are query topic ids.
(Please note that for the above query topics, we have done stemming and stopword
removal.)
To run our algorithm, you need to first install the
Lemur toolkit. See
http://sourceforge.net/apps/trac/lemur/wiki/Compiling and Installing on Linux
and Mac OS X for more details regarding compiling and installing Lemur
toolkit on Linux and/or Mac OS X. After that, change the "prefix"
value in the Makefile file to your
installation path.
Finally, you can compile our algorithm
and run it like this: PLMRetEval PLMRetEval.param
If you have more questions, please email me (Yuanhua
Lv, ylv2@uiuc.edu)