Positional Language Models (PLM)

(Source code download)

This is an implementation of the positional language model for ad hoc information retrieval. Please refer to the following paper for more details of the algorithm:

[1] Yuanhua Lv and ChengXiang Zhai. "Positional Language Models for Information Retrieval". In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pages 299-306, 2009.

The main source code file "PLMRetEval.cpp" is implemented in C++ and works with the Lemur toolkit (currently not supporting Indri search engine). Most of the codes were used in the experiments for our sigir'09 paper. The algorithm has only been tested on Lemur 4.10 (probably it can also work with other versions, but we haven't tested it yet) in a Linux environment, where the index type is "key", built using the BuildIndex application provided by Lemur.

The current version of the algorithm can only "re-rank" result documents retrieved by other retrieval models, e.g., language models + Dirichlet prior smoothing method (as default). Note that we did not change any internal implementation of Lemur. As for an experimental system, we haven't yet put too much effort to improve the efficiency, which could be done easily by using an index with term position information.

The PLMRetEval.param file provides some recommended parameter settings. The PLM-specific parameters include

<!-- Number of documents to be ranked using PLM -->

<!-- Size of the "soft" passage (sigma) -->

<!-- Propagation function: -1 Passage; 0 Gaussian; 1: Cosine; 2: Triangle; 3: Arc; 4: Circle-->

<!-- Jelinek-Mercer Smoothing 0; Dirichlet prior Smoothing 1 -->

<!-- The weight of PLM if we interpolate PLM with the original relevance score -->

<!-- 1: do not use PLM for single-term query; 0: otherwise -->

Other parameters in the PLMRetEval.param file are standard parameters used in Lemur. For example, you can also do a pseudo relevance feedback after re-ranking documents using PLM.

Besides, we support standard Lemur query format, as shown below:

<DOC 301>
<DOC 302>
<DOC 303>

where 301-303 are query topic ids. (Please note that for the above query topics, we have done stemming and stopword removal.)

To run our algorithm, you need to first install the Lemur toolkit. See http://sourceforge.net/apps/trac/lemur/wiki/Compiling and Installing on Linux and Mac OS X for more details regarding compiling and installing Lemur toolkit on Linux and/or Mac OS X. After that, change the "prefix" value in the Makefile file to your installation path.

Finally, you can compile our algorithm and run it like this: PLMRetEval PLMRetEval.param

If you have more questions, please email me (Yuanhua Lv, ylv2@uiuc.edu)