Language Model Ranking

Written by: Zhaozhen Liang(zhaozhen)

Goal

The Goal of this project is to make a mini search engine program over a movie folder using language model(which contains 2000 file/document about movie reviews).

"Instead of overtly modeling the probability P(R=1|q,d) of relevance of a document d to query q, as in the traditional probabilistic approach to IR, the basic language modeling approach instead builds a probabilistic language model Md from each document d, and ranks documents based on the probability of the model generating the query: P(q|Md)."[p237,Introduction to Information Retrieval, By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze © 2008 Cambridge University Press.]

Intuition
Good queries: contain words likely appear in a relevant document
Key Idea
The language modeling approach to IR directly models that idea: a document is a good match to a query if the document model is likely to generate the query, which will in turn happen if the document contains the query words often. The Basic language modeling approach builds a probilistic language model Md from each document d, and ranks documents based on the probability of the model generating the query: P(q|Md).

Reference
[Introduction to Information Retrieval, By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze © 2008 Cambridge University Press.]

Libraries

All libraries are listed in requirements.txt
Please run following command to install all the library that are needed:

First make the bash script executable by:

> chmod +x download.sh

run the script by:

> ./download.sh

Parts

There are two parts in this project.
The first part is create_index: which take in the input source directory(which contain bunch of files/documents) and collect some statistic information that is needed for later ranking computations.
The second part is lm_query(language model query): which uses the index statistic information(language model) that is collected in part1(create index) to perform the language model ranking.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
create_index		create_index
lm_query		lm_query
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download.sh		download.sh
downloadNLTK.py		downloadNLTK.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Model Ranking

Written by: Zhaozhen Liang(zhaozhen)

Goal

Libraries

Parts

About

Releases

Packages

Languages

License

ExploreNcrack/Language-Model-Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Language Model Ranking

Written by: Zhaozhen Liang(zhaozhen)

Goal

Libraries

Parts

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages