First, that it brings the thinking, theory, and practical knowledge of research in related fields to bear on the retrieval problem. A common approach is to generate a maximumlikelihood model for the entire collection and linearly interpolate the collection model with a maximumlikelihood model for each document to smooth the model. One advantage of this new approach is its statistical foundations. Language modeling is the 3rd major paradigm that we will cover in information retrieval. References and further reading contents index language models for information retrieval a common suggestion to users for coming up with good queries is to think of words that would likely appear in a relevant document, and to use those words as the query. Dependence language model for information retrieval. With this book, he makes two major contributions to the field of information retrieval. Weintegrate the proximityfactor into theunigram language modeling approach in a more systematic and internal way that ismore e. Language modeling approach to information retrieval chengxiang zhai school of computer science carnegie mellon university pittsburgh, pa 152 abstract the language modeling approach to retrieval has been shown to perform well empirically. However, a distinction should be made between generative models, which can in principle be used to. Proceedings of the 21st annual international acm sigir conference on research and development in information retrieval a language modeling approach to information retrieval.
References and further reading contents index language models for information retrieval a common suggestion to users for coming up with good queries is to think of words that would likely appear in a relevant document, and to. We argue that there are two principal contributions of the language modeling approach. In proceedings of the 21st acm sigir conference on research and development in information retrieval, pages 275281. For advanced models,however,the book only provides a high level discussion,thus readers will still. The unigram language models are the most used for ad hoc information retrieval work. A language modeling approach to trec university of. Language modeling for information retrieval the information retrieval series 2003rd edition by w. In the language modeling approach to information retrieval, a multinomial model over terms is estimated for each document d in the collection c to be searched. We investigate effectiveness of three retrieval models lemur supports, especially language modeling approach to information retrieval, combined with. Contributions of language modeling to the theory and. A generative theory of relevance the information retrieval. A study of smoothing methods for language models 1 1. The language modeling approach provides a natural and intuitive means of encoding the context associated with a document. The remainder of the paper further details the synthesis of the inference network and language modeling approaches into a single retrieval model, and shows that this model produces results that are more effective than either the language modeling approach or the inference network approach on their own.
A study of smoothing methods for language models applied to. The infocrystal is both a visual query language and a tool for visualizing retrieval results. A language modeling approach to information retrieval guide. A language modeling approach to trec university of twente. References in textual criticism as language modeling.
The importance of a query term proceedings of the 25th annual international acm sigir conference on research and development in information retrieval, acm press 2002, pp. Incorporating context within the language modeling. Then documents are ranked by the probability that a query q q 1,q m would be observed as a sample from the respective document model, i. Four trec subtasks ad hoc, entry page, adaptive filtering and cross language are used to illustrate the application of language models to dierent information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling nontraditional retrieval problems.
A combination of multiple information retrieval approaches is proposed for the purpose of book recommendation. Language modeling an overview sciencedirect topics. Introduction the study of information retrieval models has a long history. Advances in information retrieval book subtitle 32nd european conference on ir research, ecir 2010, milton keynes, uk, march 2831, 2010. A proximity language model for information retrieval.
Semantic smoothing for the language modeling approach to information retrieval is significant and effective to improve retrieval performance. John lafferty this book contains the first collection of papers addressing recent developments in the design of information retrieval systems using language modeling techniques. Incorporating context within the language modeling approach. While nlp is implicitly usedin stemming and generation of stopword lists for ir, its use in identifying phrases either in documents andor queries is of interest. A great diversity of approaches and methodologyhas been developed, rather than a single uni. The approach extends the basic language modeling approach based on unigram by relaxing the independence assumption. Clusterbased retrieval using language models a statistical language model is a probability distribution over all possible sentences or other linguistic units in a language 15. Lafferty, information retrieval as statistical translation, in proceedings of the 1999 acm sigir conference on research and development in information retrieval, pages 222229, 1999. An abductive, linguistic approach to model retrieval. Results are promising for monolingual retrieval applied on. Wikipediabased semantic smoothing for the language modeling. Introduction the language modeling approach to text retrieval was rst introduced by ponte and croft in 11 and later explored in 8, 5, 1, 15. Four trec subtasks ad hoc, entry page, adaptive filtering and crosslanguage are used to illustrate the application of language models to di erent information retrieval problems. Language modeling lm has become a widely used approach in information retrieval ponte and croft 1998.
In information retrieval contexts, unigram language models are often smoothed to avoid instances where pterm 0. It surveys a wide range of retrieval models based on language modeling and attempts to make connections between this. A language modeling approach to information retrieval jay m. Language modeling is the task of assigning a probability to sentences in a language. Combining the language model and inference network. Ponte and croft, 1998 a language modeling approach to information retrieval zhai and lafferty, 2001 a study of smoothing methods for language models applied to ad hoc information retrieval. Contributions of language modeling to the theory and practice. The basic approach for using language models for ir is to model the query generation process 14. The idea of the language modeling approach to information retrieval is to estimate the language model for a document and then to compute the likelihood that the query would have been generated from the estimated model. In this paper we present the language modeling approach to information retrieval as a toolbox to systematically combine information from di erent sources. Natural language processing and information retrieval.
We extended this framework to match sms queries with crosslanguage faqs. Language modeling for information retrieval bruce croft. Such adefinition is general enough to include an endless variety of schemes. This chapter describes the twentyone language modeling experiments on a variety of trec tasks. In previous methods such as the translation model, individual terms or phrases are used to do semantic mapping. A probabilistic approach to term translation for crosslingual. This paper presents a new dependence language modeling approach to information retrieval. In this paper, we propose a method using language modeling approach to match noisy sms text with right faq. Challenges in information retrieval and language modeling. Language modeling for information retrieval book, 2003. It surveys a wide range of retrieval models based on language modeling and attempts to make connections between this new family of models and traditional retrieval models. The relative simplicity and e ectiveness of the language modeling approach, together with the fact that it leverages statistical methods that have been developed in. Statistical language models for information retrieval a.
Statistical language modeling, or language modeling and lm for short, is the development of probabilistic models that are able to predict the next word in the sequence given the words that precede it. We use the word document as a general term that could also include nontextual information, such as multimedia objects. Contributions of language modeling to the theory and practice of ir 5. Croft, relevance models in information retrieval, in language modeling for information retrieval, w. Given a query q and a document d, we are interested in estimating the. This led to a numberof fruitful trec participations, in which we evaluated the use of a probabilistic modeling approach known as language modeling. Language modeling for information retrieval ebook, 2003. Nov 30, 2008 in general, statistical language models provide a principled way of modeling various kinds of retrieval problems. Feb 08, 2011 introduction to information retrieval by manning, prabhakar and schutze is the. There is a growing discrepancy between the retrieval approach used by existing commercial retrieval systems and the approaches investigated and promoted by a large segment of the information retrieval. Information retrieval and graph analysis approaches for book. The major difference between this book and the first edition is the addition to this text of descriptions of the automated indexing of multimedia documents, as items in information retrieval are now considered to be a combination of text along with graphics.
Language modeling versus other approaches in ir the language modeling approach provides a novel way of looking at the problem of text retrieval, which links it with a lot of recent work in speech and language processing. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. Statistical language models for information retrieval. Four trec subtasks ad hoc, entry page, adaptive filtering and crosslanguage are used to illustrate the application of language models to dierent information retrieval problems. Nlp techniques in query processing and language modeling approach to ir. Zhai c and lafferty j model based feedback in the language modeling approach to information retrieval proceedings of the tenth international conference on information and knowledge management, 403410. The language modeling approach to information retrieval by.
Modelbased feedback in the language modeling approach. Over the decades, many different types of retrieval models have been proposed and tested. A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. As another special case of the risk minimization framework, we derive a kullbackleibler divergence retrieval model that can exploit feedback documents to improve the estimation of query models.
View notes 07notes from csci 5250 at the chinese university of hong kong. In this paper, we present an information retrieval approach that frees the user from knowing the details of the modeling languages used in the repository and helps him retrieve models from other domains that are structurally similar to the one he intends to build. Sanda harabagiu is an assistant professor at southern methodist university. This work is first related to the area of document retrieval models, more specially language models and probabilistic models. Modelbased feedback in the language modeling approach to. Natural language processing and information retrieval is a textbook designed to meet the requirements of engineering students pursuing undergraduate and postgraduate programs in computer science and information technology. It surveys a wide range of retrieval models based on language modeling and attempts to make connections between this new family of models and traditional retrieval. Abstract models of document indexing and document retrieval have been extensively studied. Recent work has begun to develop more sophisticated models and a sys. Wikipediabased semantic smoothing for the language modeling approach to information retrieval.
Combining the language model and inference network approaches. Risk minimization and language modeling in text retrieval. A survey by greengrass 5 on information retrieval includes a comprehensive section on nlp techniques usedin ir. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the. Zhai c and lafferty j modelbased feedback in the language modeling approach to information retrieval proceedings of the tenth international conference on information and knowledge management, 403410. However, the language modeling approach also represents a change to the way probability theory is applied in ad hoc information retrieval and makes. Feedback has so far been dealt with heuristically in the language modeling approach to. In this paper, book recommendation is based on complex users query. Exploiting syntactic structure of queries in a language. This paper presents a novel statistical model for cross. At the time of application, statistical language modeling had been used successfully by the speech recognition community and ponte and croft recognized the value. Language modeling approach to retrieval for sms and faq.
This paper presents an analysis of what language modeling lm is in the context of information retrieval ir. This barcode number lets you verify that youre getting exactly the right version or edition of a book. In particular they disagree with sparck jones et al. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the query as an acyclic, planar, undirected graph. Probabilistic relevance models based on document and query generation 2. Those areas are retrieval models, crosslingual retrieval, web search, user modeling, filtering, topic detection and tracking, classification, summarization, question answering, metasearch, distributed retrieval, multimedia retrieval, information. Hiemstratermspecific smoothing for the language modeling approach to information retrieval. Gentle introduction to statistical language modeling and. Language modeling approach to information retrieval chengxiang zhai school of computer science carnegie mellon university pittsburgh, pa 152 john lafferty school of computer science carnegie mellon university pittsburgh, pa 152 abstract the language modeling approach toretrieval has been shown to perform well empirically.
214 245 1168 1011 1150 25 598 1131 681 192 1292 367 1004 1287 1128 1294 349 468 615 1171 441 940 1357 563 617 790 444 940 1267 920 444 5 357 1025 118 236 1496 1408 223 14