blog




  • Essay / Essay on Information Retrieval - 845

    CHAPTER 2IR AND CLIR METHODOLOGIESInformation retrieval (IR) has become a mature technology for discovering the relevance of information retrieved from different sources, not only in the field of information but also in special areas. In this research work, the search for information is limited to the information available on the Web. This chapter begins with recovery models and techniques used to improve recovery; then it reviews multilingual information retrieval approaches; and finally, it discusses information retrieval methods applied in Telugu.2.1. Information retrieval The term “information retrieval” was first coined by Mooers [6]. After many early studies, such as [7, 8, 9], IR came of age in the mid-1990s. In this section, IR refers to "information retrieval", where queries and information is presented in the same language.2.1.1 The definitionThe research work in [10], “Information retrieval” refers to the technology of “information retrieval of an unstructured type”. nature (text) that meets a need for information from large collections of information available in different sources. The general workflow of information retrieval is illustrated in Figure 2.1, which can be divided into three sections: the first focuses on techniques for preparing information for retrieval; the second presents the algorithms used to analyze user requests and then improve these requests; and the third describes the recovery engine itself. The first step is to collect information from multiple sources, such as online documents, databases, etc. Before indexing the information, several pre-processings are necessary: ​​Generally, the information will have too high a level or too low a frequency. The information is removed from the information at this stage, because it is sparse...... middle of the 'article ......attribution of the weightings of the terms, on which the measurement of the relevance of the query-document depends; iii) the retrieved documents are generally presented in random order, that is to say without classification, because the Boolean model does not provide an estimate of the relevance of the queried document; iv) the size of the subset of documents to be returned is difficult to control; and v) it is difficult, if not impossible, to find a satisfactory middle ground between AND and OR. Salton [Sal86] proposed a compromise by using query wording that is neither too broad nor too narrow. Several studies [Sal82, SFW83] have extended the basic Boolean model to add term weighting and result ranking functionalities.2.1.3.2 The Vector Spatial ModelThe Vector Spatial Model (VSM) [Sal71] uses a ranking algorithm that attempts to classify documents based on the overlap between query terms and document terms [Boo82].