English
Related papers

Related papers: Approximate textual retrieval

200 papers

We present a new efficient method for approximate search in electronic lexica. Given an input string (the pattern) and a similarity threshold, the algorithm retrieves all entries of the lexicon that are sufficiently similar to the pattern.…

Computation and Language · Computer Science 2015-12-04 Stefan Gerdjikov , Stoyan Mihov , Petar Mitankin , Klaus U. Schulz

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic…

Computation and Language · Computer Science 2024-02-01 Parth Sarthi , Salman Abdullah , Aditi Tuli , Shubh Khanna , Anna Goldie , Christopher D. Manning

In this paper, we present the concept of Approximate grammar and how it can be used to extract information from a documemt. As the structure of informational strings cannot be defined well in a document, we cannot use the conventional…

Computation and Language · Computer Science 2007-05-23 V. Sriram , B. Ravi Sekar Reddy , R. Sangal

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For…

Information Retrieval · Computer Science 2020-09-09 Alexander B. Veretennikov

Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine…

cmp-lg · Computer Science 2008-02-03 I. Dan Melamed

The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…

Data Structures and Algorithms · Computer Science 2017-01-31 Ibrahim Chegrane

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of…

cmp-lg · Computer Science 2008-02-03 David A. Evans , Chengxiang Zhai

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for…

Computation and Language · Computer Science 2018-01-22 Goran Glavaš , Marc Franco-Salvador , Simone Paolo Ponzetto , Paolo Rosso

Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both…

Computation and Language · Computer Science 2024-10-03 Charbel Chucri , Rami Azouz , Joachim Ott

Many word-level adversarial attack approaches for textual data have been proposed in recent studies. However, due to the massive search space consisting of combinations of candidate words, the existing approaches face the problem of…

Computation and Language · Computer Science 2022-11-15 Xingyi Zhao , Lu Zhang , Depeng Xu , Shuhan Yuan

We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings -- broadly belonging to the family of edit distances;…

Computation and Language · Computer Science 2021-09-03 Pascal Vaillant

The text retrieval is the task of retrieving similar documents to a search query, and it is important to improve retrieval accuracy while maintaining a certain level of retrieval speed. Existing studies have reported accuracy improvements…

Information Retrieval · Computer Science 2023-11-15 Yuichi Sasazawa , Kenichi Yokote , Osamu Imaichi , Yasuhiro Sogawa

We study strategies of approximate pattern matching that exploit bidirectional text indexes, extending and generalizing ideas of Lam et al. We introduce a formalism, called search schemes, to specify search strategies of this type, then…

Data Structures and Algorithms · Computer Science 2015-09-08 Gregory Kucherov , Kamil Salikhov , Dekel Tsur

Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling…

This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often…

Information Retrieval · Computer Science 2017-07-12 Gregor Wiedemann , Andreas Niekler

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering…

Computation and Language · Computer Science 2019-05-09 Enrique Manjavacas , Brian Long , Mike Kestemont

We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from…

Information Retrieval · Computer Science 2010-08-19 Daniel Karch , Dennis Luxen , Peter Sanders

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we…

Information Retrieval · Computer Science 2020-09-08 Alexander B. Veretennikov

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen…

Computation and Language · Computer Science 2018-04-10 Jingyi Zhang , Masao Utiyama , Eiichro Sumita , Graham Neubig , Satoshi Nakamura

Retrieval-Augmented Generation (RAG) systems commonly use chunking strategies for retrieval, which enhance large language models (LLMs) by enabling them to access external knowledge, ensuring that the retrieved information is up-to-date and…

Computation and Language · Computer Science 2025-07-15 Hai Toan Nguyen , Tien Dat Nguyen , Viet Ha Nguyen
‹ Prev 1 2 3 10 Next ›