Related papers: Approximate textual retrieval

Good parts first - a new algorithm for approximate search in lexica and string databases

We present a new efficient method for approximate search in electronic lexica. Given an input string (the pattern) and a similarity threshold, the algorithm retrieves all entries of the lexicon that are sufficiently similar to the pattern.…

Computation and Language · Computer Science 2015-12-04 Stefan Gerdjikov , Stoyan Mihov , Petar Mitankin , Klaus U. Schulz

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic…

Computation and Language · Computer Science 2024-02-01 Parth Sarthi , Salman Abdullah , Aditi Tuli , Shubh Khanna , Anna Goldie , Christopher D. Manning

Approximate Grammar for Information Extraction

In this paper, we present the concept of Approximate grammar and how it can be used to extract information from a documemt. As the structure of informational strings cannot be defined well in a document, we cannot use the conventional…

Computation and Language · Computer Science 2007-05-23 V. Sriram , B. Ravi Sekar Reddy , R. Sangal

Proximity full-text searches of frequently occurring words with a response time guarantee

Full-text search engines are important tools for information retrieval. In a proximity full-text search, a document is relevant if it contains query terms near each other, especially if the query terms are frequently occurring words. For…

Information Retrieval · Computer Science 2020-09-09 Alexander B. Veretennikov

Automatic Discovery of Non-Compositional Compounds in Parallel Data

Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine…

cmp-lg · Computer Science 2008-02-03 I. Dan Melamed

Approximate String Matching: Theory and Applications (La Recherche Approch\'ee de Motifs : Th\'eorie et Applications)

The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…

Data Structures and Algorithms · Computer Science 2017-01-31 Ibrahim Chegrane

Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of…

cmp-lg · Computer Science 2008-02-03 David A. Evans , Chengxiang Zhai

A Resource-Light Method for Cross-Lingual Semantic Textual Similarity

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for…

Computation and Language · Computer Science 2018-01-22 Goran Glavaš , Marc Franco-Salvador , Simone Paolo Ponzetto , Paolo Rosso

Recursive Abstractive Processing for Retrieval in Dynamic Datasets

Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both…

Computation and Language · Computer Science 2024-10-03 Charbel Chucri , Rami Azouz , Joachim Ott

Generating Textual Adversaries with Minimal Perturbation

Many word-level adversarial attack approaches for textual data have been proposed in recent studies. However, due to the massive search space consisting of combinations of candidate words, the existing approaches face the problem of…

Computation and Language · Computer Science 2022-11-15 Xingyi Zhao , Lu Zhang , Depeng Xu , Shuhan Yuan

Algorithme de recherche approximative dans un dictionnaire fond\'e sur une distance d'\'edition d\'efinie par blocs

We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings -- broadly belonging to the family of edit distances;…

Computation and Language · Computer Science 2021-09-03 Pascal Vaillant

Text Retrieval with Multi-Stage Re-Ranking Models

The text retrieval is the task of retrieving similar documents to a search query, and it is important to improve retrieval accuracy while maintaining a certain level of retrieval speed. Existing studies have reported accuracy improvements…

Information Retrieval · Computer Science 2023-11-15 Yuichi Sasazawa , Kenichi Yokote , Osamu Imaichi , Yasuhiro Sogawa

Approximate String Matching using a Bidirectional Index

We study strategies of approximate pattern matching that exploit bidirectional text indexes, extending and generalizing ideas of Lam et al. We introduce a formalism, called search schemes, to specify search strategies of this type, then…

Data Structures and Algorithms · Computer Science 2015-09-08 Gregory Kucherov , Kamil Salikhov , Dekel Tsur

Typo-Robust Representation Learning for Dense Retrieval

Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling…

Information Retrieval · Computer Science 2023-06-21 Panuthep Tasawong , Wuttikorn Ponwitayarat , Peerat Limkonchotiwat , Can Udomcharoenchaikit , Ekapol Chuangsuwanich , Sarana Nutanong

Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries

This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often…

Information Retrieval · Computer Science 2017-07-12 Gregor Wiedemann , Andreas Niekler

On the Feasibility of Automated Detection of Allusive Text Reuse

The detection of allusive text reuse is particularly challenging due to the sparse evidence on which allusive references rely---commonly based on none or very few shared words. Arguably, lexical semantics can be resorted to since uncovering…

Computation and Language · Computer Science 2019-05-09 Enrique Manjavacas , Brian Long , Mike Kestemont

Improved Fast Similarity Search in Dictionaries

We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from…

Information Retrieval · Computer Science 2010-08-19 Daniel Karch , Dennis Luxen , Peter Sanders

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we…

Information Retrieval · Computer Science 2020-09-08 Alexander B. Veretennikov

Guiding Neural Machine Translation with Retrieved Translation Pieces

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen…

Computation and Language · Computer Science 2018-04-10 Jingyi Zhang , Masao Utiyama , Eiichro Sumita , Graham Neubig , Satoshi Nakamura

Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking

Retrieval-Augmented Generation (RAG) systems commonly use chunking strategies for retrieval, which enhance large language models (LLMs) by enabling them to access external knowledge, ensuring that the retrieved information is up-to-date and…

Computation and Language · Computer Science 2025-07-15 Hai Toan Nguyen , Tien Dat Nguyen , Viet Ha Nguyen