Related papers: Optimizing Exact String Matching via Statistical A…

A family of fast exact pattern matching algorithms

A family of comparison-based exact pattern matching algorithms is described. They utilize multi-dimensional arrays in order to process more than one adjacent text window in each iteration of the search cycle. This approach leads to a lower…

Data Structures and Algorithms · Computer Science 2016-08-31 Igor O. Zavadskyi

On Tuning the Bad-Character Rule: the Worst-Character Rule

In this note we present the worst-character rule, an efficient variation of the bad-character heuristic for the exact string matching problem, firstly introduced in the well-known Boyer-Moore algorithm. Our proposed rule selects a position…

Data Structures and Algorithms · Computer Science 2010-12-08 Domenico Cantone , Simone Faro

A fast implementation of the good-suffix array for the Boyer-Moore string matching algorithm

String matching is the problem of finding all the occurrences of a pattern in a text. It has been intensively studied and the Boyer-Moore string matching algorithm is probably one of the most famous solution to this problem. This algorithm…

Data Structures and Algorithms · Computer Science 2024-02-27 Thierry Lecroq

A Fast String Matching Algorithm Based on Lowlight Characters in the Pattern

We put forth a new string matching algorithm which matches the pattern from neither the left nor the right end, instead a special position. Comparing with the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm, the new algorithm is…

Data Structures and Algorithms · Computer Science 2014-01-29 Zhengjun Cao , Lihua Liu

A Fast Generic Sequence Matching Algorithm

A string matching -- and more generally, sequence matching -- algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is…

Data Structures and Algorithms · Computer Science 2008-10-02 David R. Musser , Gor V. Nishanov

Exact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently…

Data Structures and Algorithms · Computer Science 2010-10-01 Tobias Marschall , Sven Rahmann

A Boyer-Moore Type Algorithm for Timed Pattern Matching

The timed pattern matching problem is formulated by Ulus et al. and has been actively studied since, with its evident application in monitoring real-time systems. The problem takes as input a timed word/signal and a timed pattern (specified…

Formal Languages and Automata Theory · Computer Science 2018-10-22 Masaki Waga , Takumi Akazaki , Ichiro Hasuo

Consistency of Anchor-based Spectral Clustering

Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a…

Machine Learning · Statistics 2020-06-30 Henry-Louis de Kergorlay , Desmond John Higham

Accelerating the Global Aggregation of Local Explanations

Local explanation methods highlight the input tokens that have a considerable impact on the outcome of classifying the document at hand. For example, the Anchor algorithm applies a statistical analysis of the sensitivity of the classifier…

Machine Learning · Computer Science 2024-01-15 Alon Mor , Yonatan Belinkov , Benny Kimelfeld

A Sea of Words: An In-Depth Analysis of Anchors for Text Data

Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they…

Machine Learning · Statistics 2025-10-22 Gianluigi Lopardo , Frederic Precioso , Damien Garreau

A Novel Algorithm for String Matching with Mismatches

We present an online algorithm to deal with pattern matching in strings. The problem we investigate is commonly known as string matching with mismatches in which the objective is to report the number of characters that match when a pattern…

Data Structures and Algorithms · Computer Science 2016-03-11 Vinodprasad P

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string…

Data Structures and Algorithms · Computer Science 2019-08-19 Simone Faro , Arianna Pavone , Francesco Pio Marino

Text Indexing for Long Patterns using Locally Consistent Anchors

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to…

Data Structures and Algorithms · Computer Science 2024-07-17 Lorraine A. K. Ayad , Grigorios Loukides , Solon P. Pissis

A New Anchor Word Selection Method for the Separable Topic Discovery

Separable Non-negative Matrix Factorization (SNMF) is an important method for topic modeling, where "separable" assumes every topic contains at least one anchor word, defined as a word that has non-zero probability only on that topic. SNMF…

Information Retrieval · Computer Science 2019-05-16 Kun He , Wu Wang , Xiaosen Wang , John E. Hopcroft

A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems

Modern neural networks have greatly improved performance across speech recognition benchmarks. However, gains are often driven by frequent words with limited semantic weight, which can obscure meaningful differences in word error rate, the…

Computation and Language · Computer Science 2026-04-21 Lasse Borgholt , Jakob Havtorn , Christian Igel , Lars Maaløe , Zheng-Hua Tan

Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm

In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words -- i.e., words that only appear (with positive probability) in one topic. Follow-up work has…

Machine Learning · Statistics 2016-11-16 Kejun Huang , Xiao Fu , Nicholas D. Sidiropoulos

Optimal-Hash Exact String Matching Algorithms

String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$-grams. The improvement consists of considering minimal…

Data Structures and Algorithms · Computer Science 2023-03-13 Thierry Lecroq

Stochastic Moving Anchor Algorithms and a Popov's Scheme with Moving Anchor

Since their introduction, anchoring methods in extragradient-type saddlepoint problems have inspired a flurry of research due to their ability to provide order-optimal rates of accelerated convergence in very general problem settings. Such…

Optimization and Control · Mathematics 2025-06-10 James Alcala , Yat Tin Chow , Mahesh Sunkula

Speeding Up String Matching by Weak Factor Recognition

String matching is the problem of finding all the substrings of a text which match a given pattern. It is one of the most investigated problems in computer science, mainly due to its very diverse applications in several fields. Recently,…

Data Structures and Algorithms · Computer Science 2017-07-04 Domenico Cantone , Simone Faro , Arianna Pavone

An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information

In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence…

Computation and Language · Computer Science 2007-05-23 Akshar Bharati , V. Sriram , A. Vamshi Krishna , Rajeev Sangal , S. M. Bendre