Related papers: Optimizing Exact String Matching via Statistical A…
A family of comparison-based exact pattern matching algorithms is described. They utilize multi-dimensional arrays in order to process more than one adjacent text window in each iteration of the search cycle. This approach leads to a lower…
In this note we present the worst-character rule, an efficient variation of the bad-character heuristic for the exact string matching problem, firstly introduced in the well-known Boyer-Moore algorithm. Our proposed rule selects a position…
String matching is the problem of finding all the occurrences of a pattern in a text. It has been intensively studied and the Boyer-Moore string matching algorithm is probably one of the most famous solution to this problem. This algorithm…
We put forth a new string matching algorithm which matches the pattern from neither the left nor the right end, instead a special position. Comparing with the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm, the new algorithm is…
A string matching -- and more generally, sequence matching -- algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is…
We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently…
The timed pattern matching problem is formulated by Ulus et al. and has been actively studied since, with its evident application in monitoring real-time systems. The problem takes as input a timed word/signal and a timed pattern (specified…
Anchor-based techniques reduce the computational complexity of spectral clustering algorithms. Although empirical tests have shown promising results, there is currently a lack of theoretical support for the anchoring approach. We define a…
Local explanation methods highlight the input tokens that have a considerable impact on the outcome of classifying the document at hand. For example, the Anchor algorithm applies a statistical analysis of the sensitivity of the classifier…
Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they…
We present an online algorithm to deal with pattern matching in strings. The problem we investigate is commonly known as string matching with mismatches in which the objective is to report the number of characters that match when a pattern…
Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string…
In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to…
Separable Non-negative Matrix Factorization (SNMF) is an important method for topic modeling, where "separable" assumes every topic contains at least one anchor word, defined as a word that has non-zero probability only on that topic. SNMF…
Modern neural networks have greatly improved performance across speech recognition benchmarks. However, gains are often driven by frequent words with limited semantic weight, which can obscure meaningful differences in word error rate, the…
In topic modeling, many algorithms that guarantee identifiability of the topics have been developed under the premise that there exist anchor words -- i.e., words that only appear (with positive probability) in one topic. Follow-up work has…
String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$-grams. The improvement consists of considering minimal…
Since their introduction, anchoring methods in extragradient-type saddlepoint problems have inspired a flurry of research due to their ability to provide order-optimal rates of accelerated convergence in very general problem settings. Such…
String matching is the problem of finding all the substrings of a text which match a given pattern. It is one of the most investigated problems in computer science, mainly due to its very diverse applications in several fields. Recently,…
In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence…