English
Related papers

Related papers: String Sampling with Bidirectional String Anchors

200 papers

Minimizers sampling is one of the most widely-used mechanisms for sampling strings. Let $S=S[0]\ldots S[n-1]$ be a string over an alphabet $\Sigma$. In addition, let $w\geq 2$ and $k\geq 1$ be two integers and $\rho=(\Sigma^k,\leq)$ be a…

Data Structures and Algorithms · Computer Science 2025-02-25 Wiktor Zuba , Oded Lachish , Solon P. Pissis

Minimizers sampling is one of the most widely-used mechanisms for sampling strings [Roberts et al., Bioinformatics 2004]. Let $S=S[1]\ldots S[n]$ be a string over a totally ordered alphabet $\Sigma$. Further let $w\geq 2$ and $k\geq 1$ be…

Data Structures and Algorithms · Computer Science 2024-05-08 Hilde Verbeek , Lorraine A. K. Ayad , Grigorios Loukides , Solon P. Pissis

Minimizers are sampling schemes with numerous applications in computational biology. Assuming a fixed alphabet of size $\sigma$, a minimizer is defined by two integers $k,w\ge2$ and a linear order $\rho$ on strings of length $k$ (also…

Data Structures and Algorithms · Computer Science 2025-06-06 Arseny Shur

Minimizer schemes, or just minimizers, are a very important computational primitive in sampling and sketching biological strings. Assuming a fixed alphabet of size $\sigma$, a minimizer is defined by two integers $k,w\ge2$ and a total order…

Combinatorics · Mathematics 2024-11-27 Shay Golan , Arseny M. Shur

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches…

Data Structures and Algorithms · Computer Science 2014-12-04 Szymon Grabowski , Marcin Raniszewski

Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the…

Data Structures and Algorithms · Computer Science 2026-02-11 Diptarka Chakraborty , Rudrayan Kundu , Nidhi Purohit , Aravinda Kanchana Ruwanpathirana

String attractors [STOC 2018] are combinatorial objects recently introduced to unify all known dictionary compression techniques in a single theory. A set $\Gamma\subseteq [1..n]$ is a $k$-attractor for a string $S\in[1..\sigma]^n$ if and…

Data Structures and Algorithms · Computer Science 2020-12-09 Dominik Kempa , Alberto Policriti , Nicola Prezza , Eva Rotenberg

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

Let $S$ be a string of length $n$. In this paper we introduce the notion of \emph{string attractor}: a subset of the string's positions $[1,n]$ such that every distinct substring of $S$ has an occurrence crossing one of the attractor's…

Data Structures and Algorithms · Computer Science 2017-09-20 Nicola Prezza

Min-entropy sampling gives a bound on the min-entropy of a randomly chosen subset of a string, given a bound on the min-entropy of the whole string. K\"onig and Renner showed a min-entropy sampling theorem that holds relative to quantum…

Quantum Physics · Physics 2011-07-18 Jürg Wullschleger

Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the…

Data Structures and Algorithms · Computer Science 2012-11-01 Travis Gagie , Paweł Gawrychowski , Christopher Hoobin , Simon J. Puglisi

In many real-world database systems, a large fraction of the data is represented by strings: sequences of letters over some alphabet. This is because strings can easily encode data arising from different sources. It is often crucial to…

Data Structures and Algorithms · Computer Science 2024-07-17 Lorraine A. K. Ayad , Grigorios Loukides , Solon P. Pissis

We propose novel algorithms for sequence prediction based on ideas from stringology. These algorithms are time and space efficient and satisfy mistake bounds related to particular stringological complexity measures of the sequence. In this…

Formal Languages and Automata Theory · Computer Science 2026-03-31 Vanessa Kosoy

MinMax sampling is a technique for downsampling a real-valued vector which minimizes the maximum variance over all vector components. This approach is useful for reducing the amount of data that must be sent over a constrained network link…

Machine Learning · Computer Science 2024-04-30 Joel Wolfrath , Abhishek Chandra

In experimental design, we are given a large collection of vectors, each with a hidden response value that we assume derives from an underlying linear model, and we wish to pick a small subset of the vectors such that querying the…

Machine Learning · Computer Science 2019-02-05 Michał Dereziński , Kenneth L. Clarkson , Michael W. Mahoney , Manfred K. Warmuth

Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The…

Data Structures and Algorithms · Computer Science 2025-05-01 Florin Manea , Tina Ringleb , Stefan Siemer , Maximilian Winkler

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string…

Data Structures and Algorithms · Computer Science 2019-08-19 Simone Faro , Arianna Pavone , Francesco Pio Marino

The problem of finding a center string that is `close' to every given string arises and has many applications in computational biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring…

Computational Engineering, Finance, and Science · Computer Science 2007-05-23 Ming Li , Bin Ma , Lusheng Wang

We consider compact representations of collections of similar strings that support random access queries. The collection of strings is given by a rooted tree where edges are labeled by an edit operation (inserting, deleting, or replacing a…

Data Structures and Algorithms · Computer Science 2021-02-12 Philip Bille , Inge Li Gørtz

Numbers and numerical vectors account for a large portion of data. However, recently the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely…

Machine Learning · Statistics 2016-02-24 Hitoshi Koyano , Morihiro Hayashida , Tatsuya Akutsu
‹ Prev 1 2 3 10 Next ›