English
Related papers

Related papers: Implementing Suffix Array Algorithm Using Apache B…

200 papers

The suffix array is the key to efficient solutions for myriads of string processing problems in different applications domains, like data compression, data mining, or Bioinformatics. With the rapid growth of available data, suffix array…

Data Structures and Algorithms · Computer Science 2016-10-11 Timo Bingmann , Simon Gog , Florian Kurpicz

The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix…

Data Structures and Algorithms · Computer Science 2013-03-27 Simon Gog , Alistair Moffat , J. Shane Culpepper , Andrew Turpin , Anthony Wirth

The Suffix Array is a classic text index enabling on-line pattern matching queries via simple binary search. The main drawback of the Suffix Array is that it takes linear space in the text's length, even if the text itself is extremely…

Data Structures and Algorithms · Computer Science 2025-03-19 Davide Cenzato , Lore Depuydt , Travis Gagie , Sung-Hwan Kim , Giovanni Manzini , Francisco Olivares , Nicola Prezza

We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the…

Data Structures and Algorithms · Computer Science 2013-04-03 Veronica Becher , Alejandro Deymonnaz , Pablo Ariel Heiber

Suffix trees have recently become very successful data structures in handling large data sequences such as DNA or Protein sequences. Consequently parallel architectures have become ubiquitous. We present a novel alphabet-dependent parallel…

Data Structures and Algorithms · Computer Science 2017-04-20 Freeson Kaniwa , Venu Madhav Kuthadi , Otlhapile Dinakenyane , Heiko Schroeder

Suffix Array (SA) is a cardinal data structure in many pattern matching applications, including data compression, plagiarism detection and sequence alignment. However, as the volumes of data increase abruptly, the construction of SA is not…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-16 Hsiang-Huang Wu , Chien-Min Wang , Hsuan-Chi Kuo , Wei-Chun Chung , Jan-Ming Ho

The suffix array is a classic full-text index, combining effectiveness with simplicity. We discuss three approaches aiming to improve its efficiency even more: changes to the navigation, data layout and adding extra data. In short, we show…

Data Structures and Algorithms · Computer Science 2016-07-28 Tomasz Kowalski , Szymon Grabowski , Kimmo Fredriksson , Marcin Raniszewski

With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on…

We present a distributed full-text index for big data applications in a distributed environment. Our index can answer different types of pattern matching queries (existential, counting and enumeration). We perform experiments on inputs up…

Data Structures and Algorithms · Computer Science 2016-12-07 Johannes Fischer , Florian Kurpicz , Peter Sanders

The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known…

Data Structures and Algorithms · Computer Science 2022-08-31 Jannik Olbrich , Enno Ohlebusch , Thomas Büchler

Much research has been devoted to optimizing algorithms of the Lempel-Ziv (LZ) 77 family, both in terms of speed and memory requirements. Binary search trees and suffix trees (ST) are data structures that have been often used for this…

Data Structures and Algorithms · Computer Science 2016-11-17 Artur Ferreira , Arlindo Oliveira , Mario Figueiredo

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a…

Data Structures and Algorithms · Computer Science 2024-04-16 Zsuzsanna Lipták , Francesco Masillo , Simon J. Puglisi

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…

Data Structures and Algorithms · Computer Science 2014-03-11 Travis Gagie , Giovanni Manzini , Daniel Valenzuela

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors / false negatives), or scale poorly on large datasets. This paper…

Databases · Computer Science 2015-01-23 Ashley Mae Conard , Stephanie Dodson , Jeremy Kepner , Darrell Ricke

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array…

Data Structures and Algorithms · Computer Science 2013-07-05 Sanguthevar Rajasekaran , Marius Nicolae

Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of…

Machine Learning · Computer Science 2018-05-11 Luca Venturini , Elena Baralis , Paolo Garza

A deterministic BSP algorithm for constructing the suffix array of a given string is presented, based on a technique which we call accelerated sampling. It runs in optimal O(n/p) local computation and communication, and requires a near…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-26 Matthew Felice Pace , Alexander Tiskin

In the big data era researchers face a series of problems. Even standard approaches/methodologies, like linear regression, can be difficult or problematic with huge volumes of data. Traditional approaches for regression in big datasets may…

Methodology · Statistics 2024-11-13 Vasilis Chasiotis , Dimitris Karlis

We study the fundamental question of how efficiently suffix array entries can be accessed when the array cannot be stored explicitly. The suffix array $SA_T[1..n]$ of a text $T$ of length $n$ encodes the lexicographic order of its suffixes…

Data Structures and Algorithms · Computer Science 2025-10-23 Dominik Kempa , Tomasz Kociumaka

Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data…

Databases · Computer Science 2018-10-16 Alejandro Alcalde-Barros , Diego García-Gil , Salvador García , Francisco Herrera
‹ Prev 1 2 3 10 Next ›