English
Related papers

Related papers: Mining Statistically Significant Substrings using …

200 papers

Given the vast reservoirs of data stored worldwide, efficient mining of data from a large information store has emerged as a great challenge. Many databases like that of intrusion detection systems, web-click records, player statistics,…

Databases · Computer Science 2010-03-09 Sourav Dutta , Arnab Bhattacharya

The longest square subsequence (LSS) problem consists of computing a longest subsequence of a given string $S$ that is a square, i.e., a longest subsequence of form $XX$ appearing in $S$. It is known that an LSS of a string $S$ of length…

Data Structures and Algorithms · Computer Science 2020-07-30 Takafumi Inoue , Shunsuke Inenaga , Hideo Bannai

Frequent pattern mining is widely used to find ``important'' or ``interesting'' patterns in data. While it is not easy to mathematically define such patterns, maximal frequent patterns are promising candidates, as frequency is a natural…

Data Structures and Algorithms · Computer Science 2025-04-08 Giovanni Buzzega , Alessio Conte , Yasuaki Kobayashi , Kazuhiro Kurita , Giulia Punzi

Given a dataset of $n$ user-contributed strings, each of length at most $\ell$, a key problem is how to identify all frequent substrings while preserving each user's privacy. Recent work by Bernardini et al. (PODS'25) introduced a…

Data Structures and Algorithms · Computer Science 2026-03-11 Peaker Guo , Rayne Holland , Hao Wu

Frequent pattern mining is a flagship problem in data mining. In its most basic form, it asks for the set of substrings of a given string $S$ of length $n$ that occur at least $\tau$ times in $S$, for some integer $\tau\in[1,n]$. We…

Data Structures and Algorithms · Computer Science 2025-06-06 Pengxin Bian , Panagiotis Charalampopoulos , Lorraine A. K. Ayad , Manal Mohamed , Solon P. Pissis , Grigorios Loukides

Significant pattern mining is a fundamental task in mining transactional data, requiring to identify patterns significantly associated with the value of a given feature, the target. In several applications, such as biomedicine, basket…

Machine Learning · Computer Science 2024-06-18 Leonardo Pellegrina , Fabio Vandin

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS)…

Data Structures and Algorithms · Computer Science 2020-09-09 Jin Cao , Dewei Zhong

We revisit classic string problems considered in the area of parameterized complexity, and study them through the lens of dynamic data structures. That is, instead of asking for a static algorithm that solves the given instance efficiently,…

Data Structures and Algorithms · Computer Science 2022-05-03 Jędrzej Olkowski , Michał Pilipczuk , Mateusz Rychlicki , Karol Węgrzycki , Anna Zych-Pawlewicz

A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present fast $\mathcal{O}(n\log n)$ time algorithms to compute all…

Data Structures and Algorithms · Computer Science 2026-01-12 Samkith K Jain , Neerja Mhaskar

A classical measure of string comparison is given by the longest common subsequence (LCS) problem on a pair of strings. We consider its generalisation, called the semi-local LCS problem, which arises naturally in many string-related…

Data Structures and Algorithms · Computer Science 2015-03-13 Alexander Tiskin

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In…

Databases · Computer Science 2010-02-08 Adam Kirsch , Michael Mitzenmacher , Andrea Pietracaprina , Geppino Pucci , Eli Upfal , Fabio Vandin

In the longest common substring (LCS) problem, we are given two strings $S$ and $T$, each of length at most $n$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. This is a classical and well-studied…

Data Structures and Algorithms · Computer Science 2018-07-17 Amihood Amir , Panagiotis Charalampopoulos , Solon P. Pissis , Jakub Radoszewski

Statistically significant patterns mining (SSPM) is an essential and challenging data mining task in the field of knowledge discovery in databases (KDD), in which each pattern is evaluated via a hypothesis test. Our study aims to introduce…

Methodology · Statistics 2020-08-26 Thien Q. Tran , Kazuto Fukuchi , Youhei Akimoto , Jun Sakuma

This paper studies the classic problem of finding heavy hitters in the turnstile streaming model. We give the first deterministic linear sketch that has $O(\epsilon^{-2} \log n \cdot \log^*(\epsilon^{-1}))$ rows and answers queries in…

Data Structures and Algorithms · Computer Science 2018-06-13 Yi Li , Vasileios Nakos

In this paper we initiate the study of computing a maximal (not necessarily maximum) repeating pattern in a single input string, where the corresponding problems have been studied (e.g., a maximal common subsequence) only in two or more…

Data Structures and Algorithms · Computer Science 2026-01-21 Mingyang Gong , Adiesha Liyanage , Braeden Sopp , Binhai Zhu

For databases consisting of many text documents, one of the most fundamental data analysis tasks is counting (i) how often a pattern appears as a substring in the database (substring counting) and (ii) how many documents in the collection…

Data Structures and Algorithms · Computer Science 2026-03-27 Giulia Bernardini , Philip Bille , Inge Li Gørtz , Teresa Anna Steiner

The longest common subsequence (LCS) problem is a central problem in stringology that finds the longest common subsequence of given two strings $A$ and $B$. More recently, a set of four constrained LCS problems (called generalized…

Data Structures and Algorithms · Computer Science 2020-01-17 Kohei Yamada , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

In image detection, one problem is to test whether the set, though mostly consisting of uniformly scattered points, also contains a small fraction of points sampled from some (a priori unknown) curve, for example, a curve with…

Applications · Statistics 2020-01-03 Kai Ni , Shanshan Cao , Xiaoming Huo

We study quantum algorithms for several fundamental string problems, including Longest Common Substring, Lexicographically Minimal String Rotation, and Longest Square Substring. These problems have been widely studied in the stringology…

Data Structures and Algorithms · Computer Science 2021-10-22 Shyan Akmal , Ce Jin

Suppose we want to seek the longest common subsequences (LCSs) of two strings as informative patterns that explain the relationship between the strings. The dynamic programming algorithm gives us a table from which all LCSs can be extracted…

Data Structures and Algorithms · Computer Science 2025-05-23 Yoshifumi Sakai
‹ Prev 1 2 3 10 Next ›