Related papers: Automated Pattern Detection--An Algorithm for Cons…
We present an efficient algorithm for checking language equivalence of states in top-down deterministic finite tree automata (DFTAs). Unlike string automata, tree automata operate over hierarchical structures, posing unique challenges for…
Minimizing the size of finite automata is a fundamental problem in theoretical computer science. Beyond standard minimization, further reductions can be achieved by decomposing an automaton into smaller components whose languages combine…
The problem of constructing optimal factoring automata arises in the context of unification factoring for the efficient execution of logic programs. Given an ordered set of $n$ strings of length $m$, the problem is to construct a trie-like…
Automatic segmentation of text into minimal content-bearing units is an unsolved problem even for languages like English. Spaces between words offer an easy first approximation, but this approximation is not good enough for machine…
The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…
Most current methods for identifying coherent structures in spatially-extended systems rely on prior information about the form which those structures take. Here we present two new approaches to automatically filter the changing…
Speech processing requires very efficient methods and algorithms. Finite-state transducers have been shown recently both to constitute a very useful abstract model and to lead to highly efficient time and space algorithms in this field. We…
Language models (LMs) are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee…
Automata over infinite words, also known as omega-automata, play a key role in the verification and synthesis of reactive systems. The spectrum of omega-automata is defined by two characteristics: the acceptance condition (e.g. B\"uchi or…
The number of parameters in large-scale language models based on transformers is gradually increasing, and the scale of computing clusters is also growing. The technology of quickly mobilizing large amounts of computing resources for…
We investigate a famous decision problem in automata theory: separation. Given a class of language C, the separation problem for C takes as input two regular languages and asks whether there exists a third one which belongs to C, includes…
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem…
Term pattern matching is the problem of finding all pattern matches in a subject term, given a set of patterns. Finding efficient algorithms for this problem is an important direction for research [19]. We present a new set automaton…
In the String Matching in Labeled Graphs (SMLG) problem, we need to determine whether a pattern string appears on a given labeled graph or a given automaton. Under the Orthogonal Vectors hypothesis, the SMLG problem cannot be solved in…
Symmetry reduction is a well-known approach for alleviating the state explosion problem in model checking. Automatically identifying symmetries in concurrent systems, however, is computationally expensive. We propose a symbolic framework…
We consider ways to construct a transducer for a given set of input word to output symbol pairs. This is motivated by the need for representing game playing programs in a low-level mathematical format that can be analyzed by algebraic…
Tandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem…
This paper proposes a new pattern of two dimensional cellular automata linear rules that are used for efficient edge detection of an image. Since cellular automata is inherently parallel in nature, it has produced desired output within a…
Given a pattern string $P$ of length $n$ and a query string $T$ of length $m$, where the characters of $P$ and $T$ are drawn from an alphabet of size $\Delta$, the {\em exact string matching} problem consists of finding all occurrences of…
Context-free grammars (CFGs) are the de-facto formalism for declaratively describing concrete syntax for programming languages and generating parsers. One of the major challenges in defining a desired syntax is ruling out all possible…