Related papers: Application of Generalised sequential crossover of…
In this paper, we propose a new operation, Generalised Sequential Crossover (GSCO) of words, which in some sense an abstract model of crossing over of the chromosomes in the living organisms. We extend GSCO over language $L$ iteratively…
Theory of splicing is an abstract model of the recombinant behaviour of DNAs. In a splicing system, two strings to be spliced are taken from the same set and the splicing rule is from another set. Here we propose a generalised splicing (GS)…
Splicing as a binary word/language operation is inspired by the DNA recombination under the action of restriction enzymes and ligases, and was first introduced by Tom Head in 1987. Shortly thereafter, it was proven that the languages…
We identify a subclass of the regular commutative languages that is closed under the iterated shuffle, or shuffle closure. In particular, it is regularity-preserving on this subclass. This subclass contains the commutative group languages…
We introduce a novel approach for building language models based on a systematic, recursive exploration of skip n-gram models which are interpolated using modified Kneser-Ney smoothing. Our approach generalizes language models as it…
Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages. Despite its widespread use online and recent research trends in this area, research in code-switching presents…
Universal cycles are generalizations of de Bruijn cycles and Gray codes that were introduced originally by Chung, Diaconis, and Graham in 1992. They have been developed by many authors since, for various combinatorial objects such as…
Commutative languages with the semilinear property (SLIP) can be naturally recognized by real-time NLOG-SPACE multi-counter machines. We show that unions and concatenations of such languages can be similarly recognized, relying on -- and…
Systematic Generalization refers to a learning algorithm's ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning…
We prove a general congruence result for bisimilarity in higher-order languages, which generalises previous work to languages specified by a labelled transition system in which programs may occur as labels, and which may rely on operations…
The (bounded) hairpin completion and its iterated versions are operations on formal lan- guages which have been inspired by the hairpin formation in DNA-biochemistry. The paper answers two questions asked in the literature about the…
Circular splicing systems are a formal model of a generative mechanism of circular words, inspired by a recombinant behaviour of circular DNA. Some unanswered questions are related to the computational power of such systems, and finding a…
In this paper, we prove decidability properties and new results on the position of the family of languages generated by (circular) splicing systems within the Chomsky hierarchy. The two main results of the paper are the following. First, we…
Whether language models (LMs) have inductive biases that favor typologically frequent grammatical properties over rare, implausible ones has been investigated, typically using artificial languages (ALs) (White and Cotterell, 2021;…
Splicing systems are generative mechanisms introduced by Tom Head in 1987 to model the biological process of DNA recombination. The computational engine of a splicing system is the "splicing operation", a cut-and-paste binary string…
We describe a generalization of the usual boundary strata classes in the Chow ring of $\overline{\mathcal{M}}_{g,n}$. The generalized boundary strata classes additively span a subring of the tautological ring. We describe a multiplication…
Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations,…
An integer generalized spline is a set of vertex labels on an edge-labeled graph that satisfy the condition that if two vertices are joined by an edge, the vertex labels are congruent modulo the edge label. Foundational work on these…
The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact…
It was recently proved that any Straight-Line Program (SLP) generating a given string can be transformed in linear time into an equivalent balanced SLP of the same asymptotic size. We generalize this proof to a general class of grammars we…