Related papers: Random XML sampling the Boltzmann way

Random generation of closed simply-typed $\lambda$-terms: a synergy between logic programming and Boltzmann samplers

A natural approach to software quality assurance consists in writing unit tests securing programmer-declared code invariants. Throughout the literature a great body of work has been devoted to tools and techniques automating this…

Logic in Computer Science · Computer Science 2017-09-14 Maciej Bendkowski , Katarzyna Grygiel , Paul Tarau

Random generation of combinatorial structures: Boltzmann samplers and beyond

The Boltzmann model for the random generation of "decomposable" combinatorial structures is a set of techniques that allows for efficient random sampling algorithms for a large class of families of discrete objects. The usual requirement of…

Data Structures and Algorithms · Computer Science 2011-12-23 Philippe Duchon

Boltzmann samplers for random generation of lambda terms

Randomly generating structured objects is important in testing and optimizing functional programs, whereas generating random $'l$-terms is more specifically needed for testing and optimizing compilers. For that a tool called QuickCheck has…

Data Structures and Algorithms · Computer Science 2014-04-29 Pierre Lescanne

Processing XML for Domain Specific Languages

XML is a standard and universal language for representing information. XML processing is supported by two key frameworks: DOM and SAX. SAX is efficient, but leaves the developer to encode much of the processing. This paper introduces a…

Formal Languages and Automata Theory · Computer Science 2015-06-11 Tony Clark

Learning Restricted Regular Expressions with Interleaving

The advantages for the presence of an XML schema for XML documents are numerous. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Relax NG is a popular and powerful schema language, which…

Databases · Computer Science 2019-05-01 Chunmei Dong , Yeting Li , Haiming Chen

Multi-dimensional Boltzmann Sampling of Languages

This paper addresses the uniform random generation of words from a context-free language (over an alphabet of size $k$), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional…

Data Structures and Algorithms · Computer Science 2010-12-21 Olivier Bodini , Yann Ponty

Information-Theoretic Generative Clustering of Documents

We present {\em generative clustering} (GC) for clustering a set of documents, $\mathrm{X}$, by using texts $\mathrm{Y}$ generated by large language models (LLMs) instead of by clustering the original documents $\mathrm{X}$. Because LLMs…

Machine Learning · Computer Science 2024-12-19 Xin Du , Kumiko Tanaka-Ishii

Uniform random sampling of planar graphs in linear time

This article introduces new algorithms for the uniform random generation of labelled planar graphs. Its principles rely on Boltzmann samplers, as recently developed by Duchon, Flajolet, Louchard, and Schaeffer. It combines the Boltzmann…

Combinatorics · Mathematics 2008-12-18 Eric Fusy

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating…

Computation and Language · Computer Science 2016-02-18 Paul Rodrigues , David Zajic , David Doermann , Michael Bloodgood , Peng Ye

Synthetic Text Generation using Hypergraph Representations

Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse…

Computation and Language · Computer Science 2023-12-05 Natraj Raman , Sameena Shah

An XML based Document Suite

We report about the current state of development of a document suite and its applications. This collection of tools for the flexible and robust processing of documents in German is based on the use of XML as unifying formalism for encoding…

Computation and Language · Computer Science 2007-05-23 Dietmar Roesner , Manuela Kunze

Optimizing XML Compression

The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present…

Databases · Computer Science 2015-05-13 Gregory Leighton , Denilson Barbosa

Random Grammar-based Testing for Covering All Non-Terminals

In the context of software testing, generating complex data inputs is frequently performed using a grammar-based specification. For combinatorial reasons, an exhaustive generation of the data -- of a given size -- is practically impossible,…

Software Engineering · Computer Science 2013-11-27 Alois Dreyfus , Pierre-Cyrille Heam , Olga Kouchnarenko

Sampling from Boltzmann densities with physics informed low-rank formats

Our method proposes the efficient generation of samples from an unnormalized Boltzmann density by solving the underlying continuity equation in the low-rank tensor train (TT) format. It is based on the annealing path commonly used in MCMC…

Machine Learning · Computer Science 2024-12-11 Paul Hagemann , Janina Schütte , David Sommer , Martin Eigel , Gabriele Steidl

How to generate random lambda terms?

We survey several methods of generating large random lambda-terms, focusing on their closed and simply-typed variants. We discuss methods of exact- and approximate-size generation, as well as methods of achieving size-uniform and…

Combinatorics · Mathematics 2020-05-20 Maciej Bendkowski

Polynomial tuning of multiparametric combinatorial samplers

Boltzmann samplers and the recursive method are prominent algorithmic frameworks for the approximate-size and exact-size random generation of large combinatorial structures, such as maps, tilings, RNA sequences or various tree-like…

Combinatorics · Mathematics 2017-10-31 Maciej Bendkowski , Olivier Bodini , Sergey Dovgal

Formal Properties of XML Grammars and Languages

XML documents are described by a document type definition (DTD). An XML-grammar is a formal grammar that captures the syntactic features of a DTD. We investigate properties of this family of grammars. We show that every XML-language…

Discrete Mathematics · Computer Science 2007-05-23 Jean Berstel , Luc Boasson

Automatic compile-time synthesis of entropy-optimal Boltzmann samplers

We present a famework for the automatic compilation of multi-parametric Boltzmann samplers for algebraic data types in Haskell. Our framework uses Template Haskell to synthesise efficient, entropy-optimal samplers generating random…

Software Engineering · Computer Science 2022-08-02 Maciej Bendkowski

Blank Language Models

We propose Blank Language Model (BLM), a model that generates sequences by dynamically creating and filling in blanks. The blanks control which part of the sequence to expand, making BLM ideal for a variety of text editing and rewriting…

Computation and Language · Computer Science 2020-11-18 Tianxiao Shen , Victor Quach , Regina Barzilay , Tommi Jaakkola

A Flexible Structured-based Representation for XML Document Mining

This paper reports on the INRIA group's approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure…

Information Retrieval · Computer Science 2007-05-23 Anne-Marie Vercoustre , Mounir Fegas , Saba Gul , Yves Lechevallier