Related papers: A Makefile for Developing Containerized LaTeX Tech…
This paper proposes a procedure to execute external source codes from a LaTeX document and include the calculation outputs in the resulting Portable Document Format (pdf) file automatically. It integrates programming tools into the LaTeX…
The rapid growth of preprint servers has accelerated scientific dissemination but has also shifted the technical burden of manuscript preparation to authors. This challenge is particularly acute in computational research, where manuscripts…
We present the sTeX+ system, a user-driven advancement of sTeX - a semantic extension of LaTeX that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc documents for the Web or…
This document explains how to create or modify an existing LATEX document with commands enabling computations in the HTML5 output: when the reader opens the HTML5 output, he can run a computation in his browser, or modify the command to be…
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and…
This paper presents CodeRefine, a novel framework for automatically transforming research paper methodologies into functional code using Large Language Models (LLMs). Our multi-step approach first extracts and summarizes key text chunks…
Software documentation is an essential but labor intensive task that often requires a dedicated team of developers to ensure coverage and accuracy. Good documentation will help shorten the development cycle and improve the overall team…
In this work, we tackle the problem of structured text generation, specifically academic paper generation in $\LaTeX{}$, inspired by the surprisingly good results of basic character-level language models. Our motivation is using more recent…
In mathematics, LaTeX is the de facto standard to prepare documents, e.g., scientific publications. While some formulae are still developed using pen and paper, more complicated mathematical expressions used more and more often with…
This paper presents our approach to use refactoring techniques together with code generation. Refactoring is particularly useful if not only the generated classes but also the generator itself can be adapted in an automatic fashion. We have…
Lexically constrained text generation is one of the constrained text generation tasks, which aims to generate text that covers all the given constraint lexicons. While the existing approaches tackle this problem using a lexically…
We propose a toolkit to generate structured synthetic documents emulating the actual document production process. Synthetic documents can be used to train systems to perform document analysis tasks. In our case we address the record…
Software Documentation plays a major role in the usage and development of a project. Widespread adoption of open source software projects contributes to larger and faster development of the projects, making it difficult to maintain the…
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table…
We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks that transform any inputs into natural language, such as machine translation, summarization, dialog, content manipulation, and so forth.…
Portable Document Format (PDF) files are dominantly used for storing and disseminating scientific research, legal documents, and tax information. LaTeX is a popular application for creating PDF documents. Despite its advantages, LaTeX is…
The advent of large pre-trained language models has made it possible to make high-quality predictions on how to add or change a sentence in a document. However, the high branching factor inherent to text generation impedes the ability of…
Automated text generation requires a underlying knowledge base from which to generate, which is often difficult to produce. Software documentation is one domain in which parts of this knowledge base may be derived automatically. In this…
Automatic documentation generation tools, or auto docs, are widely used to visualize information about APIs. However, each auto doc tool comes with its own unique representation of API information. In this paper, I use an information…
The traditional web caching is currently limited to static documents only. A page generated on the fly from a server side script may have different contents on different accesses and hence cannot be cached. A number of proposals for…