English
Related papers

Related papers: Outilex, plate-forme logicielle de traitement de t…

200 papers

This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs). The project…

XML is a standard and universal language for representing information. XML processing is supported by two key frameworks: DOM and SAX. SAX is efficient, but leaves the developer to encode much of the processing. This paper introduces a…

Formal Languages and Automata Theory · Computer Science 2015-06-11 Tony Clark

We present the sTeX+ system, a user-driven advancement of sTeX - a semantic extension of LaTeX that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc documents for the Web or…

Software Engineering · Computer Science 2010-06-24 Andrea Kohlhase , Michael Kohlhase , Christoph Lange

Litrepl is a lightweight text processing tool designed to recognize and evaluate code sections within Markdown or Latex documents. This functionality is useful for both batch document section evaluation and interactive coding within a text…

Software Engineering · Computer Science 2025-01-22 Sergei Mironov

This paper presents Ellogon, a multi-lingual, cross-platform, general-purpose text engineering environment. Ellogon was designed in order to aid both researchers in natural language processing, as well as companies that produce language…

Computation and Language · Computer Science 2007-05-23 Georgios Petasis , Vangelis Karkaletsis , Georgios Paliouras , Ion Androutsopoulos , Constantine D. Spyropoulos

Legal technology is currently receiving a lot of attention from various angles. In this contribution we describe the main technical components of a system that is currently under development in the European innovation project Lynx, which…

In this paper, we summerize the work done on the resources of Modern Greek on the Lexicon-Grammar of verbs. We detail the definitional features of each table, and all changes made to the names of features to make them consistent. Through…

Computation and Language · Computer Science 2011-11-15 Kyriaki Ioannidou , Elsa Tolone

To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of…

Computation and Language · Computer Science 2025-03-05 Hyunbyung Park , Sukyung Lee , Gyoungjin Gim , Yungi Kim , Dahyun Kim , Chanjun Park

Large language models (LLMs) are increasingly touted as powerful tools for automating scientific information extraction. However, existing methods and tools often struggle with the realities of scientific literature: long-context documents,…

This paper introduces Llettuce, an open-source tool designed to address the complexities of converting medical terms into OMOP standard concepts. Unlike existing solutions such as the Athena database search and Usagi, which struggle with…

The development of Large Language Models (LLMs) has predominantly focused on high-resource languages, leaving extremely low-resource languages like Irish with limited representation. This work presents UCCIX, a pioneering effort on the…

Computation and Language · Computer Science 2024-05-24 Khanh-Tung Tran , Barry O'Sullivan , Hoang D. Nguyen

The paper presents the design and development of English-Lithuanian-English dictionarylexicon tool and lexicon database management system for MT. The system is oriented to support two main requirements: to be open to the user and to…

Computation and Language · Computer Science 2011-05-09 G. Barisevičius , B. Tamulynas

Lexicon-Grammar tables constitute a large-coverage syntactic lexicon but they cannot be directly used in Natural Language Processing (NLP) applications because they sometimes rely on implicit information. In this paper, we introduce…

Computation and Language · Computer Science 2010-10-08 Matthieu Constant , Elsa Tolone

In this paper we present a unification-based lexical platform designed for highly inflected languages (like Roman ones). A formalism is proposed for encoding a lemma-based lexical source, well suited for linguistic generalizations. From…

cmp-lg · Computer Science 2016-08-15 José M. Goñi , José C. González

We report about the current state of development of a document suite and its applications. This collection of tools for the flexible and robust processing of documents in German is based on the use of XML as unifying formalism for encoding…

Computation and Language · Computer Science 2007-05-23 Dietmar Roesner , Manuela Kunze

Helix is an open-source, extensible, Python-based software framework to facilitate reproducible and interpretable machine learning workflows for tabular data. It addresses the growing need for transparent experimental data analytics…

We present preliminary results about Legistix, a tool we are developing to automatically consolidate the French and European law. Legistix is based both on regular expressions used in several compound grammars, similar to the successive…

Computation and Language · Computer Science 2023-01-18 Georges-André Silber

Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich…

Computation and Language · Computer Science 2024-06-25 Vincent Perot , Kai Kang , Florian Luisier , Guolong Su , Xiaoyu Sun , Ramya Sree Boppana , Zilong Wang , Zifeng Wang , Jiaqi Mu , Hao Zhang , Chen-Yu Lee , Nan Hua

The eXtensible Markup Language (XML) can be used as data exchange format in different domains. It allows different parties to exchange data by providing common understanding of the basic concepts in the domain. XML covers the syntactic…

Digital Libraries · Computer Science 2012-06-05 Nora Yahia , Sahar A. Mokhtar , AbdelWahab Ahmed

Although many attempts at automated aids for legal drafting have been made, they were based on the construction of a new tool, completely from scratch. This is at least curious, considering that a strong parallelism can be established…

Computers and Society · Computer Science 2011-09-14 Daniel Gorín , Sergio Mera , Fernando Schapachnik
‹ Prev 1 2 3 10 Next ›