Related papers: From Texts to Structured Documents: The Case of He…

Automatically Restructuring Practice Guidelines using the GEM DTD

This paper describes a system capable of semi-automatically filling an XML template from free texts in the clinical domain (practice guidelines). The XML template includes semantic information not explicitly encoded in the text (pairs of…

Artificial Intelligence · Computer Science 2007-06-11 Amanda Bouffier , Thierry Poibeau

Analyse et structuration automatique des guides de bonnes pratiques cliniques : essai d'\'evaluation

Health Practice Guideliens are supposed to unify practices and propose recommendations to physicians. This paper describes GemFrame, a system capable of semi-automatically filling an XML template from free texts in the clinical domain. The…

Artificial Intelligence · Computer Science 2008-12-17 Amanda Bouffier , Thierry Poibeau , Catherine Duclos

Knowledge-guided Text Structuring in Clinical Trials

Clinical trial records are variable resources or the analysis of patients and diseases. Information extraction from free text such as eligibility criteria and summary of results and conclusions in clinical trials would better support…

Computation and Language · Computer Science 2020-01-01 Yingcheng Sun , Kenneth Loparo

Fast, Structured Clinical Documentation via Contextual Autocomplete

We present a system that uses a learned autocompletion mechanism to facilitate rapid creation of semi-structured clinical documentation. We dynamically suggest relevant clinical concepts as a doctor drafts a note by leveraging features from…

Machine Learning · Computer Science 2020-07-31 Divya Gopinath , Monica Agrawal , Luke Murray , Steven Horng , David Karger , David Sontag

Identifying Condition-Action Statements in Medical Guidelines Using Domain-Independent Features

This paper advances the state of the art in text understanding of medical guidelines by releasing two new annotated clinical guidelines datasets, and establishing baselines for using machine learning to extract condition-action pairs. In…

Computation and Language · Computer Science 2017-06-23 Hossein Hematialam , Wlodek Zadrozny

LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain

In the last years' digitalization process, the creation and management of documents in various domains, particularly in Public Administration (PA), have become increasingly complex and diverse. This complexity arises from the need to handle…

Computation and Language · Computer Science 2024-02-26 Emanuele Musumeci , Michele Brienza , Vincenzo Suriani , Daniele Nardi , Domenico Daniele Bloisi

ELMTEX: Fine-Tuning Large Language Models for Structured Clinical Information Extraction. A Case Study on Clinical Reports

Europe's healthcare systems require enhanced interoperability and digitalization, driving a demand for innovative solutions to process legacy clinical data. This paper presents the results of our project, which aims to leverage Large…

Computation and Language · Computer Science 2025-07-09 Aynur Guluzade , Naguib Heiba , Zeyd Boukhers , Florim Hamiti , Jahid Hasan Polash , Yehya Mohamad , Carlos A Velasco

An XML based Document Suite

We report about the current state of development of a document suite and its applications. This collection of tools for the flexible and robust processing of documents in German is based on the use of XML as unifying formalism for encoding…

Computation and Language · Computer Science 2007-05-23 Dietmar Roesner , Manuela Kunze

Structured Semantics from Unstructured Notes: Language Model Approaches to EHR-Based Decision Support

The advent of large language models (LLMs) has opened new avenues for analyzing complex, unstructured data, particularly within the medical domain. Electronic Health Records (EHRs) contain a wealth of information in various formats,…

Information Retrieval · Computer Science 2025-06-10 Wu Hao Ran , Xi Xi , Furong Li , Jingyi Lu , Jian Jiang , Hui Huang , Yuzhuan Zhang , Shi Li

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and…

Computation and Language · Computer Science 2024-01-05 Wei Zhu , Wenfeng Li , Xing Tian , Pengfei Wang , Xiaoling Wang , Jin Chen , Yuanbin Wu , Yuan Ni , Guotong Xie

A High-Quality Multilingual Dataset for Structured Documentation Translation

This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text…

Computation and Language · Computer Science 2020-06-25 Kazuma Hashimoto , Raffaella Buschiazzo , James Bradbury , Teresa Marshall , Richard Socher , Caiming Xiong

From Text to Structure: Using Large Language Models to Support the Development of Legal Expert Systems

Encoding legislative text in a formal representation is an important prerequisite to different tasks in the field of AI & Law. For example, rule-based expert systems focused on legislation can support laypeople in understanding how…

Computation and Language · Computer Science 2023-11-10 Samyar Janatian , Hannes Westermann , Jinzhe Tan , Jaromir Savelka , Karim Benyekhlef

AutoMeTS: The Autocomplete for Medical Text Simplification

The goal of text simplification (TS) is to transform difficult text into a version that is easier to understand and more broadly accessible to a wide variety of readers. In some domains, such as healthcare, fully automated approaches cannot…

Computation and Language · Computer Science 2020-10-22 Hoang Van , David Kauchak , Gondy Leroy

MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

Clinical practice guidelines (CPGs) encode evidence-based decision logic that clinicians apply by evaluating patient variables, conditional criteria, and recommendation rules. However, existing methods often use CPGs as free-text training…

Artificial Intelligence · Computer Science 2026-05-27 Yuhao Shen , Lang Cao , Simo Du , Yuqing Wang , Juexiao Zhou , Hao Peng , Yue Guo

Leveraging LLMs for Structured Data Extraction from Unstructured Patient Records

Manual chart review remains an extremely time-consuming and resource-intensive component of clinical research, requiring experts to extract often complex information from unstructured electronic health record (EHR) narratives. We present a…

Artificial Intelligence · Computer Science 2025-12-17 Mitchell A. Klusty , Elizabeth C. Solie , Caroline N. Leach , W. Vaiden Logan , Lynnet E. Richey , John C. Gensel , David P. Szczykutowicz , Bryan C. McLellan , Emily B. Collier , Samuel E. Armstrong , V. K. Cody Bumgardner

Generating Concise and Readable Summaries of XML Documents

XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird's eye view…

Information Retrieval · Computer Science 2009-10-14 Maya Ramanath , Kondreddi Sarath Kumar , Georgiana Ifrim

Mining Semi-structured Data

The need for discovering knowledge from XML documents according to both structure and content features has become challenging, due to the increase in application contexts for which handling both structure and content information in XML data…

Databases · Computer Science 2015-04-17 Olfa Arfaoui , Minyar Sassi Hidri

Information Extraction Using the Structured Language Model

The paper presents a data-driven approach to information extraction (viewed as template filling) using the structured language model (SLM) as a statistical parser. The task of template filling is cast as constrained parsing using the SLM.…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba , Milind Mahajan

Clinical Text Deduplication Practices for Efficient Pretraining and Improved Clinical Tasks

Despite being a unique source of information on patients' status and disease progression, clinical notes are characterized by high levels of duplication and information redundancy. In general domain text, it has been shown that…

Computation and Language · Computer Science 2023-12-18 Isotta Landi , Eugenia Alleva , Alissa A. Valentine , Lauren A. Lepow , Alexander W. Charney

Automatic Posology Structuration : What role for LLMs?

Automatically structuring posology instructions is essential for improving medication safety and enabling clinical decision support. In French prescriptions, these instructions are often ambiguous, irregular, or colloquial, limiting the…

Computation and Language · Computer Science 2025-06-25 Natalia Bobkova , Laura Zanella-Calzada , Anyes Tafoughalt , Raphaël Teboul , François Plesse , Félix Gaschi