Related papers: Text2Struct: A Machine Learning Pipeline for Minin…

Text2MDT: Extracting Medical Decision Trees from Medical Texts

Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and…

Computation and Language · Computer Science 2024-01-05 Wei Zhu , Wenfeng Li , Xing Tian , Pengfei Wang , Xiaoling Wang , Jin Chen , Yuanbin Wu , Yuan Ni , Guotong Xie

Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction

Event extraction is challenging due to the complex structure of event records and the semantic gap between text and event. Traditional methods usually extract event records by decomposing the complex structure prediction task into multiple…

Computation and Language · Computer Science 2021-06-18 Yaojie Lu , Hongyu Lin , Jin Xu , Xianpei Han , Jialong Tang , Annan Li , Le Sun , Meng Liao , Shaoyi Chen

Analyzing Text Representations under Tight Annotation Budgets: Measuring Structural Alignment

Annotating large collections of textual data can be time consuming and expensive. That is why the ability to train models with limited annotation budgets is of great importance. In this context, it has been shown that under tight annotation…

Computation and Language · Computer Science 2022-10-13 César González-Gutiérrez , Audi Primadhanty , Francesco Cazzaro , Ariadna Quattoni

SPSQL: Step-by-step Parsing Based Framework for Text-to-SQL Generation

Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all…

Computation and Language · Computer Science 2023-05-19 Ran Shen , Gang Sun , Hao Shen , Yiling Li , Liangfeng Jin , Han Jiang

Text2Math: End-to-end Parsing Text into Math Expressions

We propose Text2Math, a model for semantically parsing text into math expressions. The model can be used to solve different math related problems including arithmetic word problems and equation parsing problems. Unlike previous approaches,…

Computation and Language · Computer Science 2019-10-16 Yanyan Zou , Wei Lu

ClinStructor: AI-Powered Structuring of Unstructured Clinical Texts

Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor generalization across clinical settings (e.g.,…

Computation and Language · Computer Science 2025-11-18 Karthikeyan K , Raghuveer Thirukovalluru , David Carlson

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and…

Computation and Language · Computer Science 2017-07-31 Mehdi Allahyari , Seyedamin Pouriyeh , Mehdi Assefi , Saied Safaei , Elizabeth D. Trippe , Juan B. Gutierrez , Krys Kochut

Neural Pipeline for Zero-Shot Data-to-Text Generation

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets…

Computation and Language · Computer Science 2022-03-31 Zdeněk Kasner , Ondřej Dušek

Text-to-Text Pre-Training for Data-to-Text Tasks

We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5, enables simple, end-to-end transformer based models to outperform pipelined neural architectures…

Computation and Language · Computer Science 2021-07-12 Mihir Kale , Abhinav Rastogi

Structured Learning from Partial Annotations

Structured learning is appropriate when predicting structured outputs such as trees, graphs, or sequences. Most prior work requires the training set to consist of complete trees, graphs or sequences. Specifying such detailed ground truth…

Machine Learning · Computer Science 2012-07-03 Xinghua Lou , Fred Hamprecht

Doc2Dict: Information Extraction as Text Generation

Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling model is trained on manually annotated documents to extract relevant spans; then, when a new document arrives, a model predicts spans which are…

Computation and Language · Computer Science 2021-10-12 Benjamin Townsend , Eamon Ito-Fisher , Lily Zhang , Madison May

Mining Unstructured Medical Texts With Conformal Active Learning

The extraction of relevant data from Electronic Health Records (EHRs) is crucial to identifying symptoms and automating epidemiological surveillance processes. By harnessing the vast amount of unstructured text in EHRs, we can detect…

Computation and Language · Computer Science 2025-02-10 Juliano Genari , Guilherme Tegoni Goedert

Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity

End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this…

Computation and Language · Computer Science 2020-11-12 Hamza Harkous , Isabel Groves , Amir Saffari

Multi-utility Learning: Structured-output Learning with Multiple Annotation-specific Loss Functions

Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a…

Computer Vision and Pattern Recognition · Computer Science 2014-06-24 Roman Shapovalov , Dmitry Vetrov , Anton Osokin , Pushmeet Kohli

Learning from Imperfect Annotations

Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…

Machine Learning · Computer Science 2020-04-08 Emmanouil Antonios Platanios , Maruan Al-Shedivat , Eric Xing , Tom Mitchell

Text2Data: Low-Resource Data Generation with Textual Control

Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data…

Computation and Language · Computer Science 2025-01-03 Shiyu Wang , Yihao Feng , Tian Lan , Ning Yu , Yu Bai , Ran Xu , Huan Wang , Caiming Xiong , Silvio Savarese

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene…

Computer Vision and Pattern Recognition · Computer Science 2019-08-23 Minghui Liao , Pengyuan Lyu , Minghang He , Cong Yao , Wenhao Wu , Xiang Bai

Structure Extraction in Task-Oriented Dialogues with Slot Clustering

Extracting structure information from dialogue data can help us better understand user and system behaviors. In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states. However,…

Computation and Language · Computer Science 2022-03-17 Liang Qiu , Chien-Sheng Wu , Wenhao Liu , Caiming Xiong

Automating biomedical data science through tree-based pipeline optimization

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline…

Machine Learning · Computer Science 2016-02-01 Randal S. Olson , Ryan J. Urbanowicz , Peter C. Andrews , Nicole A. Lavender , La Creis Kidd , Jason H. Moore

OmniStruct: Universal Text-to-Structure Generation across Diverse Schemas

The ability of Large Language Models (LLMs) to generate structured outputs that follow arbitrary schemas is crucial to a wide range of downstream tasks that require diverse structured representations of results such as information…

Computation and Language · Computer Science 2025-11-25 James Y. Huang , Wenxuan Zhou , Nan Xu , Fei Wang , Qin Liu , Sheng Zhang , Hoifung Poon , Muhao Chen