English
Related papers

Related papers: Text2Struct: A Machine Learning Pipeline for Minin…

200 papers

Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to build clinical decision support systems. However, the current MDT construction methods rely heavily on time-consuming and…

Computation and Language · Computer Science 2024-01-05 Wei Zhu , Wenfeng Li , Xing Tian , Pengfei Wang , Xiaoling Wang , Jin Chen , Yuanbin Wu , Yuan Ni , Guotong Xie

Event extraction is challenging due to the complex structure of event records and the semantic gap between text and event. Traditional methods usually extract event records by decomposing the complex structure prediction task into multiple…

Computation and Language · Computer Science 2021-06-18 Yaojie Lu , Hongyu Lin , Jin Xu , Xianpei Han , Jialong Tang , Annan Li , Le Sun , Meng Liao , Shaoyi Chen

Annotating large collections of textual data can be time consuming and expensive. That is why the ability to train models with limited annotation budgets is of great importance. In this context, it has been shown that under tight annotation…

Computation and Language · Computer Science 2022-10-13 César González-Gutiérrez , Audi Primadhanty , Francesco Cazzaro , Ariadna Quattoni

Converting text into the structured query language (Text2SQL) is a research hotspot in the field of natural language processing (NLP), which has broad application prospects. In the era of big data, the use of databases has penetrated all…

Computation and Language · Computer Science 2023-05-19 Ran Shen , Gang Sun , Hao Shen , Yiling Li , Liangfeng Jin , Han Jiang

We propose Text2Math, a model for semantically parsing text into math expressions. The model can be used to solve different math related problems including arithmetic word problems and equation parsing problems. Unlike previous approaches,…

Computation and Language · Computer Science 2019-10-16 Yanyan Zou , Wei Lu

Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor generalization across clinical settings (e.g.,…

Computation and Language · Computer Science 2025-11-18 Karthikeyan K , Raghuveer Thirukovalluru , David Carlson

The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and…

Computation and Language · Computer Science 2017-07-31 Mehdi Allahyari , Seyedamin Pouriyeh , Mehdi Assefi , Saied Safaei , Elizabeth D. Trippe , Juan B. Gutierrez , Krys Kochut

In data-to-text (D2T) generation, training on in-domain data leads to overfitting to the data representation and repeating training data noise. We examine how to avoid finetuning pretrained language models (PLMs) on D2T generation datasets…

Computation and Language · Computer Science 2022-03-31 Zdeněk Kasner , Ondřej Dušek

We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5, enables simple, end-to-end transformer based models to outperform pipelined neural architectures…

Computation and Language · Computer Science 2021-07-12 Mihir Kale , Abhinav Rastogi

Structured learning is appropriate when predicting structured outputs such as trees, graphs, or sequences. Most prior work requires the training set to consist of complete trees, graphs or sequences. Specifying such detailed ground truth…

Machine Learning · Computer Science 2012-07-03 Xinghua Lou , Fred Hamprecht

Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling model is trained on manually annotated documents to extract relevant spans; then, when a new document arrives, a model predicts spans which are…

Computation and Language · Computer Science 2021-10-12 Benjamin Townsend , Eamon Ito-Fisher , Lily Zhang , Madison May

The extraction of relevant data from Electronic Health Records (EHRs) is crucial to identifying symptoms and automating epidemiological surveillance processes. By harnessing the vast amount of unstructured text in EHRs, we can detect…

Computation and Language · Computer Science 2025-02-10 Juliano Genari , Guilherme Tegoni Goedert

End-to-end neural data-to-text (D2T) generation has recently emerged as an alternative to pipeline-based architectures. However, it has faced challenges in generalizing to new domains and generating semantically consistent text. In this…

Computation and Language · Computer Science 2020-11-12 Hamza Harkous , Isabel Groves , Amir Saffari

Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a…

Computer Vision and Pattern Recognition · Computer Science 2014-06-24 Roman Shapovalov , Dmitry Vetrov , Anton Osokin , Pushmeet Kohli

Many machine learning systems today are trained on large amounts of human-annotated data. Data annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective,…

Machine Learning · Computer Science 2020-04-08 Emmanouil Antonios Platanios , Maruan Al-Shedivat , Eric Xing , Tom Mitchell

Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data…

Computation and Language · Computer Science 2025-01-03 Shiyu Wang , Yihao Feng , Tian Lan , Ning Yu , Yu Bai , Ran Xu , Huan Wang , Caiming Xiong , Silvio Savarese

Unifying text detection and text recognition in an end-to-end training fashion has become a new trend for reading text in the wild, as these two tasks are highly relevant and complementary. In this paper, we investigate the problem of scene…

Computer Vision and Pattern Recognition · Computer Science 2019-08-23 Minghui Liao , Pengyuan Lyu , Minghang He , Cong Yao , Wenhao Wu , Xiang Bai

Extracting structure information from dialogue data can help us better understand user and system behaviors. In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states. However,…

Computation and Language · Computer Science 2022-03-17 Liang Qiu , Chien-Sheng Wu , Wenhao Liu , Caiming Xiong

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline…

Machine Learning · Computer Science 2016-02-01 Randal S. Olson , Ryan J. Urbanowicz , Peter C. Andrews , Nicole A. Lavender , La Creis Kidd , Jason H. Moore

The ability of Large Language Models (LLMs) to generate structured outputs that follow arbitrary schemas is crucial to a wide range of downstream tasks that require diverse structured representations of results such as information…

Computation and Language · Computer Science 2025-11-25 James Y. Huang , Wenxuan Zhou , Nan Xu , Fei Wang , Qin Liu , Sheng Zhang , Hoifung Poon , Muhao Chen
‹ Prev 1 2 3 10 Next ›