Related papers: Information Extraction Using the Structured Langua…

Structured information extraction from complex scientific text with fine-tuned large language models

Intelligently extracting and linking complex scientific information from unstructured text is a challenging endeavor particularly for those inexperienced with natural language processing. Here, we present a simple sequence-to-sequence…

Computation and Language · Computer Science 2022-12-13 Alexander Dunn , John Dagdelen , Nicholas Walker , Sanghoon Lee , Andrew S. Rosen , Gerbrand Ceder , Kristin Persson , Anubhav Jain

Unified Text Structuralization with Instruction-tuned Language Models

Text structuralization is one of the important fields of natural language processing (NLP) consists of information extraction (IE) and structure formalization. However, current studies of text structuralization suffer from a shortage of…

Computation and Language · Computer Science 2023-03-31 Xuanfan Ni , Piji Li , Huayang Li

Extracting Research Instruments from Educational Literature Using LLMs

Large Language Models (LLMs) are transforming information extraction from academic literature, offering new possibilities for knowledge management. This study presents an LLM-based system designed to extract detailed information about…

Information Retrieval · Computer Science 2025-05-29 Jiseung Yoo , Curran Mahowald , Meiyu Li , Wei Ai

Structured Language Modeling for Speech Recognition

A new language model for speech recognition is presented. The model develops hidden hierarchical syntactic-like structure incrementally and uses it to extract meaningful information from the word history, thus complementing the locality of…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba , Frederick Jelinek

From Text to Insight: Large Language Models for Materials Science Data Extraction

The vast majority of materials science knowledge exists in unstructured natural language, yet structured data is crucial for innovative and systematic materials design. Traditionally, the field has relied on manual curation and partial…

Materials Science · Physics 2024-12-03 Mara Schilling-Wilhelmi , Martiño Ríos-García , Sherjeel Shabih , María Victoria Gil , Santiago Miret , Christoph T. Koch , José A. Márquez , Kevin Maik Jablonka

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

Schema-Driven Information Extraction from Heterogeneous Tables

In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into…

Computation and Language · Computer Science 2024-11-22 Fan Bai , Junmo Kang , Gabriel Stanovsky , Dayne Freitag , Mark Dredze , Alan Ritter

Learning to Extract Structured Entities Using Language Models

Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically…

Computation and Language · Computer Science 2024-10-03 Haolun Wu , Ye Yuan , Liana Mikaelyan , Alexander Meulemans , Xue Liu , James Hensman , Bhaskar Mitra

Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents

This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and…

Computation and Language · Computer Science 2022-05-27 Nguyen Hong Son , Hieu M. Vu , Tuan-Anh D. Nguyen , Minh-Tien Nguyen

Dependency Parsing with the Structuralized Prompt Template

Dependency parsing is a fundamental task in natural language processing (NLP), aiming to identify syntactic dependencies and construct a syntactic tree for a given sentence. Traditional dependency parsing models typically construct…

Computation and Language · Computer Science 2025-02-25 Keunha Kim , Youngjoong Ko

Information Extraction in Low-Resource Scenarios: Survey and Perspective

Information Extraction (IE) seeks to derive structured information from unstructured texts, often facing challenges in low-resource scenarios due to data scarcity and unseen classes. This paper presents a review of neural approaches to…

Computation and Language · Computer Science 2024-10-29 Shumin Deng , Yubo Ma , Ningyu Zhang , Yixin Cao , Bryan Hooi

A Span Extraction Approach for Information Extraction on Visually-Rich Documents

Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA performance recently thanks to the adaptation of Transformer-based language models, which shows the great potential of pre-training methods. In this paper, we…

Artificial Intelligence · Computer Science 2021-07-07 Tuan-Anh D. Nguyen , Hieu M. Vu , Nguyen Hong Son , Minh-Tien Nguyen

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural…

Computation and Language · Computer Science 2025-05-23 Bo Li , Gexiang Fang , Wei Ye , Zhenghua Xu , Jinglei Zhang , Hao Cheng , Shikun Zhang

Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with…

Databases · Computer Science 2018-01-01 Francisco Maturana , Cristian Riveros , Domagoj Vrgoč

Leveraging Large Language Models for Web Scraping

Large Language Models (LLMs) demonstrate remarkable capabilities in replicating human tasks and boosting productivity. However, their direct application for data extraction presents limitations due to a prioritisation of fluency over…

Computation and Language · Computer Science 2024-06-13 Aman Ahluwalia , Suhrud Wani

Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as…

Computation and Language · Computer Science 2024-08-30 Zepeng Ding , Ruiyang Ke , Wenhao Huang , Guochao Jiang , Yanda Li , Deqing Yang , Jiaqing Liang

Key Information Extraction From Documents: Evaluation And Generator

Extracting information from documents usually relies on natural language processing methods working on one-dimensional sequences of text. In some cases, for example, for the extraction of key information from semi-structured documents, such…

Computation and Language · Computer Science 2021-06-29 Oliver Bensch , Mirela Popa , Constantin Spille

Language Model Pre-Training with Sparse Latent Typing

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn…

Computation and Language · Computer Science 2022-10-28 Liliang Ren , Zixuan Zhang , Han Wang , Clare R. Voss , Chengxiang Zhai , Heng Ji

Assessing the quality of information extraction

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective…

Computation and Language · Computer Science 2024-05-24 Filip Seitl , Tomáš Kovářík , Soheyla Mirshahi , Jan Kryštůfek , Rastislav Dujava , Matúš Ondreička , Herbert Ullrich , Petr Gronat

Construction of English Resume Corpus and Test with Pre-trained Language Models

Information extraction(IE) has always been one of the essential tasks of NLP. Moreover, one of the most critical application scenarios of information extraction is the information extraction of resumes. Constructed text is obtained by…

Computation and Language · Computer Science 2023-02-07 Chengguang Gan , Tatsunori Mori