Related papers: Information Extraction - A User Guide

Span-Oriented Information Extraction -- A Unifying Perspective on Information Extraction

Information Extraction refers to a collection of tasks within Natural Language Processing (NLP) that identifies sub-sequences within text and their labels. These tasks have been used for many years to link extract relevant information and…

Computation and Language · Computer Science 2024-03-26 Yifan Ding , Michael Yankoski , Tim Weninger

A Review of Keyphrase Extraction

Keyphrase extraction is a textual information processing task concerned with the automatic extraction of representative and characteristic phrases from a document that express all the key aspects of its content. Keyphrases constitute a…

Computation and Language · Computer Science 2019-07-31 Eirini Papagiannopoulou , Grigorios Tsoumakas

Pattern Matching and Discourse Processing in Information Extraction from Japanese Text

Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of…

Artificial Intelligence · Computer Science 2008-02-03 T. Kitani , Y. Eriguchi , M. Hara

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks,…

Computer Vision and Pattern Recognition · Computer Science 2021-10-26 Peng Zhang , Yunlu Xu , Zhanzhan Cheng , Shiliang Pu , Jing Lu , Liang Qiao , Yi Niu , Fei Wu

IE as Cache: Information Extraction Enhanced Agentic Reasoning

Information Extraction aims to distill structured, decision-relevant information from unstructured text, serving as a foundation for downstream understanding and reasoning. However, it is traditionally treated merely as a terminal…

Computation and Language · Computer Science 2026-04-17 Hang Lv , Sheng Liang , Hongchao Gu , Wei Guo , Defu Lian , Yong Liu , Hao Wang , Enhong Chen

Dictionary based methods for information extraction

In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called "dictionary" of a sequence. Dictionaries are…

Statistical Mechanics · Physics 2009-11-10 A. Baronchelli , E. Caglioti , V. Loreto , E. Pizzi

Assessing the quality of information extraction

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective…

Computation and Language · Computer Science 2024-05-24 Filip Seitl , Tomáš Kovářík , Soheyla Mirshahi , Jan Kryštůfek , Rastislav Dujava , Matúš Ondreička , Herbert Ullrich , Petr Gronat

Approximate Grammar for Information Extraction

In this paper, we present the concept of Approximate grammar and how it can be used to extract information from a documemt. As the structure of informational strings cannot be defined well in a document, we cannot use the conventional…

Computation and Language · Computer Science 2007-05-23 V. Sriram , B. Ravi Sekar Reddy , R. Sangal

Open Information Extraction on Scientific Text: An Evaluation

Open Information Extraction (OIE) is the task of the unsupervised creation of structured information from text. OIE is often used as a starting point for a number of downstream tasks including knowledge base construction, relation…

Computation and Language · Computer Science 2018-08-23 Paul Groth , Michael Lauruhn , Antony Scerri , Ron Daniel

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data era. Most of the data exchange is done via direct communication (using APIs) and well-structured file formats (JSON, XML, EDI, etc.), but a significant portion of the…

Information Retrieval · Computer Science 2020-07-17 Vladimir Bernstein , Andrei Afanassenkov

Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with…

Databases · Computer Science 2018-01-01 Francisco Maturana , Cristian Riveros , Domagoj Vrgoč

A language independent web data extraction using vision based page segmentation algorithm

Web usage mining is a process of extracting useful information from server logs i.e. users history. Web usage mining is a process of finding out what users are looking for on the internet. Some users might be looking at only textual data,…

Information Retrieval · Computer Science 2013-10-25 P YesuRaju , P KiranSree

Information Retrieval Model: A Social Network Extraction Perspective

Future Information Retrieval, especially in connection with the internet, will incorporate the content descriptions that are generated with social network extraction technologies and preferably incorporate the probability theory for…

Information Retrieval · Computer Science 2012-07-17 Mahyuddin K. M. Nasution , Shahrul Azman Noah

Natural Language Processing for Information Extraction

With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and…

Computation and Language · Computer Science 2018-07-09 Sonit Singh

A Semi-automatic Data Extraction System for Heterogeneous Data Sources: A Case Study from Cotton Industry

With the recent developments in digitisation, there are increasing number of documents available online. There are several information extraction tools that are available to extract information from digitised documents. However, identifying…

Information Retrieval · Computer Science 2021-11-08 Richi Nayak , Thirunavukarasu Balasubramaniam , Sangeetha Kutty , Sachindra Banduthilaka , Erin Peterson

Information Extraction Framework to Build Legislation Network

This paper concerns an Information Extraction process for building a dynamic Legislation Network from legal documents. Unlike supervised learning approaches which require additional calculations, the idea here is to apply Information…

Information Retrieval · Computer Science 2020-06-16 Neda Sakhaee , Mark C Wilson

Doc2Dict: Information Extraction as Text Generation

Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling model is trained on manually annotated documents to extract relevant spans; then, when a new document arrives, a model predicts spans which are…

Computation and Language · Computer Science 2021-10-12 Benjamin Townsend , Eamon Ito-Fisher , Lily Zhang , Madison May

Information Extraction from Unstructured data using Augmented-AI and Computer Vision

Information extraction (IE) from unstructured documents remains a critical challenge in data processing pipelines. Traditional optical character recognition (OCR) methods and conventional parsing engines demonstrate limited effectiveness…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Aditya Parikh

An Information Extraction Approach to Prescreen Heart Failure Patients for Clinical Trials

To reduce the large amount of time spent screening, identifying, and recruiting patients into clinical trials, we need prescreening systems that are able to automate the data extraction and decision-making tasks that are typically relegated…

Computation and Language · Computer Science 2016-09-07 Abhishek Kalyan Adupa , Ravi Prakash Garg , Jessica Corona-Cox , Sanjiv. J. Shah , Siddhartha R. Jonnalagadda

Improving Unstructured Data Quality via Updatable Extracted Views

Improving data quality in unstructured documents is a long-standing challenge. Unstructured data, especially in textual form, inherently lacks defined semantics, which poses significant challenges for effective processing and for ensuring…

Databases · Computer Science 2025-02-26 Besat Kassaie , Frank Wm. Tompa