English
Related papers

Related papers: Data-Efficient Information Extraction from Form-Li…

200 papers

This paper introduces a new information extraction model for business documents. Different from prior studies which only base on span extraction or sequence labeling, the model takes into account advantage of both span extraction and…

Computation and Language · Computer Science 2022-05-27 Nguyen Hong Son , Hieu M. Vu , Tuan-Anh D. Nguyen , Minh-Tien Nguyen

Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

Information extraction from copy-heavy documents, characterized by massive volumes of structurally similar content, represents a critical yet understudied challenge in enterprise document processing. We present a systematic framework that…

Computation and Language · Computer Science 2025-10-14 Zilong Wang , Xiaoyu Shen

In this paper, we explore the question of whether large language models can support cost-efficient information extraction from tables. We introduce schema-driven information extraction, a new task that transforms tabular data into…

Computation and Language · Computer Science 2024-11-22 Fan Bai , Junmo Kang , Gabriel Stanovsky , Dayne Freitag , Mark Dredze , Alan Ritter

Techniques for automatically extracting important content elements from business documents such as contracts, statements, and filings have the potential to make business operations more efficient. This problem can be formulated as a…

Computation and Language · Computer Science 2020-02-06 Ruixue Zhang , Wei Yang , Luyun Lin , Zhengkai Tu , Yuqing Xie , Zihang Fu , Yuhao Xie , Luchen Tan , Kun Xiong , Jimmy Lin

Extracting key information from documents represents a large portion of business workloads and therefore offers a high potential for efficiency improvements and process automation. With recent advances in Deep Learning, a plethora of Deep…

Information Retrieval · Computer Science 2025-07-21 Alexander Michael Rombach , Peter Fettke

Over the past few decades, the amount of scientific articles and technical literature has increased exponentially in size. Consequently, there is a great need for systems that can ingest these documents at scale and make their content…

Digital Libraries · Computer Science 2018-05-25 Peter W J Staar , Michele Dolfi , Christoph Auer , Costas Bekas

This paper presents a practical approach to fine-grained information extraction. Through plenty of experiences of authors in practically applying information extraction to business process automation, there can be found a couple of…

Information Retrieval · Computer Science 2020-06-09 Minh-Tien Nguyen , Viet-Anh Phan , Le Thai Linh , Nguyen Hong Son , Le Tien Dung , Miku Hirano , Hajime Hotta

Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA performance recently thanks to the adaptation of Transformer-based language models, which shows the great potential of pre-training methods. In this paper, we…

Artificial Intelligence · Computer Science 2021-07-07 Tuan-Anh D. Nguyen , Hieu M. Vu , Nguyen Hong Son , Minh-Tien Nguyen

The automation of document processing is gaining recent attention due to the great potential to reduce manual work through improved methods and hardware. Neural networks have been successfully applied before - even though they have been…

Computation and Language · Computer Science 2021-06-15 Martin Holeček

Bioinformatics workflows are essential for complex biological data analyses and are often described in scientific articles with source code in public repositories. Extracting detailed workflow information from articles can improve…

Computation and Language · Computer Science 2025-03-11 Clémence Sebe , Sarah Cohen-Boulakia , Olivier Ferret , Aurélie Névéol

Extracting information from documents usually relies on natural language processing methods working on one-dimensional sequences of text. In some cases, for example, for the extraction of key information from semi-structured documents, such…

Computation and Language · Computer Science 2021-06-29 Oliver Bensch , Mirela Popa , Constantin Spille

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language…

Computation and Language · Computer Science 2022-08-16 Ismail Oussaid , William Vanhuffel , Pirashanth Ratnamogan , Mhamed Hajaiej , Alexis Mathey , Thomas Gilles

Processing large amounts of data is an essential problem of the big data era. Most of the data exchange is done via direct communication (using APIs) and well-structured file formats (JSON, XML, EDI, etc.), but a significant portion of the…

Information Retrieval · Computer Science 2020-07-17 Vladimir Bernstein , Andrei Afanassenkov

Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a fundamental process in Information Extraction. Recent works introduce the label knowledge to enhance the text representation by formalizing the…

Computation and Language · Computer Science 2021-11-02 Pan Yang , Xin Cong , Zhenyun Sun , Xingwu Liu

Advances in large language models have notably enhanced the efficiency of information extraction from unstructured and semi-structured data sources. As these technologies become integral to various applications, establishing an objective…

Successful Artificial Intelligence systems often require numerous labeled data to extract information from document images. In this paper, we investigate the problem of improving the performance of Artificial Intelligence systems in…

Information Retrieval · Computer Science 2022-09-27 Bao-Sinh Nguyen , Dung Tien Le , Hieu M. Vu , Tuan Anh D. Nguyen , Minh-Tien Nguyen , Hung Le

Traditional information retrieval (such as that offered by web search engines) impedes users with information overload from extensive result pages and the need to manually locate the desired information therein. Conversely,…

Computation and Language · Computer Science 2019-03-11 Bernhard Kratzwald , Stefan Feuerriegel

Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product…

Artificial Intelligence · Computer Science 2020-10-21 Shivali Agarwal , Shubham Atreja , Vikas Agarwal

Extracting structured information from HTML documents is a long-studied problem with a broad range of applications, including knowledge base construction, faceted search, and personalized recommendation. Prior works rely on a few…

Information Retrieval · Computer Science 2022-08-30 Ritesh Sarkhel , Binxuan Huang , Colin Lockard , Prashant Shiralkar
‹ Prev 1 2 3 10 Next ›