English
Related papers

Related papers: LAMBERT: Layout-Aware (Language) Modeling for info…

200 papers

Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics.…

Computation and Language · Computer Science 2021-07-13 Rafał Powalski , Łukasz Borchmann , Dawid Jurkiewicz , Tomasz Dwojak , Michał Pietruszka , Gabriela Pałka

Document layout comprises both structural and visual (eg. font-sizes) information that is vital but often ignored by machine learning models. The few existing models which do use layout information only consider textual contents, and…

Computation and Language · Computer Science 2021-04-20 Te-Lin Wu , Cheng Li , Mingyang Zhang , Tao Chen , Spurthi Amba Hombaiah , Michael Bendersky

Building document-grounded dialogue systems have received growing interest as documents convey a wealth of human knowledge and commonly exist in enterprises. Wherein, how to comprehend and retrieve information from documents is a…

Computation and Language · Computer Science 2022-07-15 Zhenyu Zhang , Bowen Yu , Haiyang Yu , Tingwen Liu , Cheng Fu , Jingyang Li , Chengguang Tang , Jian Sun , Yongbin Li

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Ali Furkan Biten , Ron Litman , Yusheng Xie , Srikar Appalaraju , R. Manmatha

This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model…

Computation and Language · Computer Science 2026-02-04 Gaye Colakoglu , Gürkan Solmaz , Jonathan Fürst

Many business documents processed in modern NLP and IR pipelines are visually rich: in addition to text, their semantics can also be captured by visual traits such as layout, format, and fonts. We study the problem of information extraction…

Computation and Language · Computer Science 2020-05-25 Mengxi Wei , Yifan He , Qiong Zhang

Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language…

Computation and Language · Computer Science 2022-08-16 Ismail Oussaid , William Vanhuffel , Pirashanth Ratnamogan , Mhamed Hajaiej , Alexis Mathey , Thomas Gilles

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Making an informed choice of pre-trained language model (LM) is critical for performance, yet environmentally costly, and as such widely underexplored. The field of Computer Vision has begun to tackle encoder ranking, with promising forays…

Computation and Language · Computer Science 2022-06-13 Max Müller-Eberstein , Rob van der Goot , Barbara Plank

Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document hoping to understand its structure and layout. Models such as LayoutLM (and its subsequent iterations) can understand…

Computation and Language · Computer Science 2025-01-13 Pablo Melendez , Clemens Havas

This paper introduces a deep learning model tailored for document information analysis, emphasizing document classification, entity relation extraction, and document visual question answering. The proposed model leverages transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Tofik Ali , Partha Pratim Roy

The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to…

Machine Learning · Computer Science 2021-12-24 Jongyun Choi , Hyesoo Kong , Hwamook Yoon , Heung-Seon Oh , Yuchul Jung

Modeling and leveraging layout reading order in visually-rich documents (VrDs) is critical in document intelligence as it captures the rich structure semantics within documents. Previous works typically formulated layout reading order as a…

Computation and Language · Computer Science 2024-10-01 Chong Zhang , Yi Tu , Yixi Zhao , Chenshu Yuan , Huan Chen , Yue Zhang , Mingxu Chai , Ya Guo , Huijia Zhu , Qi Zhang , Tao Gui

Automated resume information extraction is critical for scaling talent acquisition, yet its real-world deployment faces three major challenges: the extreme heterogeneity of resume layouts and content, the high cost and latency of large…

Computation and Language · Computer Science 2025-10-14 Fanwei Zhu , Jinke Yu , Zulong Chen , Ying Zhou , Junhao Ji , Zhibo Yang , Yuxue Zhang , Haoyuan Hu , Zhenghao Liu

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the…

Information Retrieval · Computer Science 2022-10-18 Minghan Li , Diana Nicoleta Popa , Johan Chagnon , Yagmur Gizem Cinar , Eric Gaussier

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the…

Computation and Language · Computer Science 2022-03-01 Jiapeng Wang , Lianwen Jin , Kai Ding

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. However, previous studies have typically…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Jiawei Wang , Kai Hu , Qiang Huo

Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models…

Computation and Language · Computer Science 2022-04-06 Teakgyu Hong , Donghyun Kim , Mingi Ji , Wonseok Hwang , Daehyun Nam , Sungrae Park
‹ Prev 1 2 3 10 Next ›