English
Related papers

Related papers: DOM-LM: Learning Generalizable Representations for…

200 papers

The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and…

Computation and Language · Computer Science 2024-02-29 Hongshen Xu , Lu Chen , Zihan Zhao , Da Ma , Ruisheng Cao , Zichen Zhu , Kai Yu

This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained…

Computation and Language · Computer Science 2024-03-22 Masato Fujitake

Large language models (LLMs) that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to…

Computation and Language · Computer Science 2023-12-12 Faria Huq , Jeffrey P. Bigham , Nikolas Martelaro

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Large Vision-Language Models (LVLMs) have demonstrated strong multimodal reasoning capabilities on long and complex documents. However, their high memory footprint makes them impractical for deployment on resource-constrained edge devices.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-24 Tanveer Hannan , Dimitrios Mallios , Parth Pathak , Faegheh Sardari , Thomas Seidl , Gedas Bertasius , Mohsen Fayyaz , Sunando Sengupta

Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding -- i.e., parsing the raw HTML of a webpage, with applications to automation of web-based…

In recent years, the use of multi-modal pre-trained Transformers has led to significant advancements in visually-rich document understanding. However, existing models have mainly focused on features such as text and vision while neglecting…

Computation and Language · Computer Science 2023-08-16 Qiwei Li , Zuchao Li , Xiantao Cai , Bo Du , Hai Zhao

Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…

Recent approaches in literature have exploited the multi-modal information in documents (text, layout, image) to serve specific downstream document tasks. However, they are limited by their - (i) inability to learn cross-modal…

Computation and Language · Computer Science 2022-01-06 Subhojeet Pramanik , Shashank Mujumdar , Hima Patel

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while…

Computation and Language · Computer Science 2020-06-17 Yiheng Xu , Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou

The number of web pages is growing at an exponential rate, accumulating massive amounts of data on the web. It is one of the key processes to classify webpages in web information mining. Some classical methods are based on manually building…

Computation and Language · Computer Science 2023-05-10 Qiwei Lang , Jingbo Zhou , Haoyi Wang , Shiqi Lyu , Rui Zhang

Text classification is a fundamental task in NLP applications. Latest research in this field has largely been divided into two major sub-fields. Learning representations is one sub-field and learning deeper models, both sequential and…

Computation and Language · Computer Science 2018-11-09 Mithun Das Gupta

Retrieval-Augmented Generation (RAG) has been shown to improve knowledge capabilities and alleviate the hallucination problem of LLMs. The Web is a major source of external knowledge used in RAG systems, and many commercial RAG systems have…

Information Retrieval · Computer Science 2025-02-10 Jiejun Tan , Zhicheng Dou , Wen Wang , Mang Wang , Weipeng Chen , Ji-Rong Wen

Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Chuwei Luo , Yufan Shen , Zhaoqing Zhu , Qi Zheng , Zhi Yu , Cong Yao

We introduce HTLM, a hyper-text language model trained on a large-scale web crawl. Modeling hyper-text has a number of advantages: (1) it is easily gathered at scale, (2) it provides rich document-level and end-task-adjacent supervision…

Computation and Language · Computer Science 2021-07-16 Armen Aghajanyan , Dmytro Okhonko , Mike Lewis , Mandar Joshi , Hu Xu , Gargi Ghosh , Luke Zettlemoyer

Extracting structured data from HTML documents is a long-studied problem with a broad range of applications like augmenting knowledge bases, supporting faceted search, and providing domain-specific experiences for key verticals like…

Computation and Language · Computer Science 2020-10-22 Bill Yuchen Lin , Ying Sheng , Nguyen Vo , Sandeep Tata

Users demand fast, seamless webpage experiences, yet developers often struggle to meet these expectations within tight constraints. Performance optimization, while critical, is a time-consuming and often manual process. One of the most…

Software Engineering · Computer Science 2026-01-12 Gideon Peters , SayedHassan Khatoonabadi , Emad Shihab

Large Language Models (LLMs) demonstrate remarkable capabilities in replicating human tasks and boosting productivity. However, their direct application for data extraction presents limitations due to a prioritisation of fluency over…

Computation and Language · Computer Science 2024-06-13 Aman Ahluwalia , Suhrud Wani

Diagrams play a crucial role in visually conveying complex relationships and processes within business documentation. Despite recent advances in Vision-Language Models (VLMs) for various image understanding tasks, accurately identifying and…

Software Engineering · Computer Science 2025-02-10 Shue Shiinoki , Ryo Koshihara , Hayato Motegi , Masumi Morishige

Recent methods that integrate spatial layouts with text for document understanding in large language models (LLMs) have shown promising results. A commonly used method is to represent layout information as text tokens and interleave them…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Zhaoqing Zhu , Chuwei Luo , Zirui Shao , Feiyu Gao , Hangdi Xing , Qi Zheng , Ji Zhang
‹ Prev 1 2 3 10 Next ›