Related papers: Deep Learning for Technical Document Classificatio…

SelfDoc: Self-Supervised Document Representation Learning

We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , Hongfu Liu

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Document intelligence as a relatively new research topic supports many business applications. Its main task is to automatically read, understand, and analyze documents. However, due to the diversity of formats (invoices, reports, forms,…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Zhenrong Zhang , Jiefeng Ma , Jun Du , Licheng Wang , Jianshu Zhang

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models. However, these models typically require extensive document pre-training data to learn intermediate representations and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Souhail Bakkali , Sanket Biswas , Zuheng Ming , Mickaël Coustaty , Marçal Rusiñol , Oriol Ramos Terrades , Josep Lladós

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are…

Computation and Language · Computer Science 2024-11-12 Yew Ken Chia , Liying Cheng , Hou Pong Chan , Chaoqun Liu , Maojia Song , Sharifah Mahani Aljunied , Soujanya Poria , Lidong Bing

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task. In this paper, we approach the document classification problem…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Souhail Bakkali , Zuheng Ming , Mickael Coustaty , Marçal Rusiñol , Oriol Ramos Terrades

HDLTex: Hierarchical Deep Learning for Text Classification

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document…

Machine Learning · Computer Science 2018-03-29 Kamran Kowsari , Donald E. Brown , Mojtaba Heidarysafa , Kiana Jafari Meimandi , Matthew S. Gerber , Laura E. Barnes

Generating Multilingual Documents from a Knowledge Base: The TECHDOC Project

TECHDOC is an implemented system demonstrating the feasibility of generating multilingual technical documents on the basis of a language-independent knowledge base. Its application domain is user and maintenance instructions, which are…

cmp-lg · Computer Science 2008-02-03 Dietmar Rösner , Manfred Stede

Unified Pretraining Framework for Document Understanding

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions…

Computation and Language · Computer Science 2022-04-29 Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Nikolaos Barmpalios , Rajiv Jain , Ani Nenkova , Tong Sun

Multimodal deep networks for text and image-based document classification

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Nicolas Audebert , Catherine Herold , Kuider Slimani , Cédric Vidal

Text Classification: A Perspective of Deep Learning Methods

In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…

Computation and Language · Computer Science 2023-09-26 Zhongwei Wan

Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs

In the field of document understanding, significant advances have been made in the fine-tuning of Multimodal Large Language Models (MLLMs) with instruction-following data. Nevertheless, the potential of text-grounding capability within…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Yonghui Wang , Wengang Zhou , Hao Feng , Keyi Zhou , Houqiang Li

$FastDoc$: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy

In this paper, we propose $FastDoc$ (Fast Continual Pre-training Technique using Document Level Metadata and Taxonomy), a novel, compute-efficient framework that utilizes Document metadata and Domain-Specific Taxonomy as supervision signals…

Computation and Language · Computer Science 2024-11-04 Abhilash Nandy , Manav Nitin Kapadnis , Sohan Patnaik , Yash Parag Butala , Pawan Goyal , Niloy Ganguly

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows, extracting data from documents, and summarizing reports. Code generation tasks that require…

Machine Learning · Computer Science 2025-03-18 Juan Rodriguez , Xiangru Jian , Siba Smarak Panigrahi , Tianyu Zhang , Aarash Feizi , Abhay Puri , Akshay Kalkunte , François Savard , Ahmed Masry , Shravan Nayak , Rabiul Awal , Mahsa Massoud , Amirhossein Abaskohi , Zichao Li , Suyuchen Wang , Pierre-André Noël , Mats Leon Richter , Saverio Vadacchino , Shubham Agarwal , Sanket Biswas , Sara Shanian , Ying Zhang , Noah Bolger , Kurt MacDonald , Simon Fauvel , Sathwik Tejaswi , Srinivas Sunkara , Joao Monteiro , Krishnamurthy DJ Dvijotham , Torsten Scholak , Nicolas Chapados , Sepideh Kharagani , Sean Hughes , M. Özsu , Siva Reddy , Marco Pedersoli , Yoshua Bengio , Christopher Pal , Issam Laradji , Spandana Gella , Perouz Taslakian , David Vazquez , Sai Rajeswar

On Using Machine Learning to Identify Knowledge in API Reference Documentation

Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the…

Software Engineering · Computer Science 2019-07-24 Davide Fucci , Alireza Mollaalizadehbahnemiri , Walid Maalej

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Pengfei Hu , Zhenrong Zhang , Jianshu Zhang , Jun Du , Jiajia Wu

A Multi-Modal Multilingual Benchmark for Document Image Classification

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that…

Computation and Language · Computer Science 2023-10-26 Yoshinari Fujinuma , Siddharth Varia , Nishant Sankaran , Srikar Appalaraju , Bonan Min , Yogarshi Vyas

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and…

Artificial Intelligence · Computer Science 2023-09-06 Hao Feng , Zijian Wang , Jingqun Tang , Jinghui Lu , Wengang Zhou , Houqiang Li , Can Huang

Unfolding the Structure of a Document using Deep Learning

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

EmbraceNet: A robust deep learning architecture for multimodal classification

Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper,…

Machine Learning · Computer Science 2019-04-22 Jun-Ho Choi , Jong-Seok Lee

A Robust Hybrid Approach for Textual Document Classification

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This…

Computation and Language · Computer Science 2019-09-13 Muhammad Nabeel Asim , Muhammad Usman Ghani Khan , Muhammad Imran Malik , Andreas Dengel , Sheraz Ahmed