English
Related papers

Related papers: Deep Learning for Technical Document Classificatio…

200 papers

We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , Hongfu Liu

Document intelligence as a relatively new research topic supports many business applications. Its main task is to automatically read, understand, and analyze documents. However, due to the diversity of formats (invoices, reports, forms,…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Zhenrong Zhang , Jiefeng Ma , Jun Du , Licheng Wang , Jianshu Zhang

Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models. However, these models typically require extensive document pre-training data to learn intermediate representations and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Souhail Bakkali , Sanket Biswas , Zuheng Ming , Mickaël Coustaty , Marçal Rusiñol , Oriol Ramos Terrades , Josep Lladós

The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are…

Computation and Language · Computer Science 2024-11-12 Yew Ken Chia , Liying Cheng , Hou Pong Chan , Chaoqun Liu , Maojia Song , Sharifah Mahani Aljunied , Soujanya Poria , Lidong Bing

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task. In this paper, we approach the document classification problem…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Souhail Bakkali , Zuheng Ming , Mickael Coustaty , Marçal Rusiñol , Oriol Ramos Terrades

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document…

TECHDOC is an implemented system demonstrating the feasibility of generating multilingual technical documents on the basis of a language-independent knowledge base. Its application domain is user and maintenance instructions, which are…

cmp-lg · Computer Science 2008-02-03 Dietmar Rösner , Manfred Stede

Document intelligence automates the extraction of information from documents and supports many business applications. Recent self-supervised learning methods on large-scale unlabeled document datasets have opened up promising directions…

Computation and Language · Computer Science 2022-04-29 Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Nikolaos Barmpalios , Rajiv Jain , Ani Nenkova , Tong Sun

Classification of document images is a critical step for archival of old manuscripts, online subscription and administrative procedures. Computer vision and deep learning have been suggested as a first solution to classify documents based…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Nicolas Audebert , Catherine Herold , Kuider Slimani , Cédric Vidal

In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…

Computation and Language · Computer Science 2023-09-26 Zhongwei Wan

In the field of document understanding, significant advances have been made in the fine-tuning of Multimodal Large Language Models (MLLMs) with instruction-following data. Nevertheless, the potential of text-grounding capability within…

Computer Vision and Pattern Recognition · Computer Science 2023-12-18 Yonghui Wang , Wengang Zhou , Hao Feng , Keyi Zhou , Houqiang Li

In this paper, we propose $FastDoc$ (Fast Continual Pre-training Technique using Document Level Metadata and Taxonomy), a novel, compute-efficient framework that utilizes Document metadata and Domain-Specific Taxonomy as supervision signals…

Computation and Language · Computer Science 2024-11-04 Abhilash Nandy , Manav Nitin Kapadnis , Sohan Patnaik , Yash Parag Butala , Pawan Goyal , Niloy Ganguly

Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the…

Software Engineering · Computer Science 2019-07-24 Davide Fucci , Alireza Mollaalizadehbahnemiri , Walid Maalej

Table of contents (ToC) extraction aims to extract headings of different levels in documents to better understand the outline of the contents, which can be widely used for document understanding and information retrieval. Existing works…

Computer Vision and Pattern Recognition · Computer Science 2022-12-07 Pengfei Hu , Zhenrong Zhang , Jianshu Zhang , Jun Du , Jiajia Wu

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that…

Computation and Language · Computer Science 2023-10-26 Yoshinari Fujinuma , Siddharth Varia , Nishant Sankaran , Srikar Appalaraju , Bonan Min , Yogarshi Vyas

In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and…

Artificial Intelligence · Computer Science 2023-09-06 Hao Feng , Zijian Wang , Jingqun Tang , Jinghui Lu , Wengang Zhou , Houqiang Li , Can Huang

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper,…

Machine Learning · Computer Science 2019-04-22 Jun-Ho Choi , Jong-Seok Lee

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This…

Computation and Language · Computer Science 2019-09-13 Muhammad Nabeel Asim , Muhammad Usman Ghani Khan , Muhammad Imran Malik , Andreas Dengel , Sheraz Ahmed
‹ Prev 1 2 3 10 Next ›