Related papers: TableLab: An Interactive Table Extraction System w…

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices…

Computer Vision and Pattern Recognition · Computer Science 2020-01-07 Shubham Paliwal , Vishwanath D , Rohit Rahul , Monika Sharma , Lovekesh Vig

PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction

Currently, a substantial volume of document data exists in an unstructured format, encompassing Portable Document Format (PDF) files and images. Extracting information from these documents presents formidable challenges due to diverse table…

Computer Vision and Pattern Recognition · Computer Science 2024-09-10 Lei Sheng , Shuai-Shuai Xu

Table Detection in the Wild: A Novel Diverse Table Detection Dataset and Method

Recent deep learning approaches in table detection achieved outstanding performance and proved to be effective in identifying document layouts. Currently, available table detection benchmarks have many limitations, including the lack of…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Mrinal Haloi , Shashank Shekhar , Nikhil Fande , Siddhant Swaroop Dash , Sanjay G

Financial Table Extraction in Image Documents

Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-10 William Watson , Bo Liu

Locating Tables in Scanned Documents for Reconstructing and Republishing (ICIAfS14)

Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to…

Computer Vision and Pattern Recognition · Computer Science 2014-12-25 Akmal Jahan Mac , Roshan G Ragel

Flexible Table Recognition and Semantic Interpretation System

Table extraction is an important but still unsolved problem. In this paper, we introduce a flexible and modular table extraction system. We develop two rule-based algorithms that perform the complete table recognition process, including…

Computer Vision and Pattern Recognition · Computer Science 2021-12-03 Marcin Namysl , Alexander M. Esser , Sven Behnke , Joachim Köhler

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images

A correct localisation of tables in a document is instrumental for determining their structure and extracting their contents; therefore, table detection is a key step in table understanding. Nowadays, the most successful methods for table…

Computer Vision and Pattern Recognition · Computer Science 2019-12-13 Ángela Casado-García , César Domínguez , Jónathan Heras , Eloy Mata , Vico Pascual

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

Automatic table detection in PDF documents has achieved a great success but tabular data extraction are still challenging due to the integrity and noise issues in detected table areas. The accurate data extraction is extremely crucial in…

Computation and Language · Computer Science 2022-05-24 Siwen Luo , Mengting Wu , Yiwen Gong , Wanying Zhou , Josiah Poon

Table understanding in structured documents

Abstract--- Table detection and extraction has been studied in the context of documents like reports, where tables are clearly outlined and stand out from the document structure visually. We study this topic in a rather more challenging…

Information Retrieval · Computer Science 2021-08-20 Martin Holeček , Antonín Hoskovec , Petr Baudiš , Pavel Klinger

Efficient Table Retrieval and Understanding with Multimodal Large Language Models

Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations pose unique challenges for machine understanding,…

Artificial Intelligence · Computer Science 2026-02-10 Zhuoyan Xu , Haoyang Fang , Boran Han , Bonan Min , Bernie Wang , Cuixiong Hu , Shuai Zhang

TableBank: A Benchmark Dataset for Table Detection and Recognition

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Minghao Li , Lei Cui , Shaohan Huang , Furu Wei , Ming Zhou , Zhoujun Li

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Table Structure Recognition is an essential part of end-to-end tabular data extraction in document images. The recent success of deep learning model architectures in computer vision remains to be non-reflective in table structure…

Computer Vision and Pattern Recognition · Computer Science 2021-05-18 Umar Khan , Sohaib Zahid , Muhammad Asad Ali , Adnan ul Hassan , Faisal Shafait

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Tables are widely used in several types of documents since they can bring important information in a structured way. In scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Andrea Gemelli , Emanuele Vivoli , Simone Marinai

Benchmarking Table Extraction from Heterogeneous Scientific Extraction Documents

Table Extraction (TE) consists in extracting tables from PDF documents, in a structured format which can be automatically processed. While numerous TE tools exist, the variety of methods and techniques makes it difficult for users to choose…

Databases · Computer Science 2025-11-21 Marijan Soric , Cécile Gracianne , Ioana Manolescu , Pierre Senellart

SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction

Table extraction from document images is a challenging AI problem, and labelled data for many content domains is difficult to come by. Existing table extraction datasets often focus on scientific tables due to the vast amount of academic…

Machine Learning · Computer Science 2024-12-06 Ethan Bradley , Muhammad Roman , Karen Rafferty , Barry Devereux

Visual Understanding of Complex Table Structures from Document Images

Table structure recognition is necessary for a comprehensive understanding of documents. Tables in unstructured business documents are tough to parse due to the high diversity of layouts, varying alignments of contents, and the presence of…

Computer Vision and Pattern Recognition · Computer Science 2021-11-16 Sachin Raja , Ajoy Mondal , C V Jawahar

Attend, Copy, Parse -- End-to-end information extraction from documents

Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This end-to-end data is naturally consumed and produced when performing the task because it is…

Computation and Language · Computer Science 2021-04-26 Rasmus Berg Palm , Florian Laws , Ole Winther

Deep learning for table detection and structure recognition: A survey

Tables are everywhere, from scientific journals, papers, websites, and newspapers all the way to items we buy at the supermarket. Detecting them is thus of utmost importance to automatically understanding the content of a document. The…

Computer Vision and Pattern Recognition · Computer Science 2022-11-17 Mahmoud Kasem , Abdelrahman Abdallah , Alexander Berendeyev , Ebrahem Elkady , Mahmoud Abdalla , Mohamed Mahmoud , Mohamed Hamada , Daniyar Nurseitov , Islam Taj-Eddin

TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets

Tables have been an ever-existing structure to store data. There exist now different approaches to store tabular data physically. PDFs, images, spreadsheets, and CSVs are leading examples. Being able to parse table structures and extract…

Computer Vision and Pattern Recognition · Computer Science 2022-01-06 Susie Xi Rao , Johannes Rausch , Peter Egger , Ce Zhang

Tablext: A Combined Neural Network And Heuristic Based Table Extractor

A significant portion of the data available today is found within tables. Therefore, it is necessary to use automated table extraction to obtain thorough results when data-mining. Today's popular state-of-the-art methods for table…

Information Retrieval · Computer Science 2021-04-26 Zach Colter , Morteza Fayazi , Zineb Benameur-El , Serafina Kamp , Shuyan Yu , Ronald Dreslinski