Related papers: S2Doc -- Spatial-Semantic Document Format

SelfDoc: Self-Supervised Document Representation Learning

We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , Hongfu Liu

SimDoc: Topic Sequence Alignment based Document Similarity Framework

Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering,…

Computation and Language · Computer Science 2017-11-15 Gaurav Maheshwari , Priyansh Trivedi , Harshita Sahijwani , Kunal Jha , Sourish Dasgupta , Jens Lehmann

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Ahmed Nassar , Andres Marafioti , Matteo Omenetti , Maksym Lysak , Nikolaos Livathinos , Christoph Auer , Lucas Morin , Rafael Teixeira de Lima , Yusik Kim , A. Said Gurbuz , Michele Dolfi , Miquel Farré , Peter W. J. Staar

From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis

Documents are core carriers of information and knowl-edge, with broad applications in finance, healthcare, and scientific research. Tables, as the main medium for structured data, encapsulate key information and are among the most critical…

Computation and Language · Computer Science 2025-08-15 Xuan Li , Jialiang Dong , Raymond Wong

FlexDoc: Flexible Document Adaptation through Optimizing both Content and Layout

Designing adaptive documents that are visually appealing across various devices and for diverse viewers is a challenging task. This is due to the wide variety of devices and different viewer requirements and preferences. Alterations to a…

Human-Computer Interaction · Computer Science 2024-10-22 Yue Jiang , Christof Lutteroth , Rajiv Jain , Christopher Tensmeyer , Varun Manjunatha , Wolfgang Stuerzlinger , Vlad Morariu

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

Spatial Information Integration in Small Language Models for Document Layout Generation and Classification

Document layout understanding is a field of study that analyzes the spatial arrangement of information in a document hoping to understand its structure and layout. Models such as LayoutLM (and its subsequent iterations) can understand…

Computation and Language · Computer Science 2025-01-13 Pablo Melendez , Clemens Havas

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

The ability to understand and answer questions over documents can be useful in many business and practical applications. However, documents often contain lengthy and diverse multimodal contents such as texts, figures, and tables, which are…

Computation and Language · Computer Science 2024-11-12 Yew Ken Chia , Liying Cheng , Hou Pong Chan , Chaoqun Liu , Maojia Song , Sharifah Mahani Aljunied , Soujanya Poria , Lidong Bing

Simple is not Enough: Document-level Text Simplification using Readability and Coherence

In this paper, we present the SimDoc system, a simplification model considering simplicity, readability, and discourse aspects, such as coherence. In the past decade, the progress of the Text Simplification (TS) field has been mostly shown…

Computation and Language · Computer Science 2024-12-30 Laura Vásquez-Rodríguez , Nhung T. H. Nguyen , Piotr Przybyła , Matthew Shardlow , Sophia Ananiadou

S2 Chunking: A Hybrid Framework for Document Segmentation Through Integrated Spatial and Semantic Analysis

Document chunking is a critical task in natural language processing (NLP) that involves dividing a document into meaningful segments. Traditional methods often rely solely on semantic analysis, ignoring the spatial layout of elements, which…

Computation and Language · Computer Science 2025-01-13 Prashant Verma

From 2D Document Interactions into Immersive Information Experience: An Example-Based Design by Augmenting Content, Spatializing Placement, Enriching Long-Term Interactions, and Simplifying Content Creations

Documents serve as a crucial and indispensable medium for everyday workplace tasks. However, understanding, interacting and creating such documents on today's planar interfaces without any intelligent support are challenging due to our…

Human-Computer Interaction · Computer Science 2024-11-19 Chen Chen

Spatial and Spatio-Temporal Multidimensional Data Modelling: A Survey

Data warehouse store and provide access to large volume of historical data supporting the strategic decisions of organisations. Data warehouse is based on a multidimensional model which allow to express user's needs for supporting the…

Databases · Computer Science 2012-08-02 Saida Aissi , Mohamed Salah Gouider

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models

Developing document understanding models at enterprise scale requires large, diverse, and well-annotated datasets spanning a wide range of document types. However, collecting such data is prohibitively expensive due to privacy constraints,…

Artificial Intelligence · Computer Science 2025-10-03 Karan Dua , Hitesh Laxmichand Patel , Puneet Mittal , Ranjeet Gupta , Amit Agarwal , Praneet Pabolu , Srikant Panda , Hansa Meghwani , Graham Horwood , Fahad Shah

Space-time databases modeling global semantic networks

This paper represents an approach to creating global knowledge systems, using new philosophy and infrastructure of global distributed semantic network (frame knowledge representation system) based on the space-time database construction.…

Information Theory · Computer Science 2007-07-16 A. A. Prikhod'ko , N. A. Prikhod'ko

Ranking Archived Documents for Structured Queries on Semantic Layers

Archived collections of documents (like newspaper and web archives) serve as important information sources in a variety of disciplines, including Digital Humanities, Historical Science, and Journalism. However, the absence of efficient and…

Information Retrieval · Computer Science 2021-07-30 Pavlos Fafalios , Vaibhav Kasturia , Wolfgang Nejdl

sTeX+ - a System for Flexible Formalization of Linked Data

We present the sTeX+ system, a user-driven advancement of sTeX - a semantic extension of LaTeX that allows for producing high-quality PDF documents for (proof)reading and printing, as well as semantic XML/OMDoc documents for the Web or…

Software Engineering · Computer Science 2010-06-24 Andrea Kohlhase , Michael Kohlhase , Christoph Lange

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as a machine reading comprehension task based on a single…

Computation and Language · Computer Science 2022-05-04 Song Feng , Siva Sankalp Patel , Hui Wan , Sachindra Joshi

Combining Linguistic and Spatial Information for Document Analysis

We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial.…

Computation and Language · Computer Science 2007-05-23 Marco Aiello , Christof Monz , Leon Todoran

The Semantic Scholar Open Data Platform

The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping…

Digital Libraries · Computer Science 2025-04-29 Rodney Kinney , Chloe Anastasiades , Russell Authur , Iz Beltagy , Jonathan Bragg , Alexandra Buraczynski , Isabel Cachola , Stefan Candra , Yoganand Chandrasekhar , Arman Cohan , Miles Crawford , Doug Downey , Jason Dunkelberger , Oren Etzioni , Rob Evans , Sergey Feldman , Joseph Gorney , David Graham , Fangzhou Hu , Regan Huff , Daniel King , Sebastian Kohlmeier , Bailey Kuehl , Michael Langan , Daniel Lin , Haokun Liu , Kyle Lo , Jaron Lochner , Kelsey MacMillan , Tyler Murray , Chris Newell , Smita Rao , Shaurya Rohatgi , Paul Sayre , Zejiang Shen , Amanpreet Singh , Luca Soldaini , Shivashankar Subramanian , Amber Tanaka , Alex D. Wade , Linda Wagner , Lucy Lu Wang , Chris Wilhelm , Caroline Wu , Jiangjiang Yang , Angele Zamarron , Madeleine Van Zuylen , Daniel S. Weld

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

Visual document understanding (VDU) has rapidly advanced with the development of powerful multi-modal language models. However, these models typically require extensive document pre-training data to learn intermediate representations and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-06 Souhail Bakkali , Sanket Biswas , Zuheng Ming , Mickaël Coustaty , Marçal Rusiñol , Oriol Ramos Terrades , Josep Lladós