Related papers: Revealing digital documents. Concealed structures …

Unfolding the Structure of a Document using Deep Learning

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

A Study on the usage of Data Structures in Information Retrieval

This paper tries to throw light in the usage of data structures in the field of information retrieval. Information retrieval is an area of study which is gaining momentum as the need and urge for sharing and exploring information is growing…

Information Retrieval · Computer Science 2016-02-26 V. R. Kanagavalli , G. Maheeja

Digitizing scientific data and data retrieval techniques

Storing data is easy, but finding and using data is not. It is desirable that the data is stored in a structured format, which can be preserved and retrieved in future. Creating Metadata for the data is one way of creating structured data…

Information Theory · Computer Science 2011-01-04 Ranjeet Devarakonda , Giri Palanisamy , Jim Green

Understanding and representing the semantics of large structured documents

Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function…

Computation and Language · Computer Science 2018-07-27 Muhammad Mahbubur Rahman , Tim Finin

An Analysis of Structured Data on the Web

In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research…

Databases · Computer Science 2012-03-30 Nilesh Dalvi , Ashwin Machanavajjhala , Bo Pang

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…

Multimedia · Computer Science 2026-04-07 Qintong Zhang , Bin Wang , Victor Shea-Jay Huang , Junyuan Zhang , Zhengren Wang , Hao Liang , Conghui He , Wentao Zhang

A Reflection on the Structure and Process of the Web of Data

The Web community has introduced a set of standards and technologies for representing, querying, and manipulating a globally distributed data structure known as the Web of Data. The proponents of the Web of Data envision much of the world's…

Artificial Intelligence · Computer Science 2009-08-05 Marko A. Rodriguez

Towards Semantically Enhanced Data Understanding

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…

Databases · Computer Science 2018-06-14 Markus Schröder , Christian Jilek , Jörn Hees , Andreas Dengel

Le travail collaboratif dans le cadre d'un projet architectural

The analysis of the practices and the tendencies of the users at the time of the search for information on Internet makes it possible to highlight several points. The search for information becomes powerful after knowledge of the typology…

Human-Computer Interaction · Computer Science 2007-06-14 Marie-France Ango-Obiang

Supporting Structured Browsing for Full-Text Scientific Research Reports

Scientific research is highly structured and some of that structure is reflected in research reports. Traditional scientific research reports are yielding to interactive documents which expose their internal structure and are richly linked…

Digital Libraries · Computer Science 2012-09-04 Robert B. Allen

Using Structural Metadata to Localize Experience of Digital Content

With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or…

Digital Libraries · Computer Science 2007-05-23 Naomi Dushay

Discovery data topology with the closure structure. Theoretical and practical aspects

In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a…

Databases · Computer Science 2021-03-31 Tatiana Makhalova , Aleksey Buzmakov , Sergei O. Kuznetsov , Amedeo Napoli

Discovering Pattern Structure Using Differentiable Compositing

Patterns, which are collections of elements arranged in regular or near-regular arrangements, are an important graphic art form and widely used due to their elegant simplicity and aesthetic appeal. When a pattern is encoded as a flat image…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Pradyumna Reddy , Paul Guerrero , Matt Fisher , Wilmot Li , Miloy J. Mitra

Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction

Document collections of various domains, e.g., legal, medical, or financial, often share some underlying collection-wide structure, which captures information that can aid both human users and structure-aware models. We propose to identify…

Computation and Language · Computer Science 2025-08-27 Gili Lior , Yoav Goldberg , Gabriel Stanovsky

Learning with Hidden Factorial Structure

Statistical learning in high-dimensional spaces is challenging without a strong underlying data structure. Recent advances with foundational models suggest that text and image data contain such hidden structures, which help mitigate the…

Machine Learning · Statistics 2025-02-04 Charles Arnal , Clement Berenfeld , Simon Rosenberg , Vivien Cabannes

Hiding Data Hiding

Data hiding is the art of hiding secret data into a cover object such as digital image for covert communication. In this paper, we make the first step towards hiding ``data hiding'', which is totally different from many conventional works…

Cryptography and Security · Computer Science 2022-12-19 Hanzhou Wu , Gen Liu , Xinpeng Zhang

Natural data structure extracted from neighborhood-similarity graphs

'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods…

Machine Learning · Statistics 2019-02-20 Tom Lorimer , Karlis Kanders , Ruedi Stoop

Big Data Dimensional Analysis

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity…

Databases · Computer Science 2016-08-01 Vijay Gadepally , Jeremy Kepner

Text Data Integration

Data comes in many forms. From a shallow perspective, they can be viewed as being either in structured (e.g., as a relation, as key-value pairs) or unstructured (e.g., text, image) formats. So far, machines have been fairly good at…

Computation and Language · Computer Science 2026-03-31 Md Ataur Rahman , Dimitris Sacharidis , Oscar Romero , Sergi Nadal

Visualization of Mined Pattern and Its Human Aspects

Researchers got success in mining the Web usage data effectively and efficiently. But representation of the mined patterns is often not in a form suitable for direct human consumption. Hence mechanisms and tools that can represent mined…

Human-Computer Interaction · Computer Science 2009-09-01 Ratnesh Kumar Jain , Dr. Suresh Jain , Dr. R. S. Kasana