Related papers: Revealing digital documents. Concealed structures …
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…
This paper tries to throw light in the usage of data structures in the field of information retrieval. Information retrieval is an area of study which is gaining momentum as the need and urge for sharing and exploring information is growing…
Storing data is easy, but finding and using data is not. It is desirable that the data is stored in a structured format, which can be preserved and retrieved in future. Creating Metadata for the data is one way of creating structured data…
Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function…
In this paper, we analyze the nature and distribution of structured data on the Web. Web-scale information extraction, or the problem of creating structured tables using extraction from the entire web, is gathering lots of research…
Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…
The Web community has introduced a set of standards and technologies for representing, querying, and manipulating a globally distributed data structure known as the Web of Data. The proponents of the Web of Data envision much of the world's…
In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…
The analysis of the practices and the tendencies of the users at the time of the search for information on Internet makes it possible to highlight several points. The search for information becomes powerful after knowledge of the typology…
Scientific research is highly structured and some of that structure is reflected in research reports. Traditional scientific research reports are yielding to interactive documents which expose their internal structure and are richly linked…
With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or…
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a…
Patterns, which are collections of elements arranged in regular or near-regular arrangements, are an important graphic art form and widely used due to their elegant simplicity and aesthetic appeal. When a pattern is encoded as a flat image…
Document collections of various domains, e.g., legal, medical, or financial, often share some underlying collection-wide structure, which captures information that can aid both human users and structure-aware models. We propose to identify…
Statistical learning in high-dimensional spaces is challenging without a strong underlying data structure. Recent advances with foundational models suggest that text and image data contain such hidden structures, which help mitigate the…
Data hiding is the art of hiding secret data into a cover object such as digital image for covert communication. In this paper, we make the first step towards hiding ``data hiding'', which is totally different from many conventional works…
'Big' high-dimensional data are commonly analyzed in low-dimensions, after performing a dimensionality-reduction step that inherently distorts the data structure. For the same purpose, clustering methods are also often used. These methods…
The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity…
Data comes in many forms. From a shallow perspective, they can be viewed as being either in structured (e.g., as a relation, as key-value pairs) or unstructured (e.g., text, image) formats. So far, machines have been fairly good at…
Researchers got success in mining the Web usage data effectively and efficiently. But representation of the mined patterns is often not in a form suitable for direct human consumption. Hence mechanisms and tools that can represent mined…