Related papers: CHARTER: heatmap-based multi-type chart data extra…
In this paper, we fill the research gap by adopting state-of-the-art computer vision techniques for the data extraction stage in a data mining system. As shown in Fig.1, this stage contains two subtasks, namely, plot element detection and…
Charts are an essential part of both graphicacy (graphical literacy), and statistical literacy. As chart understanding has become increasingly relevant in data science, automating chart analysis by processing raster images of the charts has…
Most search engines index the textual content of documents in digital libraries. However, scholarly articles frequently report important findings in figures for visual impact and the contents of these figures are not indexed. These contents…
Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical…
This paper introduces a new approach to extract and analyze vector data from technical drawings in PDF format. Our method involves converting PDF files into SVG format and creating a feature-rich graph representation, which captures the…
Visual chart recognition systems are gaining increasing attention due to the growing demand for automatically identifying table headers and values from chart images. Current methods rely on keypoint detection to estimate data element shapes…
Images greatly help in understanding, interpreting and visualizing data. Adding textual description to images is the first and foremost principle of web accessibility. Visually impaired users using screen readers will use these textual…
The widespread use of charts and infographics as a means of data visualization in various domains has inspired recent research in automated chart understanding. However, information extraction from chart images is a complex multitasked…
Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed…
Tables are widely used in several types of documents since they can bring important information in a structured way. In scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research…
Most of the textual information available to us are temporally variable. In a world where information is dynamic, time-stamping them is a very important task. Documents are a good source of information and are used for many tasks like,…
Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields tables are often disclosed in…
Information visualizations such as bar charts and line charts are very popular for exploring data and communicating insights. Interpreting and making sense of such visualizations can be challenging for some people, such as those who are…
Chart images, such as bar charts, pie charts, and line charts, are explosively produced due to the wide usage of data visualizations. Accordingly, knowledge mining from chart images is becoming increasingly important, which can benefit…
Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its…
Charts are a powerful tool for visually conveying complex data, but their comprehension poses a challenge due to the diverse chart types and intricate components. Existing chart comprehension methods suffer from either heuristic rules or an…
Information representation as tables are compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used, however industry still faces…
Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and…
Scientific documents contain tables that list important information in a concise fashion. Structure and content extraction from tables embedded within PDF research documents is a very challenging task due to the existence of visual features…
Vertical bars, horizontal bars, dot, scatter, and line plots provide a diverse set of visualizations to represent data. To understand these plots, one must be able to recognize textual components, locate data points in a plot, and process…