Related papers: Automating Date Format Detection for Data Visualiz…

Automatic Detection of Trends in Dynamical Text: An Evolutionary Approach

This paper presents an evolutionary algorithm for modeling the arrival dates of document streams, which is any time-stamped collection of documents, such as newscasts, e-mails, IRC conversations, scientific journals archives and weblog…

Information Retrieval · Computer Science 2007-05-23 Lourdes Araujo , Juan J. Merelo

Unsupervised Data Extraction from Computer-generated Documents with Single Line Formatting

Processing large amounts of data is an essential problem of the big data era. Most of the data exchange is done via direct communication (using APIs) and well-structured file formats (JSON, XML, EDI, etc.), but a significant portion of the…

Information Retrieval · Computer Science 2020-07-17 Vladimir Bernstein , Andrei Afanassenkov

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Dynamic Data selection aims to accelerate training by prioritizing informative samples during online training. However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate…

Machine Learning · Computer Science 2026-05-14 Suorong Yang , Fangjian Su , Hai Gan , Ziqi Ye , Jie Li , Baile Xu , Furao Shen , Soujanya Poria

An Analytical Survey on Recent Trends in High Dimensional Data Visualization

Data visualization is the process by which data of any size or dimensionality is processed to produce an understandable set of data in a lower dimensionality, allowing it to be manipulated and understood more easily by people. The goal of…

Graphics · Computer Science 2021-07-06 Alexander Kiefer , Md. Khaledur Rahman

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…

Software Engineering · Computer Science 2025-03-21 Pankaj Thorat , Adnan Qidwai , Adrija Dhar , Aishwariya Chakraborty , Anand Eswaran , Hima Patel , Praveen Jayachandran

One-Shot Template Matching for Automatic Document Data Capture

In this paper, we propose a novel one-shot template-matching algorithm to automatically capture data from business documents with an aim to minimize manual data entry. Given one annotated document, our algorithm can automatically extract…

Information Retrieval · Computer Science 2019-10-23 Pranjal Dhakal , Manish Munikar , Bikram Dahal

A new algorithm for shape matching and pattern recognition using dynamic programming

We propose a new method for shape recognition and retrieval based on dynamic programming. Our approach uses the dynamic programming algorithm to compute the optimal score and to find the optimal alignment between two strings. First, each…

Computer Vision and Pattern Recognition · Computer Science 2019-05-01 Noreddine Gherabi , Bahaj Mohamed

Automatic String Data Validation with Pattern Discovery

In enterprise data pipelines, data insertions occur periodically and may impact downstream services if data quality issues are not addressed. Typically, such problems can be investigated and fixed by on-call engineers, but locating the…

Databases · Computer Science 2024-08-07 Xinwei Lin , Jing Zhao , Peng Di , Chuan Xiao , Rui Mao , Yan Ji , Makoto Onizuka , Zishuo Ding , Weiyi Shang , Jianbin Qin

Investigating Entropy for Extractive Document Summarization

Automatic text summarization aims to cut down readers time and cognitive effort by reducing the content of a text document without compromising on its essence. Ergo, informativeness is the prime attribute of document summary generated by an…

Information Retrieval · Computer Science 2021-10-01 Alka Khurana , Vasudha Bhatnagar

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before…

Computation and Language · Computer Science 2024-10-03 Julian Neuberger , Han van der Aa , Lars Ackermann , Daniel Buschek , Jannic Herrmann , Stefan Jablonski

Data Point Selection for Line Chart Visualization: Methodological Assessment and Evidence-Based Guidelines

Time series visualization plays a crucial role in identifying patterns and extracting insights across various domains. However, as datasets continue to grow in size, visualizing them effectively becomes challenging. Downsampling, which…

Human-Computer Interaction · Computer Science 2023-04-04 Jonas Van Der Donckt , Jeroen Van Der Donckt , Michael Rademaker , Sofie Van Hoecke

A Dynamic Programming Algorithm for Finding an Optimal Sequence of Informative Measurements

An informative measurement is the most efficient way to gain information about an unknown state. We present a first-principles derivation of a general-purpose dynamic programming algorithm that returns an optimal sequence of informative…

Machine Learning · Computer Science 2023-02-01 Peter N. Loxley , Ka-Wai Cheung

Dictionary-Learning-Based Data Pruning for System Identification

System identification is normally involved in augmenting time series data by time shifting and nonlinearisation (e.g., polynomial basis), both of which introduce redundancy in features and samples. Many research works focus on reducing…

Machine Learning · Computer Science 2025-09-05 Tingna Wang , Sikai Zhang , Mingming Song , Limin Sun

A Novel Framework to Expedite Systematic Reviews by Automatically Building Information Extraction Training Corpora

A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A…

Information Retrieval · Computer Science 2016-06-22 Tanmay Basu , Shraman Kumar , Abhishek Kalyan , Priyanka Jayaswal , Pawan Goyal , Stephen Pettifer , Siddhartha R. Jonnalagadda

Data Compression with Stochastic Codes

Machine learning has had a major impact on data compression over the last decade and inspired many new, exciting theoretical and applied questions. This paper describes one such direction -- relative entropy coding -- which focuses on…

Information Theory · Computer Science 2026-02-10 Gergely Flamich , Deniz Gündüz

This study introduces a simple yet effective method for identifying similar data points across non-free text domains, such as tabular and image data, using Large Language Models (LLMs). Our two-step approach involves data point…

Computation and Language · Computer Science 2024-10-01 Xianlong Zeng , Yijing Gao , Fanghao Song , Ang Liu

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models

Natural Language Explanation (NLE) aims to elucidate the decision-making process by providing detailed, human-friendly explanations in natural language. It helps demystify the decision-making processes of large vision-language models…

Computation and Language · Computer Science 2024-12-10 Patrick Amadeus Irawan , Genta Indra Winata , Samuel Cahyawijaya , Ayu Purwarianti

Automatic Labeling for Entity Extraction in Cyber Security

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is…

Information Retrieval · Computer Science 2014-06-11 Robert A. Bridges , Corinne L. Jones , Michael D. Iannacone , Kelly M. Testa , John R. Goodall

Detection and Description of Change in Visual Streams

This paper presents a framework for the analysis of changes in visual streams: ordered sequences of images, possibly separated by significant time gaps. We propose a new approach to incorporating unlabeled data into training to generate…

Computer Vision and Pattern Recognition · Computer Science 2020-04-13 Davis Gilton , Ruotian Luo , Rebecca Willett , Greg Shakhnarovich

Deep Visual Template-Free Form Parsing

Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process…

Computer Vision and Pattern Recognition · Computer Science 2019-09-20 Brian Davis , Bryan Morse , Scott Cohen , Brian Price , Chris Tensmeyer