Related papers: Selecting Sub-tables for Data Exploration

Interactive Data Exploration with Smart Drill-Down

We present {\em smart drill-down}, an operator for interactively exploring a relational table to discover and summarize "interesting" groups of tuples. Each group of tuples is described by a {\em rule}. For instance, the rule $(a, b, \star,…

Databases · Computer Science 2016-12-20 Manas Joglekar , Hector Garcia-Molina , Aditya Parameswaran

FEDEX: An Explainability Framework for Data Exploration Steps

When exploring a new dataset, Data Scientists often apply analysis queries, look for insights in the resulting dataframe, and repeat to apply further queries. We propose in this paper a novel solution that assists data scientists in this…

Databases · Computer Science 2022-09-15 Daniel Deutch , Amir Gilad , Tova Milo , Amit Mualem , Amit Somech

Guided Visual Exploration of Relations in Data Sets

Efficient explorative data analysis systems must take into account both what a user knows and wants to know. This paper proposes a principled framework for interactive visual exploration of relations in data, through views most informative…

Machine Learning · Statistics 2021-07-02 Kai Puolamäki , Emilia Oikarinen , Andreas Henelius

Divisi: Interactive Search and Visualization for Scalable Exploratory Subgroup Analysis

Analyzing data subgroups is a common data science task to build intuition about a dataset and identify areas to improve model performance. However, subgroup analysis is prohibitively difficult in datasets with many features, and existing…

Human-Computer Interaction · Computer Science 2025-02-18 Venkatesh Sivaraman , Zexuan Li , Adam Perer

A Lightweight Algorithm to Uncover Deep Relationships in Data Tables

Many data we collect today are in tabular form, with rows as records and columns as attributes associated with each record. Understanding the structural relationship in tabular data can greatly facilitate the data science process.…

Data Structures and Algorithms · Computer Science 2020-09-09 Jin Cao , Yibo Zhao , Linjun Zhang , Jason Li

Nearly Optimal Subdata Selection

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further…

Methodology · Statistics 2026-04-28 Min Yang , Wei Zheng , John Stufken , Ming-Chung Chang , Ting Tian , Xueqin Wang

Answering Table Queries on the Web using Column Keywords

We present the design of a structured search engine which returns a multi-column table in response to a query consisting of keywords describing each of its columns. We answer such queries by exploiting the millions of tables on the Web…

Databases · Computer Science 2017-07-07 Rakesh Pimplikar , Sunita Sarawagi

Predictive Subsampling for Scalable Inference in Networks

Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis…

Methodology · Statistics 2026-02-19 Arpan Kumar , Minh Tang , Srijan Sengupta

Subjectively Interesting Subgroup Discovery on Real-valued Targets

Deriving insights from high-dimensional data is one of the core problems in data mining. The difficulty mainly stems from the fact that there are exponentially many variable combinations to potentially consider, and there are infinitely…

Machine Learning · Statistics 2021-11-08 Jefrey Lijffijt , Bo Kang , Wouter Duivesteijn , Kai Puolamäki , Emilia Oikarinen , Tijl De Bie

Scientific Dataset Discovery via Topic-level Recommendation

Data intensive research requires the support of appropriate datasets. However, it is often time-consuming to discover usable datasets matching a specific research topic. We formulate the dataset discovery problem on an attributed…

Information Retrieval · Computer Science 2021-06-08 Basmah Altaf , Shichao Pei , Xiangliang Zhang

DataPilot: Utilizing Quality and Usage Information for Subset Selection during Visual Data Preparation

Selecting relevant data subsets from large, unfamiliar datasets can be difficult. We address this challenge by modeling and visualizing two kinds of auxiliary information: (1) quality - the validity and appropriateness of data required to…

Human-Computer Interaction · Computer Science 2023-03-06 Arpit Narechania , Fan Du , Atanu R Sinha , Ryan A. Rossi , Jane Hoffswell , Shunan Guo , Eunyee Koh , Shamkant B. Navathe , Alex Endert

Semantic Table Retrieval using Keyword and Table Queries

Tables on the Web contain a vast amount of knowledge in a structured form. To tap into this valuable resource, we address the problem of table retrieval: answering an information need with a ranked list of tables. We investigate this…

Information Retrieval · Computer Science 2021-05-14 Shuo Zhang , Krisztian Balog

Untidy Data: The Unreasonable Effectiveness of Tables

Working with data in table form is usually considered a preparatory and tedious step in the sensemaking pipeline; a way of getting the data ready for more sophisticated visualization and analytical tools. But for many people, spreadsheets…

Human-Computer Interaction · Computer Science 2021-06-30 Lyn Bartram , Michael Correll , Melanie Tory

Web Table Extraction, Retrieval and Augmentation: A Survey

Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two…

Information Retrieval · Computer Science 2020-02-06 Shuo Zhang , Krisztian Balog

Data Informativeness in Linear Optimization under Uncertainty

We study the problem of determining what data is required to solve a decision-making task when only partial information about the state of the world is available. Focusing on linear programs, we introduce a decision-focused notion of data…

Optimization and Control · Mathematics 2026-02-18 Omar Bennouna , Amine Bennouna , Saurabh Amin , Asuman Ozdaglar

Exploring Scale-Measures of Data Sets

Measurement is a fundamental building block of numerous scientific models and their creation. This is in particular true for data driven science. Due to the high complexity and size of modern data sets, the necessity for the development of…

Artificial Intelligence · Computer Science 2022-04-26 Tom Hanika , Johannes Hirth

rtables -- A Framework For Creating Complex Structured Reporting Tables Via Multi-Level Faceted Computations

Tables form a central component in both exploratory data analysis and formal reporting procedures across many industries. These tables are often complex in their conceptual structure and in the computations that generate their individual…

Computation · Statistics 2023-06-30 Gabriel Becker , Adrian Waddell

Active Data Discovery: Mining Unknown Data using Submodular Information Measures

Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance…

Computer Vision and Pattern Recognition · Computer Science 2022-06-20 Suraj Kothawade , Shivang Chopra , Saikat Ghosh , Rishabh Iyer

Scalable Sampling for High Utility Patterns

Discovering valuable insights from data through meaningful associations is a crucial task. However, it becomes challenging when trying to identify representative patterns in quantitative databases, especially with large datasets, as…

Databases · Computer Science 2024-10-31 Lamine Diop , Marc Plantevit

Structured Evaluation of Synthetic Tabular Data

Tabular data is common yet typically incomplete, small in volume, and access-restricted due to privacy concerns. Synthetic data generation offers potential solutions. Many metrics exist for evaluating the quality of synthetic tabular data;…

Machine Learning · Computer Science 2024-04-01 Scott Cheng-Hsin Yang , Baxter Eaves , Michael Schmidt , Ken Swanson , Patrick Shafto