Related papers: Computing Multi-Relational Sufficient Statistics f…

Fast and Reliable Missing Data Contingency Analysis with Predicate-Constraints

Today, data analysts largely rely on intuition to determine whether missing or withheld rows of a dataset significantly affect their analyses. We propose a framework that can produce automatic contingency analysis, i.e., the range of values…

Databases · Computer Science 2020-04-09 Xi Liang , Zechao Shang , Aaron J. Elmore , Sanjay Krishnan , Michael J. Franklin

Scalable Join Inference for Large Context Graphs

Context graphs are essential for modern AI applications including question answering, pattern discovery, and data analysis. Building accurate context graphs from structured databases requires inferring join relationships between entities.…

Databases · Computer Science 2026-03-05 Shivani Tripathi , Ravi Shetye , Shi Qiao , Alekh Jindal

Efficiently Estimating Mutual Information Between Attributes Across Tables

Relational data augmentation is a powerful technique for enhancing data analytics and improving machine learning models by incorporating columns from external datasets. However, it is challenging to efficiently discover relevant external…

Databases · Computer Science 2025-03-06 Aécio Santos , Flip Korn , Juliana Freire

Algorithms for Efficient Mining of Statistically Significant Attribute Association Information

Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes and class (if…

Databases · Computer Science 2012-08-21 Pritam Chanda , Aidong Zhang , Murali Ramanathan

A Model Explaining Correlation Between Observed Values in Contingency Tables

In this article, a model is proposed using Bayesian techniques to account for the high correlation between many observed set of contingency tables. In many real life data this high correlation is encountered. Simulation studies are also…

Methodology · Statistics 2013-10-01 Abhik Ghosh , Samit Roy , Sujatro Chaklader

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of…

Artificial Intelligence · Computer Science 2009-09-25 A. Moore , M. S. Lee

A New Scale for Attribute Dependency in Large Database Systems

Large, data centric applications are characterized by its different attributes. In modern day, a huge majority of the large data centric applications are based on relational model. The databases are collection of tables and every table…

Information Retrieval · Computer Science 2012-06-28 Soumya Sen , Anjan Dutta , Agostino Cortesi , Nabendu Chaki

Missing and spurious interactions and the reconstruction of complex networks

Network analysis is currently used in a myriad of contexts: from identifying potential drug targets to predicting the spread of epidemics and designing vaccination strategies, and from finding friends to uncovering criminal activity.…

Data Analysis, Statistics and Probability · Physics 2010-04-28 R. Guimera , M. Sales-Pardo

Joining relations under discrete uncertainty

In this paper we introduce and experimentally compare alternative algorithms to join uncertain relations. Different algorithms are based on specific principles, e.g., sorting, indexing, or building intermediate relational tables to apply…

Databases · Computer Science 2012-11-02 Matteo Magnani , Danilo Montesi

Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false…

Machine Learning · Statistics 2018-04-03 Feras Saad , Vikash Mansinghka

An Odds Ratio Based Inference Engine

Expert systems applications that involve uncertain inference can be represented by a multidimensional contingency table. These tables offer a general approach to inferring with uncertain evidence, because they can embody any form of…

Artificial Intelligence · Computer Science 2013-04-15 David S. Vaughan , Bruce M. Perrin , Robert M. Yadrick , Peter D. Holden , Karl G. Kempf

Query Significance in Databases via Randomizations

Many sorts of structured data are commonly stored in a multi-relational format of interrelated tables. Under this relational model, exploratory data analysis can be done by using relational queries. As an example, in the Internet Movie…

Databases · Computer Science 2009-07-01 Markus Ojala , Gemma C. Garriga , Aristides Gionis , Heikki Mannila

Scalable Community Detection in Massive Networks Using Aggregated Relational Data

The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into…

Social and Information Networks · Computer Science 2024-05-24 Timothy Jones , Owen G. Ward , Yiran Jiang , John Paisley , Tian Zheng

Reliable and Efficient Inference of Bayesian Networks from Sparse Data by Statistical Learning Theory

To learn (statistical) dependencies among random variables requires exponentially large sample size in the number of observed random variables if any arbitrary joint probability distribution can occur. We consider the case that sparse data…

Machine Learning · Computer Science 2007-05-23 Dominik Janzing , Daniel Herrmann

A Review of Relational Machine Learning for Knowledge Graphs

Relational machine learning studies methods for the statistical analysis of relational, or graph-structured, data. In this paper, we provide a review of how such statistical models can be "trained" on large knowledge graphs, and then used…

Machine Learning · Statistics 2016-11-18 Maximilian Nickel , Kevin Murphy , Volker Tresp , Evgeniy Gabrilovich

Inferring Multilateral Relations from Dynamic Pairwise Interactions

Correlations between anomalous activity patterns can yield pertinent information about complex social processes: a significant deviation from normal behavior, exhibited simultaneously by multiple pairs of actors, provides evidence for some…

Artificial Intelligence · Computer Science 2013-11-19 Aaron Schein , Juston Moore , Hanna Wallach

Mining tree-query associations in graphs

New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for…

Databases · Computer Science 2010-08-17 Eveline Hoekx , Jan Van den Bussche

A Record Linkage Model Incorporating Relational Data

In this paper we introduce a novel Bayesian approach for linking multiple social networks in order to discover the same real world person having different accounts across networks. In particular, we develop a latent model that allow us to…

Applications · Statistics 2018-08-15 Juan Sosa , Abel Rodriguez

Fast Counting in Machine Learning Applications

We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We…

Machine Learning · Statistics 2019-01-09 Subhadeep Karan , Matthew Eichhorn , Blake Hurlburt , Grant Iraci , Jaroslaw Zola

Fast and Simple Relational Processing of Uncertain Data

This paper introduces U-relations, a succinct and purely relational representation system for uncertain databases. U-relations support attribute-level uncertainty using vertical partitioning. If we consider positive relational algebra…

Databases · Computer Science 2007-07-12 Lyublena Antova , Thomas Jansen , Christoph Koch , Dan Olteanu