Related papers: Why the Standard Data Processing should be changed

Computational Implications of Reducing Data to Sufficient Statistics

Given a large dataset and an estimation task, it is common to pre-process the data by reducing them to a set of sufficient statistics. This step is often regarded as straightforward and advantageous (in that it simplifies statistical…

Computation · Statistics 2015-07-31 Andrea Montanari

Data Consistency Approach to Model Validation

In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The…

Methodology · Statistics 2019-05-21 Andreas Svensson , Dave Zachariah , Petre Stoica , Thomas B. Schön

Statistical Validity and Consistency of Big Data Analytics: A General Framework

Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making…

Databases · Computer Science 2018-03-30 Bikram Karmakar , Indranil Mukhopadhyay

Representation Bias in Data: A Survey on Identification and Resolution Techniques

Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from…

Databases · Computer Science 2023-03-21 Nima Shahbazi , Yin Lin , Abolfazl Asudeh , H. V. Jagadish

Statistical Models for the Analysis of Optimization Algorithms with Benchmark Functions

Frequentist statistical methods, such as hypothesis testing, are standard practice in papers that provide benchmark comparisons. Unfortunately, these methods have often been misused, e.g., without testing for their statistical test…

Methodology · Statistics 2021-05-18 David Issa Mattos , Jan Bosch , Helena Holmström Olsson

Source Coding Optimization for Distributed Average Consensus

Consensus is a common method for computing a function of the data distributed among the nodes of a network. Of particular interest is distributed average consensus, whereby the nodes iteratively compute the sample average of the data stored…

Information Theory · Computer Science 2021-12-06 Ryan Pilgrim

Adjusting for Bias with Procedural Data

3D softwares are now capable of producing highly realistic images that look nearly indistinguishable from the real images. This raises the question: can real datasets be enhanced with 3D rendered data? We investigate this question. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-04-06 Shesh Narayan Gupta , Nicholas Bear Brown

A General Identification Algorithm For Data Fusion Problems Under Systematic Selection

Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a…

Methodology · Statistics 2024-04-16 Jaron J. R. Lee , AmirEmad Ghassami , Ilya Shpitser

Models for the assessment of treatment improvement: the ideal and the feasible

Comparisons of different treatments or production processes are the goals of a significant fraction of applied research. Unsurprisingly, two-sample problems play a main role in Statistics through natural questions such as `Is the the new…

Methodology · Statistics 2017-09-05 P. C. Álvarez-Esteban , E. del Barrio , J. A. Cuesta-Albertos , C. Matrán

Data-conforming data-driven control: avoiding premature generalizations beyond data

Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in…

Systems and Control · Electrical Eng. & Systems 2025-08-25 Mohammad Ramadan , Evan Toler , Mihai Anitescu

An Overview of Statistical Data Analysis

The use of statistical software in academia and enterprises has been evolving over the last years. More often than not, students, professors, workers, and users, in general, have all had, at some point, exposure to statistical software.…

Applications · Statistics 2019-08-21 Rui Portocarrero Sarmento , Vera Costa

Creating Synthetic Datasets via Evolution for Neural Program Synthesis

Program synthesis is the task of automatically generating a program consistent with a given specification. A natural way to specify programs is to provide examples of desired input-output behavior, and many current program synthesis…

Machine Learning · Computer Science 2020-07-28 Alexander Suh , Yuval Timen

Towards Efficient Abstractions for Concurrent Consensus

Consensus is an often occurring problem in concurrent and distributed programming. We present a programming language with simple semantics and build-in support for consensus in the form of communicating transactions. We motivate the need…

Programming Languages · Computer Science 2013-05-08 Carlo Spaccasassi , Vasileios Koutavas

Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data

As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), researchers increasingly use predictions from…

Machine Learning · Statistics 2025-12-08 Stephen Salerno , Kentaro Hoffman , Awan Afiaz , Anna Neufeld , Tyler H. McCormick , Jeffrey T. Leek

Front End Data Cleaning And Transformation In Standard Printed Form Using Neural Models

Front end of data collection and loading into database manually may cause potential errors in data sets and a very time consuming process. Scanning of a data document in the form of an image and recognition of corresponding information in…

Databases · Computer Science 2014-01-14 Raju Dara , Dr. Ch. Satyanarayana , Dr. A. Govardhan

Consensus Propagation

We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better…

Information Theory · Computer Science 2007-07-13 Ciamac C. Moallemi , Benjamin Van Roy

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or…

Artificial Intelligence · Computer Science 2022-05-11 Sandeep Hans , Diptikalyan Saha , Aniya Aggarwal

Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression

Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated…

Methodology · Statistics 2022-09-15 Mahsa Taheri , Néhémy Lim , Johannes Lederer

Making Online Polls More Accurate: Statistical Methods Explained

Online data has the potential to transform how researchers and companies produce election forecasts. Social media surveys, online panels and even comments scraped from the internet can offer valuable insights into political preferences.…

Applications · Statistics 2025-03-20 Alberto Arletti , Maria Letizia Tanturri , Omar Paccagnella

Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

Diffusion models have recently emerged as powerful generative priors for solving inverse problems. However, training diffusion models in the pixel space are both data-intensive and computationally demanding, which restricts their…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Bowen Song , Soo Min Kwon , Zecheng Zhang , Xinyu Hu , Qing Qu , Liyue Shen