Related papers: Why the Standard Data Processing should be changed
Given a large dataset and an estimation task, it is common to pre-process the data by reducing them to a set of sufficient statistics. This step is often regarded as straightforward and advantageous (in that it simplifies statistical…
In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The…
Informatics and technological advancements have triggered generation of huge volume of data with varied complexity in its management and analysis. Big Data analytics is the practice of revealing hidden aspects of such data and making…
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately. Representation Bias in data can happen due to various reasons ranging from…
Frequentist statistical methods, such as hypothesis testing, are standard practice in papers that provide benchmark comparisons. Unfortunately, these methods have often been misused, e.g., without testing for their statistical test…
Consensus is a common method for computing a function of the data distributed among the nodes of a network. Of particular interest is distributed average consensus, whereby the nodes iteratively compute the sample average of the data stored…
3D softwares are now capable of producing highly realistic images that look nearly indistinguishable from the real images. This raises the question: can real datasets be enhanced with 3D rendered data? We investigate this question. In this…
Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a…
Comparisons of different treatments or production processes are the goals of a significant fraction of applied research. Unsurprisingly, two-sample problems play a main role in Statistics through natural questions such as `Is the the new…
Data-driven and adaptive control approaches face the problem of introducing sudden distributional shifts beyond the distribution of data encountered during learning. Therefore, they are prone to invalidating the very assumptions used in…
The use of statistical software in academia and enterprises has been evolving over the last years. More often than not, students, professors, workers, and users, in general, have all had, at some point, exposure to statistical software.…
Program synthesis is the task of automatically generating a program consistent with a given specification. A natural way to specify programs is to provide examples of desired input-output behavior, and many current program synthesis…
Consensus is an often occurring problem in concurrent and distributed programming. We present a programming language with simple semantics and build-in support for consensus in the form of communicating transactions. We motivate the need…
As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), researchers increasingly use predictions from…
Front end of data collection and loading into database manually may cause potential errors in data sets and a very time consuming process. Scanning of a data document in the form of an image and recognition of corresponding information in…
We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better…
Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or…
Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated…
Online data has the potential to transform how researchers and companies produce election forecasts. Social media surveys, online panels and even comments scraped from the internet can offer valuable insights into political preferences.…
Diffusion models have recently emerged as powerful generative priors for solving inverse problems. However, training diffusion models in the pixel space are both data-intensive and computationally demanding, which restricts their…