Related papers: A Bayesian algorithm for sample selection bias cor…
In analyzing big data for finite population inference, it is critical to adjust for the selection bias in the big data. In this paper, we propose two methods of reducing the selection bias associated with the big data sample. The first…
Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches…
Accurate and timely population data are essential for disaster response and humanitarian planning, but traditional censuses often cannot capture rapid demographic changes. Social media data offer a promising alternative for dynamic…
An ever-increasing deluge of big data is becoming available to national statistical offices globally, but it is well documented that statistics produced by big data alone often suffer from selection bias and are not usually representative…
Non-representative surveys are commonly used and widely available but suffer from selection bias that generally cannot be entirely eliminated using weighting techniques. Instead, we propose a Bayesian method to synthesize longitudinal…
In the case of informative sampling the sampling scheme explicitly or implicitly depends on the response variable. As a result, the sample distribution of response variable can- not be used for making inference about the population. In this…
Online data has the potential to transform how researchers and companies produce election forecasts. Social media surveys, online panels and even comments scraped from the internet can offer valuable insights into political preferences.…
We consider inference from non-random samples in data-rich settings where high-dimensional auxiliary information is available both in the sample and the target population, with survey inference being a special case. We propose a regularized…
Big data presents potential but unresolved value as a source for analysis and inference. However,selection bias, present in many of these datasets, needs to be accounted for so that appropriate inferences can be made on the target…
Non-Bayesian social learning enables multiple agents to conduct networked signal and information processing through observing environmental signals and information aggregating. Traditional non-Bayesian social learning models only consider…
This paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and…
Bayesian estimation is increasingly popular for performing model based inference to support policymaking. These data are often collected from surveys under informative sampling designs where subject inclusion probabilities are designed to…
Social media provide access to behavioural data at an unprecedented scale and granularity. However, using these data to understand phenomena in a broader population is difficult due to their non-representativeness and the bias of…
Using social media data for statistical analysis of general population faces commonly two basic obstacles: firstly, social media data are collected for different objects than the population units of interest; secondly, the relevant measures…
In population studies, it is standard to sample data via designs in which the population is divided into strata, with the different strata assigned different probabilities of inclusion. Although there have been some proposals for including…
We propose a general method for distributed Bayesian model choice, using the marginal likelihood, where a data set is split in non-overlapping subsets. These subsets are only accessed locally by individual workers and no data is shared…
In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for…
Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…
We propose the use of a simple intuitive principle for measuring algorithmic classification bias: the significance of the differences in a classifier's error rates across the various demographics is inversely commensurate with the sample…
We have developed a new Bayesian method to correct the flux densities of astronomical sources. The hybrid method combines a simulated likelihood to model survey selection together with an analytic source-count-based prior. The simulated…