Related papers: Perturbation selection and influence measures in l…
We consider the issue of assessing influence of observations in the class of Birnbaum-Saunders nonlinear regression models, which is useful in lifetime data analysis. Our results generalize those in Galea et al. [2004, Influence diagnostics…
Existing effect measures for compositional features are inadequate for many modern applications, for example, in microbiome research, since they display traits such as high-dimensionality and sparsity that can be poorly modelled with…
Evaluation of treatment effects and more general estimands is typically achieved via parametric modelling, which is unsatisfactory since model misspecification is likely. Data-adaptive model building (e.g. statistical/machine learning) is…
The inverse statistical problem of finding direct interactions in complex networks is difficult. In the natural sciences, well-controlled perturbation experiments are widely used to probe the structure of complex networks. However, our…
The optimization of high dimensional functions is a key issue in engineering problems but it frequently comes at a cost that is not acceptable since it usually involves a complex and expensive computer code. Engineers often overcome this…
A common problem in health research is that we have a large database with many variables measured on a large number of individuals. We are interested in measuring additional variables on a subsample; these measurements may be newly…
Causal influence measures for machine learnt classifiers shed light on the reasons behind classification, and aid in identifying influential input features and revealing their biases. However, such analyses involve evaluating the classifier…
The performance of distance-based classifiers heavily depends on the underlying distance metric, so it is valuable to learn a suitable metric from the data. To address the problem of multimodality, it is desirable to learn local metrics. In…
Estimating causal effects under interference, where the stable unit treatment value assumption is violated, is critical in fields such as regional and public economics. Much of the existing research on causal inference under interference…
Influence diagnosis is important since presence of influential observations could lead to distorted analysis and misleading interpretations. For high-dimensional data, it is particularly so, as the increased dimensionality and complexity…
There is an especially strong need in modern large-scale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial…
Utilizing recently developed abstract notions of sectional curvature, we introduce a method for constructing a curvature-based geometric profile of discrete metric spaces. The curvature concept that we use here captures the metric relations…
Researchers are often interested in learning not only the effect of treatments on outcomes, but also the pathways through which these effects operate. A mediator is a variable that is affected by treatment and subsequently affects outcome.…
Many analyses in particle and nuclear physics use simulations to infer fundamental, effective, or phenomenological parameters of the underlying physics models. When the inference is performed with unfolded cross sections, the observables…
Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the…
We consider the inference problem for parameters in stochastic differential equation models from discrete time observations (e.g. experimental or simulation data). Specifically, we study the case where one does not have access to…
A general approach to a broad class of asymptotic problems related to long-time influence of small perturbations, of both deterministic and stochastic type, is presented in the paper. The main characteristic of this influence is a limiting…
In a standard classification framework a set of trustworthy learning data are employed to build a decision rule, with the final aim of classifying unlabelled units belonging to the test set. Therefore, unreliable labelled observations,…
$Anomaly$ $detection$ problems (also called $change$-$point$ $detection$ problems) have been studied in data mining, statistics and computer science over the last several decades in applications such as medical condition monitoring and…
Marginal structural models are a popular tool for investigating the effects of time-varying treatments, but they require an assumption of no unobserved confounders between the treatment and outcome. With observational data, this assumption…