应用统计
We propose a flexible algorithm for feature detection and hypothesis testing in images with ultra low signal-to-noise ratio using cubical persistent homology. Our main application is in the identification of atomic columns and other…
It is proposed to investigate the onset of a disease D, based on several risk factors., with a specific interest in Alzheimer occurrence. For that purpose, two classes of techniques are available, whose properties are quite different in…
According to the Lancet report on the global burden of disease published in October 2020, air pollution is among the five highest risk factors for global health, reducing life expectancy on average by 20 months. This paper describes a…
Improving public policy is one of the key roles of governments, and they can do this in an evidence-based way using administrative data. Causal inference for observational data improves on current practice of using descriptive or predictive…
The statistical shape analysis called Procrustes analysis minimizes the distance between matrices by similarity transformations. The method returns a set of optimal orthogonal matrices, which project each matrix into a common space. This…
This paper presents a novel approach for modeling mortality rates above age 70 by proposing a mixture-based model. This model is compared to four other widely used models: the Beard, Gompertz, Makeham, and Perks models. Our model can…
In most circumstances, probability sampling is the only way to ensure unbiased inference about population quantities where a complete census is not possible. As we enter the era of 'big data', however, nonprobability samples, whose sampling…
Modeling the relationship between vehicle speed and density on the road is a fundamental problem in traffic flow theory. Recent research found that using the least-squares (LS) method to calibrate single-regime speed-density models is…
Ordinal data occur frequently in the social sciences. When applying principal component analysis (PCA), however, those data are often treated as numeric implying linear relationships between the variables at hand, or non-linear PCA is…
Metro systems in megacities such as Beijing, Shenzhen and Guangzhou are under great passenger demand pressure. During peak hours, it is common to see oversaturated conditions (i.e., passenger demand exceeds network capacity), which bring…
This paper proposes a statistical simulator for the engine knock based on the Mixture Density Network (MDN) and the accept-reject method. The proposed simulator can generate the random knock intensity signal corresponding to the input…
This study compared the effectiveness of COVID-19 control policies, including wearing masks, and the vaccine rates through proportional infection rate in 28 states of the United States using the eSIR model. The effective rate of policies…
With unprecedented and growing interest in data science education, there are limited educator materials that provide meaningful opportunities for learners to practice statistical thinking, as defined by Wild and Pfannkuch (1999), with messy…
We seek to provide an interpretable framework for segmenting users in a population for personalized decision-making. We propose a general methodology, Market Segmentation Trees (MSTs), for learning market segmentations explicitly driven by…
Broadening eligibility criteria in cancer trials has been advocated to represent the true patient population more accurately. While the advantages are clear in terms of generalizability and recruitment, novel dose-finding designs are needed…
We provide a procedure termed Flagged observation analyses that can be applied to all the available time series to help identifying time series that should be prioritized.The statistical procedure first applies a structural time series…
This paper presents an approach for estimating Shapley effects for use as global sensitivity metrics to quantify the relative importance of uncertain model parameters. Polynomial Chaos expansion, a well established approach for developing…
Sports analytics -- broadly defined as the pursuit of improvement in athletic performance through the analysis of data -- has expanded its footprint both in the professional sports industry and in academia over the past 30 years. In this…
Low-cost sensors (LCS) for measuring air pollution are increasingly being deployed in mobile applications but questions concerning the quality of the measurements remain unanswered. For example, what is the best way to correct LCS data in a…
Understanding passengers' path choice behavior in urban rail systems is a prerequisite for effective operations and planning. This paper attempts bridging the gap by proposing a probabilistic approach to infer passengers' path choice…