应用统计
Machine learning and statistical methods can improve conventional motor protection systems, providing early warning and detection of emerging failures. Data-driven methods rely on historical data to learn how the system is expected to…
This paper presents a comprehensive analysis of power plant performance using the inverse Gaussian (IG) distribution framework. We combine theoretical foundations with practical applications, focusing on both combined cycle and nuclear…
The least squares of depth-trimmed (LST) residuals regression, proposed and studied in Zuo and Zuo (2023), serves as a robust alternative to the classic least squares (LS) regression as well as a strong competitor to the renowned robust…
The need to explore and/or optimize expensive simulators with many qualitative factors arises in broad scientific and engineering problems. Our motivating application lies in path planning - the exploration of feasible paths for navigation,…
Fine-grained noise maps are vital for epidemiological studies on traffic noise. However, detailed information on traffic noise is often limited, especially in Eastern Europe. Rigid linear noise land-use regressions are typically employed to…
A core principle of Privacy by Design (PbD) is minimizing the data that is stored or shared about each individual respondent. PbD principles are mandated by the GDPR (see Article 5c and Article 25), as well as informing aspects of…
The national forecasting competition WxChallenge, brainchild of Brad Illston at the University of Oklahoma in 2005, has become a cherished institution played across the United States each year. Participants include students, faculty,…
Online boards offer a platform for sharing and discussing content, where discussion emerges as a cascade of comments in response to a post. Branching point process models offer a practical approach to modelling these cascades; however,…
Topic Modeling is a popular statistical tool commonly used on textual data to identify the hidden thematic structure in a document collection based on the distribution of words. Additionally, it can be used to cluster the documents, with…
The identification of genetic signal regions in the human genome is critical for understanding the genetic architecture of complex traits and diseases. Numerous methods based on scan algorithms (i.e. QSCAN, SCANG, SCANG-STARR) have been…
We study the problem of sequentially testing whether a given stochastic process is generated by a known Markov chain. Formally, given access to a stream of random variables, we want to quickly determine whether this sequence is a trajectory…
Ecosystems tend to fluctuate around stable equilibria in response to internal dynamics and environmental factors. Occasionally, they enter an unstable tipping region and collapse into an alternative stable state. Our understanding of how…
The Belin/Ambr\'osio Deviation (BAD) model is a widely used diagnostic tool for detecting keratoconus and corneal ectasia. The input to the model is a set of z-score normalized $D$ indices that represent physical characteristics of the…
In many healthcare settings, it is both critical to consider fairness when building analytical applications but also uniquely unacceptable to lower model performance for one group to match that of another (e.g. fairness cannot be achieved…
Substance use disorders (SUDs) are a serious public health concern in the United States. Alcohol and cannabis are two of the most widely used substances. For adolescent/youth users of alcohol or cannabis, we propose a joint Bayesian…
Underwater noise pollution from human activities, particularly shipping, has been recognised as a serious threat to marine life. The sound generated by vessels can have various adverse effects on fish and aquatic ecosystems in general. In…
Because there are similarities between the evaluation of alternative stories in criminal trials and the evaluation of scientific theories, scholars have looked to literature in epistemology and the philosophy of science for insights on the…
In this paper, we develop a framework of 'Benford models' for counter-intelligence investigations which analyze frequency data of a suspect's visits to physical locations, online websites, and communication channels. We accomplish this by…
The energy transition is profoundly reshaping electricity market dynamics. It makes it essential to understand how renewable energy generation actually impacts electricity prices, among all other market drivers. These insights are critical…
Fuel moisture content (FMC) is a key predictor for wildfire rate of spread (ROS). Machine learning models of FMC are being used more in recent years, augmenting or replacing traditional physics-based approaches. Wildfire rate of spread…