应用统计
We describe an experiment with the modeling of urban verticalization effects on perceived safety scores as obtained with computer vision on Google Streetview data for New York City. Preliminary results suggests that for smaller buildings…
The spatial autoregressive (SAR) model is extended by introducing a Markov switching dynamics for the weight matrix and spatial autoregressive parameter. The framework enables the identification of regime-specific connectivity patterns and…
This study aims to analyze the service and return landing areas in badminton men's double, based on data extracted from 20 badminton matches. We find that most services land near the center-line, while returns tend to land in the crossing…
Accurate identification of synergistic treatment combinations and their underlying biological mechanisms is critical across many disease domains, especially cancer. In translational oncology research, preclinical systems such as…
Perturbed by natural hazards, community-level infrastructure networks operate like many-body systems, with behaviors emerging from coupling individual component dynamics with group correlations and interactions. It follows that we can…
The data science revolution has highlighted the varying roles that data analytic products can play in a different industries and applications. There has been particular interest in using analytic products coupled with algorithmic prediction…
A novel spatial autoregressive model for panel data is introduced, which incorporates multilayer networks and accounts for time-varying relationships. Moreover, the proposed approach allows the structural variance to evolve smoothly over…
Aerosols play a critical role in atmospheric chemistry, and affect clouds, climate, and human health. However, the spatial coverage of satellite-derived aerosol optical depth (AOD) products is limited by cloud cover, orbit patterns, polar…
Traffic congestion at urban-scale levels occurs when road network supply is insufficient compared with demand. Therefore, the relationship between supply and demand has been extensively investigated in the literature. Especially the impact…
Recently, all major weather centres issue ensemble forecasts which even covering the same domain differ both in the ensemble size and spatial resolution. These two parameters highly determine both the forecast skill of the prediction and…
Functional Magnetic Resonance Imaging (fMRI) maps cerebral activation in response to stimuli but this activation is often difficult to detect, especially in low-signal contexts and single-subject studies. Accurate activation detection can…
Opportunistic pharmacokinetic (PK) studies have sparse and imbalanced clinical measurement data, and the impact of sample time errors is an important concern when seeking accurate estimates of treatment response. We evaluated an approximate…
The causal roadmap is a formal framework for causal and statistical inference that supports clear specification of the causal question, interpretable and transparent statement of required causal assumptions, robust inference, and optimal…
Manual coding of text data from open-ended questions into different categories is time consuming and expensive. Automated coding uses statistical/machine learning to train on a small subset of manually coded text answers. Recently,…
Since the advent of high-resolution pitch tracking data (PITCHf/x), many in the sabermetrics community have attempted to quantify a Major League Baseball catcher's ability to "frame" a pitch (i.e. increase the chance that a pitch is called…
A comprehensive examination of data science vocabulary usage over the past 13 years in this work is conducted. The investigation commences with a dataset comprising 16,018 abstracts that feature the term "data science" in either the title,…
This paper considers the use of machine learning algorithms for predicting cocaine use based on magnetic resonance imaging (MRI) connectomic data. The study utilized functional MRI (fMRI) and diffusion MRI (dMRI) data collected from 275…
Cosine similarity is an established similarity metric for computing associations on vectors, and it is commonly used to identify related samples from biological perturbational data. The distribution of cosine similarity changes with the…
Biomarker analysis of athletes' urinary steroid profiles is crucial for the success of anti-doping efforts. Current statistical analysis methods generate personalised limits for each athlete based on univariate modelling of longitudinal…
Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing…