Related papers: Summarization and Classification of Non-Poisson Po…
The mathematical models used to capture features of complex, biological systems are typically non-linear, meaning that there are no generally valid simple relationships between their outputs and the data that might be used to validate them.…
Fitting models to data is an important part of the practice of science. Advances in machine learning have made it possible to fit more -- and more complex -- models, but have also exacerbated a problem: when multiple models fit the data…
A novel correction algorithm is proposed for multi-class classification problems with corrupted training data. The algorithm is non-intrusive, in the sense that it post-processes a trained classification model by adding a correction…
Changepoint detection is commonly formulated by minimizing the sum of in-sample losses to quantify the model's overall fit. However, for flexible modeling procedures -- especially those involving high-dimensional parameter spaces or…
Consistent experiment data are crucial to adjust parameters of physics models and to determine best estimates of observables. However, often experiment data are not consistent due to unrecognized systematic errors. Standard methods of…
The true process that generated data cannot be determined when multiple explanations are possible. Prediction requires a model of the probability that a process, chosen randomly from the set of candidate explanations, generates some future…
Statistical modeling plays a fundamental role in understanding the underlying mechanism of massive data (statistical inference) and predicting the future (statistical prediction). Although all models are wrong, researchers try their best to…
Modern data analysis depends increasingly on estimating models via flexible high-dimensional or nonparametric machine learning methods, where the identification of structural parameters is often challenging and untestable. In linear…
Real-world data is often incomplete and contains missing values. To train accurate models over real-world datasets, users need to spend a substantial amount of time and resources imputing and finding proper values for missing data items. In…
In the context of computer models, calibration is the process of estimating unknown simulator parameters from observational data. Calibration is variously referred to as model fitting, parameter estimation/inference, an inverse problem, and…
Sphere packings are essential to the development of physical models for powders, composite materials, and the atomic structure of the liquid state. There is a strong scientific need to be able to assess the fit of packing models to data,…
High complexity models are notorious in machine learning for overfitting, a phenomenon in which models well represent data but fail to generalize an underlying data generating process. A typical procedure for circumventing overfitting…
Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on…
While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning…
Data on count processes arise in a variety of applications, including longitudinal, spatial and imaging studies measuring count responses. The literature on statistical models for dependent count data is dominated by models built from…
We propose the use of the probability integral transform (PIT) for model validation in point process models. The simple PIT diagnostics assess the calibration of the model and can detect inconsistencies in both the intensity and the…
The statistical matching problem is a data integration problem with structured missing data. The general form involves the analysis of multiple datasets that only have a strict subset of variables jointly observed across all datasets. The…
Some statistical models are specified via a data generating process for which the likelihood function cannot be computed in closed form. Standard likelihood-based inference is then not feasible but the model parameters can be inferred by…
Using a time series model to mimic an observed time series has a long history. However, with regard to this objective, conventional estimation methods for discrete-time dynamical models are frequently found to be wanting. In fact, they are…
Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…