Related papers: Towards Practicable Sequential Shift Detectors
When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make…
The term dataset shift refers to the situation where the data used to train a machine learning model is different from where the model operates. While several types of shifts naturally occur, existing shift detectors are usually designed to…
Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite…
While previous distribution shift detection approaches can identify if a shift has occurred, these approaches cannot localize which specific features have caused a distribution shift -- a critical step in diagnosing or fixing any underlying…
As machine learning models increasingly replace traditional business logic in the production system, their lifecycle management is becoming a significant concern. Once deployed into production, the machine learning models are constantly…
Classifier predictions often rely on the assumption that new observations come from the same distribution as training data. When the underlying distribution changes, so does the optimal classification rule, and performance may degrade. We…
Distribution shifts remain a fundamental problem for the safe application of machine learning systems. If undetected, they may impact the real-world performance of such systems or will at least render original performance claims invalid. In…
This paper investigates sequential change-point detection in reconfigurable sensor networks. In this problem, data from multiple sensors are observed sequentially. Each sensor can have a unique change point, and the data distribution…
Supervised learning techniques typically assume training data originates from the target population. Yet, in reality, dataset shift frequently arises, which, if not adequately taken into account, may decrease the performance of their…
Change-point detection studies the problem of detecting the changes in the underlying distribution of the data stream as soon as possible after the change happens. Modern large-scale, high-dimensional, and complex streaming data call for…
We propose a new method for generating realistic datasets with distribution shifts using any decoder-based generative model. Our approach systematically creates datasets with varying intensities of distribution shifts, facilitating a…
In the sequential change-point detection literature, most research specifies a required frequency of false alarms at a given pre-change distribution $f_{\theta}$ and tries to minimize the detection delay for every possible post-change…
Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to…
A distribution shift can have fundamental consequences such as signaling a change in the operating environment or significantly reducing the accuracy of downstream models. Thus, understanding distribution shifts is critical for examining…
Online detection of changes in stochastic systems, referred to as sequential change detection or quickest change detection, is an important research topic in statistics, signal processing, and information theory, and has a wide range of…
Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the covariate shift, where the input distributions of data change from training to testing stages while the…
We consider industrial federated learning, a collaboration between a small number of powerful, potentially competing industrial players, mediated by a third party aspiring to improve the service it provides to its customers. We argue that…
In continual learning, understanding the properties of task sequences and their relationships to model performance is important for developing advanced algorithms with better accuracy. However, efforts in this direction remain…
We introduce a novel approach for detecting distribution shifts that negatively impact the performance of machine learning models in continuous production environments, which requires no access to ground truth data labels. It builds upon…
Information from related source studies can often enhance the findings of a target study. However, the distribution shift between target and source studies can severely impact the efficiency of knowledge transfer. In the high-dimensional…