Related papers: Piecewise Linear Approximation in Data Streaming: …
We study the problem of deriving compressibility measures for Piecewise Linear Approximations (PLAs), i.e., error-bounded approximations of a set of two-dimensional increasing data points using a sequence of segments. Such approximations…
A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($\epsilon$-PLA) has emerged…
The inherent nonlinearity of the power flow equations poses significant challenges in accurately modeling power systems, particularly when employing linearized approximations. Although power flow linearizations provide computational…
Sparse regression has been a popular approach to perform variable selection and enhance the prediction accuracy and interpretability of the resulting statistical model. Existing approaches focus on offline regularized regression, while the…
Piecewise Aggregate Approximation (PAA) is a competitive basic dimension reduction method for high-dimensional time series mining. When deployed, however, the limitations are obvious that some important information will be missed,…
We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…
For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a…
In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the high-dimensional regime. Streaming principal component analysis aims to find the…
Tracking and approximating data matrices in streaming fashion is a fundamental challenge. The problem requires more care and attention when data comes from multiple distributed sites, each receiving a stream of data. This paper considers…
Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. Various streaming technologies and serialization protocols have…
This paper investigates the control of nonlinear systems using a piecewise linear approximation framework. The proposed approach combines a PID controller with locally linearized models obtained by partitioning the nonlinear function into…
In this thesis, we explore streaming algorithms for approximating constraint satisfaction problems (CSPs). The setup is roughly the following: A computer has limited memory space, sees a long "stream" of local constraints on a set of…
Large scale, streaming datasets are ubiquitous in modern machine learning. Streaming algorithms must be scalable, amenable to incremental training and robust to the presence of non-stationarity. In this work consider the problem of learning…
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a…
This paper argues that randomized linear sketching is a natural tool for on-the-fly compression of data matrices that arise from large-scale scientific simulations and data collection. The technical contribution consists in a new algorithm…
In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with…
Sparse pseudo-point approximations for Gaussian process (GP) models provide a suite of methods that support deployment of GPs in the large data regime and enable analytic intractabilities to be sidestepped. However, the field lacks a…
Large-batch Contrastive Learning (CL), the foundation of modern representation learning, is fundamentally incompatible with the volatile resource constraints of edge devices. This conflict creates a dilemma: small on-device batches degrade…
This paper proposes a tool for dimension reduction where the dimension of the original space is reduced: a Principal Loading Analysis (PLA). PLA is a tool to reduce dimensions by discarding variables. The intuition is that variables are…
We study streaming algorithms for Correlation Clustering. Given a graph as an arbitrary-order stream of edges, with each edge labeled as positive or negative, the goal is to partition the vertices into disjoint clusters, such that the…