Related papers: tsdownsample: high-performance time series downsam…
Time series visualization plays a crucial role in identifying patterns and extracting insights across various domains. However, as datasets continue to grow in size, visualizing them effectively becomes challenging. Downsampling, which…
Visual analytics is arguably the most important step in getting acquainted with your data. This is especially the case for time series, as this data type is hard to describe and cannot be fully understood when using for example summary…
Visualization plays an important role in analyzing and exploring time series data. To facilitate efficient visualization of large datasets, downsampling has emerged as a well-established approach. This work concentrates on LTTB…
Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…
We present DySample, an ultra-lightweight and effective dynamic upsampler. While impressive performance gains have been witnessed from recent kernel-based dynamic upsamplers such as CARAFE, FADE, and SAPA, they introduce much workload,…
Online sampling-supported visual analytics is increasingly important, as it allows users to explore large datasets with acceptable approximate answers at interactive rates. However, existing online spatiotemporal sampling techniques are…
Due to ongoing accrual over long durations, a defining characteristic of real-world data streams is the requirement for rolling, often real-time, mechanisms to coarsen or summarize stream history. One common data structure for this purpose…
Repeated Sampling (RS) is a simple inference-time algorithm that has been shown to improve model performance on complex tasks. Although it is an effective way of scaling inference time, it often struggles to generate diverse solution…
Operations over data streams typically hinge on efficient mechanisms to aggregate or summarize history on a rolling basis. For high-volume data steams, it is critical to manage state in a manner that is fast and memory efficient --…
The study of complex many-body systems via analysis of the trajectories of the units that dynamically move and interact within them is a non-trivial task. The workflow for extracting meaningful information from the raw trajectory data is…
Time-series tasks often benefit from signals expressed across multiple representation spaces (e.g., time vs. frequency) and at varying abstraction levels (e.g., local patterns vs. global semantics). However, existing pre-trained time-series…
Exa-scale simulations are on the horizon but almost no new design for the output has been proposed in recent years. In simulations using individual time steps, the traditional snapshots are over resolving particles/cells with large time…
Time series forecasting is a subject of significant scientific and industrial importance. Despite the widespread utilization of forecasting methods, there is a dearth of research aimed at comprehending the conditions under which these…
The t-distributed Stochastic Neighbor Embedding (tSNE) algorithm has become in recent years one of the most used and insightful techniques for the exploratory data analysis of high-dimensional data. tSNE reveals clusters of high-dimensional…
Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for…
As large language models (LLMs) scale out with tensor parallelism (TP) and pipeline parallelism (PP) and production stacks have aggressively optimized the data plane (attention/GEMM and KV cache), sampling, the decision plane that turns…
This thesis investigates dataset downsampling as a strategy to optimize energy efficiency in recommender systems while maintaining competitive performance. With increasing dataset sizes posing computational and environmental challenges,…
Missing values are pervasive in large-scale time-series data, posing challenges for reliable analysis and decision-making. Many neural architectures have been designed to model and impute the complex and heterogeneous missingness patterns…
Visualizing multiple time series presents fundamental tradeoffs between scalability and visual clarity. Time series capture the behavior of many large-scale real-world processes, from stock market trends to urban activities. Users often…
Many computer vision systems require low-cost segmentation algorithms based on deep learning, either because of the enormous size of input images or limited computational budget. Common solutions uniformly downsample the input images to…