Related papers: A Sequential Addressing Subsampling Method for Mas…

ASSS: A Differentiable Adversarial Framework for Task-Aware Data Reduction

Massive datasets often contain redundancy that inflates computational costs without improving generalization. Existing data reduction methods are typically task-agnostic, discarding informative boundary samples and yielding suboptimal…

Machine Learning · Computer Science 2026-04-07 Jiacheng Lyu , Bihua Bao , Shiyun Yan

Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI

We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker…

Sound · Computer Science 2026-04-10 David Joohun Kim , Daniyal Anjum , Bonny Banerjee , Omar Abbasi

A Moment-assisted Approach for Improving Subsampling-based MLE with Large-scale data

The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it often results in efficiency loss.This…

Methodology · Statistics 2026-04-27 Miaomiao Su , Qihua Wang , Ruoyu Wang

Novel Subsampling Strategies for Heavily Censored Reliability Data

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

SAS: Simulated Attention Score

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multi-head attention (MHA), multi-query attention, group-query attention and so on. We…

Computation and Language · Computer Science 2025-11-26 Chuanyang Zheng , Jiankai Sun , Yihang Gao , Yuehao Wang , Peihao Wang , Jing Xiong , Liliang Ren , Hao Cheng , Janardhan Kulkarni , Yelong Shen , Atlas Wang , Mac Schwager , Anderson Schneider , Xiaodong Liu , Jianfeng Gao

10,000+ Times Accelerated Robust Subset Selection (ARSS)

Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as current methods are generally slow in speed and sensitive to outliers. To address the…

Machine Learning · Computer Science 2014-11-18 Feiyun Zhu , Bin Fan , Xinliang Zhu , Ying Wang , Shiming Xiang , Chunhong Pan

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources

Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…

Methodology · Statistics 2023-04-14 Shuyuan Wu , Xuening Zhu , Hansheng Wang

Contrastive Self-supervised Neural Architecture Search

This paper proposes a novel cell-based neural architecture search algorithm (NAS), which completely alleviates the expensive costs of data labeling inherited from supervised learning. Our algorithm capitalizes on the effectiveness of…

Computer Vision and Pattern Recognition · Computer Science 2021-11-09 Nam Nguyen , J. Morris Chang

Sequential scaled sparse factor regression

Large-scale association analysis between multivariate responses and predictors is of great practical importance, as exemplified by modern business applications including social media marketing and crisis management. Despite the rapid…

Methodology · Statistics 2020-11-18 Zemin Zheng , Yang Li , Jie Wu , Yuchen Wang

Scalable Stochastic Alternating Direction Method of Multipliers

Stochastic alternating direction method of multipliers (ADMM), which visits only one sample or a mini-batch of samples each time, has recently been proved to achieve better performance than batch ADMM. However, most stochastic methods can…

Machine Learning · Computer Science 2015-07-21 Shen-Yi Zhao , Wu-Jun Li , Zhi-Hua Zhou

Compressed Spaced Suffix Arrays

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…

Data Structures and Algorithms · Computer Science 2014-03-11 Travis Gagie , Giovanni Manzini , Daniel Valenzuela

RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless Compression

Data centers handle vast volumes of data that require efficient lossless compression, yet emerging probabilistic models based methods are often computationally slow. To address this, we introduce RAS, the Range Asymmetric Numeral System…

Hardware Architecture · Computer Science 2025-11-10 Yuchao Qin , Anjunyi Fan , Bonan Yan

Sparse $L^1$-Autoencoders for Scientific Data Compression

Scientific datasets present unique challenges for machine learning-driven compression methods, including more stringent requirements on accuracy and mitigation of potential invalidating artifacts. Drawing on results from compressed sensing…

Machine Learning · Computer Science 2024-05-24 Matthias Chung , Rick Archibald , Paul Atzberger , Jack Michael Solomon

Orthogonal Subsampling for Big Data Linear Regression

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar…

Sound · Computer Science 2025-03-04 Tien-Hong Lo , Fu-An Chao , Tzu-I Wu , Yao-Ting Sung , Berlin Chen

Active Diffusion Subsampling

Subsampling is commonly used to mitigate costs associated with data acquisition, such as time or energy requirements, motivating the development of algorithms for estimating the fully-sampled signal of interest $x$ from partially observed…

Machine Learning · Computer Science 2025-04-23 Oisin Nolan , Tristan S. W. Stevens , Wessel L. van Nierop , Ruud J. G. van Sloun

Adaptive sieving: A dimension reduction technique for sparse optimization problems

In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller…

Optimization and Control · Mathematics 2025-04-28 Yancheng Yuan , Meixia Lin , Defeng Sun , Kim-Chuan Toh

Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

Learning robust models under adversarial settings is widely recognized as requiring a considerably large number of training samples. Recent work proposes semi-supervised adversarial training (SSAT), which utilizes external unlabeled or…

Machine Learning · Computer Science 2026-03-10 Somrita Ghosh , Yuelin Xu , Xiao Zhang

Seismic Arrival-time Picking on Distributed Acoustic Sensing Data using Semi-supervised Learning

Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake monitoring and subsurface imaging. The recorded seismic signals by DAS have several distinct characteristics, such as unknown coupling effects, strong anthropogenic…

Geophysics · Physics 2023-03-16 Weiqiang Zhu , Ettore Biondi , Jiaxuan Li , Jiuxun Yin , Zachary E. Ross , Zhongwen Zhan