English
Related papers

Related papers: A Sequential Addressing Subsampling Method for Mas…

200 papers

Massive datasets often contain redundancy that inflates computational costs without improving generalization. Existing data reduction methods are typically task-agnostic, discarding informative boundary samples and yielding suboptimal…

Machine Learning · Computer Science 2026-04-07 Jiacheng Lyu , Bihua Bao , Shiyun Yan

We study device-addressed speech detection under pre-ASR edge deployment constraints, where systems must decide whether to forward audio before transcription under strict latency and compute limits. We show that, in multi-speaker…

Sound · Computer Science 2026-04-10 David Joohun Kim , Daniyal Anjum , Bonny Banerjee , Omar Abbasi

The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it often results in efficiency loss.This…

Methodology · Statistics 2026-04-27 Miaomiao Su , Qihua Wang , Ruoyu Wang

Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…

Methodology · Statistics 2024-10-31 Yixiao Ruan , Zan Li , Zhaohui Li , Dennis K. J. Lin , Qingpei Hu , Dan Yu

The attention mechanism is a core component of the Transformer architecture. Various methods have been developed to compute attention scores, including multi-head attention (MHA), multi-query attention, group-query attention and so on. We…

Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as current methods are generally slow in speed and sensitive to outliers. To address the…

Machine Learning · Computer Science 2014-11-18 Feiyun Zhu , Bin Fan , Xinliang Zhu , Ying Wang , Shiming Xiang , Chunhong Pan

Subsampling from a large data set is useful in many supervised learning contexts to provide a global view of the data based on only a fraction of the observations. Diverse (or space-filling) subsampling is an appealing subsampling approach…

Methodology · Statistics 2023-11-27 Boyang Shang , Daniel W. Apley , Sanjay Mehrotra

Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…

Methodology · Statistics 2023-04-14 Shuyuan Wu , Xuening Zhu , Hansheng Wang

This paper proposes a novel cell-based neural architecture search algorithm (NAS), which completely alleviates the expensive costs of data labeling inherited from supervised learning. Our algorithm capitalizes on the effectiveness of…

Computer Vision and Pattern Recognition · Computer Science 2021-11-09 Nam Nguyen , J. Morris Chang

Large-scale association analysis between multivariate responses and predictors is of great practical importance, as exemplified by modern business applications including social media marketing and crisis management. Despite the rapid…

Methodology · Statistics 2020-11-18 Zemin Zheng , Yang Li , Jie Wu , Yuchen Wang

Stochastic alternating direction method of multipliers (ADMM), which visits only one sample or a mini-batch of samples each time, has recently been proved to achieve better performance than batch ADMM. However, most stochastic methods can…

Machine Learning · Computer Science 2015-07-21 Shen-Yi Zhao , Wu-Jun Li , Zhi-Hua Zhou

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…

Data Structures and Algorithms · Computer Science 2014-03-11 Travis Gagie , Giovanni Manzini , Daniel Valenzuela

Data centers handle vast volumes of data that require efficient lossless compression, yet emerging probabilistic models based methods are often computationally slow. To address this, we introduce RAS, the Range Asymmetric Numeral System…

Hardware Architecture · Computer Science 2025-11-10 Yuchao Qin , Anjunyi Fan , Bonan Yan

Scientific datasets present unique challenges for machine learning-driven compression methods, including more stringent requirements on accuracy and mitigation of potential invalidating artifacts. Drawing on results from compressed sensing…

Machine Learning · Computer Science 2024-05-24 Matthias Chung , Rick Archibald , Paul Atzberger , Jack Michael Solomon

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. We propose an orthogonal…

Methodology · Statistics 2021-06-01 Lin Wang , Jake Elmstedt , Weng Kee Wong , Hongquan Xu

Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar…

Sound · Computer Science 2025-03-04 Tien-Hong Lo , Fu-An Chao , Tzu-I Wu , Yao-Ting Sung , Berlin Chen

Subsampling is commonly used to mitigate costs associated with data acquisition, such as time or energy requirements, motivating the development of algorithms for estimating the fully-sampled signal of interest $x$ from partially observed…

Machine Learning · Computer Science 2025-04-23 Oisin Nolan , Tristan S. W. Stevens , Wessel L. van Nierop , Ruud J. G. van Sloun

In this paper, we propose an adaptive sieving (AS) strategy for solving general sparse machine learning models by effectively exploring the intrinsic sparsity of the solutions, wherein only a sequence of reduced problems with much smaller…

Optimization and Control · Mathematics 2025-04-28 Yancheng Yuan , Meixia Lin , Defeng Sun , Kim-Chuan Toh

Learning robust models under adversarial settings is widely recognized as requiring a considerably large number of training samples. Recent work proposes semi-supervised adversarial training (SSAT), which utilizes external unlabeled or…

Machine Learning · Computer Science 2026-03-10 Somrita Ghosh , Yuelin Xu , Xiao Zhang

Distributed Acoustic Sensing (DAS) is an emerging technology for earthquake monitoring and subsurface imaging. The recorded seismic signals by DAS have several distinct characteristics, such as unknown coupling effects, strong anthropogenic…

Geophysics · Physics 2023-03-16 Weiqiang Zhu , Ettore Biondi , Jiaxuan Li , Jiuxun Yin , Zachary E. Ross , Zhongwen Zhan
‹ Prev 1 2 3 10 Next ›