Related papers: Batch-Expansion Training: An Efficient Optimizatio…

Hybrid Batch Bayesian Optimization

Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose…

Artificial Intelligence · Computer Science 2012-05-02 Javad Azimi , Ali Jalali , Xiaoli Fern

A Simple and Efficient Approach to Batch Bayesian Optimization

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their performances…

Machine Learning · Computer Science 2025-04-25 Dawei Zhan , Zhaoxi Zeng , Shuoxiao Wei , Ping Wu

Never Go Full Batch (in Stochastic Convex Optimization)

We study the generalization performance of $\text{full-batch}$ optimization algorithms for stochastic convex optimization: these are first-order methods that only access the exact gradient of the empirical risk (rather than gradients with…

Optimization and Control · Mathematics 2021-07-02 Idan Amir , Yair Carmon , Tomer Koren , Roi Livni

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for…

Machine Learning · Computer Science 2021-10-28 Geng Yuan , Xiaolong Ma , Wei Niu , Zhengang Li , Zhenglun Kong , Ning Liu , Yifan Gong , Zheng Zhan , Chaoyang He , Qing Jin , Siyue Wang , Minghai Qin , Bin Ren , Yanzhi Wang , Sijia Liu , Xue Lin

The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

Recently, convergence as well as convergence rate analyses of deep learning optimizers for nonconvex optimization have been widely studied. Meanwhile, numerical evaluations for the optimizers have precisely clarified the relationship…

Optimization and Control · Mathematics 2021-08-27 Hideaki Iiduka

Population Based Training of Neural Networks

Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In…

Machine Learning · Computer Science 2017-11-29 Max Jaderberg , Valentin Dalibard , Simon Osindero , Wojciech M. Czarnecki , Jeff Donahue , Ali Razavi , Oriol Vinyals , Tim Green , Iain Dunning , Karen Simonyan , Chrisantha Fernando , Koray Kavukcuoglu

Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training

Distributed machine learning is critical for training deep learning models on large datasets with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-03 Kuan-Wei Lu , Ding-Yong Hong , Pangfeng Liu , Jan-Jan Wu

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this…

Machine Learning · Computer Science 2018-12-03 Noah Golmant , Nikita Vemuri , Zhewei Yao , Vladimir Feinberg , Amir Gholami , Kai Rothauge , Michael W. Mahoney , Joseph Gonzalez

BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Time-series forecasting is crucial for numerous real-world applications including weather prediction and financial market modeling. While temporal-domain methods remain prevalent, frequency-domain approaches can effectively capture…

Machine Learning · Computer Science 2025-08-05 Zhixuan Li , Naipeng Chen , Seonghwa Choi , Sanghoon Lee , Weisi Lin

BET: Bayesian Ensemble Trees for Clustering and Prediction in Heterogeneous Data

We propose a novel "tree-averaging" model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian ensemble…

Machine Learning · Statistics 2014-08-20 Leo L. Duan , John P. Clancy , Rhonda D. Szczesniak

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning…

Machine Learning · Computer Science 2025-10-10 Yuanjun Dai , Keqiang He , An Wang

Parallel Stochastic Optimization Framework for Large-Scale Non-Convex Stochastic Problems

In this paper, we consider the problem of stochastic optimization, where the objective function is in terms of the expectation of a (possibly non-convex) cost function that is parametrized by a random variable. While the convergence speed…

Information Theory · Computer Science 2019-10-23 Naeimeh Omidvar , An Liu , Vincent Lau , Danny H. K. Tsang , Mohammad Reza Pakravan

Budgeted Embedding Table For Recommender Systems

At the heart of contemporary recommender systems (RSs) are latent factor models that provide quality recommendation experience to users. These models use embedding vectors, which are typically of a uniform and fixed size, to represent users…

Information Retrieval · Computer Science 2026-02-05 Yunke Qu , Tong Chen , Quoc Viet Hung Nguyen , Hongzhi Yin

Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

Adversarial training methods commonly generate independent initial perturbation for adversarial samples from a simple uniform distribution, and obtain the training batch for the classifier without selection. In this work, we propose a…

Machine Learning · Computer Science 2024-06-07 Yinting Wu , Pai Peng , Bo Cai , Le Li , .

Adaptive Batch Normalization for Training Data with Heterogeneous Features

Batch Normalization (BN) is an important preprocessing step to many deep learning applications. Since it is a data-dependent process, for some homogeneous datasets it is a redundant or even a performance-degrading process. In this paper, we…

Machine Learning · Computer Science 2022-12-01 Wael Alsobhi , Tarik Alafif , Alaa Abdel-Hakim , Weiwei Zong

A Batch Learning Framework for Scalable Personalized Ranking

In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating…

Machine Learning · Statistics 2018-08-15 Kuan Liu , Prem Natarajan

Language Models Improve When Pretraining Data Matches Target Tasks

Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine…

Computation and Language · Computer Science 2025-07-17 David Mizrahi , Anders Boesen Lindbo Larsen , Jesse Allardice , Suzie Petryk , Yuri Gorokhov , Jeffrey Li , Alex Fang , Josh Gardner , Tom Gunter , Afshin Dehghan

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…

Machine Learning · Computer Science 2019-05-15 Jia Bi , Steve R. Gunn

Data-Efficient Training by Evolved Sampling

Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose…

Machine Learning · Computer Science 2025-09-30 Ziheng Cheng , Zhong Li , Jiang Bian

Adaptive Batch Sizes for Active Learning A Probabilistic Numerics Approach

Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed -- larger batches are more…

Machine Learning · Computer Science 2024-10-15 Masaki Adachi , Satoshi Hayakawa , Martin Jørgensen , Xingchen Wan , Vu Nguyen , Harald Oberhauser , Michael A. Osborne