Related papers: Dither computing: a hybrid deterministic-stochasti…
Due to the limited number of bits in floating-point or fixed-point arithmetic, rounding is a necessary step in many computations. Although rounding methods can be tailored for different applications, round-off errors are generally…
Recently, stochastic rounding (SR) has been implemented in specialized hardware but most current computing nodes do not yet support this rounding mode. Several works empirically illustrate the benefit of stochastic rounding in various…
Stochastic Rounding is a probabilistic rounding mode that is surprisingly effective in large-scale computations and low-precision arithmetic. Its random nature promotes error cancellation rather than error accumulation, resulting in slower…
Stochastic computing is a paradigm in which logical operations are performed on randomly generated bit streams. Complex arithmetic operations can be executed by simple logic circuits, resulting in a much smaller area footprint compared to…
A number of optimization approaches have been proposed for optimizing nonconvex objectives (e.g. deep learning models), such as batch gradient descent, stochastic gradient descent and stochastic variance reduced gradient descent. Theory…
We introduce a new approach to develop stochastic optimization algorithms for a class of stochastic composite and possibly nonconvex optimization problems. The main idea is to combine two stochastic estimators to create a new hybrid one. We…
Stochastic rounding (SR) offers an alternative to the deterministic IEEE-754 floating-point rounding modes. In some applications such as PDEs, ODEs and neural networks, SR empirically improves the numerical behavior and convergence to…
Recently a majorization method for optimizing partition functions of log-linear models was proposed alongside a novel quadratic variational upper-bound. In the batch setting, it outperformed state-of-the-art first- and second-order…
Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the…
In this study, we propose a novel computing paradigm "Bit Stream Computing" that is constructed on the logic used in stochastic computing, but does not necessarily employ randomly or Binomially distributed bit streams as stochastic…
We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms. In the latter case, these are the first second-order…
This paper addresses the question of whether it can be beneficial for an optimization algorithm to follow directions of negative curvature. Although prior work has established convergence results for algorithms that integrate both descent…
Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently…
We introduce a hybrid stochastic estimator to design stochastic gradient algorithms for solving stochastic optimization problems. Such a hybrid estimator is a convex combination of two existing biased and unbiased estimators and leads to…
Stochastic optimization problems often involve data distributions that change in reaction to the decision variables. This is the case for example when members of the population respond to a deployed classifier by manipulating their features…
It seems that in the current age, computers, computation, and data have an increasingly important role to play in scientific research and discovery. This is reflected in part by the rise of machine learning and artificial intelligence,…
Although double-precision floating-point arithmetic currently dominates high-performance computing, there is increasing interest in smaller and simpler arithmetic types. The main reasons are potential improvements in energy efficiency and…
We consider distributed optimization problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We develop unbiased parameter averaging methods for randomized second order optimization…
Stochastic optimisation algorithms are the de facto standard for machine learning with large amounts of data. Handling only a subset of available data in each optimisation step dramatically reduces the per-iteration computational costs,…
Motivated by emerging applications in machine learning, we consider an optimization problem in a general form where the gradient of the objective function is available through a biased stochastic oracle. We assume a bias-control parameter…