Related papers: Fully scalable online-preprocessing algorithm for …

Adaptive Training Distributions with Scalable Online Bilevel Optimization

Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers…

Machine Learning · Computer Science 2023-11-21 David Grangier , Pierre Ablin , Awni Hannun

Online hyperparameter optimization by real-time recurrent learning

Conventional hyperparameter optimization methods are computationally intensive and hard to generalize to scenarios that require dynamically adapting hyperparameters, such as life-long learning. Here, we propose an online hyperparameter…

Machine Learning · Computer Science 2021-04-09 Daniel Jiwoong Im , Cristina Savin , Kyunghyun Cho

Online Learning for Matrix Factorization and Sparse Coding

Sparse coding--that is, modelling data vectors as sparse linear combinations of basis elements--is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the large-scale matrix factorization…

Machine Learning · Statistics 2010-02-11 Julien Mairal , Francis Bach , Jean Ponce , Guillermo Sapiro

A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational…

Quantitative Methods · Quantitative Biology 2015-06-02 Fernando Puente-Sánchez , Jacobo Aguirre , Víctor Parro

RPA: Probabilistic analysis of probe performance and robust summarization

Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood. Data-driven analysis of probe performance can be used to quantify the uncertainty in…

Computational Engineering, Finance, and Science · Computer Science 2013-04-09 Leo Lahti , Laura L. Elo , Tero Aittokallio , Samuel Kaski

SOL: A Library for Scalable Online Learning Algorithms

SOL is an open-source library for scalable online learning algorithms, and is particularly suitable for learning with high-dimensional data. The library provides a family of regular and sparse online learning algorithms for large-scale…

Machine Learning · Computer Science 2016-10-31 Yue Wu , Steven C. H. Hoi , Chenghao Liu , Jing Lu , Doyen Sahoo , Nenghai Yu

Scalable Genomics with R and Bioconductor

This paper reviews strategies for solving problems encountered when analyzing large genomic data sets and describes the implementation of those strategies in R by packages from the Bioconductor project. We treat the scalable processing,…

Genomics · Quantitative Biology 2014-09-11 Michael Lawrence , Martin Morgan

Highly Scalable Algorithms for Robust String Barcoding

String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher…

Data Structures and Algorithms · Computer Science 2016-08-31 Bhaskar DasGupta , Kishori M. Konwar , Ion I. Mandoiu , Alex A. Shvartsman

Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor

The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge due to the vast amounts of data and the large…

Applications · Statistics 2013-02-19 Maria Rodrigo-Domingo , Rasmus Waagepetersen , Julie Støve Bødker , Steffen Falgreen , Malene Krag Kjeldsen , Hans Erik Johnsen , Karen Dybkær , Martin Bøgsted

Scalable Collaborative Targeted Learning for High-Dimensional Data

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of…

Computation · Statistics 2017-03-08 Cheng Ju , Susan Gruber , Samuel D. Lendle , Antoine Chambaz , Jessica M. Franklin , Richard Wyss , Sebastian Schneeweiss , Mark J. van der Laan

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to…

Machine Learning · Computer Science 2023-05-23 Min Gan , Xiang-xiang Su , Guang-yong Chen , Jing Chen

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent. Second-order optimization methods, that involve second derivatives and/or second…

Machine Learning · Computer Science 2021-03-08 Rohan Anil , Vineet Gupta , Tomer Koren , Kevin Regan , Yoram Singer

Learning of networked spreading models from noisy and incomplete data

Recent years have seen a lot of progress in algorithms for learning parameters of spreading dynamics from both full and partial data. Some of the remaining challenges include model selection under the scenarios of unknown network structure,…

Social and Information Networks · Computer Science 2024-01-02 Mateusz Wilinski , Andrey Y. Lokhov

Scalable Graph Algorithms

Processing large complex networks recently attracted considerable interest. Complex graphs are useful in a wide range of applications from technological networks to biological systems like the human brain. Sometimes these networks are…

Data Structures and Algorithms · Computer Science 2019-12-03 Christian Schulz

Scalable Prototype Selection by Genetic Algorithms and Hashing

Classification in the dissimilarity space has become a very active research area since it provides a possibility to learn from data given in the form of pairwise non-metric dissimilarities, which otherwise would be difficult to cope with.…

Machine Learning · Statistics 2017-12-27 Yenisel Plasencia-Calaña , Mauricio Orozco-Alzate , Heydi Méndez-Vázquez , Edel García-Reyes , Robert P. W. Duin

Scalable and Accurate Online Feature Selection for Big Data

Feature selection is important in many big data applications. Two critical challenges closely associate with big data. Firstly, in many big data applications, the dimensionality is extremely high, in millions, and keeps growing. Secondly,…

Machine Learning · Computer Science 2016-07-29 Kui Yu , Xindong Wu , Wei Ding , Jian Pei

Scalable Pattern Matching in Computation Graphs

Graph rewriting is a popular tool for the optimisation and modification of graph expressions in domains such as compilers, machine learning and quantum computing. The underlying data structures are often port graphs - graphs with labels at…

Data Structures and Algorithms · Computer Science 2025-03-27 Luca Mondada , Pablo Andrés-Martínez

A Framework of Sparse Online Learning and Its Applications

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, high…

Machine Learning · Computer Science 2015-07-28 Dayong Wang , Pengcheng Wu , Peilin Zhao , Steven C. H. Hoi

Scalable and Sustainable Deep Learning via Randomized Hashing

Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend…

Machine Learning · Statistics 2016-12-06 Ryan Spring , Anshumali Shrivastava

Small Coupling Expansion for Multiple Sequence Alignment

The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different…

Quantitative Methods · Quantitative Biology 2023-05-01 Louise Budzynski , Andrea Pagnani