Related papers: Exploiting the Structure: Stochastic Gradient Meth…

Computational Complexity of Sub-Linear Convergent Algorithms

Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One…

Machine Learning · Computer Science 2022-10-06 Hilal AlQuabeh , Farha AlBreiki , Dilshod Azizov

Fast model-based clustering of partial records

Partially recorded data are frequently encountered in many applications and usually clustered by first removing incomplete cases or features with missing values, or by imputing missing values, followed by application of a clustering…

Methodology · Statistics 2021-10-20 Emily M. Goren , Ranjan Maitra

Robust EM algorithm for model-based curve clustering

Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful…

Methodology · Statistics 2014-04-29 Faicel Chamroukhi

Data Clustering and Graph Partitioning via Simulated Mixing

Spectral clustering approaches have led to well-accepted algorithms for finding accurate clusters in a given dataset. However, their application to large-scale datasets has been hindered by computational complexity of eigenvalue…

Machine Learning · Computer Science 2016-03-17 Shahzad Bhatti , Carolyn Beck , Angelia Nedic

Improved initialisation of model-based clustering using Gaussian hierarchical partitions

Initialisation of the EM algorithm in model-based clustering is often crucial. Various starting points in the parameter space often lead to different local maxima of the likelihood function and, so to different clustering partitions. Among…

Methodology · Statistics 2015-07-28 Luca Scrucca , Adrian E. Raftery

Robust Hierarchical Clustering

One of the most widely used techniques for data clustering is agglomerative clustering. Such algorithms have been long used across many different fields ranging from computational biology to social sciences to computer vision in part…

Machine Learning · Computer Science 2014-07-15 Maria-Florina Balcan , Yingyu Liang , Pramod Gupta

Efficient Large-Scale Learning of Minimax Risk Classifiers

Supervised learning with large-scale data usually leads to complex optimization problems, especially for classification tasks with multiple classes. Stochastic subgradient methods can enable efficient learning with a large number of samples…

Machine Learning · Computer Science 2025-11-25 Kartheek Bondugula , Santiago Mazuelas , Aritz Pérez

Scalable Spectral Clustering Using Random Binning Features

Spectral clustering is one of the most effective clustering approaches that capture hidden cluster structures in the data. However, it does not scale well to large-scale problems due to its quadratic complexity in constructing similarity…

Machine Learning · Computer Science 2019-11-26 Lingfei Wu , Pin-Yu Chen , Ian En-Hsu Yen , Fangli Xu , Yinglong Xia , Charu Aggarwal

Graph Cut-guided Maximal Coding Rate Reduction for Learning Image Embedding and Clustering

In the era of pre-trained models, image clustering task is usually addressed by two relevant stages: a) to produce features from pre-trained vision models; and b) to find clusters from the pre-trained features. However, these two stages are…

Computer Vision and Pattern Recognition · Computer Science 2025-01-09 W. He , Z. Huang , X. Meng , X. Qi , R. Xiao , C. -G. Li

An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering

Cluster analysis organizes data into sensible groupings and is one of fundamental modes of understanding and learning. The widely used K-means and hierarchical clustering methods can be dramatically suboptimal due to local minima. Recently…

Machine Learning · Computer Science 2020-06-24 Xin Zhou , Chunlei Du , Xiaodong Cai

Map / Reduce Deisgn and Implementation of Apriori Alogirthm for handling voluminous data-sets

Apriori is one of the key algorithms to generate frequent itemsets. Analyzing frequent itemset is a crucial step in analysing structured data and in finding association relationship between items. This stands as an elementary foundation to…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-12-20 Anjan K. Koundinya , Srinath N. K. , K. A. K. Sharma , Kiran Kumar , Madhu M. N. , Kiran U. Shanbag

An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing…

Machine Learning · Computer Science 2021-04-23 Kun Li , Liang Yuan , Yunquan Zhang , Gongwei Chen

Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data

Empirical risk minimization is the main tool for prediction problems, but its extension to relational data remains unsolved. We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational…

Machine Learning · Statistics 2019-02-25 Victor Veitch , Morgane Austern , Wenda Zhou , David M. Blei , Peter Orbanz

Effective Clustering for Large Multi-Relational Graphs

Multi-relational graphs (MRGs) are an expressive data structure for modeling diverse interactions/relations among real objects (i.e., nodes), which pervade extensive applications and scenarios. Given an MRG G with N nodes, partitioning the…

Machine Learning · Computer Science 2025-08-26 Xiaoyang Lin , Runhao Jiang , Renchi Yang

On the Fine-Grained Complexity of Empirical Risk Minimization: Kernel Methods and Neural Networks

Empirical risk minimization (ERM) is ubiquitous in machine learning and underlies most supervised learning methods. While there has been a large body of work on algorithms for various ERM problems, the exact computational complexity of ERM…

Computational Complexity · Computer Science 2017-04-11 Arturs Backurs , Piotr Indyk , Ludwig Schmidt

Learning Random Fourier Features by Hybrid Constrained Optimization

The kernel embedding algorithm is an important component for adapting kernel methods to large datasets. Since the algorithm consumes a major computation cost in the testing phase, we propose a novel teacher-learner framework of learning…

Machine Learning · Statistics 2017-12-08 Jianqiao Wangni , Jingwei Zhuo , Jun Zhu

Data Skeleton Learning: Scalable Active Clustering with Sparse Graph Structures

In this work, we focus on the efficiency and scalability of pairwise constraint-based active clustering, crucial for processing large-scale data in applications such as data mining, knowledge annotation, and AI model pre-training. Our goals…

Machine Learning · Computer Science 2025-09-11 Wen-Bo Xie , Xun Fu , Bin Chen , Yan-Li Lee , Tao Deng , Tian Zou , Xin Wang , Zhen Liu , Jaideep Srivastavad

Clustering-based Low Rank Approximation Method

We propose a clustering-based generalized low rank approximation method, which takes advantage of appealing features from both the generalized low rank approximation of matrices (GLRAM) and cluster analysis. It exploits a more general form…

Optimization and Control · Mathematics 2025-02-21 Yujun Zhu , Jie Zhu , Hizba Arshad , Zhongming Wang , Ju Ming

Deep Conditional Gaussian Mixture Model for Constrained Clustering

Constrained clustering has gained significant attention in the field of machine learning as it can leverage prior information on a growing amount of only partially labeled data. Following recent advances in deep generative models, we…

Machine Learning · Computer Science 2022-02-02 Laura Manduchi , Kieran Chin-Cheong , Holger Michel , Sven Wellmann , Julia E. Vogt

EGRC-Net: Embedding-induced Graph Refinement Clustering Network

Existing graph clustering networks heavily rely on a predefined yet fixed graph, which can lead to failures when the initial graph fails to accurately capture the data topology structure of the embedding space. In order to address this…

Machine Learning · Computer Science 2023-11-15 Zhihao Peng , Hui Liu , Yuheng Jia , Junhui Hou