Related papers: Parallel Large-Scale Attribute Reduction on Cloud …

A New Parallel Adaptive Clustering and its Application to Streaming Data

This paper presents a parallel adaptive clustering (PAC) algorithm to automatically classify data while simultaneously choosing a suitable number of classes. Clustering is an important tool for data analysis and understanding in a broad set…

Machine Learning · Computer Science 2021-04-07 Benjamin McLaughlin , Sung Ha Kang

A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment

With the emergence of the big data age, the issue of how to obtain valuable knowledge from a dataset efficiently and accurately has attracted increasingly attention from both academia and industry. This paper presents a Parallel Random…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-26 Jianguo Chen , Kenli Li , Zhuo Tang , Kashif Bilal , Shui Yu , Chuliang Weng , Keqin Li

An Accurate and Efficient Large-scale Regression Method through Best Friend Clustering

As the data size in Machine Learning fields grows exponentially, it is inevitable to accelerate the computation by utilizing the ever-growing large number of available cores provided by high-performance computing hardware. However, existing…

Machine Learning · Computer Science 2021-04-23 Kun Li , Liang Yuan , Yunquan Zhang , Gongwei Chen

GraphLab: A Distributed Framework for Machine Learning in the Cloud

Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML.…

Machine Learning · Computer Science 2011-07-06 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin

SOFAR: large-scale association network learning

Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network…

Methodology · Statistics 2017-04-28 Yoshimasa Uematsu , Yingying Fan , Kun Chen , Jinchi Lv , Wei Lin

Parallel feature selection based on the trace ratio criterion

The growth of data today poses a challenge in management and inference. While feature extraction methods are capable of reducing the size of the data for inference, they do not help in minimizing the cost of data storage. On the other hand,…

Machine Learning · Computer Science 2022-03-04 Thu Nguyen , Thanh Nhan Phan , Van Nhuong Nguyen , Thanh Binh Nguyen , Pål Halvorsen , Michael Riegler

N$^3$LARS: Minimum Redundancy Maximum Relevance Feature Selection for Large and High-dimensional Data

We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way. Specifically, we propose a nonlinear extension of the non-negative least-angle regression (LARS) called…

Machine Learning · Statistics 2014-11-11 Makoto Yamada , Avishek Saha , Hua Ouyang , Dawei Yin , Yi Chang

A Distributed Deep Representation Learning Model for Big Image Data Classification

This paper describes an effective and efficient image classification framework nominated distributed deep representation learning model (DDRL). The aim is to strike the balance between the computational intensive deep learning approaches…

Computer Vision and Pattern Recognition · Computer Science 2016-07-05 Le Dong , Na Lv , Qianni Zhang , Shanshan Xie , Ling He , Mengdie Mao

Distributed and parallel time series feature extraction for industrial big data applications

The all-relevant problem of feature selection is the identification of all strongly and weakly relevant attributes. This problem is especially hard to solve for time series classification and regression in industrial applications such as…

Machine Learning · Computer Science 2017-05-23 Maximilian Christ , Andreas W. Kempa-Liehr , Michael Feindt

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

PAIRS: Parametric-Verified Adaptive Information Retrieval and Selection for Efficient RAG

Retrieval-Augmented Generation (RAG) has become a cornerstone technique for enhancing large language models (LLMs) with external knowledge. However, current RAG systems face two critical limitations: (1) they inefficiently retrieve…

Computation and Language · Computer Science 2025-08-07 Wang Chen , Guanqiang Qi , Weikang Li , Yang Li , Deguo Xia , Jizhou Huang

Distributed Silhouette Algorithm: Evaluating Clustering on Big Data

In the big data era, the key feature that each algorithm needs to have is the possibility of efficiently running in parallel in a distributed environment. The popular Silhouette metric to evaluate the quality of a clustering, unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-27 Marco Gaido

Robust Locality-Aware Regression for Labeled Data Classification

With the dramatic increase of dimensions in the data representation, extracting latent low-dimensional features becomes of the utmost importance for efficient classification. Aiming at the problems of unclear margin representation and…

Machine Learning · Computer Science 2020-06-16 Liangchen Hu , Wensheng Zhang

Parallel Latent Reasoning for Sequential Recommendation

Capturing complex user preferences from sparse behavioral sequences remains a fundamental challenge in sequential recommendation. Recent latent reasoning methods have shown promise by extending test-time computation through multi-step…

Information Retrieval · Computer Science 2026-01-07 Jiakai Tang , Xu Chen , Wen Chen , Jian Wu , Yuning Jiang , Bo Zheng

Multi-Label Feature Selection Using Adaptive and Transformed Relevance

Multi-label learning has emerged as a crucial paradigm in data analysis, addressing scenarios where instances are associated with multiple class labels simultaneously. With the growing prevalence of multi-label data across diverse…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Sadegh Eskandari , Sahar Ghassabi

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. Data-parallel computing frameworks, such as Apache Spark, are widely used to perform such…

Performance · Computer Science 2018-05-09 Zhengyu Yang , Danlin Jia , Stratis Ioannidis , Ningfang Mi , Bo Sheng

Distributed Optimization via Adaptive Regularization for Large Problems with Separable Constraints

Many practical applications require solving an optimization over large and high-dimensional data sets, which makes these problems hard to solve and prohibitively time consuming. In this paper, we propose a parallel distributed algorithm…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-12-03 Elad Gilboa , Phani Chavali , Peng Yang , Arye Nehorai

A Review and Analysis of a Parallel Approach for Decision Tree Learning from Large Data Streams

This work studies one of the parallel decision tree learning algorithms, pdsCART, designed for scalable and efficient data analysis. The method incorporates three core capabilities. First, it supports real-time learning from data streams,…

Artificial Intelligence · Computer Science 2025-05-20 Zeinab Shiralizadeh

Learning Adaptive Parallel Reasoning with Language Models

Scaling inference-time computation has substantially improved the reasoning capabilities of language models. However, existing methods have significant limitations: serialized chain-of-thought approaches generate overly long outputs,…

Artificial Intelligence · Computer Science 2025-08-19 Jiayi Pan , Xiuyu Li , Long Lian , Charlie Snell , Yifei Zhou , Adam Yala , Trevor Darrell , Kurt Keutzer , Alane Suhr