Related papers: ds-array: A Distributed Data Structure for Large S…

Revisiting Large Scale Distributed Machine Learning

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-07-07 Radu Cristian Ionescu

DSLib: An open source library for the dominant set clustering method

DSLib is an open-source implementation of the Dominant Set (DS) clustering algorithm written entirely in Matlab. The DS method is a graph-based clustering technique rooted in the evolutionary game theory that starts gaining lots of interest…

Mathematical Software · Computer Science 2020-10-16 Sebastiano Vascon , Samuel Rota Bulò , Vittorio Murino , Marcello Pelillo

NumS: Scalable Array Programming for the Cloud

Scientists increasingly rely on Python tools to perform scalable distributed memory array operations using rich, NumPy-like expressions. However, many of these tools rely on dynamic schedulers optimized for abstract task graphs, which often…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-14 Melih Elibol , Vinamra Benara , Samyu Yagati , Lianmin Zheng , Alvin Cheung , Michael I. Jordan , Ion Stoica

A Survey From Distributed Machine Learning to Distributed Deep Learning

Artificial intelligence has made remarkable progress in handling complex tasks, thanks to advances in hardware acceleration and machine learning algorithms. However, to acquire more accurate outcomes and solve more complex issues,…

Machine Learning · Computer Science 2023-09-12 Mohammad Dehghani , Zahra Yazdanparast

EvoSplit: An evolutionary approach to split a multi-label data set into disjoint subsets

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using…

Machine Learning · Computer Science 2021-03-24 Francisco Florez-Revuelta

Distributed ReliefF based Feature Selection in Spark

Feature selection (FS) is a key research area in the machine learning and data mining fields, removing irrelevant and redundant features usually helps to reduce the effort required to process a dataset while maintaining or even improving…

Machine Learning · Computer Science 2018-11-02 Raul-Jose Palma-Mendoza , Daniel Rodriguez , Luis de-Marcos

Big Data Intelligence Using Distributed Deep Neural Networks

Large amount of data is often required to train and deploy useful machine learning models in industry. Smaller enterprises do not have the luxury of accessing enough data for machine learning, For privacy sensitive fields such as banking,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-05 Felix Ongati , Eng. Lawrence Muchemi

Launchpad: A Programming Model for Distributed Machine Learning Research

A major driver behind the success of modern machine learning algorithms has been their ability to process ever-larger amounts of data. As a result, the use of distributed systems in both research and production has become increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-10 Fan Yang , Gabriel Barth-Maron , Piotr Stańczyk , Matthew Hoffman , Siqi Liu , Manuel Kroiss , Aedan Pope , Alban Rrustemi

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine.…

Machine Learning · Computer Science 2020-02-11 Chihao Zhang , Yang Yang , Wei Zhang , Shihua Zhang

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

The advancement of artificial intelligence (AI) hinges on the quality and accessibility of data, yet the current fragmentation and variability of data sources hinder efficient data utilization. The dispersion of data sources and diversity…

Digital Libraries · Computer Science 2024-07-22 Conghui He , Wei Li , Zhenjiang Jin , Chao Xu , Bin Wang , Dahua Lin

Privacy Preserving Analytics on Distributed Medical Data

Objective: To enable privacy-preserving learning of high quality generative and discriminative machine learning models from distributed electronic health records. Methods and Results: We describe general and scalable strategy to build…

Cryptography and Security · Computer Science 2018-06-19 Marina Blanton , Ah Reum Kang , Subhadeep Karan , Jaroslaw Zola

Linear density-based clustering with a discrete density model

Density-based clustering techniques are used in a wide range of data mining applications. One of their most attractive features con- sists in not making use of prior knowledge of the number of clusters that a dataset contains along with…

Machine Learning · Computer Science 2018-07-24 Roberto Pirrone , Vincenzo Cannella , Sergio Monteleone , Gabriella Giordano

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

Distributed computing platforms provide a robust mechanism to perform large-scale computations by splitting the task and data among multiple locations, possibly located thousands of miles apart geographically. Although such distribution of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Alok Singh , Eric Stephan , Malachi Schram , Ilkay Altintas

A Comprehensive Survey of Dataset Distillation

Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing…

Machine Learning · Computer Science 2023-12-27 Shiye Lei , Dacheng Tao

Distributional MIPLIB: a Multi-Domain Library for Advancing ML-Guided MILP Methods

Mixed Integer Linear Programming (MILP) is a fundamental tool for modeling combinatorial optimization problems. Recently, a growing body of research has used machine learning to accelerate MILP solving. Despite the increasing popularity of…

Machine Learning · Computer Science 2024-10-29 Weimin Huang , Taoan Huang , Aaron M Ferber , Bistra Dilkina

A Distributed Deep Representation Learning Model for Big Image Data Classification

This paper describes an effective and efficient image classification framework nominated distributed deep representation learning model (DDRL). The aim is to strike the balance between the computational intensive deep learning approaches…

Computer Vision and Pattern Recognition · Computer Science 2016-07-05 Le Dong , Na Lv , Qianni Zhang , Shanshan Xie , Ling He , Mengdie Mao

Data Engineering for HPC with Python

Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Vibhatha Abeykoon , Niranda Perera , Chathura Widanage , Supun Kamburugamuve , Thejaka Amila Kanewala , Hasara Maithree , Pulasthi Wickramasinghe , Ahmet Uyar , Geoffrey Fox

DistDD: Distributed Data Distillation Aggregation through Gradient Matching

In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that…

Machine Learning · Computer Science 2024-10-14 Peiran Wang , Haohan Wang

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented…

Machine Learning · Computer Science 2016-09-22 Guillaume Lemaitre , Fernando Nogueira , Christos K. Aridas

A Unified System for Data Analytics and In Situ Query Processing

In today's world data is being generated at a high rate due to which it has become inevitable to analyze and quickly get results from this data. Most of the relational databases primarily support SQL querying with a limited support for…

Databases · Computer Science 2021-04-08 Alex Watson , Suvam Kumar Das , Suprio Ray