English
Related papers

Related papers: HeAT -- a Distributed and GPU-accelerated Tensor F…

200 papers

Collaborative filtering (CF) has been proven to be one of the most effective techniques for recommendation. Among all CF approaches, SimpleX is the state-of-the-art method that adopts a novel loss function and a proper number of negative…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Chengming Zhang , Shaden Smith , Baixi Sun , Jiannan Tian , Jonathan Soifer , Xiaodong Yu , Shuaiwen Leon Song , Yuxiong He , Dingwen Tao

Cloud providers usually offer diverse types of hardware for their users. Customers exploit this option to deploy cloud instances featuring GPUs, FPGAs, architectures other than x86 (e.g., ARM, IBM Power8), or featuring certain specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-28 Isabelly Rocha , Christian Göttel , Pascal Felber , Marcelo Pasin , Romain Rouvoy , Valerio Schiavoni

This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module. PyTorch is a widely-adopted scientific computing package used in deep learning research and applications. Recent advances in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-30 Shen Li , Yanli Zhao , Rohan Varma , Omkar Salpekar , Pieter Noordhuis , Teng Li , Adam Paszke , Jeff Smith , Brian Vaughan , Pritam Damania , Soumith Chintala

Data-intensive applications impact many domains, and their steadily increasing size and complexity demands high-performance, highly usable environments. We integrate a set of ideas developed in various data science and data engineering…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-02 Supun Kamburugamuve , Chathura Widanage , Niranda Perera , Vibhatha Abeykoon , Ahmet Uyar , Thejaka Amila Kanewala , Gregor von Laszewski , Geoffrey Fox

Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have prohibitive runtime overheads since they are…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-12 Ehsan Totoni , Todd A. Anderson , Tatiana Shpeisman

Recent years have witnessed the booming of various differentiable optimization algorithms. These algorithms exhibit different execution patterns, and their execution needs massive computational resources that go beyond a single CPU and GPU.…

Mathematical Software · Computer Science 2022-11-15 Jie Ren , Xidong Feng , Bo Liu , Xuehai Pan , Yao Fu , Luo Mai , Yaodong Yang

Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Yifan Ding , Nicholas Botzer , Tim Weninger

The Private Aggregation of Teacher Ensembles (PATE) framework enables privacy-preserving machine learning by aggregating responses from disjoint subsets of sensitive data. Adaptations of PATE to tasks with inherent output diversity such as…

Machine Learning · Computer Science 2025-10-02 Edith Cohen , Benjamin Cohen-Wang , Xin Lyu , Jelani Nelson , Tamas Sarlos , Uri Stemmer

Modern time series analysis demands frameworks that are flexible, efficient, and extensible. However, many existing Python libraries exhibit limitations in modularity and in their native support for irregular, multi-source, or sparse data.…

Machine Learning · Computer Science 2025-08-27 Zhijin Wang , Senzhen Wu , Yue Hu , Xiufeng Liu

Distributed learning offers a practical solution for the integrative analysis of multi-source datasets, especially under privacy or communication constraints. However, addressing prospective distributional heterogeneity and ensuring…

Methodology · Statistics 2025-11-27 Yinrui Sun , Yin Xia

The rise of the Internet of Things and edge computing has shifted computing resources closer to end-users, benefiting numerous delay-sensitive, computation-intensive applications. To speed up computation, distributed computing is a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-10 Ke Ma , Junfei Xie

High level programming languages and GPU accelerators are powerful enablers for a wide range of applications. Achieving scalable vertical (within a compute node), horizontal (across compute nodes), and temporal (over different generations…

Python is the de-facto language for software development in artificial intelligence (AI). Commonly used libraries, such as PyTorch and TensorFlow, rely on parallelization built into their BLAS backends to achieve speedup on CPUs. However,…

Machine Learning · Computer Science 2025-05-02 Maksim Helmann , Alexander Strack , Dirk Pflüger

The AI hardware boom has led modern data centers to adopt HPC-style architectures centered on distributed, GPU-centric computation. Large GPU clusters interconnected by fast RDMA networks and backed by high-bandwidth NVMe storage enable…

Databases · Computer Science 2026-05-21 Jigao Luo , Nils Boeschen , Muhammad El-Hindi , Carsten Binnig

We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT, which surpasses existing solutions in functionality and efficiency. \emph{CAT} features a three-layer architecture: a foundation of core math, a…

Cryptography and Security · Computer Science 2025-03-31 Qirui Li , Rui Zong

Thermal issue is a major concern in 3D integrated circuit (IC) design. Thermal optimization of 3D IC often requires massive expensive PDE simulations. Neural network-based thermal prediction models can perform real-time prediction for many…

Machine Learning · Computer Science 2023-02-28 Ziyue Liu , Yixing Li , Jing Hu , Xinling Yu , Shinyu Shiau , Xin Ai , Zhiyu Zeng , Zheng Zhang

Heatmap is a common geovisualization method that interpolates and visualizes a set of point observations on a map surface. Most of online web mapping libraries implement a one-pass heatmap algorithm using HTML5 canvas or WebGL for efficient…

Graphics · Computer Science 2022-12-16 Yan Y. Liu , Melissa Allen-Dumas

Many hyperparameter optimization (HyperOpt) methods assume restricted computing resources and mainly focus on enhancing performance. Here we propose a novel cloud-based HyperOpt (CHOPT) framework which can efficiently utilize shared…

Machine Learning · Computer Science 2018-10-17 Jinwoong Kim , Minkyu Kim , Heungseok Park , Ernar Kusdavletov , Dongjun Lee , Adrian Kim , Ji-Hoon Kim , Jung-Woo Ha , Nako Sung

The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their…

Neural and Evolutionary Computing · Computer Science 2025-04-14 Lishuang Wang , Mengfei Zhao , Enyu Liu , Kebin Sun , Ran Cheng

To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Masatoshi Hidaka , Tatsuya Harada
‹ Prev 1 2 3 10 Next ›