English
Related papers

Related papers: D2O - a distributed data object for parallel high-…

200 papers

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…

Programming Languages · Computer Science 2014-07-17 Marcin Cieslik , Cameron Mura

The discrete distribution clustering algorithm, namely D2-clustering, has demonstrated its usefulness in image classification and annotation where each object is represented by a bag of weighed vectors. The high computational complexity of…

Machine Learning · Computer Science 2013-02-07 Yu Zhang , James Z. Wang , Jia Li

In recent IoT (Internet of Things) and Web 2.0 technologies, a critical problem arises with respect to storing and processing the large amount of collected data. In this paper we develop and evaluate distributed infrastructures for storing…

Databases · Computer Science 2014-04-04 S. Sioutas , E. Sakkopoulos , A. Panaretos , D. Tsoumakos , P. Gerolymatos , G. Tzimas , Y. Manolopoulos

Differential privacy enables general statistical analysis of data with formal guarantees of privacy protection at the individual level. Tools that assist data analysts with utilizing differential privacy have frequently taken the form of…

Programming Languages · Computer Science 2021-03-17 Chike Abuah , Alex Silence , David Darais , Joe Near

Distributed Asynchronous Object Store (DAOS) is a novel software-defined object store leveraging Non-Volatile Memory (NVM) devices, designed for high performance. It provides a number of interfaces for applications to undertake I/O, ranging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-30 Nicolau Manubens , Johann Lombardi , Simon D. Smart , Emanuele Danovaro , Tiago Quintino , Dean Hildebrand , Adrian Jackson

Data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-14 Vibhatha Abeykoon , Niranda Perera , Chathura Widanage , Supun Kamburugamuve , Thejaka Amila Kanewala , Hasara Maithree , Pulasthi Wickramasinghe , Ahmet Uyar , Geoffrey Fox

In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on…

DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for…

Hierarchical $\mathcal{H}^2$-matrices are asymptotically optimal representations for the discretizations of non-local operators such as those arising in integral equations or from kernel functions. Their $O(N)$ complexity in both memory and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Stefano Zampini , Wajih Boukaram , George Turkiyyah , Omar Knio , David E. Keyes

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library…

In this paper we introduce DISROPT, a Python package for distributed optimization over networks. We focus on cooperative set-ups in which an optimization problem must be solved by peer-to-peer processors (without central coordinators) that…

Optimization and Control · Mathematics 2021-04-21 Francesco Farina , Andrea Camisa , Andrea Testa , Ivano Notarnicola , Giuseppe Notarstefano

Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-03 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Jonathan Bader , Odej Kao

Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but…

Programming Languages · Computer Science 2014-06-02 Andreas Klöckner

Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a…

The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Ahmet Uyar , Kaiying Shan , Hasara Maithree , Damitha Lenadora , Thejaka Amila Kanewala , Geoffrey Fox

We present POLO --- a C++ library for large-scale parallel optimization research that emphasizes ease-of-use, flexibility and efficiency in algorithm design. It uses multiple inheritance and template programming to decompose algorithms into…

Optimization and Control · Mathematics 2018-10-09 Arda Aytekin , Martin Biel , Mikael Johansson

Over the Internet today, computing and communications environments are significantly more complex and chaotic than classical distributed systems, lacking any centralized organization or hierarchical control. There has been much interest in…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-28 Ciprian Dobre , Florin Pop , Valentin Cristea

One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip…

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passes, its application to…

Machine Learning · Computer Science 2025-07-08 Liangyu Wang , Huanyi Xie , Di Wang
‹ Prev 1 2 3 10 Next ›