Related papers: Pilot-Data: An Abstraction for Distributed Data

P*: A Model of Pilot-Abstractions

Pilot-Jobs support effective distributed resource utilization, and are arguably one of the most widely-used distributed computing abstractions - as measured by the number and types of applications that use them, as well as the number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-03-20 Andre Luckow , Mark Santcroos , Ole Weidner , Andre Merzky , Pradeep Mantha , Shantenu Jha

Pilot-Abstraction: A Valid Abstraction for Data-Intensive Applications on HPC, Hadoop and Cloud Infrastructures?

HPC environments have traditionally been designed to meet the compute demand of scientific applications and data has only been a second order concern. With science moving toward data-driven discoveries relying more on correlations in data…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-22 Andre Luckow , Pradeep Mantha , Shantenu Jha

Pilot-Edge: Distributed Resource Management Along the Edge-to-Cloud Continuum

Many science and industry IoT applications necessitate data processing across the edge-to-cloud continuum to meet performance, security, cost, and privacy requirements. However, diverse abstractions and infrastructures for managing…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-09 Andre Luckow , Kartik Rattan , Shantenu Jha

A Comprehensive Perspective on Pilot-Job Systems

Pilot-Job systems play an important role in supporting distributed scientific computing. They are used to consume more than 700 million CPU hours a year by the Open Science Grid communities, and by processing up to 1 million jobs a day for…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-08 Matteo Turilli , Mark Santcroos , Shantenu Jha

A Model and Survey of Distributed Data-Intensive Systems

Data is a precious resource in today's society, and is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Alessandro Margara , Gianpaolo Cugola , Nicolò Felicioni , Stefano Cilloni

Integrating Abstractions to Enhance the Execution of Distributed Applications

One of the factors that limits the scale, performance, and sophistication of distributed applications is the difficulty of concurrently executing them on multiple distributed computing resources. In part, this is due to a poor understanding…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-22 Matteo Turilli , Feng Liu , Zhao Zhang , Andre Merzky , Michael Wilde , Jon Weissman , Daniel S. Katz , Shantenu Jha

Evaluating Distributed Execution of Workloads

Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-04 Matteo Turilli , Yadu Nand Babuji , Andre Merzky , Ming Tai Ha , Michael Wilde , Daniel S. Katz , Shantenu Jha

Scalable data abstractions for distributed parallel computations

The ability to express a program as a hierarchical composition of parts is an essential tool in managing the complexity of software and a key abstraction this provides is to separate the representation of data from the computation. Many…

Programming Languages · Computer Science 2012-10-04 James Hanlon , Simon J. Hollis , David May

New Pilot-Study Design in Functional Data Analysis

Efficient data collection is essential in applied studies where frequent measurements are costly, time-consuming, or burdensome. This challenge is especially pronounced in functional data settings, where each subject is observed at only a…

Methodology · Statistics 2025-08-04 Ping-Han Huang , Ming-Hung Kao

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management

High-performance computing platforms such as supercomputers have traditionally been designed to meet the compute demands of scientific applications. Consequently, they have been architected as producers and not consumers of data. The Apache…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-02 Andre Luckow , Ioannis Paraskevakos , George Chantzialexiou , Shantenu Jha

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-30 Georgios L. Stavrinides , Helen D. Karatza

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Lauritz Thamsen , Dominik Scheinert , Jonathan Will , Jonathan Bader , Odej Kao

Pilot-Streaming: A Stream Processing Framework for High-Performance Computing

An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and Internet-of-Thing (IoT) sensors. The development of streaming applications…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-13 Andre Luckow , George Chantzialexiou , Shantenu Jha

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Industries such as finance, meteorology, and energy generate vast amounts of data daily. Efficiently managing, processing, and displaying this data requires specialized expertise and is often tedious and repetitive. Leveraging large…

Computation and Language · Computer Science 2025-05-20 Wenqi Zhang , Yongliang Shen , Zeqi Tan , Guiyang Hou , Weiming Lu , Yueting Zhuang

Pilot-Quantum: A Quantum-HPC Middleware for Resource, Workload and Task Management

As quantum hardware advances, integrating quantum processing units (QPUs) into HPC environments and managing diverse infrastructure and software stacks becomes increasingly essential. Pilot-Quantum addresses these challenges as a middleware…

Quantum Physics · Physics 2025-05-29 Pradeep Mantha , Florian J. Kiwit , Nishant Saurabh , Shantenu Jha , Andre Luckow

Towards an Integrated Platform for Big Data Analysis

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users' activities. These large data sets have been labeled as "Big Data", and their storage,…

Databases · Computer Science 2020-04-29 Mahdi Bohlouli , Frank Schulz , Lefteris Angelis , David Pahor , Ivona Brandic , David Atlan , Rosemary Tate

Data Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data Intensive Applications

Data intensive applications often involve the analysis of large datasets that require large amounts of compute and storage resources. While dedicated compute and/or storage farms offer good task/data throughput, they suffer low resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-08-27 Ioan Raicu , Yong Zhao , Ian Foster , Alex Szalay

A Multi-Server Information-Sharing Environment for Cross-Party Collaboration on A Private Cloud

Interoperability remains the key problem in multi-discipline collaboration based on building information modeling (BIM). Although various methods have been proposed to solve the technical issues of interoperability, such as data sharing and…

Cryptography and Security · Computer Science 2024-11-22 Jianping Zhang , Qiang Liu , Zhenzhong Hu , Jiarui Lin , Fangqiang Yu

Using Pilot Systems to Execute Many Task Workloads on Supercomputers

High performance computing systems have historically been designed to support applications comprised of mostly monolithic, single-job workloads. Pilot systems decouple workload specification, resource selection, and task execution via job…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-31 Andre Merzky , Matteo Turilli , Manuel Maldonado , Mark Santcroos , Shantenu Jha