Related papers: METAQ: Bundle Supercomputing Tasks
High Performance Computing is often performed on scarce and shared computing resources. To ensure computers are used to their full capacity, administrators often incentivize large workloads that are not possible on smaller systems.…
We introduce the Balsam service to manage high-throughput task scheduling and execution on supercomputing systems. Balsam allows users to populate a task database with a variety of tasks ranging from simple independent tasks to dynamic…
We present the implementation of a batch job scheduler designed for single-point management of distributed tasks on a multi-node compute farm. The scheduler uses the notion of a meta-job to launch large computing tasks simultaneously on…
Training machine learning (ML) models with large datasets can incur significant resource contention on shared clusters. This training typically involves many iterations that continually improve the quality of the model. Yet in exploratory…
Cloud-based computing systems could get oversubscribed due to budget constraints of cloud users which causes violation of Quality of Experience(QoE) metrics such as tasks' deadlines. We investigate an approach to achieve robustness against…
Containers are used by an increasing number of Internet service providers to deploy their applications in multi-access edge computing (MEC) systems. Although container-based virtualization technologies significantly increase application…
Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…
Mobile edge computing (MEC) enables low-latency and high-bandwidth applications by bringing computation and data storage closer to end-users. Intelligent computing is an important application of MEC, where computing resources are used to…
We describe 'staq', a full-stack quantum processing toolkit written in standard C++. 'staq' is a quantum compiler toolkit, comprising of tools that range from quantum optimizers and translators to physical mappers for quantum devices with…
Stream processing is a computing paradigm that supports real-time data processing for a wide variety of applications. At Meta, it's used across the company for various tasks such as deriving product insights, providing and improving user…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
With the continuous improvement of information infrastructures, academia and industry have been constantly exploring new computing paradigms to fully exploit computing powers. In this paper, we propose Meta Computing, a new computing…
There are many science applications that require scalable task-level parallelism and support for flexible execution and coupling of ensembles of simulations. Most high-performance system software and middleware, however, are designed to…
The proliferation of massive datasets combined with the development of sophisticated analytical techniques have enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved…
The idle time of personal computers has increased steadily due to the generalization of computer usage and cloud computing. Clustering research aims at utilizing idle computer resources for processing a variable workload on a large number…
One typical use case of large-scale distributed computing in data centers is to decompose a computation job into many independent tasks and run them in parallel on different machines, sometimes known as the "embarrassingly parallel"…
The main goal of parallel processing is to provide users with performance that is much better than that of single processor systems. The execution of jobs is scheduled, which requires certain resources in order to meet certain criteria.…
Cloud-based computing systems can get oversubscribed due to the budget constraints of their users or limitations in certain resource types. The oversubscription can, in turn, degrade the users perceived Quality of Service (QoS). The…
In this paper, we explore a multi-task semantic communication (SemCom) system for distributed sources, extending the existing focus on collaborative single-task execution. We build on the cooperative multi-task processing introduced in [1],…
A parallel computer system is a collection of processing elements that communicate and cooperate to solve large computational problems efficiently. To achieve this, at first the large computational problem is partitioned into several tasks…