Related papers: A pseudo-parallel Python environment for database …

FaaS and Furious: abstractions and differential caching for efficient data pre-processing

Data pre-processing pipelines are the bread and butter of any successful AI project. We introduce a novel programming model for pipelines in a data lakehouse, allowing users to interact declaratively with assets in object storage. Motivated…

Databases · Computer Science 2024-11-14 Jacopo Tagliabue , Ryan Curtin , Ciro Greco

PaPy: Parallel and Distributed Data-processing Pipelines in Python

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…

Programming Languages · Computer Science 2014-07-17 Marcin Cieslik , Cameron Mura

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex…

Machine Learning · Computer Science 2021-02-25 Derek G. Murray , Jiri Simsa , Ana Klimovic , Ihor Indyk

Functional Programming Paradigm of Python for Scientific Computation Pipeline Integration

The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data…

Machine Learning · Computer Science 2024-06-04 Chen Zhang , Lecheng Jia , Wei Zhang , Ning Wen

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Arup Kumar Sarker , Mills Staylor , Gregor von Laszewski , Kaiying Shan , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Thejaka Amila Kanewela , Geoffrey Fox

Audiosockets: A Python socket package for Real-Time Audio Processing

There are many packages in Python which allow one to perform real-time processing on audio data. Unfortunately, due to the synchronous nature of the language, there lacks a framework which allows for distributed parallel processing of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-18 Nicolas Shu , David V. Anderson

Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

Hardware Architecture · Computer Science 2025-04-10 Akhil Shekar , Kevin Gaffney , Martin Prammer , Khyati Kiyawat , Lingxi Wu , Helena Caminal , Zhenxing Fan , Yimin Gao , Ashish Venkat , José F. Martínez , Jignesh Patel , Kevin Skadron

High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms

We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-18 George Teodoro , Tony Pan , Tahsin M. Kurc , Jun Kong , Lee A. D. Cooper , Joel H. Saltz

Parallel Seismic Data Processing Performance with Cloud-based Storage

This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and…

Geophysics · Physics 2025-09-03 Sasmita Mohapatra , Weiming Yang , Zhengtang Yang , Chenxiao Wang , Jinxin Ma , Gary L. Pavlis , Yinzhi Wang

A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform

Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-30 Wenzhi Fu , Jianlei Yang , Pengcheng Dai , Yiran Chen , Weisheng Zhao

Scalable FPGA Framework for Real-Time Denoising in High-Throughput Imaging: A DRAM-Optimized Pipeline using High-Level Synthesis

High-throughput imaging workflows, such as Parallel Rapid Imaging with Spectroscopic Mapping (PRISM), generate data at rates that exceed conventional real-time processing capabilities. We present a scalable FPGA-based preprocessing pipeline…

Hardware Architecture · Computer Science 2025-11-26 Weichien Liao

Parallel Astronomical Data Processing with Python: Recipes for multicore machines

High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore…

Instrumentation and Methods for Astrophysics · Physics 2013-07-30 Navtej Singh , Lisa-Marie Browne , Ray Butler

PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-26 Eduardo Ponce , Brittany Stephenson , Suzanne Lenhart , Judy Day , Gregory D. Peterson

A Holistic Approach for Modeling and Synthesis of Image Processing Applications for Heterogeneous Computing Architectures

Image processing applications are common in every field of our daily life. However, most of them are very complex and contain several tasks with different complexities which result in varying requirements for computing architectures.…

Computer Vision and Pattern Recognition · Computer Science 2015-02-27 Christian Hartmann , Anna Yupatova , Marc Reichenbach , Dietmar Fey , Reinhard German

Warping Cache Simulation of Polyhedral Programs

Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is…

Programming Languages · Computer Science 2022-03-29 Canberk Morelli , Jan Reineke

PynPoint: a modular pipeline architecture for processing and analysis of high-contrast imaging data

The direct detection and characterization of planetary and substellar companions at small angular separations is a rapidly advancing field. Dedicated high-contrast imaging instruments deliver unprecedented sensitivity, enabling detailed…

Earth and Planetary Astrophysics · Physics 2019-01-25 Tomas Stolker , Markus J. Bonse , Sascha P. Quanz , Adam Amara , Gabriele Cugno , Alexander J. Bohn , Anna Boehle

Accelerating Biological Spatial Cluster Analysis with the Parallel Integral Image Technique

Spatial cluster analysis (SCA) offers valuable insights into biological images; a common SCA technique is sliding window analysis (SWA). Unfortunately, SWA's computational cost hinders its application to larger images, limiting its use to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-23 Seth Ockerman , Zachary Klamer , Brian Haab

Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines

Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral,…

Instrumentation and Methods for Astrophysics · Physics 2017-08-09 Juan B. Cabral , Bruno Sánchez , Martín Beroiz , Mariano Domínguez , Marcelo Lares , Sebastián Gurovich , Pablo Granitto

A Primer on the Data Cleaning Pipeline

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this…

Databases · Computer Science 2023-07-26 Rebecca C. Steorts

Cache-Conscious Run-time Decomposition of Data Parallel Computations

Multi-core architectures feature an intricate hierarchy of cache memories, with multiple levels and sizes. To adequately decompose an application according to the traits of a particular memory hierarchy is a cumbersome task that may be…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-20 Hervé Paulino , Nuno Delgado