English
Related papers

Related papers: A pseudo-parallel Python environment for database …

200 papers

Data pre-processing pipelines are the bread and butter of any successful AI project. We introduce a novel programming model for pipelines in a data lakehouse, allowing users to interact declaratively with assets in object storage. Motivated…

Databases · Computer Science 2024-11-14 Jacopo Tagliabue , Ryan Curtin , Ciro Greco

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that enables the construction of robust, scalable workflows for either generating or processing voluminous datasets. A workflow is created from user-written…

Programming Languages · Computer Science 2014-07-17 Marcin Cieslik , Cameron Mura

Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex…

Machine Learning · Computer Science 2021-02-25 Derek G. Murray , Jiri Simsa , Ana Klimovic , Ihor Indyk

The advent of modern data processing has led to an increasing tendency towards interdisciplinarity, which frequently involves the importation of different technical approaches. Consequently, there is an urgent need for a unified data…

Machine Learning · Computer Science 2024-06-04 Chen Zhang , Lecheng Jia , Wei Zhang , Ning Wen

The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more…

There are many packages in Python which allow one to perform real-time processing on audio data. Unfortunately, due to the synchronous nature of the language, there lacks a framework which allows for distributed parallel processing of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-18 Nicolas Shu , David V. Anderson

In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. Processing-in-Memory (PIM) architectures offer a viable solution to…

We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-18 George Teodoro , Tony Pan , Tahsin M. Kurc , Jun Kong , Lee A. D. Cooper , Joel H. Saltz

This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes where a local system drives processing. We show that downloading files and…

Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-30 Wenzhi Fu , Jianlei Yang , Pengcheng Dai , Yiran Chen , Weisheng Zhao

High-throughput imaging workflows, such as Parallel Rapid Imaging with Spectroscopic Mapping (PRISM), generate data at rates that exceed conventional real-time processing capabilities. We present a scalable FPGA-based preprocessing pipeline…

Hardware Architecture · Computer Science 2025-11-26 Weichien Liao

High performance computing has been used in various fields of astrophysical research. But most of it is implemented on massively parallel systems (supercomputers) or graphical processing unit clusters. With the advent of multicore…

Instrumentation and Methods for Astrophysics · Physics 2013-07-30 Navtej Singh , Lisa-Marie Browne , Ray Butler

The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-26 Eduardo Ponce , Brittany Stephenson , Suzanne Lenhart , Judy Day , Gregory D. Peterson

Image processing applications are common in every field of our daily life. However, most of them are very complex and contain several tasks with different complexities which result in varying requirements for computing architectures.…

Computer Vision and Pattern Recognition · Computer Science 2015-02-27 Christian Hartmann , Anna Yupatova , Marc Reichenbach , Dietmar Fey , Reinhard German

Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is…

Programming Languages · Computer Science 2022-03-29 Canberk Morelli , Jan Reineke

The direct detection and characterization of planetary and substellar companions at small angular separations is a rapidly advancing field. Dedicated high-contrast imaging instruments deliver unprecedented sensitivity, enabling detailed…

Earth and Planetary Astrophysics · Physics 2019-01-25 Tomas Stolker , Markus J. Bonse , Sascha P. Quanz , Adam Amara , Gabriele Cugno , Alexander J. Bohn , Anna Boehle

Spatial cluster analysis (SCA) offers valuable insights into biological images; a common SCA technique is sliding window analysis (SWA). Unfortunately, SWA's computational cost hinders its application to larger images, limiting its use to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-23 Seth Ockerman , Zachary Klamer , Brian Haab

Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral,…

Instrumentation and Methods for Astrophysics · Physics 2017-08-09 Juan B. Cabral , Bruno Sánchez , Martín Beroiz , Mariano Domínguez , Marcelo Lares , Sebastián Gurovich , Pablo Granitto

The availability of both structured and unstructured databases, such as electronic health data, social media data, patent data, and surveys that are often updated in real time, among others, has grown rapidly over the past decade. With this…

Databases · Computer Science 2023-07-26 Rebecca C. Steorts

Multi-core architectures feature an intricate hierarchy of cache memories, with multiple levels and sizes. To adequately decompose an application according to the traits of a particular memory hierarchy is a cumbersome task that may be…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-20 Hervé Paulino , Nuno Delgado
‹ Prev 1 2 3 10 Next ›