Related papers: Apache Calcite: A Foundational Framework for Optim…

Apache VXQuery: A Scalable XQuery Implementation

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache…

Databases · Computer Science 2015-04-02 E. Preston Carman , Till Westmann , Vinayak R. Borkar , Michael J. Carey , Vassilis J. Tsotras

Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes.…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Haochen Pan , Ryan Chard , Song Young Oh , Maxime Gonthier , Valérie Hayot-Sasson , Geoffrey Lentner , Joe Bottigliero , Rachana Ananthakrishnan , Kyle Chard , Ian Foster

Quantum-HPC Software Stacks and the openQSE Reference Architecture: A Survey

Quantum resources are increasingly integrated into high-performance computing (HPC) and cloud environments, but quantum high-performance computing (QHPC) software stacks remain isolated, often proprietary, full-stack solutions lacking…

Quantum Physics · Physics 2026-04-24 Amir Shehata , Brian Austin , Tom Beck , Lukas Burgholzer , Alex Chernoguzov , Spencer Churchill , Andrea Delgado , Yasuko Eckert , Jeffery Heckey , Kevin Kissell , Katherine Klymko , Josh Moles , Thomas Naughton , Lee James O'Riordan , Christian Ortiz Pauyac , Guen Prawiroatmodjo , Ermal Rrapaj , Jiri Schindler , Laura Schulz , Sebastian Stern , Tyler Takeshita , Miwako Tsuji , Aleksander Wennersteen , Travis Humble , Martin Schulz

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid…

Databases · Computer Science 2019-03-27 Jesús Camacho-Rodríguez , Ashutosh Chauhan , Alan Gates , Eugene Koifman , Owen O'Malley , Vineet Garg , Zoltan Haindrich , Sergey Shelukhin , Prasanth Jayachandran , Siddharth Seth , Deepak Jaiswal , Slim Bouguerra , Nishant Bangarwa , Sankar Hariappan , Anishek Agarwal , Jason Dere , Daniel Dai , Thejas Nair , Nita Dembla , Gopal Vijayaraghavan , Günther Hagleitner

A Framework for Agile Development of Component-Based Applications

Agile development processes and component-based software architectures are two software engineering approaches that contribute to enable the rapid building and evolution of applications. Nevertheless, few approaches have proposed a…

Software Engineering · Computer Science 2010-02-05 Guillaume Waignier , Estéban Duguepéroux , Anne-Françoise Le Meur , Laurence Duchien

Creating Automated Quantum-Assisted Solutions for Optimization Problems

When trying to use quantum-enhanced methods for optimization problems, the sheer number of options inhibits its adoption by industrial end users. Expert knowledge is required for the formulation and encoding of the use case, the selection…

Quantum Physics · Physics 2025-05-27 Benedikt Poggel , Xiomara Runge , Adelina Bärligea , Jeanette Miriam Lorenz

ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data Modalities

The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing…

Databases · Computer Science 2025-09-25 Jingyi Yang , Songsong Mo , Jiachen Shi , Zihao Yu , Kunhao Shi , Xuchen Ding , Gao Cong

AgileDART: An Agile and Scalable Edge Stream Processing Engine

Edge applications generate a large influx of sensor data on massive scales, and these massive data streams must be processed shortly to derive actionable intelligence. However, traditional data processing systems are not well-suited for…

Databases · Computer Science 2025-07-31 Cheng-Wei Ching , Xin Chen , Chaeeun Kim , Tongze Wang , Dong Chen , Dilma Da Silva , Liting Hu

Towards Query Optimizer as a Service (QOaaS) in a Unified LakeHouse Ecosystem: Can One QO Rule Them All?

Customer demand, regulatory pressure, and engineering efficiency are the driving forces behind the industry-wide trend of moving from siloed engines and services that are optimized in isolation to highly integrated solutions. This is…

Databases · Computer Science 2024-11-22 Rana Alotaibi , Yuanyuan Tian , Stefan Grafberger , Jesús Camacho-Rodríguez , Nicolas Bruno , Brian Kroth , Sergiy Matusevych , Ashvin Agrawal , Mahesh Behera , Ashit Gosalia , Cesar Galindo-Legaria , Milind Joshi , Milan Potocnik , Beysim Sezgin , Xiaoyu Li , Carlo Curino

Alchemist: An Apache Spark <=> MPI Interface

The Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce-style programming model can incur significant overheads when performing computations that do not map…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-06 Alex Gittens , Kai Rothauge , Shusen Wang , Michael W. Mahoney , Jey Kottalam , Lisa Gerhardt , Prabhat , Michael Ringenburg , Kristyn Maschhoff

Exploring polyglot software frameworks in ALICE with FairMQ and fer

In order to meet the challenges of the Run 3 data rates and volumes, the ALICE collaboration is merging the online and offline infrastructures into a common framework: ALICE-O2. O2 is based on FairRoot and FairMQ, a message-based,…

Instrumentation and Detectors · Physics 2019-10-02 S. Binet

PolyFrame: A Retargetable Query-based Approach to Scaling DataFrames (Extended Version)

In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision making and applications. Scaling data analysis, possibly…

Databases · Computer Science 2021-02-11 Phanwadee Sinthong , Michael J. Carey

DECICE: Device-Edge-Cloud Intelligent Collaboration Framework

DECICE is a Horizon Europe project that is developing an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-05 Julian Kunkel , Christian Boehme , Jonathan Decker , Fabrizio Magugliani , Dirk Pleiter , Bastian Koller , Karthee Sivalingam , Sabri Pllana , Alexander Nikolov , Mujdat Soyturk , Christian Racca , Andrea Bartolini , Adrian Tate , Berkay Yaman

On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…

Databases · Computer Science 2015-07-10 Olivier Curé , Hubert Naacke , Mohamed-Amine Baazizi , Bernd Amann

SALT. A unified framework for all shortest-path query variants on road networks

Although recent scientific output focuses on multiple shortest-path problem definitions for road networks, none of the existing solutions does efficiently answer all different types of SP queries. This work proposes SALT, a novel framework…

Data Structures and Algorithms · Computer Science 2014-11-04 Alexandros Efentakis , Dieter Pfoser , Yannis Vassiliou

Alchemi: A .NET-based Grid Computing Framework and its Integration into Global Grids

Computational grids that couple geographically distributed resources are becoming the de-facto computing platform for solving large-scale problems in science, engineering, and commerce. Software to enable grid computing has been primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Akshay Luther , Rajkumar Buyya , Rajiv Ranjan , Srikumar Venugopal

GeoFlink: A Distributed and Scalable Framework for the Real-time Processing of Spatial Streams

Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data.…

Databases · Computer Science 2020-08-04 Salman Ahmed Shaikh , Komal Mariam , Hiroyuki Kitagawa , Kyoung-Sook Kim

Enabling Cross-Language Data Integration and Scalable Analytics in Decentralized Finance

With the agile development process of most academic and corporate entities, designing a robust computational back-end system that can support their ever-changing data needs is a constantly evolving challenge. We propose the implementation…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-07 Conor Flynn , Kristin P. Bennett , John S. Erickson , Aaron Green , Oshani Seneviratne

XACC: A System-Level Software Infrastructure for Heterogeneous Quantum-Classical Computing

Quantum programming techniques and software have advanced significantly over the past five years, with a majority focusing on high-level language frameworks targeting remote REST library APIs. As quantum computing architectures advance and…

Quantum Physics · Physics 2019-11-07 Alexander J. McCaskey , Dmitry I. Lyakh , Eugene F. Dumitrescu , Sarah S. Powers , Travis S. Humble

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-06 Oren Segal , Philip Colangelo , Nasibeh Nasiri , Zhuo Qian , Martin Margala