Related papers: Apache Calcite: A Foundational Framework for Optim…
The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. To efficiently query such large data collections and take advantage of parallelism, we have implemented Apache…
Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes.…
Quantum resources are increasingly integrated into high-performance computing (HPC) and cloud environments, but quantum high-performance computing (QHPC) software stacks remain isolated, often proprietary, full-stack solutions lacking…
Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid…
Agile development processes and component-based software architectures are two software engineering approaches that contribute to enable the rapid building and evolution of applications. Nevertheless, few approaches have proposed a…
When trying to use quantum-enhanced methods for optimization problems, the sheer number of options inhibits its adoption by industrial end users. Expert knowledge is required for the formulation and encoding of the use case, the selection…
The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing…
Edge applications generate a large influx of sensor data on massive scales, and these massive data streams must be processed shortly to derive actionable intelligence. However, traditional data processing systems are not well-suited for…
Customer demand, regulatory pressure, and engineering efficiency are the driving forces behind the industry-wide trend of moving from siloed engines and services that are optimized in isolation to highly integrated solutions. This is…
The Apache Spark framework for distributed computation is popular in the data analytics community due to its ease of use, but its MapReduce-style programming model can incur significant overheads when performing computations that do not map…
In order to meet the challenges of the Run 3 data rates and volumes, the ALICE collaboration is merging the online and offline infrastructures into a common framework: ALICE-O2. O2 is based on FairRoot and FairMQ, a message-based,…
In the last few years, the field of data science has been growing rapidly as various businesses have adopted statistical and machine learning techniques to empower their decision making and applications. Scaling data analysis, possibly…
DECICE is a Horizon Europe project that is developing an AI-enabled open and portable management framework for automatic and adaptive optimization and deployment of applications in computing continuum encompassing from IoT sensors on the…
Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper…
Although recent scientific output focuses on multiple shortest-path problem definitions for road networks, none of the existing solutions does efficiently answer all different types of SP queries. This work proposes SALT, a novel framework…
Computational grids that couple geographically distributed resources are becoming the de-facto computing platform for solving large-scale problems in science, engineering, and commerce. Software to enable grid computing has been primarily…
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data.…
With the agile development process of most academic and corporate entities, designing a robust computational back-end system that can support their ever-changing data needs is a constantly evolving challenge. We propose the implementation…
Quantum programming techniques and software have advanced significantly over the past five years, with a majority focusing on high-level language frameworks targeting remote REST library APIs. As quantum computing architectures advance and…
We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core…