Related papers: Integrating pre-processing pipelines in ODC based …
Deep learning has become the gold standard for image processing over the past decade. Simultaneously, we have seen growing interest in orbital activities such as satellite servicing and debris removal that depend on proximity operations…
The amount of remote sensing data available to applications is constantly growing due to the rise of very-high-resolution sensors and short repeat cycle satellites. Consequently, tackling computational complexity in Earth Observation…
This paper presents the design and the implementation of an interface software component between OLE for Process Control (OPC) formatted data and the Global Sensor Network (GSN) framework for management of data from sensors. This interface,…
Processing data received as a stream is a task commonly performed by modern embedded devices, in a wide range of applications such as multimedia (encoding/decoding/ playing media), networking (switching and routing), digital security,…
As data mesh architectures gain traction in federated environments, organizations are increasingly building consumer-specific data-sharing pipelines using modular, cloud-native transformation services. Prior work has shown that structuring…
This paper aims to create a transition path from file-based IO to streaming-based workflows for scientific applications in an HPC environment. By using the openPMP-api, traditional workflows limited by filesystem bottlenecks can be overcome…
The OSG-operated Open Science Pool is an HTCondor-based virtual cluster that aggregates resources from compute clusters provided by several organizations. Most of the resources are not owned by OSG, so demand-based dynamic provisioning is…
The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more…
Apache Flink is an open-source system for scalable processing of batch and streaming data. Flink does not natively support efficient processing of spatial data streams, which is a requirement of many applications dealing with spatial data.…
Data is a valuable asset, and sharing it as a product across organizations is key to building comprehensive and useful insights in fields such as science and industry. Before sharing, data often requires transformation to comply with…
The Open Knowledgebase of Interatomic Models (OpenKIM) project is a framework intended to facilitate access to standardized implementations of interatomic models for molecular simulations along with computational protocols to evaluate them.…
Recent researches have shown that grid resources can be accessed by client on-demand, with the help of virtualization technology in the Cloud. The virtual machines hosted by the hypervisors are being utilized to build the grid network…
Software-defined vehicles (SDVs) offer a wide range of connected functionalities, including enhanced driving behavior and fleet management. These features are continuously updated via over-the-air (OTA) mechanisms, resulting in a growing…
We propose a methodology to manage and process remote sensing and geo-imagery data for non-expert users. The proposed system provides automated data ingestion and manipulation capability for analytical data-driven purposes. In this paper,…
Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by…
We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have…
Advancements in Earth system science have seen a surge in diverse datasets. Earth System Data Cubes (ESDCs) have been introduced to efficiently handle this influx of high-dimensional data. ESDCs offer a structured, intuitive framework for…
This paper describes an architecture for a distributing processing system that would allow remote procedure calls to invoke other services as messages are passed between clients and servers. It proposes that an additional class of data…
The proliferation of Large Language Models (LLMs) with exponentially growing parameters is making cross-data center (DC) training an inevitable trend. However, viable strategies for extending single-DC training frameworks to multi-DC…
We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical…