English
Related papers

Related papers: Optimizing ETL Dataflow Using Shared Caching and P…

200 papers

Extract-Transform-Load (ETL) processes are core components of modern data processing infrastructures. The throughput of processed data records can be adjusted by changing the amount of allocated resources, i.e.~the number of parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-01 Levin Maier , Lucas Schulze , Robert Lilow , Lukas Hahn , Nikola Krasowski , Arnulf Barth , Sebastian Gaebel , Ferdi Güran , Oliver Hanau , Giovanni Wagner , Falk Borgmann , Oleg Arenz , Jan Peters

The Extract, Transform, Load (ETL) workflow is fundamental for populating and maintaining data warehouses and other data stores accessed by analysts for downstream tasks. A major shortcoming of modern ETL solutions is the extensive need for…

Software Engineering · Computer Science 2025-08-01 Mattia Di Profio , Mingjun Zhong , Yaji Sripada , Marcel Jaspars

Edge networks are promising to provide better services to users by provisioning computing and storage resources at the edge of networks. However, due to the uncertainty and diversity of user interests, content popularity, distributed…

Networking and Internet Architecture · Computer Science 2020-03-16 Nitish K. Panigrahy , Jian Li , Faheem Zafari , Don Towsley , Paul Yu

Enterprises increasingly adopt multi cloud architectures to take advantage of diverse database engines, regional availability, and cost models. In these environments, ETL pipelines must process large, distributed datasets while minimizing…

In data warehousing, Extract-Transform-Load (ETL) extracts the data from data sources into a central data warehouse regularly for the support of business decision-makings. The data from transaction processing systems are featured with the…

Databases · Computer Science 2014-09-16 Xiufeng Liu

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…

Databases · Computer Science 2012-08-02 Stephan Ewen , Kostas Tzoumas , Moritz Kaufmann , Volker Markl

KV cache restoration has emerged as a dominant bottleneck in serving long-context LLM workloads, including multi-turn conversations, retrieval-augmented generation, and agentic pipelines. Existing approaches treat restoration as a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-29 Sean Nian , Jiahao Fang , Qilong Feng , Zhiyu Wu , Fan Lai

This paper addresses the challenges of low scheduling efficiency, unbalanced resource allocation, and poor adaptability in ETL (Extract-Transform-Load) processes under heterogeneous data environments by proposing an intelligent scheduling…

Machine Learning · Computer Science 2025-12-16 Kangning Gao , Yi Hu , Cong Nie , Wei Li

Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-06-29 Toralf Kirsten , Lars Kolb , Michael Hartung , Anika Groß , Hanna Köpcke , Erhard Rahm

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search…

Machine Learning · Computer Science 2023-08-23 Srinjoy Das , Lawrence Rauchwerger

A data warehouse efficiently prepares data for effective and fast data analysis and modelling using machine learning algorithms. This paper discusses existing solutions for the Data Extraction, Transformation, and Loading (ETL) process and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-21 Nassi Ebadifard , Ajitesh Parihar , Youry Khmelevsky , Gaetan Hains , Albert Wong , Frank Zhang

Data processing systems offer an ever increasing degree of parallelism on the levels of cores, CPUs, and processing nodes. Query optimization must exploit high degrees of parallelism in order not to gradually become the bottleneck of query…

Databases · Computer Science 2015-11-06 Immanuel Trummer , Christoph Koch

Nowadays, data caching is being used as a high-speed data storage layer in mobile edge computing networks employing flow control methodologies at an exponential rate. This study shows how to discover the best architecture for backhaul…

Networking and Internet Architecture · Computer Science 2022-11-29 Amir Ziaeddini , Amin Mohajer , Davoud Yousefi , A. Mirzaei , Shu Gonglee

In modern large-scale distributed systems, analytics jobs submitted by various users often share similar work, for example scanning and processing the same subset of data. Instead of optimizing jobs independently, which may result in…

Databases · Computer Science 2018-05-23 Pietro Michiardi , Damiano Carra , Sara Migliorini

The competitive dynamics of the globalized market demand information on the internal and external reality of corporations. Information is a precious asset and is responsible for establishing key advantages to enable companies to maintain…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-17 Gustavo V. Machado , Ítalo Cunha , Adriano C. M. Pereira , Leonardo B. Oliveira

Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on…

Hardware Architecture · Computer Science 2023-10-05 Binqi Sun , Debayan Roy , Tomasz Kloda , Andrea Bastoni , Rodolfo Pellizzoni , Marco Caccamo

A simple method for improving cache efficiency of serial and parallel explicit finite procedure with application to casting solidification simulation over three-dimensional complex geometries is presented. The method is based on division of…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-05-19 Ruhollah Tavakoli

Prefix caching is crucial to accelerate multi-turn interactions and requests with shared prefixes. At the cluster level, existing prefix caching systems are tightly coupled with request scheduling to optimize cache efficiency and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Bingyang Wu , Zili Zhang , Yinmin Zhong , Guanzhe Huang , Yibo Zhu , Xuanzhe Liu , Xin Jin

Shared training approaches, such as multi-task learning (MTL) and gradient-based meta-learning, are widely used in various machine learning applications, but they often suffer from negative transfer, leading to performance degradation in…

Machine Learning · Computer Science 2024-12-10 Anshul Thakur , Yichen Huang , Soheila Molaei , Yujiang Wang , David A. Clifton

Traditional ETL and ELT design patterns struggle to meet modern requirements of scalability, governance, and real-time data processing. Hybrid approaches such as ETLT (Extract-Transform-Load-Transform) and ELTL (Extract-Load-Transform-Load)…

Databases · Computer Science 2025-11-06 Chiara Rucco , Motaz Saad , Antonella Longo
‹ Prev 1 2 3 10 Next ›