English
Related papers

Related papers: Optimizing Data Lakes' Queries

200 papers

Data lakes are becoming increasingly prevalent for big data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats…

Databases · Computer Science 2023-10-24 Rihan Hai , Christos Koutras , Christoph Quix , Matthias Jarke

Data lakes have emerged as a flexible and scalable solution for storing and analyzing large volumes of heterogeneous data, including structured, semi-structured, and unstructured formats. Despite their growing adoption in both industry and…

Databases · Computer Science 2026-01-28 Yi Lyu , Pei-Chieh Lo , Natan Lidukhover

Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are…

Databases · Computer Science 2021-07-26 Pegdwendé Sawadogo , Jérôme Darmont

Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on…

Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is…

Databases · Computer Science 2019-05-13 Pegdwendé Sawadogo , Tokio Kibata , Jérôme Darmont

We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the…

In recent past, big data opportunities have gained much momentum to enhance knowledge management in organizations. However, big data due to its various properties like high volume, variety, and velocity can no longer be effectively stored…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-16 Mohammad Shorfuzzaman

Organizations use data lakes to store and analyze sensitive data. But hackers may compromise data lake storage to bypass access controls and access sensitive data. To address this, we propose Membrane, a system that (1) cryptographically…

Cryptography and Security · Computer Science 2025-09-11 Sam Kumar , Samyukta Yagati , Conor Power , David E. Culler , Raluca Ada Popa

Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-29 Ruoran Liu , Haruna Isah , Farhana Zulkernine

Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-03 Johannes Jablonski , Georg-Daniel Schwarz , Philip Heltweg , Dirk Riehle

In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several design alternatives have been proposed to build data lake systems. However, these proposals are difficult to evaluate as there…

Databases · Computer Science 2021-10-05 Pegdwendé Sawadogo , Jérôme Darmont

Data Lake (DL) is a Big Data analysis solution which ingests raw data in their native format and allows users to process these data upon usage. Data ingestion is not a simple copy and paste of data, it is a complicated and important phase…

Databases · Computer Science 2021-07-08 Yan Zhao , Imen Megdiche , Franck Ravat

Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated data sets can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-27 Dong Yuan , Lizhen Cui , Xiao Liu , Erjiang Fu , Yun Yang

Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data…

Databases · Computer Science 2020-11-23 Alex Bogatu , Alvaro A. A. Fernandes , Norman W. Paton , Nikolaos Konstantinou

In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose…

Databases · Computer Science 2023-10-25 Sayed Hoseini , Johannes Theissen-Lipp , Christoph Quix

A data lake is a repository of data with potential for future analysis. However, both discovering what data is in a data lake and exploring related data sets can take significant effort, as a data lake can contain an intimidating amount of…

Databases · Computer Science 2022-06-09 Nour Alhammad , Alex Bogatu , Norman W Paton

In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and…

Databases · Computer Science 2021-09-06 Pegdwendé Sawadogo , Jérôme Darmont , Camille Noûs

Today's Cloud applications are dominated by composite applications comprising multiple computing and data components with strong communication correlations among them. Although Cloud providers are deploying large number of computing and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-20 Md Hasanul Ferdaus , Manzur Murshed , Rodrigo N. Calheiros , Rajkumar Buyya

The increasing demand for diverse, mobile applications with various degrees of Quality of Service requirements meets the increasing elasticity of on-demand resource provisioning in virtualized cloud computing infrastructures. This paper…

Networking and Internet Architecture · Computer Science 2018-07-10 Ronny Hans , Björn Richerzhagen , Amr Rizk , Ulrich Lampe , Ralf Steinmetz , Sabrina Klos , Anja Klein

Cloud providers have introduced pricing models to incentivize long-term commitments of compute capacity. These long-term commitments allow the cloud providers to get guaranteed revenue for their investments in data centers and computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-01 Murray Stokely , Neel Nadgir , Jack Peele , Orestis Kostakis
‹ Prev 1 2 3 10 Next ›