Related papers: Optimizing Data Lakes' Queries
Data lakes are becoming increasingly prevalent for big data management and data analytics. In contrast to traditional 'schema-on-write' approaches such as data warehouses, data lakes are repositories storing raw data in its original formats…
Data lakes have emerged as a flexible and scalable solution for storing and analyzing large volumes of heterogeneous data, including structured, semi-structured, and unstructured formats. Despite their growing adoption in both industry and…
Over the past two decades, we have witnessed an exponential increase of data production in the world. So-called big data generally come from transactional systems, and even more so from the Internet of Things and social media. They are…
Over the past decade, the data lake concept has emerged as an alternative to data warehouses for storing and analyzing big data. A data lake allows storing data without any predefined schema. Therefore, data querying and analysis depend on…
Data lakes have emerged as an alternative to data warehouses for the storage, exploration and analysis of big data. In a data lake, data are stored in a raw state and bear no explicit schema. Thence, an efficient metadata system is…
We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the…
In recent past, big data opportunities have gained much momentum to enhance knowledge management in organizations. However, big data due to its various properties like high volume, variety, and velocity can no longer be effectively stored…
Organizations use data lakes to store and analyze sensitive data. But hackers may compromise data lake storage to bypass access controls and access sensitive data. To address this, we propose Membrane, a system that (1) cryptographically…
Large organizations are seeking to create new architectures and scalable platforms to effectively handle data management challenges due to the explosive nature of data rarely seen in the past. These data management challenges are largely…
Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by…
In the last few years, the concept of data lake has become trendy for data storage and analysis. Thus, several design alternatives have been proposed to build data lake systems. However, these proposals are difficult to evaluate as there…
Data Lake (DL) is a Big Data analysis solution which ingests raw data in their native format and allows users to process these data upon usage. Data ingestion is not a simple copy and paste of data, it is a complicated and important phase…
Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated data sets can be…
Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data…
In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose…
A data lake is a repository of data with potential for future analysis. However, both discovering what data is in a data lake and exploring related data sets can take significant effort, as a data lake can contain an intimidating amount of…
In 2010, the concept of data lake emerged as an alternative to data warehouses for big data management. Data lakes follow a schema-on-read approach to provide rich and flexible analyses. However, although trendy in both the industry and…
Today's Cloud applications are dominated by composite applications comprising multiple computing and data components with strong communication correlations among them. Although Cloud providers are deploying large number of computing and…
The increasing demand for diverse, mobile applications with various degrees of Quality of Service requirements meets the increasing elasticity of on-demand resource provisioning in virtualized cloud computing infrastructures. This paper…
Cloud providers have introduced pricing models to incentivize long-term commitments of compute capacity. These long-term commitments allow the cloud providers to get guaranteed revenue for their investments in data centers and computing…