Related papers: Push Down Optimization for Distributed Multi Cloud…
Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities for cloud-based data pipelines by…
Extract-Transform-Load (ETL) handles large amount of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and…
Edge computing has become increasingly popular across many domains and enterprises. However, given the locality constraint of edges (i.e., only close-by edges are useful), multiplexing diverse workloads becomes challenging. This results in…
The recent past has seen the adoption of multi-cloud deployments by enterprises due to availability, features, and regulatory requirements. A typical deployment involves parts of an application/workloads running inside a private cloud with…
The increasing demand for diverse, mobile applications with various degrees of Quality of Service requirements meets the increasing elasticity of on-demand resource provisioning in virtualized cloud computing infrastructures. This paper…
The evolution and advances made in the field of Cloud engineering influence the constant changes in software application development cycle and practices. Software architecture has evolved along with other domains and capabilities of…
The pervasive use of hybrid cloud computing models has changed enterprise as well as Information Technology services infrastructure by giving businesses simple and cost-effective options of combining on-premise IT equipment with public…
The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly…
SQL-on-Hadoop systems, query optimization, data distribution over multiple nodes and parallelization techniques are few of the areas under extreme research these days. Big names like Amazon, Google, Microsoft and many more are working on…
Network is a major bottleneck in modern cloud databases that adopt a storage-disaggregation architecture. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to…
Cloud workloads today are typically managed in a distributed environment and processed across geographically distributed data centers. Cloud service providers have been distributing data centers globally to reduce operating costs while also…
Entity matching is an important and difficult step for integrating web data. To reduce the typically high execution time for matching we investigate how we can perform entity matching in parallel on a distributed infrastructure. We propose…
With the rapid transformation of computer hardware and algorithms, mobile networking has evolved from low data carrying capacity and high latency to better-optimized networks, either by enhancing the digital network or using different…
In the present-day, distributed applications are commonly spread across multiple datacenters, reaching out to edge and fog computing locations. The transition away from single datacenter hosting is driven by capacity constraints in…
We address the joint optimization of multiple stream joins in a scale-out architecture by tailoring prior work on multi-way stream joins to predicate-driven data partitioning schemes. We present an integer linear programming (ILP)…
We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the…
As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of…
The hybrid cloud idea is increasingly gaining momentum because it brings distinct advantages as a hosting platform for complex software systems. However, there are several challenges that need to be surmounted before hybrid hosting can…
Virtual clusters are widely used computing platforms than can be deployed in multiple cloud platforms. The ability to dynamically grow and shrink the number of nodes has paved the way for customised elastic computing both for High…
Distributed computing, such as cloud computing, provides promising platforms to execute multiple workflows. Workflow scheduling plays an important role in multi-workflow execution with multi-objective requirements. Although there exist many…