Related papers: Distributed Data Placement via Graph Partitioning
Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems…
Increasing need for large-scale data analytics in a number of application domains has led to a dramatic rise in the number of distributed data management systems, both parallel relational databases, and systems that support alternative…
The efficient parallel execution of complex computations requires balancing the workload across processors while minimizing the communication between them. This inherent trade-off is often captured by graph partitioning or DAG scheduling…
The increasing popularity of cloud computing has resulted in a proliferation of data centers. Effective placement of data centers improves network performance and minimizes clients' perceived latency. The problem of determining the optimal…
Graph learning is often a necessary step in processing or representing structured data, when the underlying graph is not given explicitly. Graph learning is generally performed centrally with a full knowledge of the graph signals, namely…
We study online graph queries that retrieve nearby nodes of a query node from a large network. To answer such queries with high throughput and low latency, we partition the graph and process the data in parallel across a cluster of servers.…
Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of…
Querying graph data with low latency is an important requirement in application domains such as social networks and knowledge graphs. Graph queries perform multiple hops between vertices. When data is partitioned and stored across multiple…
Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…
We study the joint minimization of communication and computation costs in distributed computing, where a master node coordinates $N$ workers to evaluate a function over a library of $n$ files. Assuming that the function is decomposed into…
The distributed optimization problem has become increasingly relevant recently. It has a lot of advantages such as processing a large amount of data in less time compared to non-distributed methods. However, most distributed approaches…
Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the…
The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory…
Electronic data is growing at increasing rates, in both size and connectivity: the increasing presence of, and interest in, relationships between data. An example is the Twitter social network graph. Due to this growth demand is increasing…
The inherent connectivity and dependency of graph-structured data, combined with its unique topology-driven access patterns, pose fundamental challenges to conventional data replication and request routing strategies in geo-distributed…
Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network…
Today's Cloud applications are dominated by composite applications comprising multiple computing and data components with strong communication correlations among them. Although Cloud providers are deploying large number of computing and…
Today's data centers have an abundance of computing resources, hosting server clusters consisting of as many as tens or hundreds of thousands of machines. To execute a complex computing task over a data center, it is natural to distribute…
Next-generation communication networks are envisioned to extensively utilize storage-enabled caching units to alleviate unfavorable surges of data traffic by pro-actively storing anticipated highly popular contents across geographically…
We consider a number of fundamental statistical and graph problems in the message-passing model, where we have $k$ machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the…