Related papers: GraphLab: A Distributed Framework for Machine Lear…
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning…
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and…
Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine…
Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed.…
Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…
Distributed processing of large-scale graph data has many practical applications and has been widely studied. In recent years, a lot of distributed graph processing frameworks and algorithms have been proposed. While many efforts have been…
What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…
Graph embedding aims at learning a vector-based representation of vertices that incorporates the structure of the graph. This representation then enables inference of graph properties. Existing graph embedding techniques, however, do not…
Machine Learning (ML) is more than just training models, the whole workflow must be considered. Once deployed, a ML model needs to be watched and constantly supervised and debugged to guarantee its validity and robustness in unexpected…
Given a large graph, a graph sample determines a subgraph with similar characteristics for certain metrics of the original graph. The samples are much smaller thereby accelerating and simplifying the analysis and visualization of large…
We initiate the study of graph algorithms in the streaming setting on massive distributed and parallel systems inspired by practical data processing systems. The objective is to design algorithms that can efficiently process evolving graphs…
Graph clustering has many important applications in computing, but due to growing sizes of graphs, even traditionally fast clustering methods such as spectral partitioning can be computationally expensive for real-world graphs of interest.…
Large deep learning models have shown great potential for delivering exceptional results in various applications. However, the training process can be incredibly challenging due to the models' vast parameter sizes, often consisting of…
Graphs, consisting of vertices and edges, are vital for representing complex relationships in fields like social networks, finance, and blockchain. Visualizing these graphs helps analysts identify structural patterns, with readability…
Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a…
In the era of deep learning (DL), convolutional neural networks (CNNs), and large language models (LLMs), machine learning (ML) models are becoming increasingly complex, demanding significant computational resources for both inference and…
The GraphBLAS high performance library standard has yielded capabilities beyond enabling graph algorithms to be readily expressed in the language of linear algebra. These GraphBLAS capabilities enable new performant ways of thinking about…
In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they…