Related papers: Graph Sampling with Distributed In-Memory Dataflow…
Network embedding is an important step in many different computations based on graph data. However, existing approaches are limited to small or middle size graphs with fewer than a million edges. In practice, web or social network graphs…
In order to efficiently study the characteristics of network domains and support development of network systems (e.g. algorithms, protocols that operate on networks), it is often necessary to sample a representative subgraph from a large…
While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning…
Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph…
Sampling technique has become one of the recent research focuses in the graph-related fields. Most of the existing graph sampling algorithms tend to sample the high degree or low degree nodes in the complex networks because of the…
Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive…
Complex networks are relational data sets commonly represented as graphs. The analysis of their intricate structure is relevant to many areas of science and commerce, and data sets may reach sizes that require distributed storage and…
Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in…
Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of…
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population.…
Online social network services provide a platform for human social interactions. Nowadays, many kinds of online interactions generate large-scale social network data. Network analysis helps to mine knowledge and pattern from the…
There has been an increased interest in applying machine learning techniques on relational structured-data based on an observed graph. Often, this graph is not fully representative of the true relationship amongst nodes. In these settings,…
Specify a randomized algorithm that, given a very large graph or network, extracts a random subgraph. What can we learn about the input graph from a single subsample? We derive laws of large numbers for the sampler output, by relating…
Real-world graphs, such as social networks, financial transactions, and recommendation systems, often demonstrate dynamic behavior. This phenomenon, known as graph stream, involves the dynamic changes of nodes and the emergence and…
This proposal presents a graph computing framework intending to support both online and offline computing on large dynamic graphs efficiently. The framework proposes a new data model to support rich evolving vertex and edge data types. It…
Apart from forming the backbone of compiler optimization, static dataflow analysis has been widely applied in a vast variety of applications, such as bug detection, privacy analysis, program comprehension, etc. Despite its importance,…
Graphs may be used to represent many different problem domains -- a concrete example is that of detecting communities in social networks, which are represented as graphs. With big data and more sophisticated applications becoming widespread…
Recently we create so much data (2.5 quintillion bytes every day) that 90% of the data in the world today has been created in the last two years alone [1]. This data comes from sensors used to gather traffic or climate information, posts to…
Random walk based distance measures for graphs such as commute-time distance are useful in a variety of graph algorithms, such as clustering, anomaly detection, and creating low dimensional embeddings. Since such measures hinge on the…
Graph signal sampling is the problem of selecting a subset of representative graph vertices whose values can be used to interpolate missing values on the remaining graph vertices. Optimizing the choice of sampling set using concepts from…