Related papers: Detecting Superbubbles in Assembly Graphs
DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce…
Superbubbles are acyclic induced subgraphs of a digraph with single entrance and exit that naturally arise in the context of genome assembly and the analysis of genome alignments in computational biology. These structures can be computed in…
The first step in any genome assembly algorithm entails the conversion from the domain of strings and overlaps to the language of graphs and paths, typically using one of the two conventional methods: de Bruijn graphs or overlap graphs.…
This paper studies the problem of detecting anomalous graphs using a machine learning model trained on only normal graphs, which has many applications in molecule, biology, and social network data analysis. We present a self-discriminative…
Maximal clique enumeration is a fundamental graph mining task, but its utility is often limited by computational intractability and highly redundant output. To address these challenges, we introduce \emph{$\rho$-dense aggregators}, a novel…
The problem of finding dense components of a graph is a widely explored area in data analysis, with diverse applications in fields and branches of study including community mining, spam detection, computer security and bioinformatics. This…
The maximum clique problem is a well known NP-Hard problem with applications in data mining, network analysis, information retrieval and many other areas related to the World Wide Web. There exist several algorithms for the problem with…
Community detection is a fundamental task in graph analysis, with methods often relying on fitting models like the Stochastic Block Model (SBM) to observed networks. While many algorithms can accurately estimate SBM parameters when the…
De novo genome assembly focuses on finding connections between a vast amount of short sequences in order to reconstruct the original genome. The central problem of genome assembly could be described as finding a Hamiltonian path through a…
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories -- based on the data structures which…
Bidirected graphs are a common generalisation of directed graphs where arcs can also be incoming to both their incident nodes, or outgoing from both their incident nodes. Such arcs allow a walk to change direction. Some algorithms can…
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique,…
Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…
Molecule generation is central to a variety of applications. Current attention has been paid to approaching the generation task as subgraph prediction and assembling. Nevertheless, these methods usually rely on hand-crafted or external…
Graphs are naturally used to describe the structures of various real-world systems in biology, society, computer science etc., where subgraphs or motifs as basic blocks play an important role in function expression and information…
A fundamental algorithmic problem in computational biology is to find all subgraphs of a given type (superbubbles, snarls, and ultrabubbles) in a directed or bidirected input graph. These correspond to regions of genetic variation and are…
To understand the structure of a network, it can be useful to break it down into its constituent pieces. This is the approach taken in a multitude of successful network analysis methods, such as motif analysis. These methods require one to…
In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately…
Recently we have developed a new method in graph theory based on the Regularity Lemma. The method is applied to find certain spanning subgraphs in dense graphs. The other main general tool of the method, beside the Regularity Lemma, is the…
We develop random graph models where graphs are generated by connecting not only pairs of vertices by edges but also larger subsets of vertices by copies of small atomic subgraphs of arbitrary topology. This allows the for the generation of…