Related papers: Coding Sequence Density Estimation Via Topological…
In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases…
This work presents a new approach for classification of genomic sequences from measurements of complex networks and information theory. For this, it is considered the nucleotides, dinucleotides and trinucleotides of a genomic sequence. For…
A probability density function (pdf) encodes the entire stochastic knowledge about data distribution, where data may represent stochastic observations in robotics, transition state pairs in reinforcement learning or any other empirically…
This study proposes a data condensation method for multivariate kernel density estimation by genetic algorithm. First, our proposed algorithm generates multiple subsamples of a given size with replacement from the original sample. The…
Sequence data, such as DNA, RNA, and protein sequences, exhibit intricate, multi-scale structures that pose significant challenges for conventional analysis methods, particularly those relying on alignment or purely statistical…
While many approaches to make neural networks more fathomable have been proposed, they are restricted to interrogating the network with input data. Measures for characterizing and monitoring structural properties, however, have not been…
Kernel density estimation is a key component of a wide variety of algorithms in machine learning, Bayesian inference, stochastic dynamics and signal processing. However, the unsupervised density estimation technique requires tuning a…
Topology optimization is computationally demanding that requires the assembly and solution to a finite element problem for each material distribution hypothesis. As a complementary alternative to the traditional physics-based topology…
This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the…
We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs…
Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities like: exons, introns, intergenic and repeat/unique DNA sequences, in each of the 22 human chromosomes. The purpose…
Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a…
In this project, we present a deep neural network (DNN)-based biophysics model that uses multi-scale and uniform topological and electrostatic features to predict protein properties, such as Coulomb energies or solvation energies. The…
Methods of topological data analysis have been successfully applied in a wide range of fields to provide useful summaries of the structure of complex data sets in terms of topological descriptors, such as persistence diagrams. While there…
We address the problem of nonparametric estimation of characteristics for stationary and ergodic time series. We consider finite-alphabet time series and real-valued ones and the following four problems: i) estimation of the (limiting)…
Stochastic Boolean networks, or more generally, stochastic discrete networks, are an important class of computational models for molecular interaction networks. The stochasticity stems from the updating schedule. Standard updating schedules…
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed `volatility', to estimate selection pressures on protein…
Many online networks are measured and studied via sampling techniques, which typically collect a relatively small fraction of nodes and their associated edges. Past work in this area has primarily focused on obtaining a representative…
We introduce a method to estimate the complexity function of symbolic dynamical systems from a finite sequence of symbols. We test such complexity estimator on several symbolic dynamical systems whose complexity functions are known exactly.…
Neuroscientific data analysis has classically involved methods for statistical signal and image processing, drawing on linear algebra and stochastic process theory. However, digitized neuroanatomical data sets containing labelled neurons,…