Related papers: Linear normalised hash function for clustering gen…

Maximum Match Subsequence Alignment Algorithm Finely Grained (MMSAA FG)

Sequence alignment is common nowadays as it is used in many fields to determine how closely two sequences are related and at times to see how little they differ. In computational biology / Bioinformatics, there are many algorithms developed…

Information Theory · Computer Science 2023-05-02 Bharath Reddy , Richard Fields

Algorithms for normalized multiple sequence alignments

Sequence alignment supports numerous tasks in bioinformatics, natural language processing, pattern recognition, social sciences, and others fields. While the alignment of two sequences may be performed swiftly in many applications, the…

Data Structures and Algorithms · Computer Science 2021-12-06 Eloi Araujo , Luiz Rozante , Diego P. Rubert , Fabio V. Martinez

Defining Reference Sequences for Nocardia Species by Similarity and Clustering Analyses of 16S rRNA Gene Sequence Data

The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study…

Genomics · Quantitative Biology 2023-12-01 Manal Helal , Fanrong Kong , Sharon C. A. Chen , Michael Bain , Richard Christen , Vitali Sintchenko

MuSAlS: A Fast Multiple Sequence Alignment Approach Using Hierarchical Clustering

Motivation: The multiple sequence alignment (MSA) problem has been extensively studied, with numerous approaches developed over recent years. With the rapid growth of sequence data, there is an increasing need for fast and accurate MSA…

Computational Engineering, Finance, and Science · Computer Science 2026-01-23 Emily G. Light , Morgan Prior , Noah M. Daniels , Najib Ishaq

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

This paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-29 Stuart Byma , Akash Dhasade , Adrian Altenhoff , Christophe Dessimoz , James R. Larus

A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms

Multiple Sequences Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-05-13 Fahad Saeed , Ashfaq Khokhar

A Fast Template Based Heuristic For Global Multiple Sequence Alignment

Advances in bio-technology have made available massive amounts of functional, structural and genomic data for many biological sequences. This increased availability of heterogeneous biological data has resulted in biological applications…

Computational Engineering, Finance, and Science · Computer Science 2013-02-26 Srikrishnan Divakaran , Arpit Mithal , Namit Jain

A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

Evolutionary modeling applications are the best way to provide full information to support in-depth understanding of evaluation of organisms. These applications mainly depend on identifying the evolutionary history of existing organisms and…

Computational Engineering, Finance, and Science · Computer Science 2018-06-01 Sara Shehab , Sameh Abdulah , Arabi E. Keshk

Feature Screening in Large Scale Cluster Analysis

We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex…

Methodology · Statistics 2017-10-05 Trambak Banerjee , Gourab Mukherjee , Peter Radchenko

Unsupervised Gene Expression Data using Enhanced Clustering Method

Microarrays are made it possible to simultaneously monitor the expression profiles of thousands of genes under various experimental conditions. Identification of co-expressed genes and coherent patterns is the central goal in microarray or…

Computational Engineering, Finance, and Science · Computer Science 2013-07-15 T. Chandrasekhar , K. Thangavel , E. Elayaraja , E. N. Sathishkumar

Clustering pipeline for determining consensus sequences in targeted next-generation sequencing

Analyses of targeted genomic sequencing data from next-generation-sequencing (NGS) technologies typically involves mapping reads to a reference sequence or clustering reads. For a number of species a reference genome is not available so the…

Genomics · Quantitative Biology 2016-02-16 Raunaq Malhotra , Daniel Elleder , Le Bao , David R Hunter , Raj Acharya , Mary Poss

Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to…

Neural and Evolutionary Computing · Computer Science 2019-10-24 Andrew Lensen , Bing Xue , Mengjie Zhang

Nonparametric clustering of RNA-sequencing data

Identification of clusters of co-expressed genes in transcriptomic data is a difficult task. Most algorithms used for this purpose can be classified into two broad categories: distance-based or model-based approaches. Distance-based…

Applications · Statistics 2022-09-26 Gabriel Lozano , Nadia Atallah , Michael Levine

Heteroskedastic Tensor Clustering

Tensor clustering, which seeks to extract underlying cluster structures from noisy tensor observations, has gained increasing attention. One extensively studied model for tensor clustering is the tensor block model, which postulates the…

Statistics Theory · Mathematics 2023-11-07 Yuchen Zhou , Yuxin Chen

Classification and clustering of sequencing data using a Poisson model

In recent years, advances in high throughput sequencing technology have led to a need for specialized methods for the analysis of digital gene expression data. While gene expression data measured on a microarray take on continuous values…

Applications · Statistics 2012-02-29 Daniela M. Witten

Learning the Precise Feature for Cluster Assignment

Clustering is one of the fundamental tasks in computer vision and pattern recognition. Recently, deep clustering methods (algorithms based on deep learning) have attracted wide attention with their impressive performance. Most of these…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Yanhai Gan , Xinghui Dong , Huiyu Zhou , Feng Gao , Junyu Dong

Clustering through Feature Space Sequence Discovery and Analysis

Identifying high-dimensional data patterns without a priori knowledge is an important task of data science. This paper proposes a simple and efficient noparametric algorithm: Data Convert to Sequence Analysis, DCSA, which dynamically…

Machine Learning · Computer Science 2022-12-05 Shi Guobin

Hierarchical clustered multiclass discriminant analysis via cross-validation

Linear discriminant analysis (LDA) is a well-known method for multiclass classification and dimensionality reduction. However, in general, ordinary LDA does not achieve high prediction accuracy when observations in some classes are…

Methodology · Statistics 2021-07-07 Kei Hirose , Kanta Miura , Atori Koie

UNCA: A Neutrosophic-Based Framework for Robust Clustering and Enhanced Data Interpretation

Accurately representing the complex linkages and inherent uncertainties included in huge datasets is still a major difficulty in the field of data clustering. We address these issues with our proposed Unified Neutrosophic Clustering…

Machine Learning · Computer Science 2025-02-26 D. Dhinakaran , S. Edwin Raja , S. Gopalakrishnan , D. Selvaraj , S. D. Lalitha

A Graphical Method for Identifying Gene Clusters from RNA Sequencing Data

The identification of disease-gene associations is instrumental in understanding the mechanisms of diseases and developing novel treatments. Besides identifying genes from RNA-Seq datasets, it is often necessary to identify gene clusters…

Genomics · Quantitative Biology 2025-11-14 Jake R. Patock , Rinki Ratnapriya , Arko Barman