Related papers: Pyro-Align: Sample-Align based Multiple Alignment …
Pyrosequencing is among the emerging sequencing techniques, capable of generating upto 100,000 overlapping reads in a single run. This technique is much faster and cheaper than the existing state of the art sequencing technique such as…
Various methods have been developed to analyze the association between organisms and their genomic sequences. Among them, sequence alignment is the most frequently used for comparative analysis of biological genomes. However, the…
Multiple sequence alignment is a basic procedure in molecular biology, and it is often treated as being essentially a solved computational problem. However, this is not so, and here I review the evidence for this claim, and outline the…
An overview of current multiple alignment systems to date are described.The useful algorithms, the procedures adopted and their limitations are presented.We also present the quality of the alignments obtained and in which cases(kind of…
Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment…
Multiple sequence alignment (MSA) has been one of the most important problems in bioinformatics for more decades and it is still heavily examined by many mathematicians and biologists. However, mostly because of the practical motivation of…
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of…
We present SentAlign, an accurate sentence alignment tool designed to handle very large parallel document pairs. Given user-defined parameters, the alignment algorithm evaluates all possible alignment paths in fairly large documents of…
We propose a metric for the space of multiple sequence alignments that can be used to compare two alignments to each other. In the case where one of the alignments is a reference alignment, the resulting accuracy measure improves upon…
We address the text-to-text generation problem of sentence-level paraphrasing -- a phenomenon distinct from and more difficult than word- or phrase-level paraphrasing. Our approach applies multiple-sequence alignment to sentences gathered…
The way for performing multiple sequence alignment is based on the criterion of the maximum scored information content computed from a weight matrix, but it is possible to have two or more alignments to have the same highest score leading…
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different…
Alignment of large language models (LLMs) involves training models on preference-contrastive output pairs to adjust their responses according to human preferences. To obtain such contrastive pairs, traditional methods like RLHF and RLAIF…
While alignment of texts on the sentential level is often seen as being too coarse, and word alignment as being too fine-grained, bi- or multilingual texts which are aligned on a level in-between are a useful resource for many purposes.…
Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads…
Current large language models (LLMs) generally show a significant performance gap in alignment between English and other languages. To bridge this gap, existing research typically leverages the model's responses in English as a reference to…
Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal perception and generation, yet their safety alignment remains a critical challenge.Existing defenses and vulnerable to multimodal jailbreaks, as visual…
Word alignment, which aims to align translationally equivalent words between source and target sentences, plays an important role in many natural language processing tasks. Current unsupervised neural alignment methods focus on inducing…
Genome sequencing is the basis for many modern biological and medicinal studies. With recent technological advances, metagenomics has become a problem of interest. This problem entails the analysis and reconstruction of multiple DNA…
We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising…