Related papers: Neural String Edit Distance

Learning string edit distance

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…

cmp-lg · Computer Science 2008-02-03 Eric Sven Ristad , Peter N. Yianilos

Improved Algorithms for Approximate String Matching (Extended Abstract)

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…

Data Structures and Algorithms · Computer Science 2008-07-29 Dimitris Papamichail , Georgios Papamichail

Convolutional Embedding for Edit Distance

Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity…

Databases · Computer Science 2020-05-25 Xinyan Dai , Xiao Yan , Kaiwen Zhou , Yuxuan Wang , Han Yang , James Cheng

Towards Normalizing the Edit Distance Using a Genetic Algorithms Based Scheme

The normalized edit distance is one of the distances derived from the edit distance. It is useful in some applications because it takes into account the lengths of the two strings compared. The normalized edit distance is not defined in…

Neural and Evolutionary Computing · Computer Science 2013-12-09 Muhammad Marwan Muhammad Fuad

Pairing Orthographically Variant Literary Words to Standard Equivalents Using Neural Edit Distance Models

We present a novel corpus consisting of orthographically variant words found in works of 19th century U.S. literature annotated with their corresponding "standard" word pair. We train a set of neural edit distance models to pair these…

Computation and Language · Computer Science 2026-02-18 Craig Messner , Tom Lippincott

Approximating Edit Distance in the Fully Dynamic Model

The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$,…

Data Structures and Algorithms · Computer Science 2023-07-17 Tomasz Kociumaka , Anish Mukherjee , Barna Saha

Many Flavors of Edit Distance

Several measures exist for string similarity, including notable ones like the edit distance and the indel distance. The former measures the count of insertions, deletions, and substitutions required to transform one string into another,…

Data Structures and Algorithms · Computer Science 2024-10-15 Sudatta Bhattacharya , Sanjana Dey , Elazar Goldenberg , Michal Koucký

Learning Graph Edit Distance by Graph Neural Networks

The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine…

Computer Vision and Pattern Recognition · Computer Science 2020-08-19 Pau Riba , Andreas Fischer , Josep Lladós , Alicia Fornés

On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings

Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually…

Data Structures and Algorithms · Computer Science 2018-05-08 Moses Charikar , Ofir Geri , Michael P. Kim , William Kuszmaul

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random…

Machine Learning · Computer Science 2012-07-09 Andrew McCallum , Kedar Bellare , Fernando Pereira

Tree Edit Distance Learning via Adaptive Symbol Embeddings: Supplementary Materials and Results

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has…

Machine Learning · Computer Science 2018-05-21 Benjamin Paaßen

Transduce: learning transduction grammars for string transformation

The synthesis of string transformation programs from input-output examples utilizes various techniques, all based on an inductive bias that comprises a restricted set of basic operators to be combined. A new algorithm, Transduce, is…

Machine Learning · Computer Science 2024-01-19 Francis Frydman , Philippe Mangion

Unified Compression-Based Acceleration of Edit-Distance Computation

The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between…

Data Structures and Algorithms · Computer Science 2016-10-05 Danny Hermelin , Gad M. Landau , Shir Landau , Oren Weimann

Exponent-Strings and Their Edit Distance

An exponent-string is an extension of traditional strings that can incorporate real-numbered exponents, indicating the quantity of characters. This novel representation overcomes the limitations of traditional discrete string by enabling…

Formal Languages and Automata Theory · Computer Science 2024-08-26 Ingyu Baek

Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. In this paper we…

Data Structures and Algorithms · Computer Science 2016-07-14 Diptarka Chakraborty , Elazar Goldenberg , Michal Koucký

On Practical Accuracy of Edit Distance Approximation Algorithms

The edit distance is a basic string similarity measure used in many applications such as text mining, signal processing, bioinformatics, and so on. However, the computational cost can be a problem when we repeat many distance calculations…

Data Structures and Algorithms · Computer Science 2017-01-24 Hiroyuki Hanada , Mineichi Kudo , Atsuyoshi Nakamura

Online Pattern Matching for String Edit Distance with Moves

Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications…

Data Structures and Algorithms · Computer Science 2014-08-27 Yoshimasa Takabatake , Yasuo Tabei , Hiroshi Sakamoto

Extracting alignment data in open models

In this work, we show that it is possible to extract significant amounts of alignment training data from a post-trained model -- useful to steer the model to improve certain capabilities such as long-context reasoning, safety, instruction…

Artificial Intelligence · Computer Science 2025-10-27 Federico Barbero , Xiangming Gu , Christopher A. Choquette-Choo , Chawin Sitawarin , Matthew Jagielski , Itay Yona , Petar Veličković , Ilia Shumailov , Jamie Hayes

Supervised Attentions for Neural Machine Translation

In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine attentions and the "true" alignments, and…

Computation and Language · Computer Science 2016-08-02 Haitao Mi , Zhiguo Wang , Abe Ittycheriah

Robustness to Programmable String Transformations via Augmented Abstract Training

Deep neural networks for natural language processing tasks are vulnerable to adversarial input perturbations. In this paper, we present a versatile language for programmatically specifying string transformations -- e.g., insertions,…

Machine Learning · Computer Science 2020-09-03 Yuhao Zhang , Aws Albarghouthi , Loris D'Antoni