Related papers: Word Embedding based Edit Distance

Word Mover's Embedding: From Word2Vec to Document Embedding

While the celebrated Word2Vec technique yields semantically rich representations for individual words, there has been relatively less success in extending to generate unsupervised sentences or documents embeddings. Recent work has…

Computation and Language · Computer Science 2018-11-06 Lingfei Wu , Ian E. H. Yen , Kun Xu , Fangli Xu , Avinash Balakrishnan , Pin-Yu Chen , Pradeep Ravikumar , Michael J. Witbrock

Convolutional Embedding for Edit Distance

Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity…

Databases · Computer Science 2020-05-25 Xinyan Dai , Xiao Yan , Kaiwen Zhou , Yuxuan Wang , Han Yang , James Cheng

Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching

In a way similar to the string-to-string correction problem we address time series similarity in the light of a time-series-to-time-series-correction problem for which the similarity between two time series is measured as the minimum cost…

Information Retrieval · Computer Science 2008-12-28 Pierre-François Marteau

Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy

Word translation is an integral part of language translation. In machine translation, each language is considered a domain with its own word embedding. The alignment between word embeddings allows linking semantically equivalent words in…

Computation and Language · Computer Science 2020-06-23 Antonio H. O. Fonseca , David van Dijk

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning…

Computation and Language · Computer Science 2020-05-11 Martina Toshevska , Frosina Stojanovska , Jovan Kalajdjieski

Unsupervised learning of text line segmentation by differentiating coarse patterns

Despite recent advances in the field of supervised deep learning for text line segmentation, unsupervised deep learning solutions are beginning to gain popularity. In this paper, we present an unsupervised deep learning method that embeds…

Computer Vision and Pattern Recognition · Computer Science 2021-05-24 Berat Kurar Barakat , Ahmad Droby , Raid Saabni , Jihad El-Sana

Combinatorial Learning of Graph Edit Distance via Dynamic Embedding

Graph Edit Distance (GED) is a popular similarity measurement for pairwise graphs and it also refers to the recovery of the edit path from the source graph to the target graph. Traditional A* algorithm suffers scalability issues due to its…

Machine Learning · Computer Science 2020-12-03 Runzhong Wang , Tianqi Zhang , Tianshu Yu , Junchi Yan , Xiaokang Yang

Evaluating the impact of word embeddings on similarity scoring in practical information retrieval

Search behaviour is characterised using synonymy and polysemy as users often want to search information based on meaning. Semantic representation strategies represent a move towards richer associative connections that can adequately capture…

Information Retrieval · Computer Science 2026-02-06 Niall McCarroll , Kevin Curran , Eugene McNamee , Angela Clist , Andrew Brammer

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding

Mitigating biases in machine learning models has become an increasing concern in Natural Language Processing (NLP), particularly in developing fair text embeddings, which are crucial yet challenging for real-world applications like search…

Computation and Language · Computer Science 2024-06-25 Wenlong Deng , Blair Chen , Beidi Zhao , Chiyu Zhang , Xiaoxiao Li , Christos Thrampoulidis

Soft edit distance for differentiable comparison of symbolic sequences

Edit distance, also known as Levenshtein distance, is an essential way to compare two strings that proved to be particularly useful in the analysis of genetic sequences and natural language processing. However, edit distance is a discrete…

Machine Learning · Computer Science 2019-04-30 Evgenii Ofitserov , Vasily Tsvetkov , Vadim Nazarov

Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints

In this paper, we address a similarity search problem for spatial trajectories in road networks. In particular, we focus on the subtrajectory similarity search problem, which involves finding in a database the subtrajectories similar to a…

Databases · Computer Science 2020-07-13 Satoshi Koide , Chuan Xiao , Yoshiharu Ishikawa

Angular-Based Word Meta-Embedding Learning

Ensembling word embeddings to improve distributed word representations has shown good success for natural language processing tasks in recent years. These approaches either carry out straightforward mathematical operations over a set of…

Computation and Language · Computer Science 2018-08-14 James O' Neill , Danushka Bollegala

RPD: A Distance Function Between Word Embeddings

It is well-understood that different algorithms, training processes, and corpora produce different word embeddings. However, less is known about the relation between different embedding spaces, i.e. how far different sets of embeddings…

Computation and Language · Computer Science 2020-05-19 Xuhui Zhou , Zaixiang Zheng , Shujian Huang

Unsupervised Sentence Representations as Word Information Series: Revisiting TF--IDF

Sentence representation at the semantic level is a challenging task for Natural Language Processing and Artificial Intelligence. Despite the advances in word embeddings (i.e. word vector representations), capturing sentence meaning is an…

Computation and Language · Computer Science 2017-10-23 Ignacio Arroyo-Fernández , Carlos-Francisco Méndez-Cruz , Gerardo Sierra , Juan-Manuel Torres-Moreno , Grigori Sidorov

Learning Graph Edit Distance by Graph Neural Networks

The emergence of geometric deep learning as a novel framework to deal with graph-based representations has faded away traditional approaches in favor of completely new methodologies. In this paper, we propose a new framework able to combine…

Computer Vision and Pattern Recognition · Computer Science 2020-08-19 Pau Riba , Andreas Fischer , Josep Lladós , Alicia Fornés

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However,…

Computation and Language · Computer Science 2021-06-17 Badr M. Abdullah , Marius Mosbach , Iuliia Zaitova , Bernd Möbius , Dietrich Klakow

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which…

Computation and Language · Computer Science 2024-11-05 Jintang Xue , Yun-Cheng Wang , Chengwei Wei , C. -C. Jay Kuo

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures…

Computation and Language · Computer Science 2015-08-04 Jian Tang , Meng Qu , Qiaozhu Mei

Evaluation of taxonomic and neural embedding methods for calculating semantic similarity

Modelling semantic similarity plays a fundamental role in lexical semantic applications. A natural way of calculating semantic similarity is to access handcrafted semantic networks, but similarity prediction can also be anticipated in a…

Computation and Language · Computer Science 2022-10-03 Dongqiang Yang , Yanqin Yin

Learning Sentence Embeddings for Coherence Modelling and Beyond

We present a novel and effective technique for performing text coherence tasks while facilitating deeper insights into the data. Despite obtaining ever-increasing task performance, modern deep-learning approaches to NLP tasks often only…

Computation and Language · Computer Science 2019-08-09 Tanner Bohn , Yining Hu , Jinhang Zhang , Charles X. Ling