English

Open Knowledge Graphs Canonicalization using Variational Autoencoders

Computation and Language 2021-09-29 v2 Artificial Intelligence Information Retrieval Machine Learning

Abstract

Noun phrases and Relation phrases in open knowledge graphs are not canonicalized, leading to an explosion of redundant and ambiguous subject-relation-object triples. Existing approaches to solve this problem take a two-step approach. First, they generate embedding representations for both noun and relation phrases, then a clustering algorithm is used to group them using the embeddings as features. In this work, we propose Canonicalizing Using Variational Autoencoders (CUVA), a joint model to learn both embeddings and cluster assignments in an end-to-end approach, which leads to a better vector representation for the noun and relation phrases. Our evaluation over multiple benchmarks shows that CUVA outperforms the existing state-of-the-art approaches. Moreover, we introduce CanonicNell, a novel dataset to evaluate entity canonicalization systems.

Keywords

Cite

@article{arxiv.2012.04780,
  title  = {Open Knowledge Graphs Canonicalization using Variational Autoencoders},
  author = {Sarthak Dash and Gaetano Rossiello and Nandana Mihindukulasooriya and Sugato Bagchi and Alfio Gliozzo},
  journal= {arXiv preprint arXiv:2012.04780},
  year   = {2021}
}

Comments

Accepted to EMNLP 2021

R2 v1 2026-06-23T20:49:54.148Z