Related papers: Learning Character-level Compositionality with Vis…

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information. In this paper, we propose a model to learn Chinese word embeddings via three-level composition: (1) a…

Computation and Language · Computer Science 2019-03-26 Chi Sun , Xipeng Qiu , Xuanjing Huang

Charagram: Embedding Words and Sentences via Character n-grams

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear…

Computation and Language · Computer Science 2016-07-12 John Wieting , Mohit Bansal , Kevin Gimpel , Karen Livescu

A Joint Model for Word Embedding and Word Morphology

This paper presents a joint model for performing unsupervised morphological analysis on words, and learning a character-level composition function from morphemes to word embeddings. Our model splits individual words into segments, and…

Computation and Language · Computer Science 2016-06-09 Kris Cao , Marek Rei

Effective Character-augmented Word Embedding for Machine Reading Comprehension

Machine reading comprehension is a task to model relationship between passage and query. In terms of deep learning framework, most of state-of-the-art models simply concatenate word and character level representations, which has been shown…

Computation and Language · Computer Science 2021-01-08 Zhuosheng Zhang , Yafang Huang , Pengfei Zhu , Hai Zhao

Text segmentation with character-level text embeddings

Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of word tokens as input to linguistic analysis is often unjustified. For many languages word segmentation is a…

Computation and Language · Computer Science 2013-09-19 Grzegorz Chrupała

Component-Enhanced Chinese Character Embeddings

Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese…

Computation and Language · Computer Science 2015-08-28 Yanran Li , Wenjie Li , Fei Sun , Sujian Li

Effective Subword Segmentation for Text Comprehension

Representation learning is the foundation of machine reading comprehension and inference. In state-of-the-art models, character-level representations have been broadly adopted to alleviate the problem of effectively representing rare or…

Computation and Language · Computer Science 2019-06-12 Zhuosheng Zhang , Hai Zhao , Kangwei Ling , Jiangtong Li , Zuchao Li , Shexia He , Guohong Fu

Learning Chinese Word Representations From Glyphs Of Characters

In this paper, we propose new methods to learn Chinese word representations. Chinese characters are composed of graphical components, which carry rich semantics. It is common for a Chinese learner to comprehend the meaning of a word from…

Computation and Language · Computer Science 2017-08-17 Tzu-Ray Su , Hung-Yi Lee

Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models

Most of the Chinese pre-trained models adopt characters as basic units for downstream tasks. However, these models ignore the information carried by words and thus lead to the loss of some important semantics. In this paper, we propose a…

Computation and Language · Computer Science 2022-07-14 Wenbiao Li , Rui Sun , Yunfang Wu

An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning

In the last few years, neural networks have been intensively used to develop meaningful distributed representations of words and contexts around them. When these representations, also known as "embeddings", are learned from unsupervised…

Computation and Language · Computer Science 2019-08-07 Giuseppe Marra , Andrea Zugarini , Stefano Melacci , Marco Maggini

Measuring Compositionality in Representation Learning

Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this…

Machine Learning · Computer Science 2019-04-09 Jacob Andreas

Attending to Characters in Neural Sequence Labeling Models

Sequence labeling architectures use word embeddings for capturing similarity, but suffer when handling previously unseen or rare words. We investigate character-level extensions to such models and propose a novel architecture for combining…

Computation and Language · Computer Science 2016-11-15 Marek Rei , Gamal K. O. Crichton , Sampo Pyysalo

From Characters to Words to in Between: Do We Capture Morphology?

Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they…

Computation and Language · Computer Science 2017-04-28 Clara Vania , Adam Lopez

Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English…

Computation and Language · Computer Science 2018-05-24 Ruixi Lin , Charles Costello , Charles Jankowski

On the Correspondence between Compositionality and Imitation in Emergent Neural Communication

Compositionality is a hallmark of human language that not only enables linguistic generalization, but also potentially facilitates acquisition. When simulating language emergence with neural networks, compositionality has been shown to…

Computation and Language · Computer Science 2023-05-23 Emily Cheng , Mathieu Rita , Thierry Poibeau

Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages

We present a transition-based dependency parser that uses a convolutional neural network to compose word representations from characters. The character composition model shows great improvement over the word-lookup model, especially for…

Computation and Language · Computer Science 2017-06-01 Xiang Yu , Ngoc Thang Vu

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the…

Computation and Language · Computer Science 2014-06-17 Misha Denil , Alban Demiraj , Nal Kalchbrenner , Phil Blunsom , Nando de Freitas

Word Shape Matters: Robust Machine Translation with Visual Embedding

Neural machine translation has achieved remarkable empirical performance over standard benchmark datasets, yet recent evidence suggests that the models can still fail easily dealing with substandard inputs such as misspelled words, To…

Computation and Language · Computer Science 2020-10-21 Haohan Wang , Peiyan Zhang , Eric P. Xing

Linear Spaces of Meanings: Compositional Structures in Vision-Language Models

We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs). Traditionally, compositionality has been associated with algebraic operations on embeddings of words from a pre-existing vocabulary.…

Machine Learning · Computer Science 2024-01-12 Matthew Trager , Pramuditha Perera , Luca Zancato , Alessandro Achille , Parminder Bhatia , Stefano Soatto

A Probabilistic Framework for Learning Domain Specific Hierarchical Word Embeddings

The meaning of a word often varies depending on its usage in different domains. The standard word embedding models struggle to represent this variation, as they learn a single global representation for a word. We propose a method to learn…

Computation and Language · Computer Science 2019-10-22 Lahari Poddar , Gyorgy Szarvas , Lea Frermann