Related papers: Embedding Compression for Text Classification Usin…

Semantic Text Compression for Classification

We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact…

Information Theory · Computer Science 2023-09-20 Emrecan Kutay , Aylin Yener

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases. Keywords from text documents are primarily extracted using…

Computation and Language · Computer Science 2018-07-17 Debanjan Mahata , John Kuriakose , Rajiv Ratn Shah , Roger Zimmermann , John R. Talburt

Text Ranking and Classification using Data Compression

A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but…

Machine Learning · Computer Science 2021-12-08 Nitya Kasturi , Igor L. Markov

Sentence Compression as Deletion with Contextual Embeddings

Sentence compression is the task of creating a shorter version of an input sentence while keeping important information. In this paper, we extend the task of compression by deletion with the use of contextual embeddings. Different from…

Information Retrieval · Computer Science 2020-06-08 Minh-Tien Nguyen , Bui Cong Minh , Dung Tien Le , Le Thai Linh

A Simple and Effective Approach for Fine Tuning Pre-trained Word Embeddings for Improved Text Classification

This work presents a new and simple approach for fine-tuning pretrained word embeddings for text classification tasks. In this approach, the class in which a term appears, acts as an additional contextual variable during the fine tuning…

Computation and Language · Computer Science 2019-12-17 Amr Al-Khatib , Samhaa R. El-Beltagy

Joint Embedding of Words and Labels for Text Classification

Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding…

Computation and Language · Computer Science 2018-05-14 Guoyin Wang , Chunyuan Li , Wenlin Wang , Yizhe Zhang , Dinghan Shen , Xinyuan Zhang , Ricardo Henao , Lawrence Carin

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is…

Computation and Language · Computer Science 2017-07-26 Andrei M. Butnaru , Radu Tudor Ionescu

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…

Other Computer Science · Computer Science 2015-12-23 Vivek Dimri , Prof. Ranjit Biswas

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which…

Computation and Language · Computer Science 2024-11-05 Jintang Xue , Yun-Cheng Wang , Chengwei Wei , C. -C. Jay Kuo

Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning

Sentence compression reduces the length of text by removing non-essential content while preserving important facts and grammaticality. Unsupervised objective driven methods for sentence compression can be used to create customized models…

Computation and Language · Computer Science 2022-05-18 Demian Gholipour Ghalandari , Chris Hokamp , Georgiana Ifrim

Compositional Coding Capsule Network with K-Means Routing for Text Classification

Text classification is a challenging problem which aims to identify the category of texts. In the process of training, word embeddings occupy a large part of parameters. Under the limitation of limited computing resources, it indirectly…

Machine Learning · Computer Science 2022-06-03 Hao Ren , Hong Lu

Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or…

Machine Learning · Computer Science 2018-11-05 Anish Acharya , Rahul Goel , Angeliki Metallinou , Inderjit Dhillon

Improve Lexicon-based Word Embeddings By Word Sense Disambiguation

There have been some works that learn a lexicon together with the corpus to improve the word embeddings. However, they either model the lexicon separately but update the neural networks for both the corpus and the lexicon by the same…

Computation and Language · Computer Science 2017-07-25 Yuanzhi Ke , Masafumi Hagiwara

TF-CR: Weighting Embeddings for Text Classification

Text classification, as the task consisting in assigning categories to textual instances, is a very common task in information science. Methods learning distributed representations of words, such as word embeddings, have become popular in…

Computation and Language · Computer Science 2020-12-15 Arkaitz Zubiaga

Evaluating Word Embeddings for Sentence Boundary Detection in Speech Transcripts

This paper is motivated by the automation of neuropsychological tests involving discourse analysis in the retellings of narratives by patients with potential cognitive impairment. In this scenario the task of sentence boundary detection in…

Computation and Language · Computer Science 2017-08-17 Marcos V. Treviso , Christopher D. Shulby , Sandra M. Aluisio

Keyword Embeddings for Query Suggestion

Nowadays, search engine users commonly rely on query suggestions to improve their initial inputs. Current systems are very good at recommending lexical adaptations or spelling corrections to users' queries. However, they often struggle to…

Information Retrieval · Computer Science 2023-01-24 Jorge Gabín , M. Eduardo Ares , Javier Parapar

Dictionary-based Debiasing of Pre-trained Word Embeddings

Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective…

Computation and Language · Computer Science 2021-01-26 Masahiro Kaneko , Danushka Bollegala

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

Word Embeddings and Their Use In Sentence Classification Tasks

This paper have two parts. In the first part we discuss word embeddings. We discuss the need for them, some of the methods to create them, and some of their interesting properties. We also compare them to image embeddings and see how word…

Machine Learning · Computer Science 2016-10-27 Amit Mandelbaum , Adi Shalev