Related papers: Text Classification with Compression Algorithms

Compressed Support Vector Machines

Support vector machines (SVM) can classify data sets along highly non-linear decision boundaries because of the kernel-trick. This expressiveness comes at a price: During test-time, the SVM classifier needs to compute the kernel…

Machine Learning · Computer Science 2015-02-03 Zhixiang Xu , Jacob R. Gardner , Stephen Tyree , Kilian Q. Weinberger

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…

Data Structures and Algorithms · Computer Science 2017-12-13 Muhammad Farhan , Juvaria Tariq , Arif Zaman , Mudassir Shabbir , Imdad Ullah Khan

Text Ranking and Classification using Data Compression

A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but…

Machine Learning · Computer Science 2021-12-08 Nitya Kasturi , Igor L. Markov

Classifying text using machine learning models and determining conversation drift

Text classification helps analyse texts for semantic meaning and relevance, by mapping the words against this hierarchy. An analysis of various types of texts is invaluable to understanding both their semantic meaning, as well as their…

Machine Learning · Computer Science 2022-11-16 Chaitanya Chadha , Vandit Gupta , Deepak Gupta , Ashish Khanna

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine…

Machine Learning · Computer Science 2020-05-21 Kamran Kowsari , Kiana Jafari Meimandi , Mojtaba Heidarysafa , Sanjana Mendu , Laura E. Barnes , Donald E. Brown

Accelerating Kernel Classifiers Through Borders Mapping

Support vector machines (SVM) and other kernel techniques represent a family of powerful statistical classification methods with high accuracy and broad applicability. Because they use all or a significant portion of the training data,…

Machine Learning · Statistics 2023-01-31 Peter Mills

Embedding Compression for Text Classification Using Dictionary Screening

In this paper, we propose a dictionary screening method for embedding compression in text classification tasks. The key purpose of this method is to evaluate the importance of each keyword in the dictionary. To this end, we first train a…

Computation and Language · Computer Science 2022-11-24 Jing Zhou , Xinru Jing , Muyu Liu , Hansheng Wang

A Comparison of Neural Network Training Methods for Text Classification

We study the impact of neural networks in text classification. Our focus is on training deep neural networks with proper weight initialization and greedy layer-wise pretraining. Results are compared with 1-layer neural networks and Support…

Computation and Language · Computer Science 2019-10-29 Anderson de Andrade

Using compression to identify acronyms in text

Text mining is about looking for patterns in natural language text, and may be defined as the process of analyzing text to extract information from it for particular purposes. In previous work, we claimed that compression is a key…

Digital Libraries · Computer Science 2007-05-23 Stuart Yeates , David Bainbridge , Ian H. Witten

Text classification with pixel embedding

We propose a novel framework to understand the text by converting sentences or articles into video-like 3-dimensional tensors. Each frame, corresponding to a slice of the tensor, is a word image that is rendered by the word's shape. The…

Computation and Language · Computer Science 2021-11-08 Bin Liu , Guosheng Yin , Wenbin Du

Light-Weighted CNN for Text Classification

For management, documents are categorized into a specific category, and to do these, most of the organizations use manual labor. In today's automation era, manual efforts on such a task are not justified, and to avoid this, we have so many…

Machine Learning · Computer Science 2020-04-20 Ritu Yadav

Exploring Kernel Functions in the Softmax Layer for Contextual Word Classification

Prominently used in support vector machines and logistic regressions, kernel functions (kernels) can implicitly map data points into high dimensional spaces and make it easier to learn complex decision boundaries. In this work, by replacing…

Computation and Language · Computer Science 2019-10-29 Yingbo Gao , Christian Herold , Weiyue Wang , Hermann Ney

Conformal Transformation of Kernels: A Geometric Perspective on Text Classification

In this article we investigate the effects of conformal transformations on kernel functions used in Support Vector Machines. Our focus lies in the task of text document categorization, which involves assigning each document to a particular…

Machine Learning · Computer Science 2024-06-04 Ioana Rădulescu , Alexandra Băicoianu , Adela Mihai

Arabic Text Categorization Algorithm using Vector Evaluation Method

Text categorization is the process of grouping documents into categories based on their contents. This process is important to make information retrieval easier, and it became more important due to the huge textual information available…

Information Retrieval · Computer Science 2015-01-08 Ashraf Odeh , Aymen Abu-Errub , Qusai Shambour , Nidal Turab

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

A comparison of SVM and RVM for Document Classification

Document classification is a task of assigning a new unclassified document to one of the predefined set of classes. The content based document classification uses the content of the document with some weighting criteria to assign it to one…

Information Retrieval · Computer Science 2013-01-15 Muhammad Rafi , Mohammad Shahid Shaikh

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

We present a supervised learning algorithm for text categorization which has brought the team of authors the 2nd place in the text categorization division of the 2012 Cybersecurity Data Mining Competition (CDMC'2012) and a 3rd prize…

Information Retrieval · Computer Science 2013-07-11 Hubert Haoyang Duan , Vladimir Pestov , Varun Singla

Kernel methods for interpretable machine learning of order parameters

Machine learning is capable of discriminating phases of matter, and finding associated phase transitions, directly from large data sets of raw state configurations. In the context of condensed matter physics, most progress in the field of…

Statistical Mechanics · Physics 2017-12-06 Pedro Ponte , Roger G. Melko

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

Semantic Text Compression for Classification

We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact…

Information Theory · Computer Science 2023-09-20 Emrecan Kutay , Aylin Yener