Related papers: Document Classification Using Expectation Maximiza…

Performance Analysis of Supervised Machine Learning Algorithms for Text Classification

The demand for text classification is growing significantly in web searching, data mining, web ranking, recommendation systems, and so many other fields of information and technology. This paper illustrates the text classification process…

Computation and Language · Computer Science 2025-09-03 Sadia Zaman Mishu , S M Rafiuddin

Implicitly Constrained Semi-Supervised Linear Discriminant Analysis

Semi-supervised learning is an important and active topic of research in pattern recognition. For classification using linear discriminant analysis specifically, several semi-supervised variants have been proposed. Using any one of these…

Machine Learning · Statistics 2014-11-18 Jesse H. Krijthe , Marco Loog

Semi-supervised Classification for Natural Language Processing

Semi-supervised classification is an interesting idea where classification models are learned from both labeled and unlabeled data. It has several advantages over supervised classification in natural language processing domain. For…

Computation and Language · Computer Science 2014-09-29 Rushdi Shams

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

Machine Learning in Automated Text Categorization

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize…

Information Retrieval · Computer Science 2021-09-21 Fabrizio Sebastiani

A Survey on Semi-Supervised Learning Techniques

Semisupervised learning is a learning standard which deals with the study of how computers and natural systems such as human beings acquire knowledge in the presence of both labeled and unlabeled data. Semisupervised learning based methods…

Machine Learning · Computer Science 2014-02-20 V. Jothi Prakash , Dr. L. M. Nithya

Text Classification: A Perspective of Deep Learning Methods

In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…

Computation and Language · Computer Science 2023-09-26 Zhongwei Wan

ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning

The state of the art in semantic segmentation is steadily increasing in performance, resulting in more precise and reliable segmentations in many different applications. However, progress is limited by the cost of generating labels for…

Computer Vision and Pattern Recognition · Computer Science 2020-12-01 Viktor Olsson , Wilhelm Tranheden , Juliano Pinto , Lennart Svensson

Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification

Semi-supervised learning through deep generative models and multi-lingual pretraining techniques have orchestrated tremendous success across different areas of NLP. Nonetheless, their development has happened in isolation, while the…

Computation and Language · Computer Science 2021-01-27 Yi Zhu , Ehsan Shareghi , Yingzhen Li , Roi Reichart , Anna Korhonen

Automating Document Classification with Distant Supervision to Increase the Efficiency of Systematic Reviews

Objective: Systematic reviews of scholarly documents often provide complete and exhaustive summaries of literature relevant to a research question. However, well-done systematic reviews are expensive, time-demanding, and labor-intensive.…

Computation and Language · Computer Science 2020-12-15 Xiaoxiao Li , Rabah Al-Zaidy , Amy Zhang , Stefan Baral , Le Bao , C. Lee Giles

DOC: Deep Open Classification of Text Documents

Traditional supervised learning makes the closed-world assumption that the classes appeared in the test data must have appeared in training. This also applies to text learning or text classification. As learning is used increasingly in…

Computation and Language · Computer Science 2017-09-27 Lei Shu , Hu Xu , Bing Liu

Semi-supervised Text Categorization Using Recursive K-means Clustering

In this paper, we present a semi-supervised learning algorithm for classification of text documents. A method of labeling unlabeled text documents is presented. The presented method is based on the principle of divide and conquer strategy.…

Machine Learning · Computer Science 2017-06-27 Harsha S. Gowda , Mahamad Suhil , D. S. Guru , Lavanya Narayana Raju

Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers

Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training. Existing methods are mostly graph-based with sentences as nodes and edge weights measured by…

Computation and Language · Computer Science 2021-12-14 Shusheng Xu , Xingxing Zhang , Yi Wu , Furu Wei , Ming Zhou

A hybrid learning algorithm for text classification

Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify text need sufficient documents to learn accurately. This paper…

Neural and Evolutionary Computing · Computer Science 2010-09-27 S. M. Kamruzzaman , Farhana Haider

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Improvability Through Semi-Supervised Learning: A Survey of Theoretical Results

Semi-supervised learning is a setting in which one has labeled and unlabeled data available. In this survey we explore different types of theoretical results when one uses unlabeled data in classification and regression tasks. Most methods…

Machine Learning · Computer Science 2020-07-31 Alexander Mey , Marco Loog

Unsupervised and Distributional Detection of Machine-Generated Text

The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored. The problem so far has been framed in a standard supervised way and consists…

Computation and Language · Computer Science 2021-11-05 Matthias Gallé , Jos Rozen , Germán Kruszewski , Hady Elsahar

Utility-Theoretic Ranking for Semi-Automated Text Classification

\emph{Semi-Automated Text Classification} (SATC) may be defined as the task of ranking a set $\mathcal{D}$ of automatically labelled textual documents in such a way that, if a human annotator validates (i.e., inspects and corrects where…

Machine Learning · Computer Science 2021-09-21 Giacomo Berardi , Andrea Esuli , Fabrizio Sebastiani

Unsupervised Data Selection for Supervised Learning

Recent research put a big effort in the development of deep learning architectures and optimizers obtaining impressive results in areas ranging from vision to language processing. However little attention has been addressed to the need of a…

Computer Vision and Pattern Recognition · Computer Science 2018-12-20 Gabriele Valvano , Andrea Leo , Daniele Della Latta , Nicola Martini , Gianmarco Santini , Dante Chiappino , Emiliano Ricciardi

Unsupervised Image Matching and Object Discovery as Optimization

Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an…

Computer Vision and Pattern Recognition · Computer Science 2019-04-08 Huy V. Vo , Francis Bach , Minsu Cho , Kai Han , Yann LeCun , Patrick Perez , Jean Ponce