Related papers: Document Classification using File Names

Learning Term Discrimination

Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the…

Information Retrieval · Computer Science 2020-04-29 Jibril Frej , Phillipe Mulhem , Didier Schwab , Jean-Pierre Chevallet

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Humans can perform classification without seeing any labeled…

Computation and Language · Computer Science 2020-10-15 Yu Meng , Yunyi Zhang , Jiaxin Huang , Chenyan Xiong , Heng Ji , Chao Zhang , Jiawei Han

Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines

This paper presents an approach for real-time training and testing for document image classification. In production environments, it is crucial to perform accurate and (time-)efficient training. Existing deep learning approaches for…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Andreas Kölsch , Muhammad Zeshan Afzal , Markus Ebbecke , Marcus Liwicki

File Fragment Classification using Light-Weight Convolutional Neural Networks

In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like…

Cryptography and Security · Computer Science 2025-04-15 Mustafa Ghaleb , Kunwar Saaim , Muhamad Felemban , Saleh Al-Saleh , Ahmad Al-Mulhem

Efficient Classification of Long Documents Using Transformers

Several methods have been proposed for classifying long textual documents using Transformers. However, there is a lack of consensus on a benchmark to enable a fair comparison among different approaches. In this paper, we provide a…

Computation and Language · Computer Science 2022-03-23 Hyunji Hayley Park , Yogarshi Vyas , Kashif Shah

Improving the Efficiency of Long Document Classification using Sentence Ranking Approach

Long document classification poses challenges due to the computational limitations of transformer-based models, particularly BERT, which are constrained by fixed input lengths and quadratic attention complexity. Moreover, using the full…

Computation and Language · Computer Science 2025-06-24 Prathamesh Kokate , Mitali Sarnaik , Manavi Khopade , Raviraj Joshi

Web Document Categorization Using Naive Bayes Classifier and Latent Semantic Analysis

A rapid growth of web documents due to heavy use of World Wide Web necessitates efficient techniques to efficiently classify the document on the web. It is thus produced High volumes of data per second with high diversity. Automatically…

Computation and Language · Computer Science 2020-06-03 Alireza Saleh Sedghpour , Mohammad Reza Saleh Sedghpour

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these…

Computation and Language · Computer Science 2022-12-16 Tingyu Xia , Yue Wang , Yuan Tian , Yi Chang

Automating Document Classification with Distant Supervision to Increase the Efficiency of Systematic Reviews

Objective: Systematic reviews of scholarly documents often provide complete and exhaustive summaries of literature relevant to a research question. However, well-done systematic reviews are expensive, time-demanding, and labor-intensive.…

Computation and Language · Computer Science 2020-12-15 Xiaoxiao Li , Rabah Al-Zaidy , Amy Zhang , Stefan Baral , Le Bao , C. Lee Giles

Comparative Study of Long Document Classification

The amount of information stored in the form of documents on the internet has been increasing rapidly. Thus it has become a necessity to organize and maintain these documents in an optimum manner. Text classification algorithms study the…

Computation and Language · Computer Science 2022-02-22 Vedangi Wagh , Snehal Khandve , Isha Joshi , Apurva Wani , Geetanjali Kale , Raviraj Joshi

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this…

Information Retrieval · Computer Science 2021-01-27 Amir Jalilifard , Vinicius F. Caridá , Alex F. Mansano , Rogers S. Cristo , Felipe Penhorate C. da Fonseca

Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts

Legal practitioners and judicial institutions face an ever-growing volume of case-law documents characterised by formalised language, lengthy sentence structures, and highly specialised terminology, making manual triage both time-consuming…

Computation and Language · Computer Science 2026-04-21 Moinul Hossain , Sourav Rabi Das , Zikrul Shariar Ayon , Sadia Afrin Promi , Ahnaf Atef Choudhury , Shakila Rahman , Jia Uddin

An Effective Approach for Web Document Classification using the Concept of Association Analysis of Data Mining

Exponential growth of the web increased the importance of web document classification and data mining. To get the exact information, in the form of knowing what classes a web document belongs to, is expensive. Automatic classification of…

Information Retrieval · Computer Science 2014-06-24 R. K. Roul , S. K. Sahay

Web Document Clustering and Ranking using Tf-Idf based Apriori Approach

The dynamic web has increased exponentially over the past few years with more than thousands of documents related to a subject available to the user now. Most of the web documents are unstructured and not in an organized manner and hence…

Information Retrieval · Computer Science 2014-06-24 R. K. Roul , O. R. Devanand , S. K. Sahay

Efficient Document Image Classification Using Region-Based Graph Neural Network

Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and…

Computer Vision and Pattern Recognition · Computer Science 2021-06-28 Jaya Krishna Mandivarapu , Eric Bunch , Qian You , Glenn Fung

Text Classification: A Perspective of Deep Learning Methods

In recent years, with the rapid development of information on the Internet, the number of complex texts and documents has increased exponentially, which requires a deeper understanding of deep learning methods in order to accurately…

Computation and Language · Computer Science 2023-09-26 Zhongwei Wan

Machine learning approach for text and document mining

Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a…

Information Retrieval · Computer Science 2014-06-09 Vishwanath Bijalwan , Pinki Kumari , Jordan Pascual , Vijay Bhaskar Semwal

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

A Robust Hybrid Approach for Textual Document Classification

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This…

Computation and Language · Computer Science 2019-09-13 Muhammad Nabeel Asim , Muhammad Usman Ghani Khan , Muhammad Imran Malik , Andreas Dengel , Sheraz Ahmed

Text Classification and Distributional features techniques in Datamining and Warehousing

Text Categorization is traditionally done by using the term frequency and inverse document frequency.This type of method is not very good because, some words which are not so important may appear in the document .The term frequency of…

Information Retrieval · Computer Science 2016-11-25 Srikanth Bethu , G Charless Babu , J Vinoda , E Priyadarshini , M Raghavendra rao