Related papers: Efficient Classification of Long Documents Using T…

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP). Adapting NLP to multiple domains has introduced many new challenges for text classification and one of them is long…

Computation and Language · Computer Science 2023-07-20 Damith Premasiri , Tharindu Ranasinghe , Ruslan Mitkov

Revisiting Transformer-based Models for Long Document Classification

The recent literature in text classification is biased towards short text sequences (e.g., sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents are common and they cannot be efficiently encoded by…

Computation and Language · Computer Science 2022-10-26 Xiang Dai , Ilias Chalkidis , Sune Darkner , Desmond Elliott

Comparative Study of Long Document Classification

The amount of information stored in the form of documents on the internet has been increasing rapidly. Thus it has become a necessity to organize and maintain these documents in an optimum manner. Text classification algorithms study the…

Computation and Language · Computer Science 2022-02-22 Vedangi Wagh , Snehal Khandve , Isha Joshi , Apurva Wani , Geetanjali Kale , Raviraj Joshi

Long-length Legal Document Classification

One of the principal tasks of machine learning with major applications is text classification. This paper focuses on the legal domain and, in particular, on the classification of lengthy legal documents. The main challenge that this study…

Computation and Language · Computer Science 2019-12-17 Lulu Wan , George Papageorgiou , Michael Seddon , Mirko Bernardoni

Investigating the Working of Text Classifiers

Text classification is one of the most widely studied tasks in natural language processing. Motivated by the principle of compositionality, large multilayer neural network models have been employed for this task in an attempt to effectively…

Computation and Language · Computer Science 2018-08-07 Devendra Singh Sachan , Manzil Zaheer , Ruslan Salakhutdinov

A Survey on Long Text Modeling with Transformers

Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such…

Computation and Language · Computer Science 2025-06-11 Zican Dong , Tianyi Tang , Junyi Li , Wayne Xin Zhao

Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling

Transformer is important for text modeling. However, it has difficulty in handling long documents due to the quadratic complexity with input text length. In order to handle this problem, we propose a hierarchical interactive Transformer…

Computation and Language · Computer Science 2021-12-10 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Beyond Document Page Classification: Design, Datasets, and Challenges

This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set…

Computer Vision and Pattern Recognition · Computer Science 2023-11-01 Jordy Van Landeghem , Sanket Biswas , Matthew B. Blaschko , Marie-Francine Moens

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine…

Computation and Language · Computer Science 2018-06-12 Johannes Welbl , Pontus Stenetorp , Sebastian Riedel

Hierarchical Neural Network Approaches for Long Document Classification

Text classification algorithms investigate the intricate relationships between words or phrases and attempt to deduce the document's interpretation. In the last few years, these algorithms have progressed tremendously. Transformer…

Computation and Language · Computer Science 2022-06-28 Snehal Khandve , Vedangi Wagh , Apurva Wani , Isha Joshi , Raviraj Joshi

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine…

Machine Learning · Computer Science 2020-05-21 Kamran Kowsari , Kiana Jafari Meimandi , Mojtaba Heidarysafa , Sanjana Mendu , Laura E. Barnes , Donald E. Brown

A Survey on Transformer Context Extension: Approaches and Evaluation

Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long…

Computation and Language · Computer Science 2025-07-09 Yijun Liu , Jinzheng Yu , Yang Xu , Zhongyang Li , Qingfu Zhu

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance…

Computation and Language · Computer Science 2023-02-09 Akshita Jha , Adithya Samavedhi , Vineeth Rakesh , Jaideep Chandrashekar , Chandan K. Reddy

Transformers are Short Text Classifiers: A Study of Inductive Short Text Classifiers on Benchmarks and Real-world Datasets

Short text classification is a crucial and challenging aspect of Natural Language Processing. For this reason, there are numerous highly specialized short text classifiers. However, in recent short text research, State of the Art (SOTA)…

Computation and Language · Computer Science 2023-08-14 Fabian Karl , Ansgar Scherp

Document classification methods

Information on different fields which are collected by users requires appropriate management and organization to be structured in a standard way and retrieved fast and more easily. Document classification is a conventional method to…

Information Retrieval · Computer Science 2019-09-18 Madjid Khalilian , Shiva Hassanzadeh

Investigating Length Issues in Document-level Machine Translation

Transformer architectures are increasingly effective at processing and generating very long chunks of texts, opening new perspectives for document-level machine translation (MT). In this work, we challenge the ability of MT systems to…

Computation and Language · Computer Science 2025-04-29 Ziqian Peng , Rachel Bawden , François Yvon

Understanding Long Documents with Different Position-Aware Attentions

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current…

Computation and Language · Computer Science 2022-08-18 Hai Pham , Guoxin Wang , Yijuan Lu , Dinei Florencio , Cha Zhang

Evaluating Nonlinear Decision Trees for Binary Classification Tasks with Other Existing Methods

Classification of datasets into two or more distinct classes is an important machine learning task. Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily…

Machine Learning · Computer Science 2020-08-26 Yashesh Dhebar , Sparsh Gupta , Kalyanmoy Deb

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms…

Computation and Language · Computer Science 2022-10-12 Ilias Chalkidis , Xiang Dai , Manos Fergadiotis , Prodromos Malakasiotis , Desmond Elliott

Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis

Deceptive text classification is a critical task in natural language processing that aims to identify deceptive o fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for…

Computation and Language · Computer Science 2023-08-14 Anusuya Krishnan