Related papers: Two-Stage Document Length Normalization for Inform…

Improving Term Frequency Normalization for Multi-topical Documents, and Application to Language Modeling Approaches

Term frequency normalization is a serious issue since lengths of documents are various. Generally, documents become long due to two different reasons - verbosity and multi-topicality. First, verbosity means that the same topic is repeatedly…

Information Retrieval · Computer Science 2015-02-10 Seung-Hoon Na , In-Su Kang , Jong-Hyeok Lee

A Novel LLM-based Two-stage Summarization Approach for Long Dialogues

Long document summarization poses a significant challenge in natural language processing due to input lengths that exceed the capacity of most state-of-the-art pre-trained language models. This study proposes a hierarchical framework that…

Computation and Language · Computer Science 2024-10-10 Yuan-Jhe Yin , Bo-Yu Chen , Berlin Chen

Rule based Approach for Word Normalization by resolving Transcription Ambiguity in Transliterated Search Queries

Query term matching with document term matching is the basic function of any best effort Information Retrieval models like Vector Space Model. In our problem of SMS based Information Systems we expect common people to participate in…

Information Retrieval · Computer Science 2019-10-17 Varsha Pathak , Manish Joshi

Regularization approaches for support vector machines with applications to biomedical data

The support vector machine (SVM) is a widely used machine learning tool for classification based on statistical learning theory. Given a set of training data, the SVM finds a hyperplane that separates two different classes of data points by…

Machine Learning · Computer Science 2017-10-31 Daniel Lopez-Martinez

Text Retrieval with Multi-Stage Re-Ranking Models

The text retrieval is the task of retrieving similar documents to a search query, and it is important to improve retrieval accuracy while maintaining a certain level of retrieval speed. Existing studies have reported accuracy improvements…

Information Retrieval · Computer Science 2023-11-15 Yuichi Sasazawa , Kenichi Yokote , Osamu Imaichi , Yasuhiro Sogawa

Document Expansion by Query Prediction

One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content.From the perspective of a question answering system, this might comprise…

Information Retrieval · Computer Science 2019-09-26 Rodrigo Nogueira , Wei Yang , Jimmy Lin , Kyunghyun Cho

Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples

Advances in vision-language models (VLMs) have enabled effective cross-modality retrieval. However, when both text and images exist in the database, similarity scores would differ in scale by modality. This phenomenon, known as the modality…

Computation and Language · Computer Science 2025-12-01 Shuhei Yamashita , Daiki Shirafuji , Tatsuhiko Saito

Information retrieval for label noise document ranking by bag sampling and group-wise loss

Long Document retrieval (DR) has always been a tremendous challenge for reading comprehension and information retrieval. The pre-training model has achieved good results in the retrieval stage and Ranking for long documents in recent years.…

Information Theory · Computer Science 2022-03-15 Chunyu Li , Jiajia Ding , Xing hu , Fan Wang

TOME: A Two-stage Approach for Model-based Retrieval

Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters. This design employs a…

Information Retrieval · Computer Science 2023-05-19 Ruiyang Ren , Wayne Xin Zhao , Jing Liu , Hua Wu , Ji-Rong Wen , Haifeng Wang

A Bayesian Approach to Estimation of Speaker Normalization Parameters

In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique…

Sound · Computer Science 2016-10-20 Dhananjay Ram , Debasis Kundu , Rajesh M. Hegde

Encoded Summarization: Summarizing Documents into Continuous Vector Space for Legal Case Retrieval

We present our method for tackling a legal case retrieval task by introducing our method of encoding documents by summarizing them into continuous vector space via our phrase scoring framework utilizing deep neural networks. On the other…

Computation and Language · Computer Science 2023-09-18 Vu Tran , Minh Le Nguyen , Satoshi Tojo , Ken Satoh

Multilevel Text Normalization with Sequence-to-Sequence Networks and Multisource Learning

We define multilevel text normalization as sequence-to-sequence processing that transforms naturally noisy text into a sequence of normalized units of meaning (morphemes) in three steps: 1) writing normalization, 2) lemmatization, 3)…

Computation and Language · Computer Science 2019-04-01 Tatyana Ruzsics , Tanja Samardžić

A Comparative Study on Different Types of Approaches to Bengali document Categorization

Document categorization is a technique where the category of a document is determined. In this paper three well-known supervised learning techniques which are Support Vector Machine(SVM), Na\"ive Bayes(NB) and Stochastic Gradient…

Computation and Language · Computer Science 2017-01-31 Md. Saiful Islam , Fazla Elahi Md Jubayer , Syed Ikhtiar Ahmed

A Robust Hybrid Approach for Textual Document Classification

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This…

Computation and Language · Computer Science 2019-09-13 Muhammad Nabeel Asim , Muhammad Usman Ghani Khan , Muhammad Imran Malik , Andreas Dengel , Sheraz Ahmed

Improving Abstraction in Text Summarization

Abstractive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document. However, the level of actual abstraction as measured by novel phrases that do…

Computation and Language · Computer Science 2018-08-27 Wojciech Kryściński , Romain Paulus , Caiming Xiong , Richard Socher

Document Retrieval using Predication Similarity

Document retrieval has been an important research problem over many years in the information retrieval community. State-of-the-art techniques utilize various methods in matching documents to a given document including keywords, phrases, and…

Information Retrieval · Computer Science 2016-04-21 Kalpa Gunaratna

Optimizing Multi-Stage Language Models for Effective Text Retrieval

Efficient text retrieval is critical for applications such as legal document analysis, particularly in specialized contexts like Japanese legal systems. Existing retrieval methods often underperform in such domain-specific scenarios,…

Information Retrieval · Computer Science 2024-12-30 Quang Hoang Trung , Le Trung Hoang , Nguyen Van Hoang Phuc

On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents

Document retrieval is one of the most challenging tasks in Information Retrieval. It requires handling longer contexts, often resulting in higher query latency and increased computational overhead. Recently, Learned Sparse Retrieval (LSR)…

Information Retrieval · Computer Science 2025-04-09 Emmanouil Georgios Lionis , Jia-Huei Ju

Uncovering the Bigger Picture: Comprehensive Event Understanding Via Diverse News Retrieval

Access to diverse perspectives is essential for understanding real-world events, yet most news retrieval systems prioritize textual relevance, leading to redundant results and limited viewpoint exposure. We propose NEWSCOPE, a two-stage…

Computation and Language · Computer Science 2025-09-01 Yixuan Tang , Yuanyuan Shi , Yiqun Sun , Anthony Kum Hoe Tung

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content…

Computation and Language · Computer Science 2023-09-13 Shaoxiong Ji , Wei Sun , Pekka Marttinen