Related papers: Scalable Text Mining with Sparse Generative Models

A Comprehensive Survey of Text Classification Techniques and Their Research Applications: Observational and Experimental Insights

The exponential growth of textual data presents substantial challenges in management and analysis, notably due to high storage and processing costs. Text classification, a vital aspect of text mining, provides robust solutions by enabling…

Computation and Language · Computer Science 2025-01-22 Kamal Taha , Paul D. Yoo , Chan Yeun , Aya Taha

Variational Deep Semantic Hashing for Text Documents

As the amount of textual data has been rapidly increasing over the past decade, efficient similarity search methods have become a crucial component of large-scale information retrieval systems. A popular strategy is to represent original…

Information Retrieval · Computer Science 2017-08-14 Suthee Chaidaroon , Yi Fang

Text Mining for Processing Interview Data in Computational Social Science

We use commercially available text analysis technology to process interview text data from a computational social science study. We find that topical clustering and terminological enrichment provide for convenient exploration and…

Computation and Language · Computer Science 2020-12-01 Jussi Karlgren , Renee Li , Eva M Meyersson Milgrom

Text Classification using Data Mining

Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement…

Information Retrieval · Computer Science 2010-09-28 S. M. Kamruzzaman , Farhana Haider , Ahmed Ryadh Hasan

Evolving Text Data Stream Mining

A text stream is an ordered sequence of text documents generated over time. A massive amount of such text data is generated by online social platforms every day. Designing an algorithm for such text streams to extract useful information is…

Information Retrieval · Computer Science 2024-09-04 Jay Kumar

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and…

Computation and Language · Computer Science 2017-07-31 Mehdi Allahyari , Seyedamin Pouriyeh , Mehdi Assefi , Saied Safaei , Elizabeth D. Trippe , Juan B. Gutierrez , Krys Kochut

How Does Generative Retrieval Scale to Millions of Passages?

Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire…

Information Retrieval · Computer Science 2023-05-22 Ronak Pradeep , Kai Hui , Jai Gupta , Adam D. Lelkes , Honglei Zhuang , Jimmy Lin , Donald Metzler , Vinh Q. Tran

Using Genetic Algorithms for Texts Classification Problems

The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction - Data Mining ([1]). This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to…

Machine Learning · Computer Science 2009-06-05 A. A. Shumeyko , S. L. Sotnik

Scalable Topical Phrase Mining from Text Corpora

While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing…

Computation and Language · Computer Science 2014-11-20 Ahmed El-Kishky , Yanglei Song , Chi Wang , Clare Voss , Jiawei Han

Very Large Language Model as a Unified Methodology of Text Mining

Text data mining is the process of deriving essential information from language text. Typical text mining tasks include text categorization, text clustering, topic modeling, information extraction, and text summarization. Various data sets…

Databases · Computer Science 2022-12-21 Meng Jiang

Learning Sparse Prototypes for Text Generation

Prototype-driven text generation uses non-parametric models that first choose from a library of sentence "prototypes" and then modify the prototype to generate the output text. While effective, these methods are inefficient at test time as…

Computation and Language · Computer Science 2020-11-05 Junxian He , Taylor Berg-Kirkpatrick , Graham Neubig

Deep Generative Model for Sparse Graphs using Text-Based Learning with Augmentation in Generative Examination Networks

Graphs and networks are a key research tool for a variety of science fields, most notably chemistry, biology, engineering and social sciences. Modeling and generation of graphs with efficient sampling is a key challenge for graphs. In…

Machine Learning · Computer Science 2019-09-26 Ruud van Deursen , Guillaume Godin

Text data mining and data quality management for research information systems in the context of open data and open science

In the implementation and use of research information systems (RIS) in scientific institutions, text data mining and semantic technologies are a key technology for the meaningful use of large amounts of data. It is not the collection of…

Digital Libraries · Computer Science 2018-12-12 Otmane Azeroual , Gunter Saake , Mohammad Abuosba , Joachim Schöpfel

Accessing accurate documents by mining auxiliary document information

Earlier techniques of text mining included algorithms like k-means, Naive Bayes, SVM which classify and cluster the text document for mining relevant information about the documents. The need for improving the mining techniques has us…

Information Retrieval · Computer Science 2016-05-10 Jinju Joby , Jyothi Korra

A Survey of Generative Search and Recommendation in the Era of Large Language Models

With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching…

Information Retrieval · Computer Science 2024-04-29 Yongqi Li , Xinyu Lin , Wenjie Wang , Fuli Feng , Liang Pang , Wenjie Li , Liqiang Nie , Xiangnan He , Tat-Seng Chua

A Scalable Document-based Architecture for Text Analysis

Analyzing textual data is a very challenging task because of the huge volume of data generated daily. Fundamental issues in text analysis include the lack of structure in document datasets, the need for various preprocessing steps %(e.g.,…

Databases · Computer Science 2016-12-20 Ciprian-Octavian Truică , Jérôme Darmont , Julien Velcin

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic…

Computation and Language · Computer Science 2024-05-28 Yun Zhu , Jia-Chen Gu , Caitlin Sikora , Ho Ko , Yinxiao Liu , Chu-Cheng Lin , Lei Shu , Liangchen Luo , Lei Meng , Bang Liu , Jindong Chen

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set,…

Artificial Intelligence · Computer Science 2008-08-08 Chaitanya Chemudugunta , Padhraic Smyth , Mark Steyvers

Text Classification Algorithms: A Survey

In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine…

Machine Learning · Computer Science 2020-05-21 Kamran Kowsari , Kiana Jafari Meimandi , Mojtaba Heidarysafa , Sanjana Mendu , Laura E. Barnes , Donald E. Brown

Statistical Topic Models for Multi-Label Document Classification

Machine learning approaches to multi-label document classification have to date largely relied on discriminative modeling techniques such as support vector machines. A drawback of these approaches is that performance rapidly drops off as…

Machine Learning · Statistics 2011-11-11 Timothy N. Rubin , America Chambers , Padhraic Smyth , Mark Steyvers