English
Related papers

Related papers: Computing n-Gram Statistics in MapReduce

200 papers

The number of n-gram features grows exponentially in n, making it computationally demanding to compute the most frequent n-grams even for n as small as 3. Motivated by our production machine learning system built on n-gram features, we ask:…

Data Structures and Algorithms · Computer Science 2025-11-20 Ryan R. Curtin , Fred Lu , Edward Raff , Priyanka Ranade

Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla…

Computation and Language · Computer Science 2019-12-30 Rabeya Sadia , Md Ataur Rahman , Md Hanif Seddiqui

This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

This report describes the MUDOS-NG summarization system, which applies a set of language-independent and generic methods for generating extractive summaries. The proposed methods are mostly combinations of simple operators on a generic…

Computation and Language · Computer Science 2010-12-10 George Giannakopoulos , George Vouros , Vangelis Karkaletsis

Efficient evaluation of regular expressions (regex, for short) is crucial for text analysis, and n-gram indexes are fundamental to achieving fast regex evaluation performance. However, these indexes face scalability challenges because of…

Databases · Computer Science 2025-09-08 Ling Zhang , Shaleen Deep , Jignesh M. Patel , Karthikeyan Sankaralingam

Recent advances in Deep Learning have led to a significant performance increase on several NLP tasks, however, the models become more and more computationally demanding. Therefore, this paper tackles the domain of computationally efficient…

Computation and Language · Computer Science 2022-05-18 Pedro Alonso , Kumar Shridhar , Denis Kleyko , Evgeny Osipov , Marcus Liwicki

This work extends the set of works which deal with the popular problem of sentiment analysis in Twitter. It investigates the most popular document ("tweet") representation methods which feed sentiment evaluation mechanisms. In particular,…

Computation and Language · Computer Science 2015-05-14 Evangelos Psomakelis , Konstantinos Tserpes , Dimosthenis Anagnostopoulos , Theodora Varvarigou

In multimedia, text or bioinformatics databases, applications query sequences of n consecutive symbols called n-grams. Estimating the number of distinct n-grams is a view-size estimation problem. While view sizes can be estimated by…

Databases · Computer Science 2014-02-05 Daniel Lemire , Owen Kaser

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Searching techniques for Case Based Reasoning systems involve extensive methods of elimination. In this paper, we look at a new method of arriving at the right solution by performing a series of transformations upon the data. These involve…

Artificial Intelligence · Computer Science 2007-05-23 M. N. Karthik , Moshe Davis

MapReduce (and its open source implementation Hadoop) has become the de facto platform for processing large data sets. MapReduce offers a streamlined computational framework by interleaving sequential and parallel computation while hiding…

Computational Complexity · Computer Science 2019-04-22 Sungjin Im , Benjamin Moseley

We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts. Thus, it combines the memorization capacity and…

Computation and Language · Computer Science 2016-06-27 Babak Damavandi , Shankar Kumar , Noam Shazeer , Antoine Bruguier

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in the field of word embedding construction (Wieting et al. 2016). Our…

Computation and Language · Computer Science 2019-06-14 Sho Takase , Jun Suzuki , Masaaki Nagata

There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective…

Information Retrieval · Computer Science 2016-06-22 Javid Dadashkarimi , Hossein Nasr Esfahani , Heshaam Faili , Azadeh Shakery

We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-22 Arko Provo Mukherjee , Srikanta Tirthapura

The problem of finding locally dense components of a graph is an important primitive in data analysis, with wide-ranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we…

Databases · Computer Science 2012-02-01 Bahman Bahmani , Ravi Kumar , Sergei Vassilvitskii

String-based (or viewpoint) models of tonal harmony often struggle with data sparsity in pattern discovery and prediction tasks, particularly when modeling composite events like triads and seventh chords, since the number of distinct n-note…

Information Retrieval · Computer Science 2017-07-19 David R. W. Sears , Andreas Arzt , Harald Frostel , Reinhard Sonnleitner , Gerhard Widmer

Natural language processing models have attracted much interest in the deep learning community. This branch of study is composed of some applications such as machine translation, sentiment analysis, named entity recognition, question and…

Computation and Language · Computer Science 2020-07-22 Flávio Santos , Hendrik Macedo , Thiago Bispo , Cleber Zanchettin

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph $G$ of $n$ nodes each of which may hold a value initially, we focus on computing $\sum_{i=1}^N g(f_i)$, where $f_i$ is the…

Data Structures and Algorithms · Computer Science 2019-08-07 Hsin-Hao Su , Hoa T. Vu

Given a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss? Large-scale graphs have become omnipresent, posing considerable computational challenges. Analyzing such large…

Databases · Computer Science 2021-02-23 Kyuhan Lee , Hyeonsoo Jo , Jihoon Ko , Sungsu Lim , Kijung Shin
‹ Prev 1 2 3 10 Next ›