Related papers: Parameterizing Kterm Hashing

An Improved System for Sentence-level Novelty Detection in Textual Streams

Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where…

Information Retrieval · Computer Science 2017-06-06 Xinyu Fu , Eugene Ch'ng , Uwe Aickelin , Lanyun Zhang

A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams

We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a…

Information Retrieval · Computer Science 2019-07-19 Avishek Bose , Vahid Behzadan , Carlos Aguirre , William H. Hsu

Using temporal IDF for efficient novelty detection in text streams

Novelty detection in text streams is a challenging task that emerges in quite a few different scenarios, ranging from email thread filtering to RSS news feed recommendation on a smartphone. An efficient novelty detection algorithm can save…

Information Retrieval · Computer Science 2014-11-11 Margarita Karkali , Francois Rousseau , Alexandros Ntoulas , Michalis Vazirgiannis

Double-Hashing Algorithm for Frequency Estimation in Data Streams

Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. These algorithms allow…

Data Structures and Algorithms · Computer Science 2022-04-05 Nikita Seleznev , Senthil Kumar , C. Bayan Bruss

NoveltyRank: A Retrieval-Augmented Framework for Conceptual Novelty Estimation in AI Research

The accelerating pace of scientific publication makes it difficult to identify truly original research among incremental work. We propose a framework for estimating the conceptual novelty of research papers by combining semantic…

Machine Learning · Computer Science 2026-01-06 Zhengxu Yan , Han Li , Yuming Feng

Pattern Recognition and Event Detection on IoT Data-streams

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

An Event Detection Approach Based On Twitter Hashtags

Twitter is one of the most popular microblogging services in the world. The great amount of information within Twitter makes it an important information channel for people to learn and share news. Twitter hashtag is an popular feature that…

Social and Information Networks · Computer Science 2018-05-01 Shih-Feng Yang , Julia Taylor Rayz

Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Linear-Time Parameterized Algorithms with Limited Local Resources

We propose a new (theoretical) computational model for the study of massive data processing with limited computational resources. Our model measures the complexity of reading the very large data sets in terms of the data size N and analyzes…

Data Structures and Algorithms · Computer Science 2020-03-09 Jianer Chen , Ying Guo , Qin Huang

How to detect novelty in textual data streams? A comparative study of existing methods

Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also…

Machine Learning · Computer Science 2019-09-12 Clément Christophe , Julien Velcin , Jairo Cugliari , Philippe Suignard , Manel Boumghar

Event Detection in Twitter by Weighting Tweet's Features

In recent years, people spend a lot of time on social networks. They use social networks as a place to comment on personal or public events. Thus, a large amount of information is generated and shared daily in these networks. Using such a…

Social and Information Networks · Computer Science 2020-10-05 Parinaz Rahimizadeh , Mohammad Javad Shayegan

A novel graph-based model for hybrid recommendations in cold-start scenarios

Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid…

Machine Learning · Computer Science 2019-07-16 Cesare Bernardis , Maurizio Ferrari Dacrema , Paolo Cremonesi

Combining Feature and Prototype Pruning by Uncertainty Minimization

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms.…

Machine Learning · Computer Science 2013-01-18 Marc Sebban , Richard Nock

Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding

Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations. A common approach of the existing studies for unsupervised online story discovery is…

Information Retrieval · Computer Science 2023-05-05 Susik Yoon , Dongha Lee , Yunyi Zhang , Jiawei Han

Towards Target High-Utility Itemsets

For applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases. However, in these techniques for pattern discovery, the number of patterns can be huge, and the user is often…

Databases · Computer Science 2022-06-14 Jinbao Miao , Wensheng Gan , Shicheng Wan , Yongdong Wu , Philippe Fournier-Viger

Model-based Classification and Novelty Detection For Point Pattern Data

Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have…

Machine Learning · Computer Science 2017-02-09 Ba-Ngu Vo , Quang N. Tran , Dinh Phung , Ba-Tuong Vo

A Novel String Distance Function based on Most Frequent K Characters

This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length…

Data Structures and Algorithms · Computer Science 2014-01-28 Sadi Evren Seker , Oguz Altun , Uğur Ayan , Cihan Mert

Novelty Detection in Time Series via Weak Innovations Representation: A Deep Learning Approach

We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of…

Machine Learning · Computer Science 2022-10-25 Xinyi Wang , Mei-jen Lee , Qing Zhao , Lang Tong

Optimal Densification for Fast and Accurate Minwise Hashing

Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification~\cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown that it is possible to…

Data Structures and Algorithms · Computer Science 2017-03-16 Anshumali Shrivastava

Detecting Reliable Novel Word Senses: A Network-Centric Approach

In this era of Big Data, due to expeditious exchange of information on the web, words are being used to denote newer meanings, causing linguistic shift. With the recent availability of large amounts of digitized texts, an automated analysis…

Computation and Language · Computer Science 2018-12-17 Abhik Jana , Animesh Mukherjee , Pawan Goyal