English
Related papers

Related papers: Parameterizing Kterm Hashing

200 papers

Novelty detection in news events has long been a difficult problem. A number of models performed well on specific data streams but certain issues are far from being solved, particularly in large data streams from the WWW where…

Information Retrieval · Computer Science 2017-06-06 Xinyu Fu , Eugene Ch'ng , Uwe Aickelin , Lanyun Zhang

We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a…

Information Retrieval · Computer Science 2019-07-19 Avishek Bose , Vahid Behzadan , Carlos Aguirre , William H. Hsu

Novelty detection in text streams is a challenging task that emerges in quite a few different scenarios, ranging from email thread filtering to RSS news feed recommendation on a smartphone. An efficient novelty detection algorithm can save…

Information Retrieval · Computer Science 2014-11-11 Margarita Karkali , Francois Rousseau , Alexandros Ntoulas , Michalis Vazirgiannis

Frequency estimation of elements is an important task for summarizing data streams and machine learning applications. The problem is often addressed by using streaming algorithms with sublinear space data structures. These algorithms allow…

Data Structures and Algorithms · Computer Science 2022-04-05 Nikita Seleznev , Senthil Kumar , C. Bayan Bruss

The accelerating pace of scientific publication makes it difficult to identify truly original research among incremental work. We propose a framework for estimating the conceptual novelty of research papers by combining semantic…

Machine Learning · Computer Science 2026-01-06 Zhengxu Yan , Han Li , Yuming Feng

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

Twitter is one of the most popular microblogging services in the world. The great amount of information within Twitter makes it an important information channel for people to learn and share news. Twitter hashtag is an popular feature that…

Social and Information Networks · Computer Science 2018-05-01 Shih-Feng Yang , Julia Taylor Rayz

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

We propose a new (theoretical) computational model for the study of massive data processing with limited computational resources. Our model measures the complexity of reading the very large data sets in terms of the data size N and analyzes…

Data Structures and Algorithms · Computer Science 2020-03-09 Jianer Chen , Ying Guo , Qin Huang

Since datasets with annotation for novelty at the document and/or word level are not easily available, we present a simulation framework that allows us to create different textual datasets in which we control the way novelty occurs. We also…

Machine Learning · Computer Science 2019-09-12 Clément Christophe , Julien Velcin , Jairo Cugliari , Philippe Suignard , Manel Boumghar

In recent years, people spend a lot of time on social networks. They use social networks as a place to comment on personal or public events. Thus, a large amount of information is generated and shared daily in these networks. Using such a…

Social and Information Networks · Computer Science 2020-10-05 Parinaz Rahimizadeh , Mohammad Javad Shayegan

Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid…

Machine Learning · Computer Science 2019-07-16 Cesare Bernardis , Maurizio Ferrari Dacrema , Paolo Cremonesi

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms.…

Machine Learning · Computer Science 2013-01-18 Marc Sebban , Richard Nock

Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations. A common approach of the existing studies for unsupervised online story discovery is…

Information Retrieval · Computer Science 2023-05-05 Susik Yoon , Dongha Lee , Yunyi Zhang , Jiawei Han

For applied intelligence, utility-driven pattern discovery algorithms can identify insightful and useful patterns in databases. However, in these techniques for pattern discovery, the number of patterns can be huge, and the user is often…

Databases · Computer Science 2022-06-14 Jinbao Miao , Wensheng Gan , Shicheng Wan , Yongdong Wu , Philippe Fournier-Viger

Point patterns are sets or multi-sets of unordered elements that can be found in numerous data sources. However, in data analysis tasks such as classification and novelty detection, appropriate statistical models for point pattern data have…

Machine Learning · Computer Science 2017-02-09 Ba-Ngu Vo , Quang N. Tran , Dinh Phung , Ba-Tuong Vo

This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length…

Data Structures and Algorithms · Computer Science 2014-01-28 Sadi Evren Seker , Oguz Altun , Uğur Ayan , Cihan Mert

We consider novelty detection in time series with unknown and nonparametric probability structures. A deep learning approach is proposed to causally extract an innovations sequence consisting of novelty samples statistically independent of…

Machine Learning · Computer Science 2022-10-25 Xinyi Wang , Mei-jen Lee , Qing Zhao , Lang Tong

Minwise hashing is a fundamental and one of the most successful hashing algorithm in the literature. Recent advances based on the idea of densification~\cite{Proc:OneHashLSH_ICML14,Proc:Shrivastava_UAI14} have shown that it is possible to…

Data Structures and Algorithms · Computer Science 2017-03-16 Anshumali Shrivastava

In this era of Big Data, due to expeditious exchange of information on the web, words are being used to denote newer meanings, causing linguistic shift. With the recent availability of large amounts of digitized texts, an automated analysis…

Computation and Language · Computer Science 2018-12-17 Abhik Jana , Animesh Mukherjee , Pawan Goyal
‹ Prev 1 2 3 10 Next ›