Related papers: Effective and Efficient Variable-Length Data Serie…

Scalable Data Series Subsequence Matching with ULISSE

Data series similarity search is an important operation and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast similarity search, all existing…

Databases · Computer Science 2020-09-23 Michele Linardi , Themis Palpanas

Data Series Indexing Gone Parallel

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive…

Databases · Computer Science 2020-09-04 Botao Peng

Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series

In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and…

Databases · Computer Science 2020-09-01 Michele Linardi , Yan Zhu , Themis Palpanas , Eamonn Keogh

The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

Increasingly large data series collections are becoming commonplace across many different domains and applications. A key operation in the analysis of data series collections is similarity search, which has attracted lots of attention and…

Databases · Computer Science 2020-06-23 Karima Echihabi , Kostas Zoumpatianos , Themis Palpanas , Houda Benbrahim

Scaling Active Search using Linear Similarity Functions

Active Search has become an increasingly useful tool in information retrieval problems where the goal is to discover as many target elements as possible using only limited label queries. With the advent of big data, there is a growing…

Machine Learning · Statistics 2017-08-23 Sibi Venkatesan , James K. Miller , Jeff Schneider , Artur Dubrawski

Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has…

Databases · Computer Science 2020-06-23 Karima Echihabi , Kostas Zoumpatianos , Themis Palpanas , Houda Benbrahim

Indexing Metric Spaces for Exact Similarity Search

With the continued digitization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when…

Databases · Computer Science 2022-05-24 Lu Chen , Yunjun Gao , Xuan Song , Zheng Li , Yifan Zhu , Xiaoye Miao , Christian S. Jensen

Efficient Non-Learning Similar Subtrajectory Search

Similar subtrajectory search is a finer-grained operator that can better capture the similarities between one query trajectory and a portion of a data trajectory than the traditional similar trajectory search, which requires the two checked…

Databases · Computer Science 2023-08-09 Jiabao Jin , Peng Cheng , Lei Chen , Xuemin Lin , Wenjie Zhang

In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a…

Databases · Computer Science 2023-08-03 Renzhi Wu , Jingfan Meng , Jie Jeff Xu , Huayi Wang , Kexin Rong

LLM-assisted Vector Similarity Search

As data retrieval demands become increasingly complex, traditional search methods often fall short in addressing nuanced and conceptual queries. Vector similarity search has emerged as a promising technique for finding semantically similar…

Artificial Intelligence · Computer Science 2024-12-31 Md Riyadh , Muqi Li , Felix Haryanto Lie , Jia Long Loh , Haotian Mi , Sayam Bohra

Extreme-scale many-against-many protein similarity search

Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-06 Oguz Selvitopi , Saliya Ekanayake , Giulia Guidi , Muaaz G. Awan , Georgios A. Pavlopoulos , Ariful Azad , Nikos Kyrpides , Leonid Oliker , Katherine Yelick , Aydın Buluç

Graph-Based Algorithms for Diverse Similarity Search

Nearest neighbor search is a fundamental data structure problem with many applications in machine learning, computer vision, recommendation systems and other fields. Although the main objective of the data structure is to quickly report…

Data Structures and Algorithms · Computer Science 2025-02-20 Piyush Anand , Piotr Indyk , Ravishankar Krishnaswamy , Sepideh Mahabadi , Vikas C. Raykar , Kirankumar Shiragur , Haike Xu

The Extended Edit Distance Metric

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high…

Information Retrieval · Computer Science 2010-06-18 Muhammad Marwan Muhammad Fuad , Pierre-François Marteau

Fishing in the Stream: Similarity Search over Endless Data

Similarity search is the task of retrieving data items that are similar to a given query. In this paper, we introduce the time-sensitive notion of similarity search over endless data-streams (SSDS), which takes into account data quality and…

Information Retrieval · Computer Science 2017-08-08 Naama Kraus , David Carmel , Idit Keidar

Uncertain Time-Series Similarity: Return to the Basics

In the last years there has been a considerable increase in the availability of continuous sensor measurements in a wide range of application domains, such as Location-Based Services (LBS), medical monitoring systems, manufacturing plants…

Databases · Computer Science 2015-03-20 Michele Dallachiesa , Besmira Nushi , Katsiaryna Mirylenka , Themis Palpanas

Inferring the Most Similar Variable-length Subsequences between Multidimensional Time Series

Finding the most similar subsequences between two multidimensional time series has many applications: e.g. capturing dependency in stock market or discovering coordinated movement of baboons. Considering one pattern occurring in one time…

Machine Learning · Computer Science 2025-05-19 Thanadej Rattanakornphan , Piyanon Charoenpoonpanich , Chainarong Amornbunchornvej

DaiSy: A Library for Scalable Data Series Similarity Search

Exact similarity search over large collections of data series is a fundamental operation in modern applications, yet existing solutions are often fragmented, specialized, or tailored to specific execution environments. In this paper, we…

Databases · Computer Science 2026-03-31 Francesca Del Gaudio , Manos Chatzakis , Gayathiri Ravendirane , Botao Peng , Themis Palpanas

Set Similarity Search for Skewed Data

Set similarity join, as well as the corresponding indexing problem set similarity search, are fundamental primitives for managing noisy or uncertain data. For example, these primitives can be used in data cleaning to identify different…

Data Structures and Algorithms · Computer Science 2018-04-10 Samuel McCauley , Jesper W. Mikkelsen , Rasmus Pagh

Combining Visual Analytics and Content Based Data Retrieval Technology for Efficient Data Analysis

One of the most useful techniques to help visual data analysis systems is interactive filtering (brushing). However, visualization techniques often suffer from overlap of graphical items and multiple attributes complexity, making visual…

Graphics · Computer Science 2015-07-07 Jose Rodrigues , Luciana Romani , Agma Traina , Caetano Traina

Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with…

Medical Physics · Physics 2020-03-25 Rongchen Guo , Takanori Fujiwara , Yiran Li , Kelly M. Lima , Soman Sen , Nam K. Tran , Kwan-Liu Ma