English
Related papers

Related papers: Scalable Blocking for Very Large Databases

200 papers

Clustering analysis is of substantial significance for data mining. The properties of big data raise higher demand for more efficient and economical distributed clustering methods. However, existing distributed clustering methods mainly…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-03 Yifeng Xiao , Jiang Xue , Deyu Meng

Active search is the process of identifying high-value data points in a large and often high-dimensional parameter space that can be expensive to evaluate. Traditional active search techniques like Bayesian optimization trade off…

Machine Learning · Computer Science 2020-07-21 Vivek Myers , Peyton Greenside

Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals.…

Databases · Computer Science 2022-11-03 Alexander Slesarev , Mikhail Mikhailov , George Chernishev

To accommodate the needs of large-scale distributed P2P systems, scalable data management strategies are required, allowing applications to efficiently cope with continuously growing, highly dis tributed data. This paper addresses the…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-30 Bogdan Nicolae , Gabriel Antoniu , Luc Bougé

Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-16 Thiago S. F. X. Teixeira , George Teodoro , Eduardo Valle , Joel H. Saltz

Current deep learning architectures are growing larger in order to learn from complex datasets. These architectures require giant matrix multiplication operations to train millions of parameters. Conversely, there is another growing trend…

Machine Learning · Statistics 2016-12-06 Ryan Spring , Anshumali Shrivastava

De-duplication---identification of distinct records referring to the same real-world entity---is a well-known challenge in data integration. Since very large datasets prohibit the comparison of every pair of records, {\em blocking} has been…

Databases · Computer Science 2011-11-17 Anish Das Sarma , Ankur Jain , Ashwin Machanavajjhala , Philip Bohannon

Entity matching seeks to identify data records over one or multiple data sources that refer to the same real-world entity. Virtually every entity matching task on large datasets requires blocking, a step that reduces the number of record…

Databases · Computer Science 2019-12-10 Wei Zhang , Hao Wei , Bunyamin Sisman , Xin Luna Dong , Christos Faloutsos , David Page

Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance, cost, and coordination overhead emerge…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Shahir Abdullah , Syed Rohit Zaman

High-performance computing (HPC) requires resilience techniques such as checkpointing in order to tolerate failures in supercomputers. As the number of nodes and memory in supercomputers keeps on increasing, the size of checkpoint data also…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-06-13 Kai Keller , Leonardo Bautista Gomez

Most cloud services and distributed applications rely on hashing algorithms that allow dynamic scaling of a robust and efficient hash table. Examples include AWS, Google Cloud and BitTorrent. Consistent and rendezvous hashing are algorithms…

Data Structures and Algorithms · Computer Science 2022-05-17 Mike Heddes , Igor Nunes , Tony Givargis , Alexandru Nicolau , Alex Veidenbaum

Duplication, whether exact or partial, is a common issue in many datasets. In clinical notes data, duplication (and near duplication) can arise for many reasons, such as the pervasive use of templates, copy-pasting, or notes being generated…

Databases · Computer Science 2017-04-20 Sanjeev Shenoy , Tsung-Ting Kuo , Rodney Gabriel , Julian McAuley , Chun-Nan Hsu

We present a new algorithm for the widely used density-based clustering method DBscan. Our algorithm computes the DBscan-clustering in $O(n\log n)$ time in $\mathbb{R}^2$, irrespective of the scale parameter $\varepsilon$ (and assuming the…

Computational Geometry · Computer Science 2017-03-01 Mark de Berg , Ade Gunawan , Marcel Roeloffzen

Many real-world applications operate on dynamic graphs that undergo rapid changes in their topological structure over time. However, it is challenging to design dynamic algorithms that are capable of supporting such graph changes…

Databases · Computer Science 2022-04-26 Muhammad Farhan , Qing Wang , Henning Koehler

In this paper, we introduce DLB, a Deep Learning based load Balancing mechanism, to effectively address the data skew problem. The key idea of DLB is to replace hash functions in the load balancing mechanisms with deep learning models,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-14 Xiaoke Zhu , Qi Zhang , Taining Cheng , Ling Liu , Wei Zhou , and Jing He

Hashing method maps similar data to binary hashcodes with smaller hamming distance, and it has received a broad attention due to its low storage cost and fast retrieval speed. However, the existing limitations make the present algorithms…

Computer Vision and Pattern Recognition · Computer Science 2016-09-29 Shifeng Zhang , Jianmin Li , Jinma Guo , Bo Zhang

In the world of deep learning, Transformer models have become very significant, leading to improvements in many areas from understanding language to recognizing images, covering a wide range of applications. Despite their success, the…

Machine Learning · Computer Science 2024-07-19 Ghadeer Jaradat , Mohammed Tolba , Ghada Alsuhli , Hani Saleh , Mahmoud Al-Qutayri , Thanos Stouraitis , Baker Mohammad

Case-based Reasoning (CBR) on high-dimensional and heterogeneous data is a trending yet challenging and computationally expensive task in the real world. A promising approach is to obtain low-dimensional hash codes representing cases and…

Information Retrieval · Computer Science 2022-06-30 Qi Zhang , Liang Hu , Chongyang Shi , Ke Liu , Longbing Cao

Supervised hashing methods are widely-used for nearest neighbor search in computer vision applications. Most state-of-the-art supervised hashing approaches employ batch-learners. Unfortunately, batch-learning strategies can be inefficient…

Computer Vision and Pattern Recognition · Computer Science 2015-11-11 Fatih Cakir , Sarah Adel Bargal , Stan Sclaroff

Blockchain uses the idea of storing transaction data in the form of a distributed ledger wherein each node in the network stores a current copy of the sequence of transactions in the form of a hash chain. This requirement of storing the…

Information Theory · Computer Science 2018-01-09 Ravi Kiran Raman , Lav R. Varshney
‹ Prev 1 2 3 10 Next ›