English
Related papers

Related papers: Efficient Compression Technique for Sparse Sets

200 papers

A new line of research uses compression methods to measure the similarity between signals. Two signals are considered similar if one can be compressed significantly when the information of the other is known. The existing compression-based…

Computer Vision and Pattern Recognition · Computer Science 2019-09-30 Tanaya Guha , Rabab K. Ward

The rise of internet has resulted in an explosion of data consisting of millions of articles, images, songs, and videos. Most of this data is high dimensional and sparse. The need to perform an efficient search for similar objects in such…

Data Structures and Algorithms · Computer Science 2016-12-20 Raghav Kulkarni , Rameshwar Pratap

Traditionally, data compression deals with the problem of concisely representing a data source, e.g. a sequence of letters, for the purpose of eventual reproduction (either exact or approximate). In this work we are interested in the case…

Information Theory · Computer Science 2013-12-10 Amir Ingber , Tsachy Weissman

To improve the temporal and spatial storage efficiency, researchers have intensively studied various techniques, including compression and deduplication. Through our evaluation, we find that methods such as photo tags or local features help…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-20 Binqi Zhang , Chen Wang , Bing Bing Zhou , Albert Y. Zomaya

High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput…

Artificial Intelligence · Computer Science 2024-12-03 Xihaier Luo , Samuel Lurvey , Yi Huang , Yihui Ren , Jin Huang , Byung-Jun Yoon

Electronic information is increasingly often shared among entities without complete mutual trust. To address related security and privacy issues, a few cryptographic techniques have emerged that support privacy-preserving information…

Cryptography and Security · Computer Science 2013-09-23 Carlo Blundo , Emiliano De Cristofaro , Paolo Gasti

Embeddings provide compact representations of signals in order to perform efficient inference in a wide variety of tasks. In particular, random projections are common tools to construct Euclidean distance-preserving embeddings, while…

Data Structures and Algorithms · Computer Science 2019-09-05 Diego Valsesia , Sophie Marie Fosson , Chiara Ravazzi , Tiziano Bianchi , Enrico Magli

Sets have been used for modeling various types of objects (e.g., a document as the set of keywords in it and a customer as the set of the items that she has purchased). Measuring similarity (e.g., Jaccard Index) between sets has been a key…

Social and Information Networks · Computer Science 2022-10-10 Geon Lee , Chanyoung Park , Kijung Shin

Compression-based similarity measures are effectively employed in applications on diverse data types with a basically parameter-free approach. Nevertheless, there are problems in applying these techniques to medium-to-large datasets which…

Machine Learning · Statistics 2012-10-03 Daniele Cerra , Mihai Datcu

The Jaccard index is an important similarity measure for item sets and Boolean data. On large datasets, an exact similarity computation is often infeasible for all item pairs both due to time and space constraints, giving rise to faster…

Data Structures and Algorithms · Computer Science 2021-03-09 Marc Bury , Chris Schwiegelshohn , Mara Sorella

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning…

Data Structures and Algorithms · Computer Science 2014-10-17 Bernhard Haeupler , Mark Manasse , Kunal Talwar

Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…

Methodology · Statistics 2023-04-14 Shuyuan Wu , Xuening Zhu , Hansheng Wang

Variational inequalities are an important tool, which includes minimization, saddles, games, fixed-point problems. Modern large-scale and computationally expensive practical applications make distributed methods for solving these problems…

Optimization and Control · Mathematics 2023-03-01 Aleksandr Beznosikov , Alexander Gasnikov

Recently, sparsity has become a key concept in various areas of applied mathematics, computer science, and electrical engineering. One application of this novel methodology is the separation of data, which is composed of two (or more)…

Numerical Analysis · Mathematics 2011-02-23 Gitta Kutyniok

The Jaccard similarity index has often been employed in science and technology as a means to quantify the similarity between two sets. When modified to operate on real-valued values, the Jaccard similarity index can be applied to compare…

Data Analysis, Statistics and Probability · Physics 2024-10-23 Gonzalo Travieso , Alexandre Benatti , Luciano da F. Costa

The most effective dimensionality reduction procedures produce interpretable features from the raw input space while also providing good performance for downstream supervised learning tasks. For many methods, this requires optimizing one or…

Machine Learning · Computer Science 2023-02-22 Leland Barnard , Farwa Ali , Hugo Botha , David T. Jones

A new approach to data compression is developed and applied to multimedia content. This method separates messages into components suitable for both lossless coding and 'lossy' or statistical coding techniques, compressing complex objects by…

Information Theory · Computer Science 2011-12-26 John Scoville

The probability Jaccard similarity was recently proposed as a natural generalization of the Jaccard similarity to measure the proximity of sets whose elements are associated with relative frequencies or probabilities. In combination with a…

Data Structures and Algorithms · Computer Science 2020-10-27 Otmar Ertl

We consider the problem of optimally compressing and caching data across a communication network. Given the data generated at edge nodes and a routing path, our goal is to determine the optimal data compression ratios and caching decisions…

Networking and Internet Architecture · Computer Science 2018-01-25 Jian Li , Faheem Zafari , Don Towsley , Kin K. Leung , Ananthram Swami

Recent results in compressed sensing showed that the optimal subsampling strategy should take into account the sparsity pattern of the signal at hand. This oracle-like knowledge, even though desirable, nevertheless remains elusive in most…

Information Theory · Computer Science 2023-06-28 Simon Ruetz
‹ Prev 1 2 3 10 Next ›