Related papers: Distinct counting with a self-learning bitmap
Metric learning has attracted extensive interest for its ability to provide personalized recommendations based on the importance of observed user-item interactions. Current metric learning methods aim to push negative items away from the…
Due to the outstanding capability of capturing underlying data distributions, deep learning techniques have been recently utilized for a series of traditional database problems. In this paper, we investigate the possibilities of utilizing…
Weakly supervised person search aims to jointly detect and match persons with only bounding box annotations. Existing approaches typically focus on improving the features by exploring relations of persons. However, scale variation problem…
Cardinality estimation algorithms receive a stream of elements, with possible repetitions, and return the number of distinct elements in the stream. Such algorithms seek to minimize the required memory and CPU resource consumption at the…
Scale is often seen as a given, disturbing factor in many vision tasks. When doing so it is one of the factors why we need more data during learning. In recent work scale equivariance was added to convolutional neural networks. It was shown…
Even though convolutional neural networks (CNN) has achieved near-human performance in various computer vision tasks, its ability to tolerate scale variations is limited. The popular practise is making the model bigger first, and then train…
This paper considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form…
Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element…
This paper studies an application of a recurrent neural network to clustering method for the S-P chart: a binary data set used widely in education. As the number of students increases, the S-P chart becomes hard to handle. In order to…
Cardinality estimation is a critical component and a longstanding challenge in modern data warehouses. ByteHouse, ByteDance's cloud-native engine for extensive data analysis in exabyte-scale environments, serves numerous internal…
Many modern applications use computer vision to detect and count objects in massive image collections. However, when the detection task is very difficult or in the presence of domain shifts, the counts may be inaccurate even with…
Most image matching methods perform poorly when encountering large scale changes in images. To solve this problem, firstly, we propose a scale-difference-aware image matching method (SDAIM) that reduces image scale differences before local…
The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervised training on real-world data applications. However, unlabeled data in reality is commonly imbalanced and shows a long-tail distribution,…
Online monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited…
Support Vector Machines (SVMs) are powerful learners that have led to state-of-the-art results in various computer vision problems. SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image…
We consider a variant of online convex optimization in which both the instances (input vectors) and the comparator (weight vector) are unconstrained. We exploit a natural scale invariance symmetry in our unconstrained setting: the…
We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle…
Partitioning a set of elements into an unknown number of mutually exclusive subsets is essential in many machine learning problems. However, assigning elements, such as samples in a dataset or neurons in a network layer, to an unknown and…
Crowd counting is an important vision task, which faces challenges on continuous scale variation within a given scene and huge density shift both within and across images. These challenges are typically addressed using multi-column…
Hashing method maps similar data to binary hashcodes with smaller hamming distance, and it has received a broad attention due to its low storage cost and fast retrieval speed. However, the existing limitations make the present algorithms…