Related papers: Comprehensive and Efficient Workload Compression

Joint Data Compression and Caching: Approaching Optimality with Guarantees

We consider the problem of optimally compressing and caching data across a communication network. Given the data generated at edge nodes and a routing path, our goal is to determine the optimal data compression ratios and caching decisions…

Networking and Internet Architecture · Computer Science 2018-01-25 Jian Li , Faheem Zafari , Don Towsley , Kin K. Leung , Ananthram Swami

Forming Coordinated Teams that Balance Task Coverage and Expert Workload

We study a new formulation of the team-formation problem, where the goal is to form teams to work on a given set of tasks requiring different skills. Deviating from the classic problem setting where one is asking to cover all skills of each…

Social and Information Networks · Computer Science 2025-03-11 Karan Vombatkere , Evimaria Terzi , Aristides Gionis

Efficient Approximation Algorithms for Optimal Large-scale Network Monitoring

The growing amount of applications that generate vast amount of data in short time scales render the problem of partial monitoring, coupled with prediction, a rather fundamental one. We study the aforementioned canonical problem under the…

Data Structures and Algorithms · Computer Science 2016-08-02 Michalis Kallitsis , Stilian Stoev , George Michailidis

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance…

Machine Learning · Statistics 2014-05-26 Michail Vlachos , Nikolaos Freris , Anastasios Kyrillidis

Adaptive Sampling Strategies to Construct Equitable Training Datasets

In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities, often performing worse for members of traditionally underserved groups. One factor contributing to…

Machine Learning · Computer Science 2022-02-04 William Cai , Ro Encarnacion , Bobbie Chern , Sam Corbett-Davies , Miranda Bogen , Stevie Bergman , Sharad Goel

On Computing Compression Trees for Data Collection in Sensor Networks

We address the problem of efficiently gathering correlated data from a wired or a wireless sensor network, with the aim of designing algorithms with provable optimality guarantees, and understanding how close we can get to the known…

Networking and Internet Architecture · Computer Science 2009-08-03 Jian Li , Amol Deshpande , Samir Khuller

Compressed Representations of Conjunctive Query Results

Relational queries, and in particular join queries, often generate large output results when executed over a huge dataset. In such cases, it is often infeasible to store the whole materialized output if we plan to reuse it further down a…

Databases · Computer Science 2018-03-28 Shaleen Deep , Paraschos Koutris

Randomized Composable Core-sets for Distributed Submodular Maximization

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution…

Data Structures and Algorithms · Computer Science 2015-06-23 Vahab Mirrokni , Morteza Zadimoghaddam

Compressed and Penalized Linear Regression

Modern applications require methods that are computationally feasible on large datasets but also preserve statistical efficiency. Frequently, these two concerns are seen as contradictory: approximation methods that enable computation are…

Methodology · Statistics 2021-06-11 Darren Homrighausen , Daniel J. McDonald

Quality-Assured Synchronized Task Assignment in Crowdsourcing

With the rapid development of crowdsourcing platforms that aggregate the intelligence of Internet workers, crowdsourcing has been widely utilized to address problems that require human cognitive abilities. Considering great dynamics of…

Databases · Computer Science 2018-06-05 Jiayang Tu , Peng Cheng , Lei Chen

Constant Approximation Algorithm for Non-Uniform Capacitated Multi-Item Lot-Sizing via Strong Covering Inequalities

We study the non-uniform capacitated multi-item lot-sizing (\lotsizing) problem. In this problem, there is a set of demands over a planning horizon of $T$ time periods and all demands must be satisfied on time. We can place an order at the…

Data Structures and Algorithms · Computer Science 2016-10-10 Shi Li

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy…

Data Structures and Algorithms · Computer Science 2021-11-16 Jason Altschuler , Aditya Bhaskara , Gang Fu , Vahab Mirrokni , Afshin Rostamizadeh , Morteza Zadimoghaddam

Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is…

Machine Learning · Statistics 2022-04-15 Laura Niss , Yuekai Sun , Ambuj Tewari

Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds

We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the network's output. Our approach is based on an…

Machine Learning · Computer Science 2019-05-21 Cenk Baykal , Lucas Liebenwein , Igor Gilitschenski , Dan Feldman , Daniela Rus

Cheaper and Better: Selecting Good Workers for Crowdsourcing

Crowdsourcing provides a popular paradigm for data collection at scale. We study the problem of selecting subsets of workers from a given worker pool to maximize the accuracy under a budget constraint. One natural question is whether we…

Machine Learning · Statistics 2015-02-04 Hongwei Li , Qiang Liu

New Approximation Guarantees for The Economic Warehouse Lot Scheduling Problem

In this paper, we present long-awaited algorithmic advances toward the efficient construction of near-optimal replenishment policies for a true inventory management classic, the economic warehouse lot scheduling problem. While this paradigm…

Data Structures and Algorithms · Computer Science 2026-01-23 Danny Segev

Revisit Visual Representation in Analytics Taxonomy: A Compression Perspective

Visual analytics have played an increasingly critical role in the Internet of Things, where massive visual signals have to be compressed and fed into machines. But facing such big data and constrained bandwidth capacity, existing…

Computer Vision and Pattern Recognition · Computer Science 2021-06-17 Yueyu Hu , Wenhan Yang , Haofeng Huang , Jiaying Liu

Optimal Content Replication and Request Matching in Large Caching Systems

We consider models of content delivery networks in which the servers are constrained by two main resources: memory and bandwidth. In such systems, the throughput crucially depends on how contents are replicated across servers and how the…

Performance · Computer Science 2018-01-10 Arpan Mukhopadhyay , Nidhi Hegde , Marc Lelarge

Compression-Based Regularization with an Application to Multi-Task Learning

This paper investigates, from information theoretic grounds, a learning problem based on the principle that any regularity in a given dataset can be exploited to extract compact features from data, i.e., using fewer bits than needed to…

Machine Learning · Statistics 2018-11-14 Matías Vera , Leonardo Rey Vega , Pablo Piantanida

Towards Fair Representation: Clustering and Consensus

Consensus clustering, a fundamental task in machine learning and data analysis, aims to aggregate multiple input clusterings of a dataset, potentially based on different non-sensitive attributes, into a single clustering that best…

Machine Learning · Computer Science 2025-06-18 Diptarka Chakraborty , Kushagra Chatterjee , Debarati Das , Tien Long Nguyen , Romina Nobahari