Related papers: Transfer Learning Across Heterogeneous Features Fo…

Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be…

Machine Learning · Computer Science 2022-09-08 Perry Gibson , José Cano

Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning

Tensor program tuning is essential for the efficient deployment of deep neural networks. Search-based approaches have demonstrated scalability and effectiveness in automatically finding high-performance programs for specific hardware.…

Machine Learning · Computer Science 2025-04-10 Liang Qiao , Jun Shi , Xiaoyu Hao , Xi Fang , Sen Zhang , Minfan Zhao , Ziqi Zhu , Junshi Chen , Hong An , Xulong Tang , Bing Li , Honghui Yuan , Xinyang Wang

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Performance · Computer Science 2021-03-19 Samuel J. Kaufman , Phitchaya Mangpo Phothilimthana , Yanqi Zhou , Charith Mendis , Sudip Roy , Amit Sabne , Mike Burrows

Tensor Network Structure Search with Program Synthesis

Tensor networks provide a powerful framework for compressing multi-dimensional data. The optimal tensor network structure for a given data tensor depends on both data characteristics and specific optimality criteria, making tensor network…

Computational Engineering, Finance, and Science · Computer Science 2026-03-23 Zheng Guo , Aditya Deshpande , Brian Kiedrowski , Xinyu Wang , Alex Gorodetsky

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

Machine Learning · Computer Science 2023-10-17 Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , Joseph E. Gonzalez , Ion Stoica

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

ThinResNet: A New Baseline for Structured Convolutional Networks Pruning

Pruning is a compression method which aims to improve the efficiency of neural networks by reducing their number of parameters while maintaining a good performance, thus enhancing the performance-to-cost ratio in nontrivial ways. Of…

Neural and Evolutionary Computing · Computer Science 2023-09-25 Hugo Tessier , Ghouti Boukli Hacene , Vincent Gripon

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning

Massive data is often considered essential for deep learning applications, but it also incurs significant computational and infrastructural costs. Therefore, dataset pruning (DP) has emerged as an effective way to improve data efficiency by…

Machine Learning · Computer Science 2023-11-21 Yihua Zhang , Yimeng Zhang , Aochuan Chen , Jinghan Jia , Jiancheng Liu , Gaowen Liu , Mingyi Hong , Shiyu Chang , Sijia Liu

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various…

Hardware Architecture · Computer Science 2025-04-21 Benjamin Ramhorst , Vladimir Loncar , George A. Constantinides

Top-Tuning: a study on transfer learning for an efficient alternative to fine tuning for image classification with fast kernel methods

The impressive performance of deep learning architectures is associated with a massive increase in model complexity. Millions of parameters need to be tuned, with training and inference time scaling accordingly, together with energy…

Machine Learning · Computer Science 2023-11-10 Paolo Didier Alfano , Vito Paolo Pastore , Lorenzo Rosasco , Francesca Odone

Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer

Transfer Learning enables Convolutional Neural Networks (CNN) to acquire knowledge from a source domain and transfer it to a target domain, where collecting large-scale annotated examples is time-consuming and expensive. Conventionally,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-25 S. H. Shabbeer Basha , Debapriya Tula , Sravan Kumar Vinakota , Shiv Ram Dubey

HCE: Improving Performance and Efficiency with Heterogeneously Compressed Neural Network Ensemble

Ensemble learning has gain attention in resent deep learning research as a way to further boost the accuracy and generalizability of deep neural network (DNN) models. Recent ensemble training method explores different training algorithms or…

Machine Learning · Computer Science 2023-01-20 Jingchi Zhang , Huanrui Yang , Hai Li

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable…

Computer Vision and Pattern Recognition · Computer Science 2025-11-19 Furui Xu , Shaobo Wang , Jiajun Zhang , Chenghao Sun , Haixiang Tang , Linfeng Zhang

Parameter-Efficient Transfer Learning with Diff Pruning

While task-specific finetuning of pretrained networks has led to significant empirical advances in NLP, the large size of networks makes finetuning difficult to deploy in multi-task, memory-constrained settings. We propose diff pruning as a…

Computation and Language · Computer Science 2021-06-10 Demi Guo , Alexander M. Rush , Yoon Kim

Subset Sampling For Progressive Neural Network Learning

Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. While this approach exempts the users from the manual task of designing…

Machine Learning · Computer Science 2020-05-26 Dat Thanh Tran , Moncef Gabbouj , Alexandros Iosifidis

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-12 Shiwei Zhang , Lansong Diao , Chuan Wu , Zongyan Cao , Siyu Wang , Wei Lin

Efficient Conditional Pre-training for Transfer Learning

Almost all the state-of-the-art neural networks for computer vision tasks are trained by (1) pre-training on a large-scale dataset and (2) finetuning on the target dataset. This strategy helps reduce dependence on the target dataset and…

Computer Vision and Pattern Recognition · Computer Science 2021-11-22 Shuvam Chakraborty , Burak Uzkent , Kumar Ayush , Kumar Tanmay , Evan Sheehan , Stefano Ermon

Self-Balancing Gradient Allocation for Heterogeneity-Aware Feature Generation in Click-Through Rate Prediction

Generative pre-training via discrete diffusion provides dense reconstruction supervision across all feature fields simultaneously, mitigating representation collapse from data sparsity in CTR prediction. However, all existing generative CTR…

Information Retrieval · Computer Science 2026-05-26 Moyu Zhang , Yun Chen , Yujun Jin , Jinxin Hu , Yu Zhang , Xiaoyi Zeng

TensorSocket: Shared Data Loading for Deep Learning Training

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture…

Machine Learning · Computer Science 2025-08-04 Ties Robroek , Neil Kim Nielsen , Pınar Tözün

Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu