Related papers: TLP: A Deep Learning-based Cost Model for Tensor P…

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal…

Machine Learning · Computer Science 2026-04-15 Chaoyao Shen , Linfeng Jiang , Yixian Shen , Tao Xu , Guoqing Li , Anuj Pathania , Andy D. Pimentel , Meng Zhang

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

Machine Learning · Computer Science 2023-10-17 Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , Joseph E. Gonzalez , Ion Stoica

Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning

Tensor program tuning is essential for the efficient deployment of deep neural networks. Search-based approaches have demonstrated scalability and effectiveness in automatically finding high-performance programs for specific hardware.…

Machine Learning · Computer Science 2025-04-10 Liang Qiao , Jun Shi , Xiaoyu Hao , Xi Fang , Sen Zhang , Minfan Zhao , Ziqi Zhu , Junshi Chen , Hong An , Xulong Tang , Bing Li , Honghui Yuan , Xinyang Wang

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Performance · Computer Science 2021-03-19 Samuel J. Kaufman , Phitchaya Mangpo Phothilimthana , Yanqi Zhou , Charith Mendis , Sudip Roy , Amit Sabne , Mike Burrows

Learned Token Pruning for Transformers

Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes…

Computation and Language · Computer Science 2022-06-06 Sehoon Kim , Sheng Shen , David Thorsley , Amir Gholami , Woosuk Kwon , Joseph Hassoun , Kurt Keutzer

MPLP: Massively Parallelized Lazy Planning

Lazy search algorithms have been developed to efficiently solve planning problems in domains where the computational effort is dominated by the cost of edge evaluation. The existing algorithms operate by intelligently balancing…

Robotics · Computer Science 2023-01-16 Shohin Mukherjee , Sandip Aine , Maxim Likhachev

Auto-tuning TensorFlow Threading Model for CPU Backend

TensorFlow is a popular deep learning framework used by data scientists to solve a wide-range of machine learning and deep learning problems such as image classification and speech recognition. It also operates at a large scale and in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-06 Niranjan Hasabnis

IPM-LSTM: A Learning-Based Interior Point Method for Solving Nonlinear Programs

Solving constrained nonlinear programs (NLPs) is of great importance in various domains such as power systems, robotics, and wireless communication networks. One widely used approach for addressing NLPs is the interior point method (IPM).…

Optimization and Control · Mathematics 2024-10-22 Xi Gao , Jinxin Xiong , Akang Wang , Qihong Duan , Jiang Xue , Qingjiang Shi

MetaTune: Meta-Learning Based Cost Model for Fast and Efficient Auto-tuning Frameworks

Deep learning compiler frameworks are gaining ground as a more portable back-end for deep learning applications on increasingly diverse hardware. However, they face the daunting challenge of matching performance offered by hand-tuned…

Machine Learning · Computer Science 2021-02-10 Jaehun Ryu , Hyojin Sung

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-23 Jiale Xu , Rui Zhang , Cong Guo , Weiming Hu , Zihan Liu , Feiyang Wu , Yu Feng , Shixuan Sun , Changxu Shao , Yuhong Guo , Junping Zhao , Ke Zhang , Minyi Guo , Jingwen Leng

Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs

As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop…

Machine Learning · Computer Science 2023-02-16 Yaoyao Ding , Cody Hao Yu , Bojian Zheng , Yizhi Liu , Yida Wang , Gennady Pekhimenko

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g., vision models) to tackle complex tasks based on task…

Artificial Intelligence · Computer Science 2025-07-15 Duo Wu , Jinghe Wang , Yuan Meng , Yanning Zhang , Le Sun , Zhi Wang

Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization

Despite the strong reasoning capabilities of large language models (LLMs), optimizing the execution efficiency of tensor programs remains challenging due to the need for precise, composable transformation decisions. Recent LLM-guided…

Machine Learning · Computer Science 2026-05-26 Mengfan Liu , Da Zheng , Junwei Su , Chuan Wu

Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be…

Machine Learning · Computer Science 2022-09-08 Perry Gibson , José Cano

tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)

Tensor networks are efficient for extremely high-dimensional representation, but their model selection, known as tensor network structure search (TN-SS), is a challenging problem. Although several works have targeted TN-SS, most existing…

Machine Learning · Computer Science 2024-06-04 Junhua Zeng , Chao Li , Zhun Sun , Qibin Zhao , Guoxu Zhou

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However,…

Computation and Language · Computer Science 2023-08-17 Lovre Torbarina , Tin Ferkovic , Lukasz Roguski , Velimir Mihelcic , Bruno Sarlija , Zeljko Kraljevic

LOP: Learning Optimal Pruning for Efficient On-Demand MLLMs Scaling

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

QiMeng-TensorOp: Automatically Generating High-Performance Tensor Operators with Hardware Primitives

Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is…

Machine Learning · Computer Science 2025-05-13 Xuzhi Zhang , Shaohui Peng , Qirui Zhou , Yuanbo Wen , Qi Guo , Ruizhi Chen , Xinguo Zhu , Weiqiang Xiong , Haixin Chen , Congying Ma , Ke Gao , Chen Zhao , Yanjun Wu , Yunji Chen , Ling Li

A Two Level Neural Approach Combining Off-Chip Prediction with Adaptive Prefetch Filtering

To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will…

Hardware Architecture · Computer Science 2025-11-04 Alexandre Valentin Jamet , Georgios Vavouliotis , Daniel A. Jiménez , Lluc Alvarez , Marc Casas