English
Related papers

Related papers: TLP: A Deep Learning-based Cost Model for Tensor P…

200 papers

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal…

Machine Learning · Computer Science 2026-04-15 Chaoyao Shen , Linfeng Jiang , Yixian Shen , Tao Xu , Guoqing Li , Anuj Pathania , Andy D. Pimentel , Meng Zhang

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

Tensor program tuning is essential for the efficient deployment of deep neural networks. Search-based approaches have demonstrated scalability and effectiveness in automatically finding high-performance programs for specific hardware.…

Machine Learning · Computer Science 2025-04-10 Liang Qiao , Jun Shi , Xiaoyu Hao , Xi Fang , Sen Zhang , Minfan Zhao , Ziqi Zhu , Junshi Chen , Hong An , Xulong Tang , Bing Li , Honghui Yuan , Xinyang Wang

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

Deploying transformer models in practice is challenging due to their inference cost, which scales quadratically with input sequence length. To address this, we present a novel Learned Token Pruning (LTP) method which adaptively removes…

Computation and Language · Computer Science 2022-06-06 Sehoon Kim , Sheng Shen , David Thorsley , Amir Gholami , Woosuk Kwon , Joseph Hassoun , Kurt Keutzer

Lazy search algorithms have been developed to efficiently solve planning problems in domains where the computational effort is dominated by the cost of edge evaluation. The existing algorithms operate by intelligently balancing…

Robotics · Computer Science 2023-01-16 Shohin Mukherjee , Sandip Aine , Maxim Likhachev

TensorFlow is a popular deep learning framework used by data scientists to solve a wide-range of machine learning and deep learning problems such as image classification and speech recognition. It also operates at a large scale and in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-06 Niranjan Hasabnis

Solving constrained nonlinear programs (NLPs) is of great importance in various domains such as power systems, robotics, and wireless communication networks. One widely used approach for addressing NLPs is the interior point method (IPM).…

Optimization and Control · Mathematics 2024-10-22 Xi Gao , Jinxin Xiong , Akang Wang , Qihong Duan , Jiang Xue , Qingjiang Shi

Deep learning compiler frameworks are gaining ground as a more portable back-end for deep learning applications on increasingly diverse hardware. However, they face the daunting challenge of matching performance offered by hand-tuned…

Machine Learning · Computer Science 2021-02-10 Jaehun Ryu , Hyojin Sung

Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-23 Jiale Xu , Rui Zhang , Cong Guo , Weiming Hu , Zihan Liu , Feiyang Wu , Yu Feng , Shixuan Sun , Changxu Shao , Yuhong Guo , Junping Zhao , Ke Zhang , Minyi Guo , Jingwen Leng

As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop…

Machine Learning · Computer Science 2023-02-16 Yaoyao Ding , Cody Hao Yu , Bojian Zheng , Yizhi Liu , Yida Wang , Gennady Pekhimenko

Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g., vision models) to tackle complex tasks based on task…

Artificial Intelligence · Computer Science 2025-07-15 Duo Wu , Jinghe Wang , Yuan Meng , Yanning Zhang , Le Sun , Zhi Wang

Despite the strong reasoning capabilities of large language models (LLMs), optimizing the execution efficiency of tensor programs remains challenging due to the need for precise, composable transformation decisions. Recent LLM-guided…

Machine Learning · Computer Science 2026-05-26 Mengfan Liu , Da Zheng , Junwei Su , Chuan Wu

Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be…

Machine Learning · Computer Science 2022-09-08 Perry Gibson , José Cano

Tensor networks are efficient for extremely high-dimensional representation, but their model selection, known as tensor network structure search (TN-SS), is a challenging problem. Although several works have targeted TN-SS, most existing…

Machine Learning · Computer Science 2024-06-04 Junhua Zeng , Chao Li , Zhun Sun , Qibin Zhao , Guoxu Zhou

The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However,…

Computation and Language · Computer Science 2023-08-17 Lovre Torbarina , Tin Ferkovic , Lukasz Roguski , Velimir Mihelcic , Bruno Sarlija , Zeljko Kraljevic

Structural pruning techniques are essential for deploying multimodal large language models (MLLMs) across various hardware platforms, from edge devices to cloud servers. However, current pruning methods typically determine optimal…

Computer Vision and Pattern Recognition · Computer Science 2025-06-17 Zhihan Zhang , Xiang Pan , Hongchen Wei , Zhenzhong Chen

Computation-intensive tensor operators constitute over 90\% of the computations in Large Language Models (LLMs) and Deep Neural Networks.Automatically and efficiently generating high-performance tensor operators with hardware primitives is…

To alleviate the performance and energy overheads of contemporary applications with large data footprints, we propose the Two Level Perceptron (TLP) predictor, a neural mechanism that effectively combines predicting whether an access will…

Hardware Architecture · Computer Science 2025-11-04 Alexandre Valentin Jamet , Georgios Vavouliotis , Daniel A. Jiménez , Lluc Alvarez , Marc Casas
‹ Prev 1 2 3 10 Next ›