English
Related papers

Related papers: Mamba base PKD for efficient knowledge compression

200 papers

Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data…

Machine Learning · Computer Science 2025-06-23 Naoki Matsumura , Yuta Yoshimoto , Yuto Iwasaki , Meguru Yamazaki , Yasufumi Sakai

Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the…

Machine Learning · Computer Science 2024-02-22 Neelesh Gupta , Pengmiao Zhang , Rajgopal Kannan , Viktor Prasanna

The LiDAR 3D object detector that strikes a balance between accuracy and speed is crucial for achieving real-time perception in autonomous driving. However, many existing LiDAR detection models depend on complex feature transformations,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Rui Yu , Runkai Zhao , Jiagen Li , Qingsong Zhao , HuaiCheng Yan , Meng Wang

Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student…

Machine Learning · Computer Science 2023-08-09 Chengming Hu , Xuan Li , Dan Liu , Haolun Wu , Xi Chen , Ju Wang , Xue Liu

Pre-trained language models (PLMs) have emerged as powerful tools for code understanding. However, deploying these PLMs in large-scale applications faces practical challenges due to their computational intensity and inference latency.…

Software Engineering · Computer Science 2025-08-22 Ruiqi Wang , Zezhou Yang , Cuiyun Gao , Xin Xia , Qing Liao

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the…

Machine Learning · Computer Science 2025-06-30 Junxiong Wang , Daniele Paliotta , Avner May , Alexander M. Rush , Tri Dao

Deep Neural Networks (DNNs) have achieved notable performance in the fields of computer vision and natural language processing with various applications in both academia and industry. However, with recent advancements in DNNs and…

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep…

Neural and Evolutionary Computing · Computer Science 2018-02-16 Antonio Polino , Razvan Pascanu , Dan Alistarh

Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory…

Signal Processing · Electrical Eng. & Systems 2021-05-19 Alexey Ozerov , Ngoc Duong

Although Deep Neural Networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs with voluminous parameters are hard to be deployed in a real-time system. To tackle this issue, Teacher-Student…

Machine Learning · Computer Science 2022-11-01 Chengming Hu , Xuan Li , Dan Liu , Xi Chen , Ju Wang , Xue Liu

Transformer-based large language models (LLMs) are increasingly being adopted in networking research to address domain-specific challenges. However, their quadratic time complexity and substantial model sizes often result in significant…

Networking and Internet Architecture · Computer Science 2025-10-21 Linhan Xia , Mingzhan Yang , Jingjing Wang , Ziwei Yan , Yakun Ren , Guo Yu , Kai Lei

Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Yao Chen , Jiabao Wang , Peichao Wang , Rui Zhang , Yang Li

Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to…

Image and Video Processing · Electrical Eng. & Systems 2024-02-06 Matcha Naga Gayathri , Sriprabha Ramanarayanan , Mohammad Al Fahim , Rahul G S , Keerthi Ram , Mohanasankar Sivaprakasam

Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically…

Computation and Language · Computer Science 2023-04-20 Varun Gumma , Raj Dabre , Pratyush Kumar

Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large…

Machine Learning · Computer Science 2025-12-11 Gustavo Coelho Haase , Paulo Henrique Dourado da Silva

The model reduction problem that eases the computation costs and latency of complex deep learning architectures has received an increasing number of investigations owing to its importance in model deployment. One promising method is…

Machine Learning · Computer Science 2018-12-04 Wei-Chun Chen , Chia-Che Chang , Chien-Yu Lu , Che-Rung Lee

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Qianqian Zhang , Leon Tabaro , Ahmed M. Abdelmoniem , Junshe An

State Space Models (SSMs) such as Mamba have become a popular alternative to Transformer models, due to their reduced memory consumption and higher throughput at generation compared to their Attention-based counterparts. On the other hand,…

Computation and Language · Computer Science 2026-04-17 Abhinav Moudgil , Ningyuan Huang , Eeshan Gunesh Dhekane , Pau Rodríguez , Luca Zappella , Federico Danieli

Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time, limit the deployment of such models in real-time applications.…

Computation and Language · Computer Science 2022-11-03 Haojie Pan , Chengyu Wang , Minghui Qiu , Yichang Zhang , Yaliang Li , Jun Huang
‹ Prev 1 2 3 10 Next ›