Related papers: Block removal for large language models through co…

Scalable iterative pruning of large language and vision models using block coordinate descent

Pruning neural networks, which involves removing a fraction of their weights, can often maintain high accuracy while significantly reducing model complexity, at least up to a certain limit. We present a neural network pruning technique that…

Machine Learning · Computer Science 2024-11-28 Gili Rosenberg , J. Kyle Brubaker , Martin J. A. Schuetz , Elton Yechao Zhu , Serdar Kadıoğlu , Sima E. Borujeni , Helmut G. Katzgraber

IteRABRe: Iterative Recovery-Aided Block Reduction

Large Language Models (LLMs) have grown increasingly expensive to deploy, driving the need for effective model compression techniques. While block pruning offers a straightforward approach to reducing model size, existing methods often…

Computation and Language · Computer Science 2025-03-11 Haryo Akbarianto Wibowo , Haiyue Song , Hideki Tanaka , Masao Utiyama , Alham Fikri Aji , Raj Dabre

Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning

Pre-trained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the…

Computation and Language · Computer Science 2020-11-18 Bingbing Li , Zhenglun Kong , Tianyun Zhang , Ji Li , Zhengang Li , Hang Liu , Caiwen Ding

MI-PRUN: Optimize Large Language Model Pruning via Mutual Information

Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by removing redundant components from models. In…

Computation and Language · Computer Science 2026-01-13 Hao Zhang , Zhibin Zhang , Guangxin Wu , He Chen , Jiafeng Guo , Xueqi Cheng

Entropy-Based Block Pruning for Efficient Large Language Models

As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an…

Computation and Language · Computer Science 2025-04-08 Liangwei Yang , Yuhui Xu , Juntao Tan , Doyen Sahoo , Silvio Savarese , Caiming Xiong , Huan Wang , Shelby Heinecke

IG-Pruning: Input-Guided Block Pruning for Large Language Models

With the growing computational demands of large language models (LLMs), efficient inference has become increasingly critical for practical deployment. Depth pruning has emerged as a promising approach for reducing the computational costs of…

Computation and Language · Computer Science 2025-11-05 Kangyu Qiao , Shaolei Zhang , Yang Feng

MultiPruner: Balanced Structure Removal in Foundation Models

Recently, state-of-the-art approaches for pruning large pre-trained models (LPMs) have demonstrated that the training-free removal of non-critical residual blocks in Transformers is viable for reducing model size, achieving results that…

Machine Learning · Computer Science 2025-01-20 J. Pablo Muñoz , Jinjie Yuan , Nilesh Jain

FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing

The rapid proliferation of large language models (LLMs) in natural language processing (NLP) has created a critical need for techniques that enable efficient deployment on memory-constrained devices without compromising performance. We…

Computation and Language · Computer Science 2025-02-03 James Seale Smith , Chi-Heng Lin , Shikhar Tuli , Haris Jeelani , Shangqian Gao , Yilin Shen , Hongxia Jin , Yen-Chang Hsu

Contribution to Blocker and Interdiction optimization problems in networks

This manuscript describes the notions of blocker and interdiction applied to well-known optimization problems. The main interest of these two concepts is the capability to analyze the existence of a combinatorial structure after some…

Discrete Mathematics · Computer Science 2024-12-12 Sébastien Martin

Iterative Structured Pruning for Large Language Models with Multi-Domain Calibration

Large Language Models (LLMs) have achieved remarkable success across a wide spectrum of natural language processing tasks. However, their ever-growing scale introduces significant barriers to real-world deployment, including substantial…

Computation and Language · Computer Science 2026-01-07 Guangxin Wu , Hao Zhang , Zhang Zhibin , Jiafeng Guo , Xueqi Cheng

A constrained optimization approach to improve robustness of neural networks

In this paper, we present a novel nonlinear programming-based approach to fine-tune pre-trained neural networks to improve robustness against adversarial attacks while maintaining high accuracy on clean data. Our method introduces…

Machine Learning · Computer Science 2024-10-28 Shudian Zhao , Jan Kronqvist

Block building programming for symbolic regression

Symbolic regression that aims to detect underlying data-driven models has become increasingly important for industrial data analysis. For most existing algorithms such as genetic programming (GP), the convergence speed might be too slow for…

Neural and Evolutionary Computing · Computer Science 2017-10-31 Chen Chen , Changtong Luo , Zonglin Jiang

Efficient Optimization Accelerator Framework for Multistate Ising Problems

Ising Machines are emerging hardware architectures that efficiently solve NP-Hard combinatorial optimization problems. Generally, combinatorial problems are transformed into quadratic unconstrained binary optimization (QUBO) form, but this…

Hardware Architecture · Computer Science 2025-09-12 Chirag Garg , Sayeef Salahuddin

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at…

Computation and Language · Computer Science 2024-12-16 Jiwon Song , Kyungseok Oh , Taesu Kim , Hyungjun Kim , Yulhwa Kim , Jae-Joon Kim

Parallelizable Search-Space Decomposition for Large-Scale Combinatorial Optimization Problems Using Ising Machines

Combinatorial optimization problems are crucial in industry. However, many COPs are NP-hard, causing the search space to grow exponentially with problem size and rendering large-scale instances computationally intractable. Conventional…

Emerging Technologies · Computer Science 2026-02-27 Eiji Kawase , Shuta Kikuchi , Hideaki Tamai , Shu Tanaka

Compressing Large Language Models with Automated Sub-Network Search

Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become…

Computation and Language · Computer Science 2025-02-06 Rhea Sanjay Sukthanker , Benedikt Staffler , Frank Hutter , Aaron Klein

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

We present a new method for large language models to solve compositional tasks. Although they have shown strong performance on traditional language understanding tasks, large language models struggle to solve compositional tasks, where the…

Computation and Language · Computer Science 2024-07-09 Eric Pasewark , Kyle Montgomery , Kefei Duan , Dawn Song , Chenguang Wang

Progtuning: Progressive Fine-tuning Framework for Transformer-based Language Models

Fine-tuning is a promising technique for leveraging Transformer-based language models in downstream tasks. As model sizes continue to grow, updating all model parameters becomes increasingly costly. Parameter-efficient fine-tuning methods…

Computation and Language · Computer Science 2025-06-27 Xiaoshuang Ji , Zhendong Zhao , Xiaojun Chen , Xin Zhao , Zeyao Liu

Block local elimination algorithms for solving sparse discrete optimization problems

Block elimination algorithms for solving sparse discrete optimization problems are considered. The numerical example is provided. The benchmarking is done in order to define real computational capabilities of block elimination algorithms…

Discrete Mathematics · Computer Science 2012-01-04 Alexander Sviridenko , Oleg Shcherbina

BlockPruner: Fine-grained Pruning for Large Language Models

With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial…

Computation and Language · Computer Science 2025-05-23 Longguang Zhong , Fanqi Wan , Ruijun Chen , Xiaojun Quan , Liangzhi Li