Related papers: Super Tiny Language Models

A Survey of Small Language Models

Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources, making them ideal for various settings including on-device,…

Computation and Language · Computer Science 2024-10-29 Chien Van Nguyen , Xuan Shen , Ryan Aponte , Yu Xia , Samyadeep Basu , Zhengmian Hu , Jian Chen , Mihir Parmar , Sasidhar Kunapuli , Joe Barrow , Junda Wu , Ashish Singh , Yu Wang , Jiuxiang Gu , Franck Dernoncourt , Nesreen K. Ahmed , Nedim Lipka , Ruiyi Zhang , Xiang Chen , Tong Yu , Sungchul Kim , Hanieh Deilamsalehy , Namyong Park , Mike Rimer , Zhehao Zhang , Huanrui Yang , Ryan A. Rossi , Thien Huu Nguyen

Small Language Models: Survey, Measurements, and Insights

Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data…

Computation and Language · Computer Science 2025-02-27 Zhenyan Lu , Xiang Li , Dongqi Cai , Rongjie Yi , Fangming Liu , Xiwen Zhang , Nicholas D. Lane , Mengwei Xu

Small Language Models (SLMs) Can Still Pack a Punch: A survey (updated 2026)

As foundation AI models continue to increase in size, an important question arises - is massive scale the only path forward? This survey of about 160 papers presents a family of Small Language Models (SLMs) in the 1 to 8 billion parameter…

Computation and Language · Computer Science 2026-05-15 Akanksha Gupta , Bijo Thomas , Harshita Asnani , Phanindra Reddy Madduru , Samia Feroze , Shreyas Subramanian , Vikram Elango , Mecit Gungor

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and computational requirements present significant…

Artificial Intelligence · Computer Science 2025-11-13 Ruihao Gong , Yifu Ding , Zining Wang , Chengtao Lv , Xingyu Zheng , Jinyang Du , Haotong Qin , Jinyang Guo , Michele Magno , Xianglong Liu

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples…

Computation and Language · Computer Science 2025-05-26 Hongzhi Huang , Defa Zhu , Banggu Wu , Yutao Zeng , Ya Wang , Qiyang Min , Xun Zhou

Accelerating Multilingual Language Model for Excessively Tokenized Languages

Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text…

Computation and Language · Computer Science 2024-08-07 Jimin Hong , Gibbeum Lee , Jaewoong Cho

Small Language Models: Architectures, Techniques, Evaluation, Problems and Future Adaptation

Small Language Models (SLMs) have gained substantial attention due to their ability to execute diverse language tasks successfully while using fewer computer resources. These models are particularly ideal for deployment in limited…

Computation and Language · Computer Science 2025-05-30 Tanjil Hasan Sakib , Md. Tanzib Hosain , Md. Kishor Morol

Do Generative Large Language Models need billions of parameters?

This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the…

Computation and Language · Computer Science 2023-09-14 Sia Gholami , Marwan Omar

Achieving Peak Performance for Large Language Models: A Systematic Review

In recent years, large language models (LLMs) have achieved remarkable success in natural language processing (NLP). LLMs require an extreme amount of parameters to attain high performance. As models grow into the trillion-parameter range,…

Computation and Language · Computer Science 2024-09-10 Zhyar Rzgar K Rostam , Sándor Szénási , Gábor Kertész

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this…

Computation and Language · Computer Science 2025-02-06 Loubna Ben Allal , Anton Lozhkov , Elie Bakouch , Gabriel Martín Blázquez , Guilherme Penedo , Lewis Tunstall , Andrés Marafioti , Hynek Kydlíček , Agustín Piqueres Lajarín , Vaibhav Srivastav , Joshua Lochner , Caleb Fahlgren , Xuan-Son Nguyen , Clémentine Fourrier , Ben Burtenshaw , Hugo Larcher , Haojun Zhao , Cyril Zakka , Mathieu Morlon , Colin Raffel , Leandro von Werra , Thomas Wolf

LAST: Language Model Aware Speech Tokenization

Speech tokenization serves as the foundation of speech language model (LM), enabling them to perform various tasks such as spoken language modeling, text-to-speech, speech-to-text, etc. Most speech tokenizers are trained independently of…

Computation and Language · Computer Science 2024-09-11 Arnon Turetzky , Yossi Adi

Scaling Spoken Language Models with Syllabic Speech Tokenization

Spoken language models (SLMs) typically discretize speech into high-frame-rate tokens extracted from SSL speech models. As the most successful LMs are based on the Transformer architecture, processing these long token streams with…

Computation and Language · Computer Science 2026-02-05 Nicholas Lee , Cheol Jun Cho , Alan W Black , Gopala K. Anumanchipalli

Specializing Smaller Language Models towards Multi-Step Reasoning

The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such…

Computation and Language · Computer Science 2023-01-31 Yao Fu , Hao Peng , Litu Ou , Ashish Sabharwal , Tushar Khot

The Super Weight in Large Language Models

Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of parameters, so these small fractions, such…

Computation and Language · Computer Science 2025-07-08 Mengxia Yu , De Wang , Qi Shan , Colorado J Reed , Alvin Wan

Effective Learning for Small Reasoning Models: An Empirical Study on 0.5B Reasoning LLMs

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks. However, these models come with significant computational and energy…

Artificial Intelligence · Computer Science 2025-11-19 Xialie Zhuang , Peixian Ma , Zhikai Jia , Zane Cao , Shiwei Liu

QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models

Large Language Models (LLMs) have been emerging as prominent AI models for solving many natural language tasks due to their high performance (e.g., accuracy) and capabilities in generating high-quality responses to the given inputs.…

Neural and Evolutionary Computing · Computer Science 2026-04-22 Rachmad Vidya Wicaksana Putra , Pasindu Wickramasinghe , Muhammad Shafique

Assessing Small Language Models for Code Generation: An Empirical Study with Benchmarks

The recent advancements of Small Language Models (SLMs) have opened new possibilities for efficient code generation. SLMs offer lightweight and cost-effective alternatives to Large Language Models (LLMs), making them attractive for use in…

Software Engineering · Computer Science 2026-01-21 Md Mahade Hasan , Muhammad Waseem , Kai-Kristian Kemell , Jussi Rasku , Juha Ala-Rantala , Pekka Abrahamsson

Scaling Performance of Large Language Model Pretraining

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

Scaling Properties of Speech Language Models

Speech Language Models (SLMs) aim to learn language from raw audio, without textual resources. Despite significant advances, our current models exhibit weak syntax and semantic abilities. However, if the scaling properties of neural…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-13 Santiago Cuervo , Ricard Marxer

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong