Related papers: Augmenting Parameter-Efficient Pre-trained Languag…

Scaling Performance of Large Language Model Pretraining

Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; frontier Artificial Intelligence (AI)…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-10 Alexander Interrante-Grant , Carla Varela-Rosa , Suhaas Narayan , Chris Connelly , Albert Reuther

Predictions For Pre-training Language Models

Language model pre-training has proven to be useful in many language understanding tasks. In this paper, we investigate whether it is still helpful to add the self-training method in the pre-training step and the fine-tuning step. Towards…

Computation and Language · Computer Science 2023-02-17 Tong Guo

Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide

Fine-tuning large language models (LLMs) with limited data poses a practical challenge in low-resource languages, specialized domains, and constrained deployment settings. While pre-trained LLMs provide strong foundations, effective…

Computation and Language · Computer Science 2025-10-29 Marton Szep , Daniel Rueckert , Rüdiger von Eisenhart-Rothe , Florian Hinterwimmer

Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques

Large Language Models (LLMs) are transforming cybersecurity by enabling intelligent, adaptive, and automated approaches to threat detection, vulnerability assessment, and incident response. With their advanced language understanding and…

Cryptography and Security · Computer Science 2025-07-21 Niveen O. Jaffal , Mohammed Alkhanafseh , David Mohaisen

Preserving Pre-trained Features Helps Calibrate Fine-tuned Language Models

Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning. However, fine-tuned models still suffer from overconfident predictions, especially in…

Computation and Language · Computer Science 2023-05-31 Guande He , Jianfei Chen , Jun Zhu

A Survey of Large Language Models in Cybersecurity

Large Language Models (LLMs) have quickly risen to prominence due to their ability to perform at or close to the state-of-the-art in a variety of fields while handling natural language. An important field of research is the application of…

Cryptography and Security · Computer Science 2024-02-28 Gabriel de Jesus Coelho da Silva , Carlos Becker Westphall

Semi-Supervised Learning for Large Language Models Safety and Content Moderation

Safety for Large Language Models (LLMs) has been an ongoing research focus since their emergence and is even more relevant nowadays with the increasing capacity of those models. Currently, there are several guardrails in place for all…

Computation and Language · Computer Science 2025-12-25 Eduard Stefan Dinuta , Iustin Sirbu , Traian Rebedea

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often…

Computation and Language · Computer Science 2025-06-27 Zhengyan Shi

A Survey of Large Language Models

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach,…

Computation and Language · Computer Science 2026-03-19 Wayne Xin Zhao , Kun Zhou , Junyi Li , Tianyi Tang , Xiaolei Wang , Yupeng Hou , Yingqian Min , Beichen Zhang , Junjie Zhang , Zican Dong , Yifan Du , Chen Yang , Yushuo Chen , Zhipeng Chen , Jinhao Jiang , Ruiyang Ren , Yifan Li , Xinyu Tang , Zikang Liu , Peiyu Liu , Jian-Yun Nie , Ji-Rong Wen

Large Language Models as a (Bad) Security Norm in the Context of Regulation and Compliance

The use of Large Language Models (LLM) by providers of cybersecurity and digital infrastructures of all kinds is an ongoing development. It is suggested and on an experimental basis used to write the code for the systems, and potentially…

Computers and Society · Computer Science 2025-12-19 Kaspar Rosager Ludvigsen

Training Bilingual LMs with Data Constraints in the Targeted Language

Large language models are trained on massive scrapes of the web, as required by current scaling laws. Most progress is made for English, given its abundance of high-quality pretraining data. For most other languages, however, such high…

Computation and Language · Computer Science 2025-02-07 Skyler Seto , Maartje ter Hoeve , Richard He Bai , Natalie Schluter , David Grangier

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly. In contrast, small language…

Computation and Language · Computer Science 2024-09-23 Mohammad Samragh , Iman Mirzadeh , Keivan Alizadeh Vahid , Fartash Faghri , Minsik Cho , Moin Nabi , Devang Naik , Mehrdad Farajtabar

Efficient Strategy for Improving Large Language Model (LLM) Capabilities

Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing. However, their large-scale deployment remains constrained by the need for significant computational resources.…

Computation and Language · Computer Science 2025-08-07 Julián Camilo Velandia Gutiérrez

Backdoor Attacks for In-Context Learning with Language Models

Because state-of-the-art language models are expensive to train, most practitioners must make use of one of the few publicly available language models or language model APIs. This consolidation of trust increases the potency of backdoor…

Cryptography and Security · Computer Science 2023-07-28 Nikhil Kandpal , Matthew Jagielski , Florian Tramèr , Nicholas Carlini

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based…

Computation and Language · Computer Science 2022-01-24 Jack W. Rae , Sebastian Borgeaud , Trevor Cai , Katie Millican , Jordan Hoffmann , Francis Song , John Aslanides , Sarah Henderson , Roman Ring , Susannah Young , Eliza Rutherford , Tom Hennigan , Jacob Menick , Albin Cassirer , Richard Powell , George van den Driessche , Lisa Anne Hendricks , Maribeth Rauh , Po-Sen Huang , Amelia Glaese , Johannes Welbl , Sumanth Dathathri , Saffron Huang , Jonathan Uesato , John Mellor , Irina Higgins , Antonia Creswell , Nat McAleese , Amy Wu , Erich Elsen , Siddhant Jayakumar , Elena Buchatskaya , David Budden , Esme Sutherland , Karen Simonyan , Michela Paganini , Laurent Sifre , Lena Martens , Xiang Lorraine Li , Adhiguna Kuncoro , Aida Nematzadeh , Elena Gribovskaya , Domenic Donato , Angeliki Lazaridou , Arthur Mensch , Jean-Baptiste Lespiau , Maria Tsimpoukelli , Nikolai Grigorev , Doug Fritz , Thibault Sottiaux , Mantas Pajarskas , Toby Pohlen , Zhitao Gong , Daniel Toyama , Cyprien de Masson d'Autume , Yujia Li , Tayfun Terzi , Vladimir Mikulik , Igor Babuschkin , Aidan Clark , Diego de Las Casas , Aurelia Guy , Chris Jones , James Bradbury , Matthew Johnson , Blake Hechtman , Laura Weidinger , Iason Gabriel , William Isaac , Ed Lockhart , Simon Osindero , Laura Rimell , Chris Dyer , Oriol Vinyals , Kareem Ayoub , Jeff Stanway , Lorrayne Bennett , Demis Hassabis , Koray Kavukcuoglu , Geoffrey Irving

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power…

Computation and Language · Computer Science 2020-09-22 Zhaojiang Lin , Andrea Madotto , Pascale Fung

Model Merging in Pre-training of Large Language Models

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model…

Computation and Language · Computer Science 2025-05-23 Yunshui Li , Yiyuan Ma , Shen Yan , Chaoyi Zhang , Jing Liu , Jianqiao Lu , Ziwen Xu , Mengzhao Chen , Minrui Wang , Shiyi Zhan , Jin Ma , Xunhao Lai , Deyi Liu , Yao Luo , Xingyan Bin , Hongbin Ren , Mingji Han , Wenhao Hao , Bairen Yi , LingJun Liu , Bole Ma , Xiaoying Jia , Xun Zhou , Siyuan Qiao , Liang Xiang , Yonghui Wu

Enhancing Pre-Trained Language Models for Vulnerability Detection via Semantic-Preserving Data Augmentation

With the rapid development and widespread use of advanced network systems, software vulnerabilities pose a significant threat to secure communications and networking. Learning-based vulnerability detection systems, particularly those…

Cryptography and Security · Computer Science 2024-10-04 Weiliang Qi , Jiahao Cao , Darsh Poddar , Sophia Li , Xinda Wang