Related papers: Tiny language models

Large Language Models as Universal Predictors? An Empirical Study on Small Tabular Datasets

Large Language Models (LLMs), originally developed for natural language processing (NLP), have demonstrated the potential to generalize across modalities and domains. With their in-context learning (ICL) capabilities, LLMs can perform…

Artificial Intelligence · Computer Science 2025-08-26 Nikolaos Pavlidis , Vasilis Perifanis , Symeon Symeonidis , Pavlos S. Efraimidis

Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning

Natural language processing (NLP) enables the understanding and generation of meaningful human language, typically using a pre-trained complex architecture on a large dataset to learn the language and next fine-tune its weights to implement…

Computation and Language · Computer Science 2025-09-04 Yarden Tzach , Ronit D. Gross , Ella Koresh , Shalom Rosner , Or Shpringer , Tal Halevi , Ido Kanter

Understanding HTML with Large Language Models

Large language models (LLMs) have shown exceptional performance on a variety of natural language tasks. Yet, their capabilities for HTML understanding -- i.e., parsing the raw HTML of a webpage, with applications to automation of web-based…

Machine Learning · Computer Science 2023-05-22 Izzeddin Gur , Ofir Nachum , Yingjie Miao , Mustafa Safdari , Austin Huang , Aakanksha Chowdhery , Sharan Narang , Noah Fiedel , Aleksandra Faust

A Comprehensive Comparison of Pre-training Language Models

Recently, the development of pre-trained language models has brought natural language processing (NLP) tasks to the new state-of-the-art. In this paper we explore the efficiency of various pre-trained language models. We pre-train a list of…

Computation and Language · Computer Science 2023-07-27 Tong Guo

Improving the Ability of Pre-trained Language Model by Imparting Large Language Model's Experience

Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks (e.g., code completion and code generation). By leveraging huge existing code corpora (e.g., GitHub),…

Software Engineering · Computer Science 2025-01-16 Xin Yin , Chao Ni , Xiaodan Xu , Xinrui Li , Xiaohu Yang

A Survey of Large Language Models

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach,…

Computation and Language · Computer Science 2026-03-19 Wayne Xin Zhao , Kun Zhou , Junyi Li , Tianyi Tang , Xiaolei Wang , Yupeng Hou , Yingqian Min , Beichen Zhang , Junjie Zhang , Zican Dong , Yifan Du , Chen Yang , Yushuo Chen , Zhipeng Chen , Jinhao Jiang , Ruiyang Ren , Yifan Li , Xinyu Tang , Zikang Liu , Peiyu Liu , Jian-Yun Nie , Ji-Rong Wen

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by…

Computation and Language · Computer Science 2023-10-31 Yizhe Yang , Huashan Sun , Jiawei Li , Runheng Liu , Yinghao Li , Yuhang Liu , Heyan Huang , Yang Gao

NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale…

Computation and Language · Computer Science 2022-07-25 Xingcheng Yao , Yanan Zheng , Xiaocong Yang , Zhilin Yang

LaoPLM: Pre-trained Language Models for Lao

Trained on the large corpus, pre-trained language models (PLMs) can capture different levels of concepts in context and hence generate universal language representations. They can benefit multiple downstream natural language processing…

Computation and Language · Computer Science 2021-10-15 Nankai Lin , Yingwen Fu , Chuwei Chen , Ziyu Yang , Shengyi Jiang

Large Language Models as Data Preprocessors

Large Language Models (LLMs), typified by OpenAI's GPT, have marked a significant advancement in artificial intelligence. Trained on vast amounts of text data, LLMs are capable of understanding and generating human-like text across a…

Artificial Intelligence · Computer Science 2024-10-29 Haochen Zhang , Yuyang Dong , Chuan Xiao , Masafumi Oyamada

Pre-training LLMs using human-like development data corpus

Pre-trained Large Language Models (LLMs) have shown success in a diverse set of language inference and understanding tasks. The pre-training stage of LLMs looks at a large corpus of raw textual data. The BabyLM shared task compares LLM…

Computation and Language · Computer Science 2024-01-11 Khushi Bhardwaj , Raj Sanjay Shah , Sashank Varma

Large Language Models For Text Classification: Case Study And Comprehensive Review

Unlocking the potential of Large Language Models (LLMs) in data classification represents a promising frontier in natural language processing. In this work, we evaluate the performance of different LLMs in comparison with state-of-the-art…

Computation and Language · Computer Science 2025-01-16 Arina Kostina , Marios D. Dikaiakos , Dimosthenis Stefanidis , George Pallis

A Survey on Large Language Models with some Insights on their Capabilities and Limitations

The rapid advancement of artificial intelligence, particularly with the development of Large Language Models (LLMs) built on the transformer architecture, has redefined the capabilities of natural language processing. These models now…

Computation and Language · Computer Science 2025-02-11 Andrea Matarazzo , Riccardo Torlone

Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck?

Large Language Models (LLMs) exhibit a puzzling disparity in their formal linguistic competence: while they learn some linguistic phenomena with near-perfect mastery, they often perform below chance on others, even after training on…

Computation and Language · Computer Science 2026-04-21 H S V N S Kowndinya Renduchintala , Sumit Bhatia

Test-Time Learning for Large Language Models

While Large Language Models (LLMs) have exhibited remarkable emergent capabilities through extensive pre-training, they still face critical limitations in generalizing to specialized domains and handling diverse linguistic variations, known…

Computation and Language · Computer Science 2025-05-28 Jinwu Hu , Zhitian Zhang , Guohao Chen , Xutao Wen , Chao Shuai , Wei Luo , Bin Xiao , Yuanqing Li , Mingkui Tan

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness

Large language models (LLMs) have demonstrated emergent abilities in text generation, question answering, and reasoning, facilitating various tasks and domains. Despite their proficiency in various tasks, LLMs like PaLM 540B and Llama-3.1…

Computation and Language · Computer Science 2024-12-31 Fali Wang , Zhiwei Zhang , Xianren Zhang , Zongyu Wu , Tzuhao Mo , Qiuhao Lu , Wanjing Wang , Rui Li , Junjie Xu , Xianfeng Tang , Qi He , Yao Ma , Ming Huang , Suhang Wang

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural…

Machine Learning · Computer Science 2024-12-05 Minghao Shao , Abdul Basit , Ramesh Karri , Muhammad Shafique

MicroBERT: Effective Training of Low-resource Monolingual BERTs through Parameter Reduction and Multitask Learning

Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training…

Computation and Language · Computer Science 2023-01-06 Luke Gessler , Amir Zeldes

Beyond Human-Like Processing: Large Language Models Perform Equivalently on Forward and Backward Scientific Text

The impressive performance of large language models (LLMs) has led to their consideration as models of human language processing. Instead, we suggest that the success of LLMs arises from the flexibility of the transformer learning…

Computation and Language · Computer Science 2024-11-19 Xiaoliang Luo , Michael Ramscar , Bradley C. Love

Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

In the rapidly evolving field of Explainable Natural Language Processing (NLP), textual explanations, i.e., human-like rationales, are pivotal for explaining model predictions and enriching datasets with interpretable labels. Traditional…

Computation and Language · Computer Science 2025-11-12 Mahdi Dhaini , Juraj Vladika , Ege Erdogan , Zineb Attaoui , Gjergji Kasneci