Related papers: Pretrained Generative Language Models as General L…

Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power…

Computation and Language · Computer Science 2020-09-22 Zhaojiang Lin , Andrea Madotto , Pascale Fung

Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning

Recent pretrained language models extend from millions to billions of parameters. Thus the need to fine-tune an extremely large pretrained model with a limited training corpus arises in various downstream tasks. In this paper, we propose a…

Computation and Language · Computer Science 2021-09-14 Runxin Xu , Fuli Luo , Zhiyuan Zhang , Chuanqi Tan , Baobao Chang , Songfang Huang , Fei Huang

Language Models are General-Purpose Interfaces

Foundation models have received much attention due to their effectiveness across a broad range of downstream applications. Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for…

Computation and Language · Computer Science 2022-06-14 Yaru Hao , Haoyu Song , Li Dong , Shaohan Huang , Zewen Chi , Wenhui Wang , Shuming Ma , Furu Wei

Investigating Pre-trained Language Models on Cross-Domain Datasets, a Step Closer to General AI

Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we…

Computation and Language · Computer Science 2023-06-22 Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

Pre-Trained Language Models for Interactive Decision-Making

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and…

Machine Learning · Computer Science 2022-11-01 Shuang Li , Xavier Puig , Chris Paxton , Yilun Du , Clinton Wang , Linxi Fan , Tao Chen , De-An Huang , Ekin Akyürek , Anima Anandkumar , Jacob Andreas , Igor Mordatch , Antonio Torralba , Yuke Zhu

Sequence-to-Sequence Learning with Latent Neural Grammars

Sequence-to-sequence learning with neural networks has become the de facto standard for sequence prediction tasks. This approach typically models the local distribution over the next word with a powerful neural network that can condition on…

Computation and Language · Computer Science 2021-11-17 Yoon Kim

Domain-Adaptive Continued Pre-Training of Small Language Models

Continued pre-training of small language models offers a promising path for domain adaptation with limited computational resources. I've investigated this approach within educational domains, evaluating it as a resource-efficient…

Computation and Language · Computer Science 2025-04-15 Salman Faroz

Generative Pre-training for Speech with Flow Matching

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data. In speech, text-to-speech…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-27 Alexander H. Liu , Matt Le , Apoorv Vyas , Bowen Shi , Andros Tjandra , Wei-Ning Hsu

EduBERT: Pretrained Deep Language Models for Learning Analytics

The use of large pretrained neural networks to create contextualized word embeddings has drastically improved performance on several natural language processing (NLP) tasks. These computationally expensive models have begun to be applied to…

Computers and Society · Computer Science 2019-12-03 Benjamin Clavié , Kobi Gal

Generative Model for Less-Resourced Language with 1 billion parameters

Large language models (LLMs) are a basic infrastructure for modern natural language processing. Many commercial and open-source LLMs exist for English, e.g., ChatGPT, Llama, Falcon, and Mistral. As these models are trained on mostly English…

Computation and Language · Computer Science 2024-10-10 Domen Vreš , Martin Božič , Aljaž Potočnik , Tomaž Martinčič , Marko Robnik-Šikonja

Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and…

Computation and Language · Computer Science 2022-07-11 Zejiang Hou , Julian Salazar , George Polovets

Progressive Generation of Long Text with Pretrained Language Models

Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. However, as our systematic examination reveals, it is still challenging for such models to generate coherent…

Computation and Language · Computer Science 2021-04-15 Bowen Tan , Zichao Yang , Maruan AI-Shedivat , Eric P. Xing , Zhiting Hu

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the…

Computation and Language · Computer Science 2025-01-09 Shentao Yang , Shujian Zhang , Congying Xia , Yihao Feng , Caiming Xiong , Mingyuan Zhou

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language…

Computation and Language · Computer Science 2025-02-25 Jiasheng Ye , Zaixiang Zheng , Yu Bao , Lihua Qian , Quanquan Gu

Video-Grounded Dialogues with Pretrained Generation Language Models

Pre-trained language models have shown remarkable success in improving various downstream NLP tasks due to their ability to capture dependencies in textual data and generate natural responses. In this paper, we leverage the power of…

Computation and Language · Computer Science 2020-06-30 Hung Le , Steven C. H. Hoi

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins

We report a flexible language-model based deep learning strategy, applied here to solve complex forward and inverse problems in protein modeling, based on an attention neural network that integrates transformer and graph convolutional…

Biomolecules · Quantitative Biology 2023-10-20 Markus J. Buehler

The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities

Large Language Models (LLMs), trained on extensive web-scale corpora, have demonstrated remarkable abilities across diverse tasks, especially as they are scaled up. Nevertheless, even state-of-the-art models struggle in certain cases,…

Computation and Language · Computer Science 2025-01-16 Irina Bigoulaeva , Harish Tayyar Madabushi , Iryna Gurevych

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization…

Computation and Language · Computer Science 2021-07-06 Yu Sun , Shuohuan Wang , Shikun Feng , Siyu Ding , Chao Pang , Junyuan Shang , Jiaxiang Liu , Xuyi Chen , Yanbin Zhao , Yuxiang Lu , Weixin Liu , Zhihua Wu , Weibao Gong , Jianzhong Liang , Zhizhou Shang , Peng Sun , Wei Liu , Xuan Ouyang , Dianhai Yu , Hao Tian , Hua Wu , Haifeng Wang

Structural Guidance for Transformer Language Models

Transformer-based language models pre-trained on large amounts of text data have proven remarkably successful in learning generic transferable linguistic representations. Here we study whether structural guidance leads to more human-like…

Computation and Language · Computer Science 2021-08-03 Peng Qian , Tahira Naseem , Roger Levy , Ramón Fernandez Astudillo

Pretrained Language Models are Symbolic Mathematics Solvers too!

Solving symbolic mathematics has always been of in the arena of human ingenuity that needs compositional reasoning and recurrence. However, recent studies have shown that large-scale language models such as transformers are universal and…

Machine Learning · Statistics 2023-03-15 Kimia Noorbakhsh , Modar Sulaiman , Mahdi Sharifi , Kallol Roy , Pooyan Jamshidi