English
Related papers

Related papers: AutoTinyBERT: Automatic Hyper-parameter Optimizati…

200 papers

Pre-trained language models (PLM), for example BERT or RoBERTa, mark the state-of-the-art for natural language understanding task when fine-tuned on labeled data. However, their large size poses challenges in deploying them for inference in…

Machine Learning · Computer Science 2024-08-27 Aaron Klein , Jacek Golebiowski , Xingchen Ma , Valerio Perrone , Cedric Archambeau

Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model…

Computation and Language · Computer Science 2023-05-29 Neal Lawton , Anoop Kumar , Govind Thattai , Aram Galstyan , Greg Ver Steeg

Transformer-based pre-trained language models like BERT and its variants have recently achieved promising performance in various natural language processing (NLP) tasks. However, the conventional paradigm constructs the backbone by purely…

Computation and Language · Computer Science 2022-02-08 Jiahui Gao , Hang Xu , Han Shi , Xiaozhe Ren , Philip L. H. Yu , Xiaodan Liang , Xin Jiang , Zhenguo Li

This paper introduces a novel framework for designing efficient neural network architectures specifically tailored to tiny machine learning (TinyML) platforms. By leveraging large language models (LLMs) for neural architecture search (NAS),…

Machine Learning · Computer Science 2025-04-15 Christophe El Zeinaty , Wassim Hamidouche , Glenn Herrou , Daniel Menard , Merouane Debbah

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them…

Computation and Language · Computer Science 2021-06-01 Jin Xu , Xu Tan , Renqian Luo , Kaitao Song , Jian Li , Tao Qin , Tie-Yan Liu

Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-09 Shoukang Hu , Xurong Xie , Shansong Liu , Mingyu Cui , Mengzhe Geng , Xunying Liu , Helen Meng

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer…

Computation and Language · Computer Science 2020-02-11 Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , Radu Soricut

The abilities of modern large language models (LLMs) in solving natural language processing, complex reasoning, sentiment analysis and other tasks have been extraordinary which has prompted their extensive adoption. Unfortunately, these…

Artificial Intelligence · Computer Science 2024-05-29 Anthony Sarah , Sharath Nittur Sridhar , Maciej Szankin , Sairam Sundaresan

Transformer language models (TLMs) are critical for most NLP tasks, but they are difficult to create for low-resource languages because of how much pretraining data they require. In this work, we investigate two techniques for training…

Computation and Language · Computer Science 2023-01-06 Luke Gessler , Amir Zeldes

Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely…

Computation and Language · Computer Science 2022-10-12 Yuanxin Liu , Fandong Meng , Zheng Lin , Jiangnan Li , Peng Fu , Yanan Cao , Weiping Wang , Jie Zhou

Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. However, these models usually contain millions of parameters, which prevents them from practical deployment on…

Computation and Language · Computer Science 2022-01-03 Changsheng Zhao , Ting Hua , Yilin Shen , Qian Lou , Hongxia Jin

Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the…

Artificial Intelligence · Computer Science 2024-03-28 Philip Kenneweg , Sarah Schröder , Barbara Hammer

Large Language Models (LLMs) have long held sway in the realms of artificial intelligence research. Numerous efficient techniques, including weight pruning, quantization, and distillation, have been embraced to compress LLMs, targeting…

Artificial Intelligence · Computer Science 2024-11-01 Xuan Shen , Pu Zhao , Yifan Gong , Zhenglun Kong , Zheng Zhan , Yushu Wu , Ming Lin , Chao Wu , Xue Lin , Yanzhi Wang

State-of-the-art automatic speech recognition (ASR) system development is data and computation intensive. The optimal design of deep neural networks (DNNs) for these systems often require expert knowledge and empirical evaluation. In this…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-30 Shoukang Hu , Xurong Xie , Mingyu Cui , Jiajun Deng , Shansong Liu , Jianwei Yu , Mengzhe Geng , Xunying Liu , Helen Meng

Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick…

Computation and Language · Computer Science 2021-01-25 Daoyuan Chen , Yaliang Li , Minghui Qiu , Zhen Wang , Bofang Li , Bolin Ding , Hongbo Deng , Jun Huang , Wei Lin , Jingren Zhou

Previous works on meta-learning either relied on elaborately hand-designed network structures or adopted specialized learning rules to a particular domain. We propose a universal framework to optimize the meta-learning process automatically…

Machine Learning · Computer Science 2019-09-10 Xinyue Zheng , Peng Wang , Qigang Wang , Zhongchao shi , Feiyu Xu

Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient…

Machine Learning · Computer Science 2020-08-18 Henry Tsai , Jayden Ooi , Chun-Sung Ferng , Hyung Won Chung , Jason Riesa

Neural Architecture Search (NAS) is a powerful approach of automating the design of efficient neural architectures. In contrast to traditional NAS methods, recently proposed one-shot NAS methods prove to be more efficient in performing NAS.…

Computer Vision and Pattern Recognition · Computer Science 2025-01-16 Waqwoya Abebe , Sadegh Jafari , Sixing Yu , Akash Dutta , Jan Strube , Nathan R. Tallent , Luanzheng Guo , Pablo Munoz , Ali Jannesari

The rapid proliferation of computing domains relying on Internet of Things (IoT) devices has created a pressing need for efficient and accurate deep-learning (DL) models that can run on low-power devices. However, traditional DL models tend…

Currently, the most widespread neural network architecture for training language models is the so called BERT which led to improvements in various Natural Language Processing (NLP) tasks. In general, the larger the number of parameters in a…

Computation and Language · Computer Science 2021-11-02 Jochen Zöllner , Konrad Sperfeld , Christoph Wick , Roger Labahn
‹ Prev 1 2 3 10 Next ›