Kenneth Heafield

The Llama 3 Herd of Models

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and…

Artificial Intelligence · Computer Science 2024-11-26 Aaron Grattafiori , Abhimanyu Dubey , Abhinav Jauhri , Abhinav Pandey , Abhishek Kadian , Ahmad Al-Dahle , Aiesha Letman , Akhil Mathur , Alan Schelten , Alex Vaughan , Amy Yang , Angela Fan , Anirudh Goyal , Anthony Hartshorn , Aobo Yang , Archi Mitra , Archie Sravankumar , Artem Korenev , Arthur Hinsvark , Arun Rao , Aston Zhang , Aurelien Rodriguez , Austen Gregerson , Ava Spataru , Baptiste Roziere , Bethany Biron , Binh Tang , Bobbie Chern , Charlotte Caucheteux , Chaya Nayak , Chloe Bi , Chris Marra , Chris McConnell , Christian Keller , Christophe Touret , Chunyang Wu , Corinne Wong , Cristian Canton Ferrer , Cyrus Nikolaidis , Damien Allonsius , Daniel Song , Danielle Pintz , Danny Livshits , Danny Wyatt , David Esiobu , Dhruv Choudhary , Dhruv Mahajan , Diego Garcia-Olano , Diego Perino , Dieuwke Hupkes , Egor Lakomkin , Ehab AlBadawy , Elina Lobanova , Emily Dinan , Eric Michael Smith , Filip Radenovic , Francisco Guzmán , Frank Zhang , Gabriel Synnaeve , Gabrielle Lee , Georgia Lewis Anderson , Govind Thattai , Graeme Nail , Gregoire Mialon , Guan Pang , Guillem Cucurell , Hailey Nguyen , Hannah Korevaar , Hu Xu , Hugo Touvron , Iliyan Zarov , Imanol Arrieta Ibarra , Isabel Kloumann , Ishan Misra , Ivan Evtimov , Jack Zhang , Jade Copet , Jaewon Lee , Jan Geffert , Jana Vranes , Jason Park , Jay Mahadeokar , Jeet Shah , Jelmer van der Linde , Jennifer Billock , Jenny Hong , Jenya Lee , Jeremy Fu , Jianfeng Chi , Jianyu Huang , Jiawen Liu , Jie Wang , Jiecao Yu , Joanna Bitton , Joe Spisak , Jongsoo Park , Joseph Rocca , Joshua Johnstun , Joshua Saxe , Junteng Jia , Kalyan Vasuden Alwala , Karthik Prasad , Kartikeya Upasani , Kate Plawiak , Ke Li , Kenneth Heafield , Kevin Stone , Khalid El-Arini , Krithika Iyer , Kshitiz Malik , Kuenley Chiu , Kunal Bhalla , Kushal Lakhotia , Lauren Rantala-Yeary , Laurens van der Maaten , Lawrence Chen , Liang Tan , Liz Jenkins , Louis Martin , Lovish Madaan , Lubo Malo , Lukas Blecher , Lukas Landzaat , Luke de Oliveira , Madeline Muzzi , Mahesh Pasupuleti , Mannat Singh , Manohar Paluri , Marcin Kardas , Maria Tsimpoukelli , Mathew Oldham , Mathieu Rita , Maya Pavlova , Melanie Kambadur , Mike Lewis , Min Si , Mitesh Kumar Singh , Mona Hassan , Naman Goyal , Narjes Torabi , Nikolay Bashlykov , Nikolay Bogoychev , Niladri Chatterji , Ning Zhang , Olivier Duchenne , Onur Çelebi , Patrick Alrassy , Pengchuan Zhang , Pengwei Li , Petar Vasic , Peter Weng , Prajjwal Bhargava , Pratik Dubal , Praveen Krishnan , Punit Singh Koura , Puxin Xu , Qing He , Qingxiao Dong , Ragavan Srinivasan , Raj Ganapathy , Ramon Calderer , Ricardo Silveira Cabral , Robert Stojnic , Roberta Raileanu , Rohan Maheswari , Rohit Girdhar , Rohit Patel , Romain Sauvestre , Ronnie Polidoro , Roshan Sumbaly , Ross Taylor , Ruan Silva , Rui Hou , Rui Wang , Saghar Hosseini , Sahana Chennabasappa , Sanjay Singh , Sean Bell , Seohyun Sonia Kim , Sergey Edunov , Shaoliang Nie , Sharan Narang , Sharath Raparthy , Sheng Shen , Shengye Wan , Shruti Bhosale , Shun Zhang , Simon Vandenhende , Soumya Batra , Spencer Whitman , Sten Sootla , Stephane Collot , Suchin Gururangan , Sydney Borodinsky , Tamar Herman , Tara Fowler , Tarek Sheasha , Thomas Georgiou , Thomas Scialom , Tobias Speckbacher , Todor Mihaylov , Tong Xiao , Ujjwal Karn , Vedanuj Goswami , Vibhor Gupta , Vignesh Ramanathan , Viktor Kerkez , Vincent Gonguet , Virginie Do , Vish Vogeti , Vítor Albiero , Vladan Petrovic , Weiwei Chu , Wenhan Xiong , Wenyin Fu , Whitney Meers , Xavier Martinet , Xiaodong Wang , Xiaofang Wang , Xiaoqing Ellen Tan , Xide Xia , Xinfeng Xie , Xuchao Jia , Xuewei Wang , Yaelle Goldschlag , Yashesh Gaur , Yasmine Babaei , Yi Wen , Yiwen Song , Yuchen Zhang , Yue Li , Yuning Mao , Zacharie Delpierre Coudert , Zheng Yan , Zhengxing Chen , Zoe Papakipos , Aaditya Singh , Aayushi Srivastava , Abha Jain , Adam Kelsey , Adam Shajnfeld , Adithya Gangidi , Adolfo Victoria , Ahuva Goldstand , Ajay Menon , Ajay Sharma , Alex Boesenberg , Alexei Baevski , Allie Feinstein , Amanda Kallet , Amit Sangani , Amos Teo , Anam Yunus , Andrei Lupu , Andres Alvarado , Andrew Caples , Andrew Gu , Andrew Ho , Andrew Poulton , Andrew Ryan , Ankit Ramchandani , Annie Dong , Annie Franco , Anuj Goyal , Aparajita Saraf , Arkabandhu Chowdhury , Ashley Gabriel , Ashwin Bharambe , Assaf Eisenman , Azadeh Yazdan , Beau James , Ben Maurer , Benjamin Leonhardi , Bernie Huang , Beth Loyd , Beto De Paola , Bhargavi Paranjape , Bing Liu , Bo Wu , Boyu Ni , Braden Hancock , Bram Wasti , Brandon Spence , Brani Stojkovic , Brian Gamido , Britt Montalvo , Carl Parker , Carly Burton , Catalina Mejia , Ce Liu , Changhan Wang , Changkyu Kim , Chao Zhou , Chester Hu , Ching-Hsiang Chu , Chris Cai , Chris Tindal , Christoph Feichtenhofer , Cynthia Gao , Damon Civin , Dana Beaty , Daniel Kreymer , Daniel Li , David Adkins , David Xu , Davide Testuggine , Delia David , Devi Parikh , Diana Liskovich , Didem Foss , Dingkang Wang , Duc Le , Dustin Holland , Edward Dowling , Eissa Jamil , Elaine Montgomery , Eleonora Presani , Emily Hahn , Emily Wood , Eric-Tuan Le , Erik Brinkman , Esteban Arcaute , Evan Dunbar , Evan Smothers , Fei Sun , Felix Kreuk , Feng Tian , Filippos Kokkinos , Firat Ozgenel , Francesco Caggioni , Frank Kanayet , Frank Seide , Gabriela Medina Florez , Gabriella Schwarz , Gada Badeer , Georgia Swee , Gil Halpern , Grant Herman , Grigory Sizov , Guangyi , Zhang , Guna Lakshminarayanan , Hakan Inan , Hamid Shojanazeri , Han Zou , Hannah Wang , Hanwen Zha , Haroun Habeeb , Harrison Rudolph , Helen Suk , Henry Aspegren , Hunter Goldman , Hongyuan Zhan , Ibrahim Damlaj , Igor Molybog , Igor Tufanov , Ilias Leontiadis , Irina-Elena Veliche , Itai Gat , Jake Weissman , James Geboski , James Kohli , Janice Lam , Japhet Asher , Jean-Baptiste Gaya , Jeff Marcus , Jeff Tang , Jennifer Chan , Jenny Zhen , Jeremy Reizenstein , Jeremy Teboul , Jessica Zhong , Jian Jin , Jingyi Yang , Joe Cummings , Jon Carvill , Jon Shepard , Jonathan McPhie , Jonathan Torres , Josh Ginsburg , Junjie Wang , Kai Wu , Kam Hou U , Karan Saxena , Kartikay Khandelwal , Katayoun Zand , Kathy Matosich , Kaushik Veeraraghavan , Kelly Michelena , Keqian Li , Kiran Jagadeesh , Kun Huang , Kunal Chawla , Kyle Huang , Lailin Chen , Lakshya Garg , Lavender A , Leandro Silva , Lee Bell , Lei Zhang , Liangpeng Guo , Licheng Yu , Liron Moshkovich , Luca Wehrstedt , Madian Khabsa , Manav Avalani , Manish Bhatt , Martynas Mankus , Matan Hasson , Matthew Lennie , Matthias Reso , Maxim Groshev , Maxim Naumov , Maya Lathi , Meghan Keneally , Miao Liu , Michael L. Seltzer , Michal Valko , Michelle Restrepo , Mihir Patel , Mik Vyatskov , Mikayel Samvelyan , Mike Clark , Mike Macey , Mike Wang , Miquel Jubert Hermoso , Mo Metanat , Mohammad Rastegari , Munish Bansal , Nandhini Santhanam , Natascha Parks , Natasha White , Navyata Bawa , Nayan Singhal , Nick Egebo , Nicolas Usunier , Nikhil Mehta , Nikolay Pavlovich Laptev , Ning Dong , Norman Cheng , Oleg Chernoguz , Olivia Hart , Omkar Salpekar , Ozlem Kalinli , Parkin Kent , Parth Parekh , Paul Saab , Pavan Balaji , Pedro Rittner , Philip Bontrager , Pierre Roux , Piotr Dollar , Polina Zvyagina , Prashant Ratanchandani , Pritish Yuvraj , Qian Liang , Rachad Alao , Rachel Rodriguez , Rafi Ayub , Raghotham Murthy , Raghu Nayani , Rahul Mitra , Rangaprabhu Parthasarathy , Raymond Li , Rebekkah Hogan , Robin Battey , Rocky Wang , Russ Howes , Ruty Rinott , Sachin Mehta , Sachin Siby , Sai Jayesh Bondu , Samyak Datta , Sara Chugh , Sara Hunt , Sargun Dhillon , Sasha Sidorov , Satadru Pan , Saurabh Mahajan , Saurabh Verma , Seiji Yamamoto , Sharadh Ramaswamy , Shaun Lindsay , Shaun Lindsay , Sheng Feng , Shenghao Lin , Shengxin Cindy Zha , Shishir Patil , Shiva Shankar , Shuqiang Zhang , Shuqiang Zhang , Sinong Wang , Sneha Agarwal , Soji Sajuyigbe , Soumith Chintala , Stephanie Max , Stephen Chen , Steve Kehoe , Steve Satterfield , Sudarshan Govindaprasad , Sumit Gupta , Summer Deng , Sungmin Cho , Sunny Virk , Suraj Subramanian , Sy Choudhury , Sydney Goldman , Tal Remez , Tamar Glaser , Tamara Best , Thilo Koehler , Thomas Robinson , Tianhe Li , Tianjun Zhang , Tim Matthews , Timothy Chou , Tzook Shaked , Varun Vontimitta , Victoria Ajayi , Victoria Montanez , Vijai Mohan , Vinay Satish Kumar , Vishal Mangla , Vlad Ionescu , Vlad Poenaru , Vlad Tiberiu Mihailescu , Vladimir Ivanov , Wei Li , Wenchen Wang , Wenwen Jiang , Wes Bouaziz , Will Constable , Xiaocheng Tang , Xiaojian Wu , Xiaolan Wang , Xilun Wu , Xinbo Gao , Yaniv Kleinman , Yanjun Chen , Ye Hu , Ye Jia , Ye Qi , Yenda Li , Yilin Zhang , Ying Zhang , Yossi Adi , Youngjin Nam , Yu , Wang , Yu Zhao , Yuchen Hao , Yundi Qian , Yunlu Li , Yuzi He , Zach Rait , Zachary DeVito , Zef Rosnbrick , Zhaoduo Wen , Zhenyu Yang , Zhiwei Zhao , Zhiyu Ma

Iterative Translation Refinement with Large Language Models

We propose iteratively prompting a large language model to self-correct a translation, with inspiration from their strong language understanding and translation capability as well as a human-like translation approach. Interestingly,…

Computation and Language · Computer Science 2024-05-03 Pinzhen Chen , Zhicheng Guo , Barry Haddow , Kenneth Heafield

Code-Switched Language Identification is Harder Than You Think

Code switching (CS) is a very common phenomenon in written and spoken communication but one that is handled poorly by many natural language processing applications. Looking to the application of building CS corpora, we explore CS language…

Computation and Language · Computer Science 2024-02-05 Laurie Burchell , Alexandra Birch , Robert P. Thompson , Kenneth Heafield

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically…

Computation and Language · Computer Science 2024-02-01 Pinzhen Chen , Shaoxiong Ji , Nikolay Bogoychev , Andrey Kutuzov , Barry Haddow , Kenneth Heafield

Exploring Diversity in Back Translation for Low-Resource Machine Translation

Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the…

Computation and Language · Computer Science 2023-09-01 Laurie Burchell , Alexandra Birch , Kenneth Heafield

An Open Dataset and Model for Language Identification

Language identification (LID) is a fundamental step in many natural language processing pipelines. However, current LID systems are far from perfect, particularly on lower-resource languages. We present a LID model which achieves a…

Computation and Language · Computer Science 2023-08-31 Laurie Burchell , Alexandra Birch , Nikolay Bogoychev , Kenneth Heafield

Efficient Methods for Natural Language Processing: A Survey

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources…

Computation and Language · Computer Science 2023-03-28 Marcos Treviso , Ji-Ung Lee , Tianchu Ji , Betty van Aken , Qingqing Cao , Manuel R. Ciosici , Michael Hassid , Kenneth Heafield , Sara Hooker , Colin Raffel , Pedro H. Martins , André F. T. Martins , Jessica Zosa Forde , Peter Milder , Edwin Simpson , Noam Slonim , Jesse Dodge , Emma Strubell , Niranjan Balasubramanian , Leon Derczynski , Iryna Gurevych , Roy Schwartz

Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task

Chinese word segmentation has entered the deep learning era which greatly reduces the hassle of feature engineering. Recently, some researchers attempted to treat it as character-level translation, which further simplified model designing,…

Computation and Language · Computer Science 2022-10-12 Pinzhen Chen , Kenneth Heafield

No Language Left Behind: Scaling Human-Centered Machine Translation

Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of…

Computation and Language · Computer Science 2022-08-26 NLLB Team , Marta R. Costa-jussà , James Cross , Onur Çelebi , Maha Elbayad , Kenneth Heafield , Kevin Heffernan , Elahe Kalbassi , Janice Lam , Daniel Licht , Jean Maillard , Anna Sun , Skyler Wang , Guillaume Wenzek , Al Youngblood , Bapi Akula , Loic Barrault , Gabriel Mejia Gonzalez , Prangthip Hansanti , John Hoffman , Semarley Jarrett , Kaushik Ram Sadagopan , Dirk Rowe , Shannon Spruit , Chau Tran , Pierre Andrews , Necip Fazil Ayan , Shruti Bhosale , Sergey Edunov , Angela Fan , Cynthia Gao , Vedanuj Goswami , Francisco Guzmán , Philipp Koehn , Alexandre Mourachko , Christophe Ropers , Safiyyah Saleem , Holger Schwenk , Jeff Wang

Making Asynchronous Stochastic Gradient Descent Work for Transformers

Asynchronous stochastic gradient descent (SGD) is attractive from a speed perspective because workers do not wait for synchronization. However, the Transformer model converges poorly with asynchronous SGD, resulting in substantially lower…

Computation and Language · Computer Science 2021-11-30 Alham Fikri Aji , Kenneth Heafield

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to…

Computation and Language · Computer Science 2021-11-30 Alham Fikri Aji , Kenneth Heafield

TranslateLocally: Blazing-fast translation running on the local CPU

Every day, millions of people sacrifice their privacy and browsing habits in exchange for online machine translation. Companies and governments with confidentiality requirements often ban online translation or pay a premium to disable…

Computation and Language · Computer Science 2021-09-22 Nikolay Bogoychev , Jelmer Van der Linde , Kenneth Heafield

Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation

This paper explores augmenting monolingual data for knowledge distillation in neural machine translation. Source language monolingual text can be incorporated as a forward translation. Interestingly, we find the best way to incorporate…

Computation and Language · Computer Science 2021-09-16 Alham Fikri Aji , Kenneth Heafield

Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and…

Computation and Language · Computer Science 2021-09-15 Robert Lim , Kenneth Heafield , Hieu Hoang , Mark Briers , Allen Malony

Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based…

Computation and Language · Computer Science 2021-06-02 Adithya Renduchintala , Denise Diaz , Kenneth Heafield , Xian Li , Mona Diab

The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet's Gluon API, a focus on state of the art model…

Computation and Language · Computer Science 2020-08-12 Tobias Domhan , Michael Denkowski , David Vilar , Xing Niu , Felix Hieber , Kenneth Heafield

Neural Machine Translation with 4-Bit Precision and Beyond

Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress NMT models better for devices with limited hardware capability. Because most neural network parameters are near zero, we employ…

Computation and Language · Computer Science 2019-09-23 Alham Fikri Aji , Kenneth Heafield

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly. In order to achieve further speedup we introduce a technique that…

Computation and Language · Computer Science 2018-09-17 Nikolay Bogoychev , Marcin Junczys-Dowmunt , Kenneth Heafield , Alham Fikri Aji

Multi-Source Syntactic Neural Machine Translation

We introduce a novel multi-source technique for incorporating source syntax into neural machine translation using linearized parses. This is achieved by employing separate encoders for the sequential and parsed versions of the same source…

Computation and Language · Computer Science 2018-08-31 Anna Currey , Kenneth Heafield

Fast Neural Machine Translation Implementation

This paper describes the submissions to the efficiency track for GPUs at the Workshop for Neural Machine Translation and Generation by members of the University of Edinburgh, Adam Mickiewicz University, Tilde and University of Alicante. We…

Computation and Language · Computer Science 2018-06-11 Hieu Hoang , Tomasz Dwojak , Rihards Krislauks , Daniel Torregrosa , Kenneth Heafield