Dan Iter — Scifaro

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5…

Computation and Language · Computer Science 2024-09-04 Marah Abdin , Jyoti Aneja , Hany Awadalla , Ahmed Awadallah , Ammar Ahmad Awan , Nguyen Bach , Amit Bahree , Arash Bakhtiari , Jianmin Bao , Harkirat Behl , Alon Benhaim , Misha Bilenko , Johan Bjorck , Sébastien Bubeck , Martin Cai , Qin Cai , Vishrav Chaudhary , Dong Chen , Dongdong Chen , Weizhu Chen , Yen-Chun Chen , Yi-Ling Chen , Hao Cheng , Parul Chopra , Xiyang Dai , Matthew Dixon , Ronen Eldan , Victor Fragoso , Jianfeng Gao , Mei Gao , Min Gao , Amit Garg , Allie Del Giorno , Abhishek Goswami , Suriya Gunasekar , Emman Haider , Junheng Hao , Russell J. Hewett , Wenxiang Hu , Jamie Huynh , Dan Iter , Sam Ade Jacobs , Mojan Javaheripi , Xin Jin , Nikos Karampatziakis , Piero Kauffmann , Mahoud Khademi , Dongwoo Kim , Young Jin Kim , Lev Kurilenko , James R. Lee , Yin Tat Lee , Yuanzhi Li , Yunsheng Li , Chen Liang , Lars Liden , Xihui Lin , Zeqi Lin , Ce Liu , Liyuan Liu , Mengchen Liu , Weishung Liu , Xiaodong Liu , Chong Luo , Piyush Madan , Ali Mahmoudzadeh , David Majercak , Matt Mazzola , Caio César Teodoro Mendes , Arindam Mitra , Hardik Modi , Anh Nguyen , Brandon Norick , Barun Patra , Daniel Perez-Becker , Thomas Portet , Reid Pryzant , Heyang Qin , Marko Radmilac , Liliang Ren , Gustavo de Rosa , Corby Rosset , Sambudha Roy , Olatunji Ruwase , Olli Saarikivi , Amin Saied , Adil Salim , Michael Santacroce , Shital Shah , Ning Shang , Hiteshi Sharma , Yelong Shen , Swadheen Shukla , Xia Song , Masahiro Tanaka , Andrea Tupini , Praneetha Vaddamanu , Chunyu Wang , Guanhua Wang , Lijuan Wang , Shuohang Wang , Xin Wang , Yu Wang , Rachel Ward , Wen Wen , Philipp Witte , Haiping Wu , Xiaoxia Wu , Michael Wyatt , Bin Xiao , Can Xu , Jiahang Xu , Weijian Xu , Jilong Xue , Sonali Yadav , Fan Yang , Jianwei Yang , Yifan Yang , Ziyi Yang , Donghan Yu , Lu Yuan , Chenruidong Zhang , Cyril Zhang , Jianwen Zhang , Li Lyna Zhang , Yi Zhang , Yue Zhang , Yunan Zhang , Xiren Zhou

In-Context Demonstration Selection with Cross Entropy Difference

Large language models (LLMs) can use in-context demonstrations to improve performance on zero-shot tasks. However, selecting the best in-context examples is challenging because model performance can vary widely depending on the selected…

Computation and Language · Computer Science 2023-11-29 Dan Iter , Reid Pryzant , Ruochen Xu , Shuohang Wang , Yang Liu , Yichong Xu , Chenguang Zhu

Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models

Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of…

Computation and Language · Computer Science 2023-10-23 Zhihan Zhang , Shuohang Wang , Wenhao Yu , Yichong Xu , Dan Iter , Qingkai Zeng , Yang Liu , Chenguang Zhu , Meng Jiang

The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions

Recent progress in Large Language Models (LLMs) has produced models that exhibit remarkable performance across a variety of NLP tasks. However, it remains unclear whether the existing focus of NLP research accurately captures the genuine…

Computation and Language · Computer Science 2023-10-20 Siru Ouyang , Shuohang Wang , Yang Liu , Ming Zhong , Yizhu Jiao , Dan Iter , Reid Pryzant , Chenguang Zhu , Heng Ji , Jiawei Han

Automatic Prompt Optimization with "Gradient Descent" and Beam Search

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and…

Computation and Language · Computer Science 2023-10-20 Reid Pryzant , Dan Iter , Jerry Li , Yin Tat Lee , Chenguang Zhu , Michael Zeng

Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference

Performing event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n^2$ pairwise comparisons. Existing approaches simplify by considering coreference…

Computation and Language · Computer Science 2023-05-29 William Held , Dan Iter , Dan Jurafsky

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

The quality of texts generated by natural language generation (NLG) systems is hard to measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE, have been shown to have relatively low correlation with human…

Computation and Language · Computer Science 2023-05-25 Yang Liu , Dan Iter , Yichong Xu , Shuohang Wang , Ruochen Xu , Chenguang Zhu

LMGQS: A Large-scale Dataset for Query-focused Summarization

Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has…

Computation and Language · Computer Science 2023-05-23 Ruochen Xu , Song Wang , Yang Liu , Shuohang Wang , Yichong Xu , Dan Iter , Chenguang Zhu , Michael Zeng

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies…

Computation and Language · Computer Science 2023-05-23 Yichong Xu , Ruochen Xu , Dan Iter , Yang Liu , Shuohang Wang , Chenguang Zhu , Michael Zeng

How Does In-Context Learning Help Prompt Tuning?

Fine-tuning large language models is becoming ever more impractical due to their rapidly-growing scale. This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable…

Computation and Language · Computer Science 2023-02-23 Simeng Sun , Yang Liu , Dan Iter , Chenguang Zhu , Mohit Iyyer

Generate rather than Retrieve: Large Language Models are Strong Context Generators

Knowledge-intensive tasks, such as open-domain question answering (QA), require access to a large amount of world or domain knowledge. A common approach for knowledge-intensive tasks is to employ a retrieve-then-read pipeline that first…

Computation and Language · Computer Science 2023-01-26 Wenhao Yu , Dan Iter , Shuohang Wang , Yichong Xu , Mingxuan Ju , Soumya Sanyal , Chenguang Zhu , Michael Zeng , Meng Jiang

The Trade-offs of Domain Adaptation for Neural Language Models

This work connects language model adaptation with concepts of machine learning theory. We consider a training setup with a large out-of-domain set and a small in-domain set. We derive how the benefit of training a model on either set…

Computation and Language · Computer Science 2022-03-23 David Grangier , Dan Iter

On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation

Domain adaptation of neural networks commonly relies on three training phases: pretraining, selected data training and then fine tuning. Data selection improves target domain generalization by training further on pretraining data identified…

Computation and Language · Computer Science 2021-09-17 Dan Iter , David Grangier

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models

Recent models for unsupervised representation learning of text have employed a number of techniques to improve contextual word representations but have put little focus on discourse-level representations. We propose CONPONO, an…

Computation and Language · Computer Science 2020-05-22 Dan Iter , Kelvin Guu , Larry Lansing , Dan Jurafsky

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is obtaining enough labeled training data. Recent approaches use generative models to combine weak supervision sources, like user-defined heuristics or knowledge bases, to…

Machine Learning · Computer Science 2017-09-29 Paroma Varma , Bryan He , Dan Iter , Peng Xu , Rose Yu , Christopher De Sa , Christopher Ré

Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs

We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-20 Stefan Hadjis , Ce Zhang , Ioannis Mitliagkas , Dan Iter , Christopher Ré