Ting-Rui Chiang
In this work, we propose a simple theoretical framework, Pelican Soup, aiming to better understand how pretraining allows LLMs to (1) generalize to unseen instructions and (2) perform in-context learning, even when the verbalizers are…
The Rotary Position Embedding (RoPE) is widely used in the attention heads of many large language models (LLM). It rotates dimensions in the query and the key vectors by different angles according to their positions in the input sequence.…
The ability to locate an object in an image according to natural language instructions is crucial for many real-world applications. In this work we propose LocateBench, a high-quality benchmark dedicated to evaluating this ability. We…
Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive. In this work, we rule out one previously posited…
We analyze the masked language modeling pretraining objective function from the perspective of the distributional hypothesis. We investigate whether better sample efficiency and the better generalization capability of models pretrained with…
Dialog system developers need high-quality data to train, fine-tune and assess their systems. They often use crowdsourcing for this since it provides large quantities of data from many workers. However, the data may not be of sufficiently…
While multilingual training is now an essential ingredient in machine translation (MT) systems, recent work has demonstrated that it has different effects in different multilingual settings, such as many-to-one, one-to-many, and…
Dialogue state tracking models play an important role in a task-oriented dialogue system. However, most of them model the slot types conditionally independently given the input. We discover that it may cause the model to be confused by slot…
ALFRED is a recently proposed benchmark that requires a model to complete tasks in simulated house environments specified by instructions in natural language. We hypothesize that key to success is accurately aligning the text modality with…
Despite the success of pretrained masked language models (MLM), why MLM pretraining is useful is still a qeustion not fully answered. In this work we theoretically and empirically show that MLM pretraining makes models robust to…
This work focuses on relating two mysteries in neural-based text generation: exposure bias, and text degeneration. Despite the long time since exposure bias was mentioned and the numerous studies for its remedy, to our knowledge, its impact…
The huge size of the widely used BERT family models has led to recent efforts about model distillation. The main goal of distillation is to create a task-agnostic pre-trained model that can be fine-tuned on downstream tasks without…
With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and…
Solving math word problems is a challenging task that requires accurate natural language understanding to bridge natural language texts and math expressions. Motivated by the intuition about how human generates the equations given the…
With the increasing research interest in dialogue response generation, there is an emerging branch formulating this task as selecting next sentences, where given the partial dialogue contexts, the goal is to determine the most probable next…
The response selection has been an emerging research topic due to the growing interest in dialogue modeling, where the goal of the task is to select an appropriate response for continuing dialogues. To further push the end-to-end dialogue…