Related papers: Does Pretraining for Summarization Require Knowled…
Biomedical summarization requires large datasets to train for text generation. We show that while transfer learning offers a viable option for addressing this challenge, an in-domain pre-training does not always offer advantages in a BioASQ…
Pre-training models with large crawled corpora can lead to issues such as toxicity and bias, as well as copyright and privacy concerns. A promising way of alleviating such concerns is to conduct pre-training with synthetic tasks and data,…
Deep neural networks are data hungry models and thus face difficulties when attempting to train on small text datasets. Transfer learning is a potential solution but their effectiveness in the text domain is not as explored as in areas such…
Neural abstractive summarization has been studied in many pieces of literature and achieves great success with the aid of large corpora. However, when encountering novel tasks, one may not always benefit from transfer learning due to the…
Pre-trained language models (e.g. BART) have shown impressive results when fine-tuned on large summarization datasets. However, little is understood about this fine-tuning process, including what knowledge is retained from pre-training time…
Large-scale pretraining instills large amounts of knowledge in deep neural networks. This, in turn, improves the generalization behavior of these models in downstream tasks. What exactly are the limits to the generalization benefits of…
Recommender systems aim to provide item recommendations for users, and are usually faced with data sparsity problem (e.g., cold start) in real-world scenarios. Recently pre-trained models have shown their effectiveness in knowledge transfer…
Pre-training has exhibited notable benefits to downstream tasks by boosting accuracy and speeding up convergence, but the exact reasons for these benefits still remain unclear. To this end, we propose to quantitatively and explicitly…
For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent…
This empirical study analyzes the effects of the pre-training corpus on the quality of learned transformer representations. We focus on the representation quality induced solely through pre-training. Our experiments show that pre-training…
Despite recent advancements in automatic summarization, state-of-the-art models do not summarize all documents equally well, raising the question: why? While prior research has extensively analyzed summarization models, little attention has…
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models, and generally yields improved performance and faster training times. The technique of pre-training on one task and then…
Most studies on language model pretraining focus on large datasets, leaving open questions about optimization in data-constrained settings. In such settings, the effects of training data order and of including alternative versions of the…
Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to…
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version. Existing unsupervised abstractive summarization models leverage recurrent neural networks framework while the…
Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the…
The goal of text summarization is to compress documents to the relevant information while excluding background information already known to the receiver. So far, summarization researchers have given considerably more attention to relevance…
When training data is scarce, the incorporation of additional prior knowledge can assist the learning process. While it is common to initialize neural networks with weights that have been pre-trained on other large data sets, pre-training…
Pre-trained language models are still far from human performance in tasks that need understanding of properties (e.g. appearance, measurable quantity) and affordances of everyday objects in the real world since the text lacks such…
Summarization is a core task in Natural Language Processing (NLP). Recent advances in Large Language Models (LLMs) and the introduction of large context windows reaching millions of tokens make it possible to process entire books in a…