Related papers: Mediators in Determining what Processing BERT Perf…

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target…

Computation and Language · Computer Science 2020-05-12 Yada Pruksachatkun , Jason Phang , Haokun Liu , Phu Mon Htut , Xiaoyi Zhang , Richard Yuanzhe Pang , Clara Vania , Katharina Kann , Samuel R. Bowman

Understanding the Behaviors of BERT in Ranking

This paper studies the performances and behaviors of BERT in ranking tasks. We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc…

Information Retrieval · Computer Science 2019-04-29 Yifan Qiao , Chenyan Xiong , Zhenghao Liu , Zhiyuan Liu

On Explaining Your Explanations of BERT: An Empirical Study with Sequence Classification

BERT, as one of the pretrianed language models, attracts the most attention in recent years for creating new benchmarks across GLUE tasks via fine-tuning. One pressing issue is to open up the blackbox and explain the decision makings of…

Computation and Language · Computer Science 2021-01-05 Zhengxuan Wu , Desmond C. Ong

Undivided Attention: Are Intermediate Layers Necessary for BERT?

In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based…

Computation and Language · Computer Science 2023-04-06 Sharath Nittur Sridhar , Anthony Sarah

Does Dialog Length matter for Next Response Selection task? An Empirical Study

In the last few years, the release of BERT, a multilingual transformer based model, has taken the NLP community by storm. BERT-based models have achieved state-of-the-art results on various NLP tasks, including dialog tasks. One of the…

Computation and Language · Computer Science 2021-01-26 Jatin Ganhotra , Sachindra Joshi

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval

On a wide range of natural language processing and information retrieval tasks, transformer-based models, particularly pre-trained language models like BERT, have demonstrated tremendous effectiveness. Due to the quadratic complexity of the…

Information Retrieval · Computer Science 2022-10-18 Minghan Li , Diana Nicoleta Popa , Johan Chagnon , Yagmur Gizem Cinar , Eric Gaussier

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

Identifying arguments is a necessary prerequisite for various tasks in automated discourse analysis, particularly within contexts such as political debates, online discussions, and scientific reasoning. In addition to theoretical advances…

Computation and Language · Computer Science 2025-05-29 Marc Feger , Katarina Boland , Stefan Dietze

Pretrained Transformers for Text Ranking: BERT and Beyond

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural…

Information Retrieval · Computer Science 2021-08-20 Jimmy Lin , Rodrigo Nogueira , Andrew Yates

The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis

Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making…

Machine Learning · Computer Science 2025-10-01 Aaron Mueller , Jannik Brinkmann , Millicent Li , Samuel Marks , Koyena Pal , Nikhil Prakash , Can Rager , Aruna Sankaranarayanan , Arnab Sen Sharma , Jiuding Sun , Eric Todd , David Bau , Yonatan Belinkov

Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers

In the era of high performing Large Language Models, researchers have widely acknowledged that contextual word representations are one of the key drivers in achieving top performances in downstream tasks. In this work, we investigate the…

Computation and Language · Computer Science 2024-09-24 Soniya Vijayakumar , Josef van Genabith , Simon Ostermann

Hierarchical Multitask Learning Approach for BERT

Recent works show that learning contextualized embeddings for words is beneficial for downstream tasks. BERT is one successful example of this approach. It learns embeddings by solving two tasks, which are masked language model (masked LM)…

Computation and Language · Computer Science 2020-11-10 Çağla Aksoy , Alper Ahmetoğlu , Tunga Güngör

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for…

Computation and Language · Computer Science 2020-11-24 Jesse Vig , Sebastian Gehrmann , Yonatan Belinkov , Sharon Qian , Daniel Nevo , Simas Sakenis , Jason Huang , Yaron Singer , Stuart Shieber

Better Reasoning Behind Classification Predictions with BERT for Fake News Detection

Fake news detection has become a major task to solve as there has been an increasing number of fake news on the internet in recent years. Although many classification models have been proposed based on statistical learning methods showing…

Computation and Language · Computer Science 2022-07-26 Daesoo Lee

Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study

Large pre-trained language models help to achieve state of the art on a variety of natural language processing (NLP) tasks, nevertheless, they still suffer from forgetting when incrementally learning a sequence of tasks. To alleviate this…

Computation and Language · Computer Science 2023-03-03 Mingxu Tao , Yansong Feng , Dongyan Zhao

What does BERT know about books, movies and music? Probing BERT for Conversational Recommendation

Heavily pre-trained transformer models such as BERT have recently shown to be remarkably powerful at language modelling by achieving impressive results on numerous downstream tasks. It has also been shown that they are able to implicitly…

Information Retrieval · Computer Science 2021-03-05 Gustavo Penha , Claudia Hauff

Pretrained Language Models for Document-Level Neural Machine Translation

Previous work on document-level NMT usually focuses on limited contexts because of degraded performance on larger contexts. In this paper, we investigate on using large contexts with three main contributions: (1) Different from previous…

Computation and Language · Computer Science 2019-11-11 Liangyou Li , Xin Jiang , Qun Liu

Enhancing Legal Argument Mining with Domain Pre-training and Neural Networks

The contextual word embedding model, BERT, has proved its ability on downstream tasks with limited quantities of annotated data. BERT and its variants help to reduce the burden of complex annotation work in many interdisciplinary research…

Computation and Language · Computer Science 2022-04-07 Gechuan Zhang , Paul Nulty , David Lillis

Probing Neural Network Comprehension of Natural Language Arguments

We are surprised to find that BERT's peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by…

Computation and Language · Computer Science 2019-09-17 Timothy Niven , Hung-Yu Kao

Profitable Trade-Off Between Memory and Performance In Multi-Domain Chatbot Architectures

Text classification problem is a very broad field of study in the field of natural language processing. In short, the text classification problem is to determine which of the previously determined classes the given text belongs to.…

Computation and Language · Computer Science 2021-12-28 D. Emre Taşar , Şükrü Ozan , M. Fatih Akca , Oğuzhan Ölmez , Semih Gülüm , Seçilay Kutal , Ceren Belhan

Probing as Quantifying Inductive Bias

Pre-trained contextual representations have led to dramatic performance improvements on a range of downstream tasks. Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in…

Computation and Language · Computer Science 2022-03-28 Alexander Immer , Lucas Torroba Hennigen , Vincent Fortuin , Ryan Cotterell