Related papers: Dynamic Language Models for Continuously Evolving …

TweetBERT: A Pretrained Language Representation Model for Twitter Text Analysis

Twitter is a well-known microblogging social site where users express their views and opinions in real-time. As a result, tweets tend to contain valuable information. With the advancements of deep learning in the domain of natural language…

Computation and Language · Computer Science 2020-10-22 Mohiuddin Md Abdul Qudar , Vijay Mago

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Generated hateful and toxic content by a portion of users in social media is a rising phenomenon that motivated researchers to dedicate substantial efforts to the challenging direction of hateful content identification. We not only need an…

Social and Information Networks · Computer Science 2019-10-29 Marzieh Mozafari , Reza Farahbakhsh , Noel Crespi

Capturing Evolution in Word Usage: Just Add More Clusters?

The way the words are used evolves through time, mirroring cultural or technological evolution of society. Semantic change detection is the task of detecting and analysing word evolution in textual data, even in short periods of time. In…

Computation and Language · Computer Science 2020-04-21 Matej Martinc , Syrielle Montariol , Elaine Zosa , Lidia Pivovarova

Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media

Language use differs between domains and even within a domain, language use changes over time. For pre-trained language models like BERT, domain adaptation through continued pre-training has been shown to improve performance on in-domain…

Computation and Language · Computer Science 2021-09-09 Paul Röttger , Janet B. Pierrehumbert

Back to the Future -- Sequential Alignment of Text Representations

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens 'BERT' and 'ELMO' in publications refer to neural network architectures rather than persons. This type of…

Computation and Language · Computer Science 2019-11-25 Johannes Bjerva , Wouter Kouw , Isabelle Augenstein

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and…

Computation and Language · Computer Science 2020-10-06 Zihan Liu , Genta Indra Winata , Andrea Madotto , Pascale Fung

TSSuBERT: Tweet Stream Summarization Using BERT

The development of deep neural networks and the emergence of pre-trained language models such as BERT allow to increase performance on many NLP tasks. However, these models do not meet the same popularity for tweet summarization, which can…

Information Retrieval · Computer Science 2021-06-17 Alexis Dusart , Karen Pinel-Sauvagnat , Gilles Hubert

Fighting Redundancy and Model Decay with Embeddings

Every day, hundreds of millions of new Tweets containing over 40 languages of ever-shifting vernacular flow through Twitter. Models that attempt to extract insight from this firehose of information must face the torrential covariate shift…

Social and Information Networks · Computer Science 2018-09-21 Dan Shiebler , Luca Belli , Jay Baxter , Hanchen Xiong , Abhishek Tayal

Temporal Embeddings and Transformer Models for Narrative Text Understanding

We present two deep learning approaches to narrative text understanding for character relationship modelling. The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes…

Computation and Language · Computer Science 2020-03-20 Vani K , Simone Mellace , Alessandro Antonucci

TimeLMs: Diachronic Language Models from Twitter

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual…

Computation and Language · Computer Science 2022-04-04 Daniel Loureiro , Francesco Barbieri , Leonardo Neves , Luis Espinosa Anke , Jose Camacho-Collados

Sentiment analysis in tweets: an assessment study from classical to modern text representation models

With the growth of social medias, such as Twitter, plenty of user-generated data emerge daily. The short texts published on Twitter -- the tweets -- have earned significant attention as a rich source of information to guide many…

Artificial Intelligence · Computer Science 2021-06-01 Sérgio Barreto , Ricardo Moura , Jonnathan Carvalho , Aline Paes , Alexandre Plastino

Feature Selection Empowered BERT for Detection of Hate Speech with Vocabulary Augmentation

Abusive speech on social media poses a persistent and evolving challenge, driven by the continuous emergence of novel slang and obfuscated terms designed to circumvent detection systems. In this work, we present a data efficient strategy…

Computation and Language · Computer Science 2025-12-03 Pritish N. Desai , Tanay Kewalramani , Srimanta Mandal

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level…

Computation and Language · Computer Science 2021-09-13 Vladimir Araujo , Andrés Villa , Marcelo Mendoza , Marie-Francine Moens , Alvaro Soto

Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation

Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure,…

Computation and Language · Computer Science 2020-02-25 Yige Xu , Xipeng Qiu , Ligao Zhou , Xuanjing Huang

Neural Models for Offensive Language Detection

Offensive language detection is an ever-growing natural language processing (NLP) application. This growth is mainly because of the widespread usage of social networks, which becomes a mainstream channel for people to communicate, work, and…

Computation and Language · Computer Science 2021-06-29 Ehab Hamdy

Hashing it Out: Predicting Unhealthy Conversations on Twitter

Personal attacks in the context of social media conversations often lead to fast-paced derailment, leading to even more harmful exchanges being made. State-of-the-art systems for the detection of such conversational derailment often make…

Computation and Language · Computer Science 2023-11-20 Steven Leung , Filippos Papapolyzos

Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora

Pretrained language models (PTLMs) are typically learned over a large, static corpus and further fine-tuned for various downstream tasks. However, when deployed in the real world, a PTLM-based model must deal with data distributions that…

Computation and Language · Computer Science 2022-07-20 Xisen Jin , Dejiao Zhang , Henghui Zhu , Wei Xiao , Shang-Wen Li , Xiaokai Wei , Andrew Arnold , Xiang Ren

Out of vocabulary words decrease, running texts prevail and hashtags coalesce: Twitter as an evolving sociolinguistic system

Twitter is one of the most popular social media. Due to the ease of availability of data, Twitter is used significantly for research purposes. Twitter is known to evolve in many aspects from what it was at its birth; nevertheless, how it…

Social and Information Networks · Computer Science 2015-09-18 Suman Kalyan Maity , Bhadreswar Ghuku , Abhishek Upmanyu , Animesh Mukherjee

Evaluating BERT-based Pre-training Language Models for Detecting Misinformation

It is challenging to control the quality of online information due to the lack of supervision over all the information posted online. Manual checking is almost impossible given the vast number of posts made on online media and how quickly…

Computation and Language · Computer Science 2022-03-16 Rini Anggrainingsih , Ghulam Mubashar Hassan , Amitava Datta

RoBERTweet: A BERT Language Model for Romanian Tweets

Developing natural language processing (NLP) systems for social media analysis remains an important topic in artificial intelligence research. This article introduces RoBERTweet, the first Transformer architecture trained on Romanian…

Computation and Language · Computer Science 2023-06-13 Iulian-Marius Tăiatu , Andrei-Marius Avram , Dumitru-Clementin Cercel , Florin Pop