Related papers: Regularizing Transformers With Deep Probabilistic …

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional…

Computation and Language · Computer Science 2019-05-28 Jacob Devlin , Ming-Wei Chang , Kenton Lee , Kristina Toutanova

Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet

Recent years have seen a proliferation of attention mechanisms and the rise of Transformers in Natural Language Generation (NLG). Previously, state-of-the-art NLG architectures such as RNN and LSTM ran into vanishing gradient problems; as…

Computation and Language · Computer Science 2021-02-17 M. Onat Topal , Anil Bas , Imke van Heerden

Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection

It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints. However, the information needed to solve a given task can vary,…

Computation and Language · Computer Science 2019-04-17 Masahiro Kaneko , Mamoru Komachi

Syntax-Infused Transformer and BERT models for Machine Translation and Natural Language Understanding

Attention-based models have shown significant improvement over traditional algorithms in several NLP tasks. The Transformer, for instance, is an illustrative example that generates abstract representations of tokens inputted to an encoder…

Computation and Language · Computer Science 2019-11-15 Dhanasekar Sundararaman , Vivek Subramanian , Guoyin Wang , Shijing Si , Dinghan Shen , Dong Wang , Lawrence Carin

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models

Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in…

Machine Learning · Computer Science 2025-12-22 Mengdan Zhu , Raasikh Kanjiani , Jiahui Lu , Andrew Choi , Qirui Ye , Liang Zhao

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

How is BERT surprised? Layerwise detection of linguistic anomalies

Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density…

Computation and Language · Computer Science 2021-05-18 Bai Li , Zining Zhu , Guillaume Thomas , Yang Xu , Frank Rudzicz

Differentiable Gaussianization Layers for Inverse Problems Regularized by Deep Generative Models

Deep generative models such as GANs, normalizing flows, and diffusion models are powerful regularizers for inverse problems. They exhibit great potential for helping reduce ill-posedness and attain high-quality results. However, the latent…

Computer Vision and Pattern Recognition · Computer Science 2024-07-30 Dongzhuo Li

LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces

Deep generative neural networks, such as Variational AutoEncoders (VAEs), offer an opportunity to better understand and control language models from the perspective of sentence-level latent spaces. To combine the controllability of VAE…

Computation and Language · Computer Science 2023-12-21 Yingji Zhang , Danilo S. Carvalho , Ian Pratt-Hartmann , André Freitas

Training Deeper Neural Machine Translation Models with Transparent Attention

While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we…

Computation and Language · Computer Science 2018-09-06 Ankur Bapna , Mia Xu Chen , Orhan Firat , Yuan Cao , Yonghui Wu

A Combined Encoder and Transformer Approach for Coherent and High-Quality Text Generation

This research introduces a novel text generation model that combines BERT's semantic interpretation strengths with GPT-4's generative capabilities, establishing a high standard in generating coherent, contextually accurate language. Through…

Computation and Language · Computer Science 2024-11-20 Jiajing Chen , Shuo Wang , Zhen Qi , Zhenhong Zhang , Chihang Wang , Hongye Zheng

Multimodal Latent Language Modeling with Next-Token Diffusion

Multimodal generative models require a unified approach to handle both discrete data (e.g., text and code) and continuous data (e.g., image, audio, video). In this work, we propose Latent Language Modeling (LatentLM), which seamlessly…

Computation and Language · Computer Science 2024-12-12 Yutao Sun , Hangbo Bao , Wenhui Wang , Zhiliang Peng , Li Dong , Shaohan Huang , Jianyong Wang , Furu Wei

mu-Forcing: Training Variational Recurrent Autoencoders for Text Generation

It has been previously observed that training Variational Recurrent Autoencoders (VRAE) for text generation suffers from serious uninformative latent variables problem. The model would collapse into a plain language model that totally…

Computation and Language · Computer Science 2019-11-20 Dayiheng Liu , Xu Yang , Feng He , Yuanyuan Chen , Jiancheng Lv

Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR

Recently Deep Transformer models have proven to be particularly powerful in language modeling tasks for ASR. Their high complexity, however, makes them very difficult to apply in the first (single) pass of an online system. Recent studies…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Balázs Tarján , György Szaszák , Tibor Fegyó , Péter Mihajlik

Enhanced Transformer Architecture for Natural Language Processing

Transformer is a state-of-the-art model in the field of natural language processing (NLP). Current NLP models primarily increase the number of transformers to improve processing performance. However, this technique requires a lot of…

Computation and Language · Computer Science 2023-10-18 Woohyeon Moon , Taeyoung Kim , Bumgeun Park , Dongsoo Har

Efficient Sparsely Activated Transformers

Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains including natural language processing and computer vision. To further improve their accuracy, recent work has explored…

Machine Learning · Computer Science 2022-09-01 Salar Latifi , Saurav Muralidharan , Michael Garland

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and…

Computation and Language · Computer Science 2021-06-11 Ivan Chelombiev , Daniel Justus , Douglas Orr , Anastasia Dietrich , Frithjof Gressmann , Alexandros Koliousis , Carlo Luschi

Data Augmentation using Pre-trained Transformer Models

Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of transformer based pre-trained models such as auto-regressive models (GPT-2),…

Computation and Language · Computer Science 2021-02-02 Varun Kumar , Ashutosh Choudhary , Eunah Cho

Explainable Verbal Deception Detection using Transformers

People are regularly confronted with potentially deceptive statements (e.g., fake news, misleading product reviews, or lies about activities). Only few works on automated text-based deception detection have exploited the potential of deep…

Computation and Language · Computer Science 2022-10-07 Loukas Ilias , Felix Soldner , Bennett Kleinberg

Enhancing Grammatical Error Detection using BERT with Cleaned Lang-8 Dataset

This paper presents an improved LLM based model for Grammatical Error Detection (GED), which is a very challenging and equally important problem for many applications. The traditional approach to GED involved hand-designed features, but…

Computation and Language · Computer Science 2024-11-26 Rahul Nihalani , Kushal Shah