Related papers: Big Bidirectional Insertion Representations for Do…

Graph-tree Fusion Model with Bidirectional Information Propagation for Long Document Classification

Long document classification presents challenges in capturing both local and global dependencies due to their extensive content and complex structure. Existing methods often struggle with token limits and fail to adequately model…

Computation and Language · Computer Science 2024-10-07 Sudipta Singha Roy , Xindi Wang , Robert E. Mercer , Frank Rudzicz

Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling

Transformer is important for text modeling. However, it has difficulty in handling long documents due to the quadratic complexity with input text length. In order to handle this problem, we propose a hierarchical interactive Transformer…

Computation and Language · Computer Science 2021-12-10 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang

Hierarchical Transformers for Long Document Classification

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its…

Computation and Language · Computer Science 2019-10-25 Raghavendra Pappagari , Piotr Żelasko , Jesús Villalba , Yishay Carmiel , Najim Dehak

Key Information Retrieval to Classify the Unstructured Data Content of Preferential Trade Agreements

With the rapid proliferation of textual data, predicting long texts has emerged as a significant challenge in the domain of natural language processing. Traditional text prediction methods encounter substantial difficulties when grappling…

Computation and Language · Computer Science 2024-01-24 Jiahui Zhao , Ziyi Meng , Stepan Gordeev , Zijie Pan , Dongjin Song , Sandro Steinbach , Caiwen Ding

Transformer based Multilingual document Embedding model

One of the current state-of-the-art multilingual document embedding model LASER is based on the bidirectional LSTM neural machine translation model. This paper presents a transformer-based sentence/document embedding model, T-LASER, which…

Computation and Language · Computer Science 2020-08-21 Wei Li , Brian Mak

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from Natural Language Understanding to Computer Vision. Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019.…

Computation and Language · Computer Science 2023-09-12 Thibault Douzon , Stefan Duffner , Christophe Garcia , Jérémy Espinas

Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong…

Computation and Language · Computer Science 2019-07-16 Marcin Junczys-Dowmunt

Pre-training Tasks for Embedding-based Large-scale Retrieval

We consider the large-scale query-document retrieval problem: given a query (e.g., a question), return the set of relevant documents (e.g., paragraphs containing the answer) from a large document corpus. This problem is often solved in two…

Machine Learning · Computer Science 2020-02-11 Wei-Cheng Chang , Felix X. Yu , Yin-Wen Chang , Yiming Yang , Sanjiv Kumar

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance…

Computation and Language · Computer Science 2023-02-09 Akshita Jha , Adithya Samavedhi , Vineeth Rakesh , Jaideep Chandrashekar , Chandan K. Reddy

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level. Recently large pre-trained language models such as…

Computation and Language · Computer Science 2021-06-08 Hongyu Gong , Vishrav Chaudhary , Yuqing Tang , Francisco Guzmán

Enhancing Document-level Translation of Large Language Model via Translation Mixed-instructions

Existing large language models (LLMs) for machine translation are typically fine-tuned on sentence-level translation instructions and achieve satisfactory performance at the sentence level. However, when applied to document-level…

Computation and Language · Computer Science 2024-01-17 Yachao Li , Junhui Li , Jing Jiang , Min Zhang

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena…

Computation and Language · Computer Science 2025-08-29 Miguel Moura Ramos , Patrick Fernandes , Sweta Agrawal , André F. T. Martins

Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Large language model (LLM)-based embedding models, benefiting from large scale pre-training and post-training, have begun to surpass BERT and T5-based models on general-purpose text embedding tasks such as document retrieval. However, a…

Computation and Language · Computer Science 2025-05-22 Siyue Zhang , Yilun Zhao , Liyuan Geng , Arman Cohan , Anh Tuan Luu , Chen Zhao

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level…

Computation and Language · Computer Science 2020-05-21 Arman Cohan , Sergey Feldman , Iz Beltagy , Doug Downey , Daniel S. Weld

BIRD: Behavior Induction via Representation-structure Distillation

Human-aligned deep learning models exhibit behaviors consistent with human values, such as robustness, fairness, and honesty. Transferring these behavioral properties to models trained on different tasks or data distributions remains…

Machine Learning · Computer Science 2025-06-02 Galen Pogoncheff , Michael Beyeler

BIG MOOD: Relating Transformers to Explicit Commonsense Knowledge

We introduce a simple yet effective method of integrating contextual embeddings with commonsense graph embeddings, dubbed BERT Infused Graphs: Matching Over Other embeDdings. First, we introduce a preprocessing method to improve the speed…

Computation and Language · Computer Science 2019-10-18 Jeff Da

Text Summarization with Pretrained Encoders

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how…

Computation and Language · Computer Science 2019-09-06 Yang Liu , Mirella Lapata

A text autoencoder from transformer for fast encoding language representation

In recent years BERT shows apparent advantages and great potential in natural language processing tasks. However, both training and applying BERT requires intensive time and resources for computing contextual language representations, which…

Computation and Language · Computer Science 2021-11-05 Tan Huang

Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when…

Computation and Language · Computer Science 2022-04-18 Yikuan Li , Ramsey M. Wehbe , Faraz S. Ahmad , Hanyin Wang , Yuan Luo

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such as BERT, on numerous natural language processing tasks has prompted researchers to attempt to incorporate these pre-trained models into neural machine translation…

Computation and Language · Computer Science 2021-09-13 Haoran Xu , Benjamin Van Durme , Kenton Murray