Related papers: CDLM: Cross-Document Language Modeling

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key…

Computation and Language · Computer Science 2023-06-13 Yan Zeng , Wangchunshu Zhou , Ao Luo , Ziming Cheng , Xinsong Zhang

XDLM: Cross-lingual Diffusion Language Model for Machine Translation

Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is…

Computation and Language · Computer Science 2023-08-01 Linyao Chen , Aosong Feng , Boming Yang , Zihui Li

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document…

Computation and Language · Computer Science 2023-05-25 Avi Caciularu , Matthew E. Peters , Jacob Goldberger , Ido Dagan , Arman Cohan

Efficient Training for Cross-lingual Speech Language Models

Currently, large language models (LLMs) predominantly focus on the text modality. To enable more natural human-AI interaction, speech LLMs are emerging, but building effective end-to-end speech LLMs remains challenging due to limited data…

Computation and Language · Computer Science 2026-04-14 Yan Zhou , Qingkai Fang , Yun Hong , Yang Feng

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better…

Computation and Language · Computer Science 2021-04-08 Zewen Chi , Li Dong , Furu Wei , Nan Yang , Saksham Singhal , Wenhui Wang , Xia Song , Xian-Ling Mao , Heyan Huang , Ming Zhou

Efficient Pre-training of Masked Language Model via Concept-based Curriculum Masking

Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to…

Computation and Language · Computer Science 2022-12-16 Mingyu Lee , Jun-Hyung Park , Junho Kim , Kang-Min Kim , SangKeun Lee

CDLM: Consistency Diffusion Language Models For Faster Sampling

Diffusion Language Models (DLMs) offer a promising parallel generation paradigm but suffer from slow inference due to numerous refinement steps and the inability to use standard KV caching. We introduce CDLM (Consistency Diffusion Language…

Machine Learning · Computer Science 2026-02-23 Minseo Kim , Chenfeng Xu , Coleman Hooper , Harman Singh , Ben Athiwaratkun , Ce Zhang , Kurt Keutzer , Amir Gholami

Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task.…

Computation and Language · Computer Science 2021-09-14 Zewen Chi , Li Dong , Bo Zheng , Shaohan Huang , Xian-Ling Mao , Heyan Huang , Furu Wei

DocLLM: A layout-aware generative language model for multimodal document understanding

Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a…

Computation and Language · Computer Science 2024-01-03 Dongsheng Wang , Natraj Raman , Mathieu Sibue , Zhiqiang Ma , Petr Babkin , Simerjot Kaur , Yulong Pei , Armineh Nourbakhsh , Xiaomo Liu

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding

This paper proposes LayoutLLM, a more flexible document analysis method for understanding imaged documents. Visually Rich Document Understanding tasks, such as document image classification and information extraction, have gained…

Computation and Language · Computer Science 2024-03-22 Masato Fujitake

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Computation and Language · Computer Science 2024-06-25 Weijia Shi , Sewon Min , Maria Lomeli , Chunting Zhou , Margaret Li , Gergely Szilvasy , Rich James , Xi Victoria Lin , Noah A. Smith , Luke Zettlemoyer , Scott Yih , Mike Lewis

Document Context Language Models

Text documents are structured on multiple levels of detail: individual words are related by syntax, but larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but…

Computation and Language · Computer Science 2016-02-23 Yangfeng Ji , Trevor Cohn , Lingpeng Kong , Chris Dyer , Jacob Eisenstein

DOCmT5: Document-Level Pretraining of Multilingual Language Models

In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained with large scale parallel documents. While previous approaches have focused on leveraging sentence-level parallel data, we try to build a…

Computation and Language · Computer Science 2022-05-06 Chia-Hsuan Lee , Aditya Siddhant , Viresh Ratnakar , Melvin Johnson

FltLM: An Intergrated Long-Context Large Language Model for Effective Context Filtering and Understanding

The development of Long-Context Large Language Models (LLMs) has markedly advanced natural language processing by facilitating the process of textual data across long documents and multiple corpora. However, Long-Context LLMs still face two…

Computation and Language · Computer Science 2024-10-10 Jingyang Deng , Zhengyang Shen , Boyang Wang , Lixin Su , Suqi Cheng , Ying Nie , Junfeng Wang , Dawei Yin , Jinwen Ma

Data Efficient Masked Language Modeling for Vision and Language

Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In…

Computation and Language · Computer Science 2021-09-07 Yonatan Bitton , Gabriel Stanovsky , Michael Elhadad , Roy Schwartz

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number…

Computation and Language · Computer Science 2022-03-14 Junlong Li , Yiheng Xu , Lei Cui , Furu Wei

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

In this paper, we propose CLMSM, a domain-specific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives - a) Contrastive Learning using…

Computation and Language · Computer Science 2023-10-24 Abhilash Nandy , Manav Nitin Kapadnis , Pawan Goyal , Niloy Ganguly

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Text-rich document understanding (TDU) requires comprehensive analysis of documents containing substantial textual content and complex layouts. While Multimodal Large Language Models (MLLMs) have achieved fast progress in this domain,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Wenhui Liao , Jiapeng Wang , Hongliang Li , Chengyu Wang , Jun Huang , Lianwen Jin

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification

Multimodal learning from document data has achieved great success lately as it allows to pre-train semantically meaningful features as a prior into a learnable downstream task. In this paper, we approach the document classification problem…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Souhail Bakkali , Zuheng Ming , Mickael Coustaty , Marçal Rusiñol , Oriol Ramos Terrades

Multilingual Contextualization of Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have demonstrated strong performance in sentence-level machine translation, but scaling to document-level translation remains challenging, particularly in modeling long-range dependencies and discourse phenomena…

Computation and Language · Computer Science 2025-08-29 Miguel Moura Ramos , Patrick Fernandes , Sweta Agrawal , André F. T. Martins