Related papers: Connecting and Comparing Language Model Interpolat…

Bayesian Reconstruction of Missing Observations

We focus on an interpolation method referred to Bayesian reconstruction in this paper. Whereas in standard interpolation methods missing data are interpolated deterministically, in Bayesian reconstruction, missing data are interpolated…

Machine Learning · Statistics 2015-03-27 Shun Kataoka , Muneki Yasuda , Kazuyuki Tanaka

Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval

The Bag-of-Words (BoW) representation is well applied to recent state-of-the-art image retrieval works. Typically, multiple vocabularies are generated to correct quantization artifacts and improve recall. However, this routine is corrupted…

Computer Vision and Pattern Recognition · Computer Science 2014-04-15 Liang Zheng , Shengjin Wang , Wengang Zhou , Qi Tian

An Attribute Interpolation Method in Speech Synthesis by Model Merging

With the development of speech synthesis, recent research has focused on challenging tasks, such as speaker generation and emotion intensity control. Attribute interpolation is a common approach to these tasks. However, most previous…

Sound · Computer Science 2024-07-02 Masato Murata , Koichi Miyazaki , Tomoki Koriyama

Revisiting Model Interpolation for Efficient Reasoning

Model merging, typically on Instruct and Thinking models, has shown remarkable performance for efficient reasoning. In this paper, we systematically revisit the simplest merging method that interpolates two weights directly. Particularly,…

Artificial Intelligence · Computer Science 2026-01-27 Taiqiang Wu , Runming Yang , Tao Liu , Jiahao Wang , Ngai Wong

Better Language Models with Model Merging

This paper investigates model merging, a technique for deriving Markov models from text or speech corpora. Models are derived by starting with a large and specific model and by successively combining states to build smaller and more general…

cmp-lg · Computer Science 2008-02-03 Thorsten Brants

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

Fine-tuning pre-trained models for downstream tasks is a widely adopted technique known for its adaptability and reliability across various domains. Despite its conceptual simplicity, fine-tuning entails several troublesome engineering…

Artificial Intelligence · Computer Science 2024-12-30 Chaeyun Jang , Hyungi Lee , Jungtaek Kim , Juho Lee

Model Merging in Pre-training of Large Language Models

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model…

Computation and Language · Computer Science 2025-05-23 Yunshui Li , Yiyuan Ma , Shen Yan , Chaoyi Zhang , Jing Liu , Jianqiao Lu , Ziwen Xu , Mengzhao Chen , Minrui Wang , Shiyi Zhan , Jin Ma , Xunhao Lai , Deyi Liu , Yao Luo , Xingyan Bin , Hongbin Ren , Mingji Han , Wenhao Hao , Bairen Yi , LingJun Liu , Bole Ma , Xiaoying Jia , Xun Zhou , Siyuan Qiao , Liang Xiang , Yonghui Wu

Is the Best Better? Bayesian Statistical Model Comparison for Natural Language Processing

Recent work raises concerns about the use of standard splits to compare natural language processing models. We propose a Bayesian statistical model comparison technique which uses k-fold cross-validation across multiple data sets to…

Computation and Language · Computer Science 2020-10-08 Piotr Szymański , Kyle Gorman

Bayes and empirical Bayes: do they merge?

Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an {\em empirical Bayes} approach,…

Statistics Theory · Mathematics 2012-04-09 Sonia Petrone , Judith Rousseau , Catia Scricciolo

Bayesian Model Merging

Model merging aims to combine multiple task-specific expert models into a single model without joint retraining, offering a practical alternative to multi-task learning when data access or computational budget is limited. Existing methods,…

Machine Learning · Computer Science 2026-05-14 Kaiyang Li , Shaobo Han , Qing Su , Shihao Ji

Confronting Quasi-Separation in Logistic Mixed Effects for Linguistic Data: A Bayesian Approach

Mixed effects regression models are widely used by language researchers. However, these regressions are implemented with an algorithm which may not converge on a solution. While convergence issues in linear mixed effects models can often be…

Applications · Statistics 2018-09-10 Amelia Kimball , Kailen Shantz , Christopher Eager , Joseph Roy

Extrapolation Merging: Keep Improving With Extrapolation and Merging

Large Language Models (LLMs) require instruction fine-tuning to perform different downstream tasks. However, the instruction fine-tuning phase still demands significant computational resources and labeled data, lacking a paradigm that can…

Computation and Language · Computer Science 2025-03-10 Yiguan Lin , Bin Xu , Yinghao Li , Yang Gao

The adaptive interpolation method: A simple scheme to prove replica formulas in Bayesian inference

In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged…

Information Theory · Computer Science 2018-10-30 Jean Barbier , Nicolas Macris

A Bit of Progress in Language Modeling

In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, interpolated Kneser-Ney smoothing, and clustering. We present…

Computation and Language · Computer Science 2007-05-23 Joshua Goodman

Bayesian Efficient Multiple Kernel Learning

Multiple kernel learning algorithms are proposed to combine kernels in order to obtain a better similarity measure or to integrate feature representations coming from different data sources. Most of the previous research on such methods is…

Machine Learning · Computer Science 2012-07-03 Mehmet Gonen

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of…

Computation and Language · Computer Science 2016-06-09 Shyam Upadhyay , Manaal Faruqui , Chris Dyer , Dan Roth

Leveraging Interpolation Models and Error Bounds for Verifiable Scientific Machine Learning

Effective verification and validation techniques for modern scientific machine learning workflows are challenging to devise. Statistical methods are abundant and easily deployed, but often rely on speculative assumptions about the data and…

Machine Learning · Computer Science 2025-02-11 Tyler Chang , Andrew Gillette , Romit Maulik

Multiple Bayesian Filtering as Message Passing

In this manuscript, a general method for deriving filtering algorithms that involve a network of interconnected Bayesian filters is proposed. This method is based on the idea that the processing accomplished inside each of the Bayesian…

Statistics Theory · Mathematics 2020-04-22 Giorgio M. Vitetta , Pasquale Di Viesti , Emilio Sirignano , Francesco Montorsi

Recent methods from statistical inference and machine learning to improve integrative modeling of macromolecular assemblies

Integrative modeling of macromolecular assemblies allows for structural characterization of large assemblies that are recalcitrant to direct experimental observation. A Bayesian inference approach facilitates combining data from…

Biomolecules · Quantitative Biology 2026-01-13 Shreyas Arvindekar , Kartik Majila , Shruthi Viswanath

Combining predictions from linear models when training and test inputs differ

Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the…

Methodology · Statistics 2014-06-25 Thijs van Ommen