Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning
Abstract
Recent studies have shown that code-switching data (CSD), in which multiple languages are mixed within the same context, can improve cross-lingual transfer and multilingual alignment in large language models (LLMs). However, existing studies primarily focus on bilingual transfer between English and a target language, leaving multilingual settings involving three or more languages largely unexplored. In this work, we investigate multilingual code-switching instruction tuning across four languages: English, Japanese, Korean, and Chinese. We evaluate multilingual understanding on Belebele. Our experiments show that simple sentence-level multilingual CSD consistently improves average multilingual performance across all four languages, indicating that multilingual code-switching can be effective beyond bilingual transfer settings.
Cite
@article{arxiv.2605.29414,
title = {Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning},
author = {Shunta Asano and Jeonghun Baek and Toshihiko Yamasaki},
journal= {arXiv preprint arXiv:2605.29414},
year = {2026}
}