English

Virtual Width Networks

Machine Learning 2025-11-18 v2 Artificial Intelligence

Abstract

We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.

Keywords

Cite

@article{arxiv.2511.11238,
  title  = {Virtual Width Networks},
  author = {Seed and Baisheng Li and Banggu Wu and Bole Ma and Bowen Xiao and Chaoyi Zhang and Cheng Li and Chengyi Wang and Chengyin Xu and Chi Zhang and Chong Hu and Daoguang Zan and Defa Zhu and Dongyu Xu and Du Li and Faming Wu and Fan Xia and Ge Zhang and Guang Shi and Haobin Chen and Hongyu Zhu and Hongzhi Huang and Huan Zhou and Huanzhang Dou and Jianhui Duan and Jianqiao Lu and Jianyu Jiang and Jiayi Xu and Jiecao Chen and Jin Chen and Jin Ma and Jing Su and Jingji Chen and Jun Wang and Jun Yuan and Juncai Liu and Jundong Zhou and Kai Hua and Kai Shen and Kai Xiang and Kaiyuan Chen and Kang Liu and Ke Shen and Liang Xiang and Lin Yan and Lishu Luo and Mengyao Zhang and Ming Ding and Mofan Zhang and Nianning Liang and Peng Li and Penghao Huang and Pengpeng Mu and Qi Huang and Qianli Ma and Qiyang Min and Qiying Yu and Renming Pang and Ru Zhang and Shen Yan and Shen Yan and Shixiong Zhao and Shuaishuai Cao and Shuang Wu and Siyan Chen and Siyu Li and Siyuan Qiao and Tao Sun and Tian Xin and Tiantian Fan and Ting Huang and Ting-Han Fan and Wei Jia and Wenqiang Zhang and Wenxuan Liu and Xiangzhong Wu and Xiaochen Zuo and Xiaoying Jia and Ximing Yang and Xin Liu and Xin Yu and Xingyan Bin and Xintong Hao and Xiongcai Luo and Xujing Li and Xun Zhou and Yanghua Peng and Yangrui Chen and Yi Lin and Yichong Leng and Yinghao Li and Yingshuan Song and Yiyuan Ma and Yong Shan and Yongan Xiang and Yonghui Wu and Yongtao Zhang and Yongzhen Yao and Yu Bao and Yuehang Yang and Yufeng Yuan and Yunshui Li and Yuqiao Xian and Yutao Zeng and Yuxuan Wang and Zehua Hong and Zehua Wang and Zengzhi Wang and Zeyu Yang and Zhengqiang Yin and Zhenyi Lu and Zhexi Zhang and Zhi Chen and Zhi Zhang and Zhiqi Lin and Zihao Huang and Zilin Xu and Ziyun Wei and Zuo Wang},
  journal= {arXiv preprint arXiv:2511.11238},
  year   = {2025}
}