Software Engineering · Computer Science
The Struggles of LLMs in Cross-lingual Code Clone Detection
Micheline Bénédicte Moumoula, Abdoul Kader Kabore, Jacques Klein, Tegawendé Bissyande
2025-05-07
Computation and Language · Computer Science
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations
Zhihui Xie, Handong Zhao, Tong Yu, Shuai Li
2024-01-12
Computer Vision and Pattern Recognition · Computer Science
Language-Agnostic Visual Embeddings for Cross-Script Handwriting Retrieval
Fangke Chen, Tianhao Dong, Sirry Chen, Guobin Zhang +2
2026-01-19
Computation and Language · Computer Science
Are Multilingual Models Effective in Code-Switching?
Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin +2
2021-03-25
Software Engineering · Computer Science
An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models
Zixiang Xian, Chenhui Cui, Rubing Huang, Chunrong Fang +1
2025-06-04
Computation and Language · Computer Science
On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning
Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
2020-03-04
Computation and Language · Computer Science
Exploring Alignment in Shared Cross-lingual Spaces
Basel Mousi, Nadir Durrani, Fahim Dalvi, Majd Hawasly +1
2024-05-24
Software Engineering · Computer Science
Beyond Language Boundaries: Uncovering Programming Language Families for Code Language Models
Shangbo Yun, Xiaodong Gu, Jianghong Huang, Beijun Shen
2025-12-23
Computation and Language · Computer Science
Inducing Language-Agnostic Multilingual Representations
Wei Zhao, Steffen Eger, Johannes Bjerva, Isabelle Augenstein
2021-06-22
Computation and Language · Computer Science
Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition
Genta Indra Winata, Zhaojiang Lin, Jamin Shin, Zihan Liu +1
2019-09-19
Machine Learning · Computer Science
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment
Aleksander Boruch-Gruszecki, Yangtian Zi, Zixuan Wu, Tejas Oberoi +3
2026-03-24
Computation and Language · Computer Science
Coding Triangle: How Does Large Language Model Understand Code?
Taolin Zhang, Zihan Ma, Maosong Cao, Junnan Liu +2
2025-07-09
Software Engineering · Computer Science
Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking
Zhuohao Li, Wenqing Chen, Jianxing Yu, Zhichao Lu
2025-08-28
Computation and Language · Computer Science
Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth
2016-06-09
Software Engineering · Computer Science
BERT2Code: Can Pretrained Language Models be Leveraged for Code Search?
Abdullah Al Ishtiaq, Masum Hasan, Md. Mahim Anjum Haque, Kazi Sajeed Mehrab +4
2021-04-19
Computation and Language · Computer Science
Meemi: A Simple Method for Post-processing and Integrating Cross-lingual Word Embeddings
Yerai Doval, Jose Camacho-Collados, Luis Espinosa-Anke, Steven Schockaert
2020-11-12
Computation and Language · Computer Science
Language Models are Universal Embedders
Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long +3
2025-05-23
Artificial Intelligence · Computer Science
Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity
Charles E. Gagnon, Steven H. H. Ding, Philippe Charland, Benjamin C. M. Fung
2025-09-30
Computation and Language · Computer Science
Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English
Injy Hamed, Moritz Zhu, Mohamed Elmahdy, Slim Abdennadher +1
2019-09-25
Computation and Language · Computer Science
Text and Code Embeddings by Contrastive Pre-Training
Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford +21
2022-01-26