Related papers: Traceability Support for Multi-Lingual Software Pr…

TraceLLM: Leveraging Large Language Models with Prompt Engineering for Enhanced Requirements Traceability

Requirements traceability, the process of establishing and maintaining relationships between requirements and various software development artifacts, is paramount for ensuring system integrity and fulfilling requirements throughout the…

Software Engineering · Computer Science 2026-05-25 Nouf Alturayeif , Irfan Ahmad , Jameleddine Hassine

Enhancing Automated Software Traceability by Transfer Learning from Open-World Data

Software requirements traceability is a critical component of the software engineering process, enabling activities such as requirements validation, compliance verification, and safety assurance. However, the cost and effort of manually…

Software Engineering · Computer Science 2022-07-05 Jinfeng Lin , Amrit Poudel , Wenhao Yu , Qingkai Zeng , Meng Jiang , Jane Cleland-Huang

Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue…

Software Engineering · Computer Science 2021-02-24 Jinfeng Lin , Yalin Liu , Qingkai Zeng , Meng Jiang , Jane Cleland-Huang

Semantically Enhanced Software Traceability Using Deep Learning Techniques

In most safety-critical domains the need for traceability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts, however, creating such links manually…

Software Engineering · Computer Science 2018-04-10 Jin Guo , Jinghui Cheng , Jane Cleland-Huang

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Improving Trace Link Recommendation by Using Non-Isotropic Distances and Combinations

The existence of trace links between artifacts of the software development life cycle can improve the efficiency of many activities during software development, maintenance and operations. Unfortunately, the creation and maintenance of…

Software Engineering · Computer Science 2023-07-18 Christof Tinnes

Evaluating the Use of LLMs for Documentation to Code Traceability

Large Language Models (LLMs) offer new potential for automating documentation-to-code traceability, yet their capabilities remain underexplored. We present a comprehensive evaluation of LLMs (Claude 3.5 Sonnet, GPT-4o, and o3-mini) in…

Software Engineering · Computer Science 2025-08-08 Ebube Alor , SayedHassan Khatoonabadi , Emad Shihab

Vector embedding of multi-modal texts: a tool for discovery?

Computer science texts are particularly rich in both narrative content and illustrative charts, algorithms, images, annotated diagrams, etc. This study explores the extent to which vector-based multimodal retrieval, powered by…

Information Retrieval · Computer Science 2025-09-11 Beth Plale , Sai Navya Jyesta , Sachith Withana

Prompts Matter: Insights and Strategies for Prompt Engineering in Automated Software Traceability

Large Language Models (LLMs) have the potential to revolutionize automated traceability by overcoming the challenges faced by previous methods and introducing new possibilities. However, the optimal utilization of LLMs for automated…

Software Engineering · Computer Science 2023-08-02 Alberto D. Rodriguez , Katherine R. Dearstyne , Jane Cleland-Huang

Do Visual-Language Grid Maps Capture Latent Semantics?

Visual-language models (VLMs) have recently been introduced in robotic mapping using the latent representations, i.e., embeddings, of the VLMs to represent semantics in the map. They allow moving from a limited set of human-created labels…

Robotics · Computer Science 2025-09-23 Matti Pekkanen , Tsvetomila Mihaylova , Francesco Verdoja , Ville Kyrki

Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection

Recent advancements in generative AI have led to the widespread adoption of large language models (LLMs) in software engineering, addressing numerous long-standing challenges. However, a comprehensive study examining the capabilities of…

Software Engineering · Computer Science 2025-03-04 Ting Zhang , Chengran Yang , Yindu Su , Martin Weyssow , Hung Nguyen , Tan Bui , Hong Jin Kang , Yikun Li , Eng Lieh Ouh , Lwin Khin Shar , David Lo

Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval

Contrastive learning has been successfully used for retrieval of semantically aligned sentences, but it often requires large batch sizes or careful engineering to work well. In this paper, we instead propose a generative model for learning…

Computation and Language · Computer Science 2023-06-06 John Wieting , Jonathan H. Clark , William W. Cohen , Graham Neubig , Taylor Berg-Kirkpatrick

R2Code: A Self-Reflective LLM Framework for Requirements-to-Code Traceability

Accurate requirement-to-code traceability is crucial for software maintenance. However, existing IR- and embedding-based methods are heavily dependent on lexical similarity, often yielding incomplete or inconsistent links across projects…

Software Engineering · Computer Science 2026-04-27 Yifei Wang , Jacky Keung , Xiaoxue Ma , Zhenyu Mao , Kehui Chen , Yishu Li

Towards General Continuous Memory for Vision-Language Models

Language models (LMs) and their extension, vision-language models (VLMs), have achieved remarkable performance across various tasks. However, they still struggle with complex reasoning tasks that require multimodal or multilingual…

Machine Learning · Computer Science 2025-07-09 Wenyi Wu , Zixuan Song , Kun Zhou , Yifei Shao , Zhiting Hu , Biwei Huang

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call multilingual neural language models, takes sentences of multiple languages as…

Computation and Language · Computer Science 2018-09-10 Takashi Wada , Tomoharu Iwata

Cross-lingual Transfer in Programming Languages: An Extensive Empirical Study

Large language models (LLMs) have achieved state-of-the-art performance in various software engineering tasks, including error detection, clone detection, and code translation, primarily leveraging high-resource programming languages like…

Computation and Language · Computer Science 2025-06-11 Razan Baltaji , Saurabh Pujar , Louis Mandel , Martin Hirzel , Luca Buratti , Lav Varshney

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding.…

Computation and Language · Computer Science 2023-07-06 Sonal Sannigrahi , Josef van Genabith , Cristina Espana-Bonet

A Literature Study of Embeddings on Source Code

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and…

Machine Learning · Computer Science 2019-04-08 Zimin Chen , Martin Monperrus

xVLM2Vec: Adapting LVLM-based embedding models to multilinguality using Self-Knowledge Distillation

In the current literature, most embedding models are based on the encoder-only transformer architecture to extract a dense and meaningful representation of the given input, which can be a text, an image, and more. With the recent advances…

Computation and Language · Computer Science 2025-03-18 Elio Musacchio , Lucia Siciliani , Pierpaolo Basile , Giovanni Semeraro

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

Sparse language vectors from linguistic typology databases and learned embeddings from tasks like multilingual machine translation have been investigated in isolation, without analysing how they could benefit from each other's language…

Computation and Language · Computer Science 2020-10-27 Arturo Oncevay , Barry Haddow , Alexandra Birch