Data Race Detection Using Large Language Models

Le Chen; Xianzhong Ding; Murali Emani; Tristan Vanderbruggen; Pei-hung Lin; Chuanhua Liao

doi:10.1145/3624062.3624088

Data Race Detection Using Large Language Models

Machine Learning 2023-11-28 v2 Computation and Language

Authors: Le Chen , Xianzhong Ding , Murali Emani , Tristan Vanderbruggen , Pei-hung Lin , Chuanhua Liao

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races.

Keywords

large language model instruction tuning large language model evaluation

Cite

@article{arxiv.2308.07505,
  title  = {Data Race Detection Using Large Language Models},
  author = {Le Chen and Xianzhong Ding and Murali Emani and Tristan Vanderbruggen and Pei-hung Lin and Chuanhua Liao},
  journal= {arXiv preprint arXiv:2308.07505},
  year   = {2023}
}

Data Race Detection Using Large Language Models

Abstract

Keywords

Cite

Related papers