TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Chong Wang; Jian Zhang; Yiling Lou; Mingwei Liu; Weisong Sun; Yang Liu; Xin Peng

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Software Engineering 2024-08-14 v3

Authors: Chong Wang , Jian Zhang , Yiling Lou , Mingwei Liu , Weisong Sun , Yang Liu , Xin Peng

Abstract

Python's dynamic typing system offers flexibility and expressiveness but can lead to type-related errors, prompting the need for automated type inference to enhance type hinting. While existing learning-based approaches show promising inference accuracy, they struggle with practical challenges in comprehensively handling various types, including complex generic types and (unseen) user-defined types. In this paper, we introduce TIGER, a two-stage generating-then-ranking (GTR) framework, designed to effectively handle Python's diverse type categories. TIGER leverages fine-tuned pre-trained code models to train a generative model with a span masking objective and a similarity model with a contrastive training objective. This approach allows TIGER to generate a wide range of type candidates, including complex generics in the generating stage, and accurately rank them with user-defined types in the ranking stage. Our evaluation on the ManyTypes4Py dataset shows TIGER's advantage over existing methods in various type categories, notably improving accuracy in inferring user-defined and unseen types by 11.2% and 20.1% respectively in Top-5 Exact Match. Moreover, the experimental results not only demonstrate TIGER's superior performance and efficiency, but also underscore the significance of its generating and ranking stages in enhancing automated type inference.

Keywords

text generation software library generative adversarial network

Cite

@article{arxiv.2407.02095,
  title  = {TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference},
  author = {Chong Wang and Jian Zhang and Yiling Lou and Mingwei Liu and Weisong Sun and Yang Liu and Xin Peng},
  journal= {arXiv preprint arXiv:2407.02095},
  year   = {2024}
}

Comments

Accepted by ICSE'25

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference

Abstract

Keywords

Cite

Comments

Related papers