Code Digital Twin: A Knowledge Infrastructure for AI-Assisted Complex Software Development

Xin Peng; Chong Wang

Code Digital Twin: A Knowledge Infrastructure for AI-Assisted Complex Software Development

Software Engineering 2026-02-03 v4

Authors: Xin Peng , Chong Wang

Abstract

Recent advances in AI coding tools powered by large language models (LLMs) have shown strong capabilities in software engineering tasks, raising expectations of major productivity gains. Tools such as Cursor and Claude Code have popularized "vibe coding" (where developers steer development through high-level intent), commonly relying on context engineering and Retrieval-Augmented Generation (RAG) to ground generation in a codebase. However, these paradigms struggle in ultra-complex enterprise systems, where software evolves incrementally under pervasive design constraints and depends on tacit knowledge such as responsibilities, intent, and decision rationales distributed across code, configurations, discussions, and version history. In this environment, context engineering faces a fundamental barrier: the required context is scattered across artifacts and entangled across time, beyond the capacity of LLMs to reliably capture, prioritize, and fuse evidence into correct and trustworthy decisions, even as context windows grow. To bridge this gap, we propose the Code Digital Twin, a persistent and evolving knowledge infrastructure built on the codebase. It separates long-term knowledge engineering from task-time context engineering and serves as a backend "context engine" for AI coding assistants. The Code Digital Twin models both the physical and conceptual layers of software and co-evolves with the system. By integrating hybrid knowledge representations, multi-stage extraction pipelines, incremental updates, AI-empowered applications, and human-in-the-loop feedback, it transforms fragmented knowledge into explicit and actionable representations, providing a roadmap toward sustainable and resilient development and evolution of ultra-complex systems.

Keywords

digital twin code generation artificial intelligence

Cite

@article{arxiv.2503.07967,
  title  = {Code Digital Twin: A Knowledge Infrastructure for AI-Assisted Complex Software Development},
  author = {Xin Peng and Chong Wang},
  journal= {arXiv preprint arXiv:2503.07967},
  year   = {2026}
}

Comments

A vision paper that will be continuously updated

Code Digital Twin: A Knowledge Infrastructure for AI-Assisted Complex Software Development

Abstract

Keywords

Cite

Comments

Related papers