English

AIDev: Studying AI Coding Agents on GitHub

Software Engineering 2026-02-11 v1 Artificial Intelligence

Abstract

AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in real-world GitHub repositories. AIDev aggregates 932,791 Agentic-PRs produced by five agents: OpenAI Codex, Devin, GitHub Copilot, Cursor, and Claude Code. These PRs span 116,211 repositories and involve 72,189 developers. In addition, AIDev includes a curated subset of 33,596 Agentic-PRs from 2,807 repositories with over 100 stars, providing further information such as comments, reviews, commits, and related issues. This dataset offers a foundation for future research on AI adoption, developer productivity, and human-AI collaboration in the new era of software engineering. > AI Agent, Agentic AI, Coding Agent, Agentic Coding, Agentic Software Engineering, Agentic Engineering

Cite

@article{arxiv.2602.09185,
  title  = {AIDev: Studying AI Coding Agents on GitHub},
  author = {Hao Li and Haoxiang Zhang and Ahmed E. Hassan},
  journal= {arXiv preprint arXiv:2602.09185},
  year   = {2026}
}