Data-driven active learning approaches for accelerating materials discovery
Abstract
Materials discovery is a cornerstone of modern technological advancement, yet it remains constrained by traditional trial-and-error paradigms and the inherent bias of human intuition. Artificial intelligence (AI) has emerged as a transformative tool in materials science by effectively modeling structure-property relationships. Despite substantial efforts to enhance model expressiveness, data efficiency remains an equally critical challenge, given the limited availability of experimental and computational resources. Active learning (AL), as a data-driven machine learning paradigm, has shown great promise for discovering novel materials and enabling the efficient navigation of vast materials spaces. In this review, we follow the evolution of sampling strategy design techniques in AL, from Bayesian optimization to advanced deep learning-based strategies. We then highlight how AL enhances data efficiency across various data regimes, ranging from task-specific settings with limited data to the development of general-purpose datasets and large-scale models. We further provide a systematic overview of AL applications throughout the materials research pipeline, including computational simulation, composition and structural design, process optimization, and self-driving laboratory systems. Finally, we pinpoint key challenges and future perspectives of AL in materials discovery.
Keywords
Cite
@article{arxiv.2601.06971,
title = {Data-driven active learning approaches for accelerating materials discovery},
author = {Jiaxin Chen and Tianjiao Wan and Hui Geng and Liang Xiong and Guohong Wang and Yihan Zhao and Longxiang Deng and Zijian Gao and Susu Fang and Zheng Luo and Huaimin Wang and Shanshan Wang and Kele Xu},
journal= {arXiv preprint arXiv:2601.06971},
year = {2026}
}