Related papers: How do Machine Learning Models Change?

Analyzing the Evolution and Maintenance of ML Models on Hugging Face

Hugging Face (HF) has established itself as a crucial platform for the development and sharing of machine learning (ML) models. This repository mining study, which delves into more than 380,000 models using data gathered via the HF Hub API,…

Software Engineering · Computer Science 2024-02-16 Joel Castaño , Silverio Martínez-Fernández , Xavier Franch , Justus Bogner

Lessons Learned from Mining the Hugging Face Repository

The rapidly evolving fields of Machine Learning (ML) and Artificial Intelligence have witnessed the emergence of platforms like Hugging Face (HF) as central hubs for model development and sharing. This experience report synthesizes insights…

Software Engineering · Computer Science 2024-02-13 Joel Castaño , Silverio Martínez-Fernández , Xavier Franch

Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face

Many have observed that the development and deployment of generative machine learning (ML) and artificial intelligence (AI) models follow a distinctive pattern in which pre-trained models are adapted and fine-tuned for specific downstream…

Social and Information Networks · Computer Science 2025-08-12 Benjamin Laufer , Hamidah Oderinwale , Jon Kleinberg

An Empirical Analysis of Machine Learning Model and Dataset Documentation, Supply Chain, and Licensing Challenges on Hugging Face

The last decade has seen widespread adoption of Machine Learning (ML) components in software systems. This has occurred in nearly every domain, from natural language processing to computer vision. These ML components range from relatively…

Software Engineering · Computer Science 2025-09-30 Trevor Stalnaker , Nathan Wintersgill , Oscar Chaparro , Laura A. Heymann , Massimiliano Di Penta , Daniel M German , Denys Poshyvanyk

Exploring the Carbon Footprint of Hugging Face's ML Models: A Repository Mining Study

The rise of machine learning (ML) systems has exacerbated their carbon footprint due to increased capabilities and model sizes. However, there is scarce knowledge on how the carbon footprint of ML models is actually measured, reported, and…

Machine Learning · Computer Science 2023-12-01 Joel Castaño , Silverio Martínez-Fernández , Xavier Franch , Justus Bogner

The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub

Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a…

Software Engineering · Computer Science 2024-06-25 Cailean Osborne , Jennifer Ding , Hannah Rose Kirk

Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face

The proliferation of open Pre-trained Language Models (PTLMs) on model registry platforms like Hugging Face (HF) presents both opportunities and challenges for companies building products around them. Similar to traditional software…

Software Engineering · Computer Science 2025-02-20 Adekunle Ajibode , Abdul Ali Bangash , Filipe Roseiro Cogo , Bram Adams , Ahmed E. Hassan

On the synchronization between Hugging Face pre-trained language models and their upstream GitHub repository

Pre-trained language models (PTLMs) have transformed natural language processing (NLP), enabling major advances in tasks such as text generation and translation. Similar to software package management, PTLMs are developed using code and…

Software Engineering · Computer Science 2026-01-27 Adekunle Ajibode , Abdul Ali Bangash , Oussama Ben Sghaier , Bram Adams , Ahmed E. Hassan

An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face

As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministic and run-time-defined semantics of ML complicate…

Software Engineering · Computer Science 2025-12-10 Nan Jia , Anita Raja , Raffi Khatchadourian

Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation

With the massive surge in ML models on platforms like Hugging Face, users often lose track and struggle to choose the best model for their downstream tasks, frequently relying on model popularity indicated by download counts, likes, or…

Computation and Language · Computer Science 2025-04-08 Pritam Kadasi , Sriman Reddy Kondam , Srivathsa Vamsi Chaturvedula , Rudranshu Sen , Agnish Saha , Soumavo Sikdar , Sayani Sarkar , Suhani Mittal , Rohit Jindal , Mayank Singh

Identifying Disruptive Models in the Open-Source LLM Community

The rapid growth of open-source large language models (LLMs) has created a complex ecosystem of model inheritance and reuse. However, existing research has focused mainly on descriptive analyses of lineage evolution, with limited attention…

Social and Information Networks · Computer Science 2026-04-14 Xiaoting Wei , Lele Kang , Xuelian Pan , Jiannan Yang

Towards a Change Taxonomy for Machine Learning Systems

Machine Learning (ML) research publications commonly provide open-source implementations on GitHub, allowing their audience to replicate, validate, or even extend machine learning algorithms, data sets, and metadata. However, thus far…

Software Engineering · Computer Science 2022-12-14 Aaditya Bhatia , Ellis E. Eghan , Manel Grichi , William G. Cavanagh , Zhen Ming , Jiang , Bram Adams

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June 2020-August 2025) alongside model metadata, we…

Computers and Society · Computer Science 2025-12-04 Shayne Longpre , Christopher Akiki , Campbell Lund , Atharva Kulkarni , Emily Chen , Irene Solaiman , Avijit Ghosh , Yacine Jernite , Lucie-Aimée Kaffee

The State of Documentation Practices of Third-party Machine Learning Models and Datasets

Model stores offer third-party ML models and datasets for easy project integration, minimizing coding efforts. One might hope to find detailed specifications of these models and datasets in the documentation, leveraging documentation…

Software Engineering · Computer Science 2024-06-19 Ernesto Lang Oreamuno , Rohan Faiyaz Khan , Abdul Ali Bangash , Catherine Stinson , Bram Adams

New Tools are Needed for Tracking Adherence to AI Model Behavioral Use Clauses

Foundation models have had a transformative impact on AI. A combination of large investments in research and development, growing sources of digital data for training, and architectures that scale with data and compute has led to models…

Computers and Society · Computer Science 2025-05-29 Daniel McDuff , Tim Korjakow , Kevin Klyman , Danish Contractor

Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging Face

Advances in machine learning are closely tied to the creation of datasets. While data documentation is widely recognized as essential to the reliability, reproducibility, and transparency of ML, we lack a systematic empirical understanding…

Machine Learning · Computer Science 2024-01-26 Xinyu Yang , Weixin Liang , James Zou

On the Suitability of Hugging Face Hub for Empirical Studies

Background. The development of empirical studies in software engineering mainly relies on the data available on code hosting platforms, being GitHub the most representative. Nevertheless, in the last years, the emergence of Machine Learning…

Software Engineering · Computer Science 2023-07-28 Adem Ait , Javier Luis Cánovas Izquierdo , Jordi Cabot

A Large-Scale Study on the Development and Issues of Multi-Agent AI Systems

The rapid emergence of multi-agent AI systems (MAS), including LangChain, CrewAI, and AutoGen, has shaped how large language model (LLM) applications are developed and orchestrated. However, little is known about how these systems evolve…

Software Engineering · Computer Science 2026-01-13 Daniel Liu , Krishna Upadhyay , Vinaik Chhetri , A. B. Siddique , Umar Farooq

Variability-Aware Machine Learning Model Selection: Feature Modeling, Instantiation, and Experimental Case Study

The emergence of machine learning (ML) has led to a transformative shift in software techniques and guidelines for building software applications that support data analysis process activities such as data ingestion, modeling, and…

Software Engineering · Computer Science 2025-01-03 Cristina Tavares , Nathalia Nascimento , Paulo Alencar , Donald Cowan

An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face

Large language models (LLMs) have rapidly evolved from general-purpose systems to multimodal models capable of processing text, images, and audio. As both general-purpose LLMs (GLLMs) and multimodal LLMs (MLLMs) gain widespread adoption,…

Software Engineering · Computer Science 2026-04-08 Yujian Liu , Xiao Yu , Jacky Keung , Xing Hu , Xin Xia , Xiaoxue Ma