English

Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language Models

Computation and Language 2025-02-19 v1

Abstract

In this study, we explore the application of Large Language Models (LLMs) for generating synthetic users and simulating user conversations with a task-oriented dialogue system and present detailed results and their analysis. We propose a comprehensive novel approach to user simulation technique that uses LLMs to create diverse user profiles, set goals, engage in multi-turn dialogues, and evaluate the conversation success. We employ two proprietary LLMs, namely GPT-4o and GPT-o1 (Achiam et al., 2023), to generate a heterogeneous base of user profiles, characterized by varied demographics, multiple user goals, different conversational styles, initial knowledge levels, interests, and conversational objectives. We perform a detailed analysis of the user profiles generated by LLMs to assess the diversity, consistency, and potential biases inherent in these LLM-generated user simulations. We find that GPT-o1 generates more heterogeneous user distribution across most user attributes, while GPT-4o generates more skewed user attributes. The generated set of user profiles are then utilized to simulate dialogue sessions by interacting with a task-oriented dialogue system.

Keywords

Cite

@article{arxiv.2502.12813,
  title  = {Simulating User Diversity in Task-Oriented Dialogue Systems using Large Language Models},
  author = {Adnan Ahmad and Stefan Hillmann and Sebastian Möller},
  journal= {arXiv preprint arXiv:2502.12813},
  year   = {2025}
}
R2 v1 2026-06-28T21:48:40.854Z