Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps
Abstract
Cloud Operations (CloudOps) is a rapidly growing field focused on the automated management and optimization of cloud infrastructure which is essential for organizations navigating increasingly complex cloud environments. MontyCloud Inc. is one of the major companies in the CloudOps domain that leverages autonomous bots to manage cloud compliance, security, and continuous operations. To make the platform more accessible and effective to the customers, we leveraged the use of GenAI. Developing a GenAI-based solution for autonomous CloudOps for the existing MontyCloud system presented us with various challenges such as i) diverse data sources; ii) orchestration of multiple processes; and iii) handling complex workflows to automate routine tasks. To this end, we developed MOYA, a multi-agent framework that leverages GenAI and balances autonomy with the necessary human control. This framework integrates various internal and external systems and is optimized for factors like task orchestration, security, and error mitigation while producing accurate, reliable, and relevant insights by utilizing Retrieval Augmented Generation (RAG). Evaluations of our multi-agent system with the help of practitioners as well as using automated checks demonstrate enhanced accuracy, responsiveness, and effectiveness over non-agentic approaches across complex workflows.
Keywords
Cite
@article{arxiv.2501.08243,
title = {Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps},
author = {Kannan Parthasarathy and Karthik Vaidhyanathan and Rudra Dhar and Venkat Krishnamachari and Basil Muhammed and Adyansh Kakran and Sreemaee Akshathala and Shrikara Arun and Sumant Dubey and Mohan Veerubhotla and Amey Karan},
journal= {arXiv preprint arXiv:2501.08243},
year = {2025}
}
Comments
The paper has been accepted as full paper to CAIN 2025 (https://conf.researchr.org/home/cain-2025), co-located with ICSE 2025 (https://conf.researchr.org/home/icse-2025). The paper was submitted to CAIN for review on 9 November 2024