Authors:
Subash Banala
Addresses:
1Department of Financial Services, Capgemini, Texas, United States of America. banala.subash@gmail.com1
Integrating Generative AI into Site Reliability Engineering (SRE) marks a transformative evolution in maintaining and enhancing complex systems’ reliability, scalability, and efficiency. This paper explores the synergistic potential of Generative AI in SRE, focusing on predictive maintenance, automated incident response, and dynamic resource management. Our methodology involves a mixed-method approach, combining quantitative data from real-world case studies with qualitative insights from industry experts. The datasets include system logs, performance metrics, incident reports, and resource allocation records from organizations implementing AI-driven SRE solutions. Statistical analysis software and thematic analysis techniques were employed to validate findings and derive insights. The results demonstrate significant improvements in system uptime, reduced mean time to recovery (MTTR), and optimized resource allocation. This study concludes that Generative AI is not just an enhancement but a necessity for future-proofing SRE practices, offering a blueprint for successful integration. We discuss the implications, limitations, and future directions for research in this rapidly evolving field.
Keywords: Generative AI; Site Reliability Engineering (SRE); Predictive Maintenance; Automated Incident Response; Resource Management; Software Engineering; IT Operations; AI-driven SRE Solutions.
Received on: 12/10/2023, Revised on: 05/12/2023, Accepted on: 30/12/2023, Published on: 07/03/2024
DOI: 10.69888/FTSCL.2024.000178
FMDB Transactions on Sustainable Computer Letters, 2024 Vol. 2 No. 1, Pages: 14-25