Empirical Evaluation of Multi-Source Monitoring Signal Effectiveness and Lead Time for Performance Degradation Prediction in Kubernetes-Based Microservices

Hao Cao; Liqun Long

doi:10.69987/JACS.2026.60402

Authors

Hao Cao Master of Computer Engineering, Stevens Institute of Technology, NJ, USA Author
Liqun Long Master of Business Administration (MBA), Hong Kong Baptist University, Hong Kong SAR, China Author

DOI:

https://doi.org/10.69987/JACS.2026.60402

Keywords:

microservice performance degradation, Kubernetes anomaly detection, multi-source monitoring signals, cloud-native early warning

Abstract

Cloud-native microservice architectures deployed on Kubernetes have become the backbone of mission-critical enterprise systems, including financial transaction platforms and real-time data processing pipelines. Detecting performance degradation before it escalates into user-facing incidents remains an open challenge, particularly when Horizontal Pod Autoscaler (HPA) dynamics introduce metric volatility that obscures genuine anomaly signals. This paper presents an empirical evaluation of multi-source monitoring signals—spanning infrastructure-level resource metrics, application-level latency indicators, distributed trace features, and structured log patterns—for their effectiveness and lead time in predicting performance degradation across Kubernetes-based microservice deployments. Through controlled fault injection experiments on the TrainTicket benchmark system running on Amazon EKS, we systematically measure the predictive lead time, detection precision, and false alarm rates of 14 distinct monitoring signals under six degradation scenarios with and without active HPA. Our results reveal that application-level signals, specifically database query latency percentile shifts and cache hit rate deviations, provide 2.3 to 4.7 minutes of advance warning before infrastructure metrics register anomalies. The findings also quantify a 23.6% increase in false positive rates attributable to HPA-induced pod scaling events, and we propose signal filtering heuristics that reduce this noise by 67.2%. These results carry direct implications for financial infrastructure resilience and SLA assurance in cloud-native environments.