Optimizing Latency-Sensitive AI Applications Through Edge-Cloud Collaboration
DOI:
https://doi.org/10.69987/JACS.2023.30303Keywords:
Edge-cloud collaboration, Latency optimization, Adaptive workload partitioning, Resource allocationAbstract
This paper presents a novel framework for optimizing latency-sensitive AI applications through intelligent edge-cloud collaboration. The proposed approach addresses critical challenges in deploying computationally intensive AI workloads across distributed computing environments while meeting stringent timing requirements. The framework introduces an adaptive workload partitioning mechanism that dynamically distributes computational tasks based on application-specific latency requirements, resource availability, and network conditions. A comprehensive resource allocation strategy optimizes utilization across the computing continuum through specialized scheduling algorithms that prioritize time-sensitive operations. Communication protocol optimizations reduce data transfer overhead through context-aware compression techniques and adaptive packet sizing. Experimental evaluation conducted across heterogeneous computing environments demonstrates significant performance improvements, achieving latency reductions of 50-62% compared to baseline approaches. Resource utilization patterns show increased edge resource efficiency (83.4%) while reducing cloud resource consumption (31.1%). Energy efficiency metrics indicate substantial improvements across application categories, with energy-per-transaction reductions ranging from 50.0% to 60.6%. The framework maintains performance standards under challenging operational conditions, including network congestion and limited resource availability, validating its applicability for real-world deployment scenarios. The results demonstrate that intelligent edge-cloud collaboration can significantly enhance performance for latency-sensitive AI applications while improving overall system efficiency.