Multi-Objective Deep Reinforcement Learning for Carbon-Aware Spatiotemporal Workload Scheduling in Geo-Distributed Data Centers

Yanhuan Chen; Zijie Chen

doi:10.69987/JACS.2025.51002

Authors

Yanhuan Chen Master of Engineering, Dartmouth College, NH, USA Author
Zijie Chen Computer Engineering, University of Toronto Master, Toronto, Canada Author

DOI:

https://doi.org/10.69987/JACS.2025.51002

Keywords:

carbon-aware scheduling, multi-objective reinforcement learning, geo-distributed data centers, spatiotemporal workload shifting, Pareto optimization

Abstract

The rapid expansion of artificial intelligence training and cloud computing workloads has transformed United States data centers into major contributors to national carbon emissions, consuming between 1% and 1.3% of total national electricity output with projections indicating sustained double-digit annual growth. A fundamental yet underexploited characteristic of the US power grid is the spatiotemporal heterogeneity of carbon intensity: marginal emission rates vary by a factor of 5–10 across the seven major independent system operator (ISO) regions and exhibit pronounced diurnal and seasonal oscillations driven by renewable penetration patterns. Existing scheduling frameworks optimize for throughput and operational cost while treating carbon emissions as an externality, leaving substantial decarbonization potential untapped. This paper presents a multi-objective deep reinforcement learning (MO-DRL) framework that jointly exploits temporal deferral and geographic migration to minimize carbon emissions, job completion latency, and operational cost for delay-tolerant batch workloads across geo-distributed data centers. By formulating the scheduling problem as a multi-objective Markov decision process (MDP) and training a Pareto-conditioned policy network using multi-objective proximal policy optimization (MO-PPO), the proposed approach learns a rich set of Pareto-optimal scheduling strategies that enable operators to navigate the three-way tradeoff without rerunning optimization. Evaluated against real carbon intensity traces from six US ISO regions and Google/Alibaba cluster workload datasets, the framework achieves up to 41.3% carbon reduction compared to carbon-agnostic baselines while maintaining 95th-percentile job completion time within a 15% overhead bound.