QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH

Authors

  • Eric Wang Computer Science, UCLA, CA, USA Author
  • Heyu Wang Electrical and Computer Engineering, Rice University, TX, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30204

Keywords:

adaptive bitrate streaming, QoE, reinforcement learning, HLS, MPEG-DASH

Abstract

HTTP adaptive streaming over HLS/DASH must balance delivered visual quality against playback interruptions, bitrate variation, and startup delay. In many deployed players, time-to-first-frame (TTFF) is still handled through startup heuristics rather than being optimized jointly with steady-state adaptive bitrate (ABR) decisions. This paper studies a trace-driven controller family that combines a PPO+GAE actor-critic policy with two deployment-oriented constraints: a safety supervisor that caps bitrate by an online throughput estimate and an optional startup cap that operates only before playback begins. We evaluate the controller family on 40 mobile HSDPA throughput traces from MMSys’13 using a simulator with 2 s segments, a 6-level bitrate ladder, and a unified QoE metric that rewards bitrate and penalizes rebuffering, bitrate changes, and TTFF. In the four-way controller comparison on the held-out 8-trace test split, the 750 kbps startup-cap operating point (SafeRL-TTFF-750) achieves the highest mean QoE (136.125 ± 58.994), improves mean TTFF by 16.6% relative to the throughput-based RB baseline, and keeps mean rebuffering at 0.228 ± 0.556 s. On the full 40-trace set, SafeRL-TTFF-750 and RB are effectively tied in mean QoE, with the former trading slightly higher bitrate and lower TTFF for higher rebuffering. An ablation study shows that the safety supervisor is essential, and that stricter startup caps can reduce TTFF further with only small changes in scalar QoE. The results support a practical conclusion: learned ABR can be useful on mobile traces when RL decisions are wrapped in transparent safety and startup controls.

Author Biography

  • Heyu Wang, Electrical and Computer Engineering, Rice University, TX, USA

     

     

     

Downloads

Published

2023-02-12

How to Cite

Eric Wang, & Heyu Wang. (2023). QoE-Driven Reinforcement Learning for Joint Bitrate, Rebuffering, and TTFF Optimization in HLS/DASH. Journal of Advanced Computing Systems , 3(2), 50-59. https://doi.org/10.69987/JACS.2023.30204

Share