We propose a safe curriculum generation method (SCG) that reduces safety constraint violations during training while boosting the learning speed of constrained reinforcement learning agents.
Curriculum learning aims to accelerate reinforcement learning (RL) by generating curricula, i.e., sequences of tasks of increasing difficulty. Although existing curriculum generation approaches provide benefits in sample efficiency, they overlook safety-critical settings where an RL agent must adhere to safety constraints. Thus, these approaches may generate tasks that cause RL agents to violate safety constraints during training and behave suboptimally after. We develop a safe curriculum generation approach (SCG) that aligns the objectives of constrained RL and curriculum learning: improving safety during training and boosting sample efficiency. SCG generates sequences of tasks where the RL agent can be safe and performant by initially generating tasks with minimum safety violations over high-reward ones. We empirically show that compared to the state-of-the-art curriculum learning approaches and their naively modified safe versions, SCG achieves optimal performance and the lowest amount of constraint violations during training.
The state-of-the-art curriculum learning methods focus on the standard multi-task RL problem, i.e., maximizing expected return in tasks drawn from a target distribution. They overlook constrained RL, hence they cannot distinguish unsafe behaviors. We call this the misalignment of objectives of curriculum learning and constrained RL. As a result, they propose tasks that, while yielding high rewards, also incur high costs, leading to constraint violations.
We study 3 constrained RL domains that showcase the misalignment phenomenon: safety-maze, safety-goal, and safety-push. Safety-goal and safety-push involve navigation tasks with realistic sensory observation. TLDR: SCG consistently yields optimal policies by achieving zero cost and the highest success rate in all environments. Furthermore, SCG reaches the lowest constraint violations among methods that learn optimal policies.
@inproceedings{
koprulu2025safetyprioritizing,
title={Safety-Prioritizing Curricula for Constrained Reinforcement Learning},
author={Cevahir Koprulu and Thiago D. Sim{\~a}o and Nils Jansen and ufuk topcu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=f3QR9TEERH}
}