We develop an uncertainty-aware, offline model-based reinforcement learning approach with neural stochastic differential equations that outperforms the state-of-the-art in continuous control benchmarks, particularly in low-quality datasets.
Offline model-based reinforcement learning (RL) offers a principled approach to using a learned dynamics model as a simulator to optimize a control policy. Despite the near-optimal performance of existing approaches on benchmarks with high-quality datasets, most struggle on datasets with low state-action space coverage or suboptimal demonstrations. We develop a novel offline model-based RL approach that particularly shines in low-quality data regimes while maintaining competitive performance on high-quality datasets. Neural Stochastic Differential Equations for Uncertainty-aware, Offline RL (NUNO) learns a dynamics model as neural stochastic differential equations (SDE), where its drift term can leverage prior physics knowledge as inductive bias. In parallel, its diffusion term provides distance-aware estimates of model uncertainty by matching the dynamics' underlying stochasticity near the training data regime while providing high but bounded estimates beyond it. To address the so-called model exploitation problem in offline model-based RL, NUNO builds on existing studies by penalizing and adaptively truncating neural SDE's rollouts according to uncertainty estimates. Our empirical results in D4RL and NeoRL MuJoCo benchmarks evidence that NUNO outperforms state-of-the-art methods in low-quality datasets by up to 93% while matching or surpassing their performance by up to 55% in some high-quality counterparts.
We propose a parametric distance-aware uncertainty estimator that captures the distance to the closest k-th neighbor in the dataset without the need for a KNN search. Besides bypassing intractable KNN search, our parametric estimator can be trained alongside the neural SDE model such that the model can capture both aleatoric and epistemic uncertainty in the dynamics. The estimator is smooth and differentiable and thus blends well with the requirements for numerical integration of the neural SDE.
We empirically evaluate NUNO against state-of-the-art (SOTA) offline model-based and model-free approaches in continuous control benchmarks, namely MuJoCo datasets in D4RL and NeoRL.
NUNO outperforms SOTA in low-quality datasets by up to 93% while matching or surpassing their performance by up to 55% in some high-quality counterparts.
We assess how NUNO addresses the model exploitation phenomenon based on two aspects: (1) conservativeness of the reward function of pessimistic learned MDPs, and (2) prediction accuracy of learned dynamics models.
@inproceedings{
koprulu2025neural,
title={Neural Stochastic Differential Equations for Uncertainty-Aware Offline {RL}},
author={Cevahir Koprulu and Franck Djeumou and ufuk topcu},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=hxUMQ4fic3}
}