Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control

Skand Peri, Akhil Perincherry*, Bikram Pandit*, Stefan Lee

Conference on Robot Learning, 2025

(Oral Presentation)

Abstract

Efficient robot control often requires balancing task performance with energy expenditure. A common approach in reinforcement learning (RL) is to penalize energy use directly as part of the reward function. This requires carefully tuning weight terms to avoid undesirable trade-offs where energy minimization harms task success. In this work, we propose a hyperparameter-free gradient optimization method to minimize energy expenditure without conflicting with task performance. Inspired by recent works in multitask learning, our method applies policy gradient projection between task and energy objectives to derive policy updates that minimize energy expenditure in ways that do not impact task performance. We evaluate this technique on standard locomotion benchmarks of DM-Control and HumanoidBench and demonstrate a reduction of 64% energy usage while maintaining comparable task performance. Further, we conduct experiments on a Unitree GO2 quadruped showcasing Sim2Real transfer of energy efficient policies. Our method is easy to implement in standard RL pipelines with minimal code changes, is applicable to any policy gradient method, and offers a principled alternative to reward shaping for energy efficient control policies.

RL control policies often optimize a weighted combination of task reward and energy penalties, i.e., \(r_{\text{task}} + \lambda r_{\text{energy}}\). However, tuning the weighting factor \(\lambda\) is challenging due to high variability in its optimal value across tasks, environments, and embodiments. (Left) When a Soft Actor Critic (SAC) agent is trained on the dog-run task, \(\lambda=0.01\) and \(\lambda=0.1\) result in significantly different performance -- with the policy at \(\lambda=0.1\) achieving low returns by crawling rather than running. However, \(\lambda=0.1\) works well in the less dynamic dog-walk environment (Not shown). (Right) For a humanoid sitting task, both \(\lambda=0.01\) and \(0.1\) yield policies that are equally energy-efficient and task-effective, showcasing the inter-environment variability. In both cases, our proposed hyperparameter-free method, PEGrad(\(\bigstar\)), leads to performant and energy efficient policies.

Projecting Enery Gradients PEGrad

We propose PEGrad, a hyper-parameter free gradient-based method to optimize for energy-efficient robot control policies. The core idea is to project the energy gradients in an orthogonal direction to that of the reward gradients (\(g_{E_{\perp R}}\)), and scale them adaptively to ensure that the projected energy gradient norm is \( ||g_{E_{\perp R}}||_2 \leq ||g_R||_2\).

Results on DMControl (Medium and Hard tasks)

We show results on six tasks from DMControl suite. Low applied torque and higher returns are better. Across all tasks, PEGrad achieves high task performant policies that are also energy-efficient. For 4 out of 6 tasks, PEGrad achieves results beyond the Pareto front identified by adjusting \(\lambda\).

Results on Humanoid-Bench

We show results on four tasks from HumanoidBench suite. Low applied torque and higher returns are better. Across all tasks, PEGrad achieves highly performant policies that are also energy-efficient with PEGrad. Further, energy minimization also improves sample-efficiency on \(\texttt{h1-run-v0}\) and \(\texttt{h1-walk-v0}\) tasks.

Sim2Real Quantitative Results

Current and Torque usage in the real-world: We compare Unitree's and \(\texttt{Factory}\) controller and \(\texttt{AMP+PPO}(\lambda)\) baselines against PEGrad for \(\texttt{Stand}\) and \(\texttt{Walk}\) tasks and report current drawn and net torque applied. We find that PEGrad is ~20% more efficient than the tuned multi-objective \(\texttt{AMP+PPO}(\lambda=0.0002)\) on the task of \(\texttt{Walk}\) and has a comparable performance on \(\texttt{Stand}\).

Non-conflicting Energy Minimization in Reinforcement Learning based Robot Control

Abstract

Projecting Enery Gradients PEGrad

Results on DMControl (Medium and Hard tasks)

Results on Humanoid-Bench

Sim2Real Quantitative Results

Video (no audio)

References

2025