Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control

ICRA 2026

Abstract

Humanoid whole-body controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kinematic topology still hinder one policy from commanding diverse humanoids. Moreover, obtaining a generalist controller that not only transfers across embodiments but also supports richer behaviors—beyond simple walking to squatting or leaning—remains challenging.

We address these obstacles with EAGLE, an iterative generalist-specialist distillation framework that yields a single unified policy capable of controlling multiple heterogeneous humanoids without per-robot reward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until convergence produces a robust whole-body controller validated on robots such as Unitree H1, G1, and Fourier N1.

We conducted experiments on five different robots in simulation and four in real-world settings. Quantitative evaluations show that EAGLE achieves high tracking accuracy and robustness compared to alternative methods, marking a step toward scalable, fleet-level humanoid control.

Method Overview

Unified command interface and generalist-specialist distillation diagram

(a) Unified command interface. The command vector ct combines task commands vt (linear velocities vx, vy and angular velocity ω) with behavior commands bt (base height h, body pitch p). Together with a short window of proprioception st, they form the observation ot.

(b) Generalist-specialist distillation. Each round copies the generalist policy πg to N specialists {πsi} for per-robot fine-tuning, then distills back by running πg, relabeling the actions with the corresponding specialist, and updating with the imitation loss. Repeating this loop yields a single controller that scales across embodiments while retaining rich whole-body commands.

BibTeX

@misc{peng2026eaglewbc,
    title={Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control}, 
    author={Quanquan Peng and Yunfeng Lin and Yufei Xue and Jiangmiao Pang and Weinan Zhang},
    year={2026},
    eprint={2602.02960},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2602.02960}, 
}