[Paper Notes] Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots
Published:
This post supports English / 中文 switching via the site language toggle in the top navigation.
TL;DR
PACE is a sim-to-real pipeline for legged robots that treats actuator and joint dynamics as the main reality gap. The key move is practical: collect short fixed-base, in-air encoder trajectories, fit a compact physical parameterization in simulation with CMA-ES, then train locomotion policies directly in the fitted simulator without dynamics randomization. The paper is useful if you are thinking about parameter identification as an alternative to ActuatorNet-style black-box actuator modeling.
My read: PACE is strongest as a joint-space dynamics alignment recipe. It does not try to learn a full residual simulator. It fits per-joint effective inertia, viscous damping, Coulomb friction, joint bias, and one global command delay. That gives a small parameter vector, enough physical meaning to debug, and a workflow that works even when the robot has only joint encoders and no joint torque sensors.
Paper and Resources
The paper is “Towards Bridging the Gap: Systematic Sim-to-Real Transfer for Diverse Legged Robots” by Filip Bjelonic, Fabian Tischhauser, and Marco Hutter from ETH Zurich’s Robotic Systems Lab. It is available as arXiv:2509.06342, with the project code at leggedrobotics/pace-sim2real, documentation at pace.filipbjelonic.com, and an ETH Research Collection dataset for actuator model identification and locomotion experiments at PACE Dataset for Sim-to-Real Transfer in Legged Robots.
The repository frames PACE as Precise Adaptation through Continuous Evolution. In the public code path, the basic example collects excitation data and runs scripts/pace/fit.py, which estimates actuator and joint parameters with CMA-ES for Isaac Lab / Isaac Sim 5.0-style workflows.
Why Actuator Modeling Matters Here
For legged locomotion, the actuator model can dominate sim-to-real transfer. A policy trained against a URDF-only model may learn joint trajectories that look plausible in simulation but land in the wrong phase and energy regime on hardware. Earlier ETH work used actuator networks: learned models that map histories of commands and joint states to torques, often requiring torque-instrumented data and careful data collection.
PACE takes a more physically constrained route. It asks whether the important low-level discrepancy can be captured by a small number of joint-space parameters. The paper’s answer is yes for the tested quadrupeds: a fitted simulator can match in-air joint trajectories, generalize across gains and trajectories, and remain competitive with an ActuatorNet baseline on ANYmal while using much less and less specialized data.
The important distinction is interpretability. An actuator network can be very expressive, but its errors are harder to attribute. In PACE, if the fitted \(I_a\) is too large, you can reason about rotor inertia, CAD link inertia, firmware compensation, and apparent inertia. If damping shifts, you can look at gearbox, motor, compensation, temperature, or motor-constant mismatch. That makes the method feel more like a debugging loop than a pure function approximator.
Data Collection Setup
PACE collects data with the robot base fixed and the legs moving freely in air. This removes unmeasured contact forces and avoids base-motion coupling. The authors excite all joints simultaneously with chirp signals, usually 20-60 seconds per sequence, with high-rate synchronized logging. The target is not to reproduce locomotion contact directly; the target is to isolate the joint and drive dynamics that will later shape contact behavior.
Three practical details matter:
- Fixed base: the simulated replay uses the same base pose as the real experiment.
- No contact: legs move in air, so the loss does not need foot forces or contact estimates.
- Excitation bandwidth: the chirp should cover the frequencies the policy can excite, or at least twice the highest frequency expected in the actual walking motion.
The paper also emphasizes PD gains. If gains are too high, the closed-loop poles move to high frequencies, making the required excitation bandwidth hard or unsafe. PACE uses small gains for identification and policy training so that the characteristic dynamics are visible in the collected data.
The joint transfer function used to explain this is:
\[ H_q(s) = e^{-sT_d}\frac{P_\tau}{I_a s^2 + (d + D_\tau)s + P_\tau} \]
Here \(I_a\) is effective armature inertia, \(d\) is viscous damping, \(T_d\) is lumped delay, and \(P_\tau,D_\tau\) are the joint-level PD gains.
Parameter Identification
The fitted parameter vector is deliberately small:
\[ \mathbf{p} = [\mathbf{I}_a,\mathbf{d},\boldsymbol{\tau}_f,\tilde{\mathbf{q}}_b,T_d]^\top \in \mathbb{R}^{4n+1} \]
For \(n\) actuated joints, PACE fits per-joint effective inertia \(\mathbf{I}_a\), viscous damping \(\mathbf{d}\), Coulomb friction \(\boldsymbol{\tau}_f\), joint bias \(\tilde{\mathbf{q}}_b\), and one global command delay \(T_d\). For the robots in the paper, this gives about 49 parameters, small enough for evolutionary search in massively parallel simulation.
The simulator runs \(N=4096\) environments in parallel. Each environment samples a candidate parameter vector, replays the recorded real joint targets, and compares simulated joint positions to the measured ones:
\[ \ell_e = \frac{1}{k}\sum_{i=1}^{k} \left|\mathbf{q}i^{\mathrm{real}}-\mathbf{q}{i,e}^{\mathrm{sim}}\right|^2 \]
PACE then solves:
\[ \mathbf{p}^{\ast} = \arg\min_{\mathbf{p}}\mathbb{E}[\ell_e] \]
using CMA-ES over the parallel population. The choice makes sense: gradients through the full simulator are not required, the dimension is moderate, and the objective can have local traps caused by delay, saturation, compensation, and friction.
At the single-joint level, the reference model is:
\[ I_a\ddot q+d\dot q=\tau_i+\tau_{\mathrm{comp}}+\tau_f \]
and the practical closed-loop form is:
\[ I_a\ddot q+d\dot q= \mathrm{sat}\left(P_\tau(\hat q-q+\tilde q_b)-D_\tau\dot q+\tau_{\mathrm{comp}}\right)+\tau_f \]
This equation is the heart of the paper for me. The authors are not just matching trajectories; they are choosing a parameterization that absorbs the effects that matter at the joint: inertia-like terms, damping-like terms, Coulomb friction, bias, firmware compensation, and saturation.
One subtle but important point: PACE does not co-optimize PD gains with the dynamics. If \(I_a,d,P_\tau,D_\tau\) are scaled together, the same trajectories can be preserved, which creates non-uniqueness. The gains are treated as known, and the fit focuses on the physical simulator parameters.
PACE versus ActuatorNet
The paper’s ANYmal comparison is the cleanest place to read PACE against ActuatorNet. The authors compare three settings: URDF-only, actuator network, and PACE. URDF-only diverges in in-air replay and fails in forward walking. Both actuator network and PACE transfer, but PACE has smaller delta phase-portrait spread in the reported in-air comparison and avoids the joint-position bias visible in the actuator-network baseline.
The data story is also different. PACE uses roughly 20 seconds of encoder-only in-air data per robot. Actuator networks generally need minutes of torque-instrumented data, and the deployed vendor LSTM baseline was likely trained on an even larger dataset. That changes the engineering trade-off: ActuatorNet is appealing when torque sensing and broad training logs are available; PACE is attractive when you want a lower-data, encoder-only path with parameters that can be inspected.
For me the useful mental model is:
| Method | What it learns | Data pressure | Debuggability |
|---|---|---|---|
| ActuatorNet | black-box actuator mapping, often recurrent | higher, often torque-instrumented | lower |
| PACE | compact physical joint-space parameters | lower, encoder-only in-air trajectories | higher |
The paper does not make ActuatorNet obsolete. It shows that for many PMSM-driven legged robots, a small physically meaningful parameter set can cover the main gap well enough to train blind locomotion without dynamics randomization.
Results and What to Keep
The single-drive experiments validate that the fitted inertia tracks known analytic changes in load. At the full-robot level, Tytan, ANYmal, and Minimal show close real-sim trajectory overlays in in-air replay. The fitted simulators generalize across unseen gains and trajectories, which is important because a parameter fit that only memorizes one chirp would be much less useful.
On the locomotion side, policies are trained in fitted simulation and deployed zero-shot. The paper reports deployment across three main platforms and more than ten additional robots. It also reports an energy result: ANYmal D reaches full Cost of Transport 1.27, about 32% lower than the state-of-the-art ANYmal C reference in the paper’s comparison. Tytan reaches CoT 0.97 in the same running-track analysis.
I would keep the RL part secondary. The more reusable idea is the upstream alignment loop:
fixed-base encoder logs
-> replay target trajectories in simulation
-> fit {inertia, damping, friction, bias, delay}
-> train policy in fitted simulation
-> zero-shot hardware deployment
That is a clean recipe for teams trying to reduce sim-to-real iteration cost without building a large learned actuator model first.
Limitations
PACE depends on the assumptions behind its fitted parameterization. The paper is explicit about this. Identification and deployment need consistent firmware compensation modes and filters. Finite excitation bandwidth can hide higher-frequency dynamics, especially on suspended setups where structural constraints cap the chirp. Temperature, wear, and aging can shift effective parameters over time. The method currently folds many electrical effects into joint-space terms; future work targets bus-voltage/current limits, inverter switching behavior, compliance, and higher-order motion terms such as jerk and snap.
The contact side is also deliberately indirect. PACE identifies in-air joint dynamics, then shows that this suffices for the tested contact tasks. If foot contact parameters, compliance, or terrain interaction dominate the gap for another platform, the recipe may need contact-parameter refinement or online adaptation.
Takeaways
PACE is worth remembering because it gives a concrete middle path between hand tuning and fully learned actuator models. The parameter vector \(4n+1\) is small, the data requirement is modest, and the resulting simulator is interpretable enough to diagnose. For a new legged robot, I would treat this as an early sim-to-real checklist: verify torque/current bandwidth, collect fixed-base chirps, fit the joint-space parameters, compare phase portraits and time traces, then train the policy only after the low-level dynamics stop lying.
