Oxford Robotics Institute

Using AI to Advance Robot Learning

Published 30 JUN 2022

The Oxford Robotics Institute (ORI) is built from collaborating and integrated groups of researchers, engineers and students all driven to change what robots can do for us. Its current interests are diverse, from flying to grasping - inspection to running - haptics to driving and exploring to planning. This spectrum of interests leads to research across a broad span of technical topics, including machine learning and AI, computer vision, fabrication, multispectral sensing, perception and systems engineering.

Next Steps: Learning a Disentangled Gait Representation for Versatile Quadruped Locomotion

Project Background

Quadruped locomotion is rapidly maturing to a degree where robots now routinely traverse a variety of unstructured terrains. However, while gaits can be varied typically by selecting from a range of pre-computed styles, current planners are unable to vary key gait parameters continuously while the robot is in motion. The synthesis, on-the-fly, of gaits with unexpected operational characteristics or even the blending of dynamic manoeuvres lies beyond the capabilities of the current state-of-the-art. In this case study, ORI address this limitation by learning a latent space capturing the key stance phases of a particular gait, via a generative model trained on a single trot style. The use of a generative model facilitates the detection and mitigation of disturbances to provide a versatile and robust planning framework. ORI evaluated its approach on a ANYmal quadruped robot and demonstrated that its method achieved a continuous blend of dynamic trot styles while being robust and reactive to external perturbations.

Quadruped locomotion has advanced significantly in recent years, extending its capability towards applications of significant value to industry and the public domain. Driven primarily by advances in optimisation-based [1-4] and reinforcement learning-based methods [5-7], quadrupeds are now able to traverse over a wide variety of terrains, making them a popular choice for tasks such as inspection, monitoring, search and rescue or goods delivery in difficult, unstructured environments. However, despite recent advances, important limitations remain. Due to the complexity of the system, models used for gait planning and control are often overly simplified and handcrafted for particular gait types such as crawl, trot or gallop [1-8].

Project Approach

Inspired by recent work on a quadruped that achieves a crawl gait via the traversal of a learned latent space9, ORI approached the challenge of continuous contact-schedule variation from the perspective of learning and traversing a structured latent-space. This is enabled by learning a generative model of locomotion data which, in addition to capturing relevant structure in the space, enables the detection and mitigation of disturbances to provide a versatile and robust planning framework. In particular, ORI train a variational auto- encoder (VAE) [10,11] on short sequences of state-space trajectories taken from a single gait type (trot), and predict a set of future states.

Figure 2: Using a variational auto-encoder (VAE), the ORI approach learns a structured latent space capturing key stance phases constituting a particular gait. The space is disentangled to a degree such that application of a drive signal to a single dimension of the latent variable induces gait styles which can be seamlessly interpolated between. ORI encodes raw sensor information to infer the robot’s gait phase using genc before applying the drive signal and then decode the augmented latent variable and the base twist action ak via gdec and predict the feet in contact using gpp. The drive signal’s amplitude and phase provide continuous control over the robot’s cadence, full-support duration and foot swing height.

The VAE is fast enough to act as a planner in a closed-loop controller. Thus, the ORI approach can react to external disturbances and mitigate against real-world effects such as unmodelled dynamics and hardware latency. For closed-loop control, ORI began by encoding a history of robot states from the raw sensor measurements to infer the current gait phase. ORI stored a buffer of past robot states to create the encoder’s input. This proved able to both detect and react to disturbances, as the VAE is trained using canonical feasible trajectories. Therefore, any disturbances are characterised as out of distribution with respect to the training set. Given the generative nature of this approach, this discrepancy is quantified during operation by the trained model via the Evidence Lower Bound (ELBO).

Figure 3: The above image depicts the ELBO trace for three push events along with the robot’s contact schedule. The widths of the white spaces in the contact schedule halve as the cadence increases to mitigate the disturbance. The robot images above this are snapshots taken from the first push and show the robot’s recovery. The robot successfully recovers and this usually requires between three and four steps.

Further information on the robotic experiments and their results can be seen in the below video from ORI.

Conclusions

ORI presented a robust and flexible approach for locomotion planning via traversal of a structured latent-space, utilising a deep generative model to capture features from locomotion data, and enable detection and mitigation of disturbances. The resulting latent-space is disentangled such that key locomotion features are automatically discovered from a single style of trot gait. This disentanglement is exploited using an oscillatory drive-signal, where the amplitude and phase directly control the gait parameters, namely the cadence, swing height, and full-support duration. Once deployed, the ease with which modulation of the drive signal gives rise to seamless interpolation between gait parameters is demonstrated. Utilising a generative model affords detection of disturbances as out of the distribution seen during training. The VAE-planner is able to reject a wide range of impulses applied to the robot’s base. This operating window is enlarged by increasing the robot’s cadence once a disturbance is detected - a rudimentary response, which reports that humans increase their cadence to recover from slippage [12].

References

[1] C. D. Bellicoso, F. Jenelten, C. Gehring, and M. Hutter, “Dy- namic locomotion through online nonlinear motion optimization for quadrupedal robots,” IEEE Robot. Automat. Lett. (RA-L), vol. 3, no. 3, pp. 2261–2268, 2018.

[2] C. Mastalli, W. Merkt, J. Marti-Saumell, H. Ferrolho et al., “A direct-indirect hybridization approach to control-limited DDP,” arXiv:2010.00411, 2021.

[3] O. Melon, R. Orsolino, D. Surovik, M. Geisert et al., “Receding- horizon perceptive trajectory optimization for dynamic legged loco- motion with learned initialization,” in IEEE Int. Conf. Rob. Autom. (ICRA), 2021.

[4] A. W. Winkler, C. D. Bellicoso, M. Hutter, and J. Buchli, “Gait and trajectory optimization for legged systems through phase-based end- effector parameterization,” IEEE Robot. Automat. Lett. (RA-L), vol. 3, no. 3, pp. 1560–1567, July 2018.

[5] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso et al., “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, 2019.

[6] S. Gangapurwala, A. Mitchell, and I. Havoutis, “Guided constrained policy optimization for dynamic quadrupedal robot locomotion,” IEEE Robot. Automat. Lett. (RA-L), vol. 5, no. 2, pp. 3642–3649, 2020.

[7] S.Gangapurwala, M.Geisert,R.Orsolino,M.Fallon,andI.Havoutis, “RLOC: Terrain-aware legged locomotion using reinforcement learn- ing and optimal control,” arXiv preprint arXiv:2012.03094, 2020.

[8] A. W. Winkler, F. Farshidian, D. Pardo, M. Neunert, and J. Buchli, “Fast trajectory optimization for legged robots using vertex-based ZMP constraints,” IEEE Robot. Automat. Lett. (RA-L), vol. 2, no. 4, pp. 2201–2208, Oct 2017.

[9] A. L. Mitchell, M. Engelcke, O. Parker Jones, D. Surovik et al., “First steps: Latent-space control with semantic constraints for quadruped locomotion,” in IEEE/RSJ Int. Conf. Intell. Rob. Sys. (IROS), 2020, pp. 5343–5350.

[10] D. Kingma and M. Welling, “Auto-encoding variational bayes,” in Int. Conf. on Learn. Repr. (ICLR), 2014.

[11] D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropa- gation and approximate inference in deep generative models,” in Int. Conf. on Mach. Learn. (ICML), 2014.

[12] B. E. Moyer, A. J. Chambers, M. S. Redfern, and R. Cham, “Gait parameters as predictors of slip severity in younger and older adults,” Ergonomics, vol. 49, pp. 329–343, 2006.

READ THE FULL WHITEPAPER

The Scan Partnership

The Scan AI team supports ORI projects by providing a remote access to a cluster of six NVIDIA DGX appliances combined with an NVIDIA EGX server and a PNY 3S-2400 AI-optimised NVMe all-flash storage array. This cluster is overlaid with Run:AI software in order to virtualise the GPU pool across the compute nodes to facilitate maximum utilisation and to provide a mechanism for scheduling and allocation of ORI workflows across the combined GPU resource. This infrastructure is delivered to the ORI team over the Scan Cloud platform and is hosted in a secure UK datacentre.

‘Using the Scan cluster, we are able to iterate over multiple learned models in parallel on their dedicated Deep Learning hardware. This translates to more time testing on our real robots, and less time waiting for models to train.’

Professor Ingmar Posner, Head of the Applied AI group at the Oxford Robotics Institute

Oxford Robotics Institute

Using AI to Advance Robot Learning

Next Steps: Learning a Disentangled Gait Representation for Versatile Quadruped Locomotion

Project Background

Project Approach

Conclusions

The Scan Partnership

Related content

Using AI to Advance Robot Learning

Other ORI Projects

ORI Embodied Intelligence Project

Read more case studies