Prediction of Human Full-Body Movements with Motion Optimization and Recurrent Neural Networks


Philipp Kratzer, Marc Toussaint and Jim Mainprice

Machine Learning and Robotics Lab, University of Stuttgart

International Conference on Robotics and Automation 2020


Introduction


  • Planning robot motion in close proximity to humans is challenging
  • Key ability is to forecast human behavior

  • In this work:
    1. Novel prediction method for full-body motion
    2. Joint human-robot planning framework

The Motion Prediction Problem


  • Given trajectory of full-body motion $s_{0:t}$, predict subsequent states$~s_{t+1:T}$
  • A good prediction respects both: human body dynamics and environmental context

Related Work: Human Body Dynamics


  • Good results using recurrent neural networks (RNNs)1, 2


  • No notion of environmental context $\rightarrow$ challenging to incorporate
  1. K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, "Recurrent network models for human dynamics" (2015)
  2. H. Wang and J. Feng, "Vred: A position-velocity recurrent encoder-decoder for human motion prediction" (2019)

Related Work: Trajectory Optimization



  • Gradient-based optimization algorithms widely used in robotics

  • Flexible framework for motion planning

  • Motion planning with non-convex obstacles possible (e.g. CHOMP1)
  1. N. Ratliff, M. Zucker, J. A. Bagnell, and S. Srinivasa, "Chomp: Gradient optimization techniques for efficient motion planning" (2009)

Our Approach



Offline: Learn a state-of-the-art RNN model from data $\mathcal{D}$

Our Approach



Online: use model to predict trajectory

Our Approach



Extract constraints from environment and update the prediction using numerical optimization techniques

RNN Architecture for Human Internal Dynamics



RNN cells with multiple Gated Recurrent Units and a final linear layer

RNN Architecture for Human Internal Dynamics



We input the base rotation, joint angles and velocities to the network and make it predict the next velocity

RNN Architecture for Human Internal Dynamics



By adding the velocities to the state we can retrieve the following state $s_{t+1}$

RNN Architecture for Human Internal Dynamics



By looping the velocity and states back into the RNN cell, multiple future steps can be predicted

RNN Architecture for Human Internal Dynamics



Finally we add our controls $\delta$ to the velocity input. This allows to change the prediction of the final network

Optimization Objective

$$ c(\delta) = \underbrace{ \lambda_1 \|\delta\|^2}_\text{low-level} + \underbrace{\lambda_2 \|\phi_{T}(f_\text{RNN}(\delta)) - p^*\|^2}_\text{goalset} + \underbrace{\lambda_3 \sum_{d=t+1}^T \exp \big\{ -\alpha~\text{SDF}(p_d) \big\}\Delta_H }_\text{collision} $$ $\phi_T \ldots$ inverse kinematics map at time $T$
$f_\text{RNN} \ldots$ recurrent neural network
$p_d=\phi_d(f_\text{RNN}(\delta)) \ldots$ position at time d and $\Delta_H=||p_{d+1}-p_d||$
SDF $\ldots$ signed distance function
$\lambda, \alpha \ldots$ hyperparameters


  • Low-level: Keep changes to the inputs small
  • Goalset: End up close to a specified point$~p^*$
  • Collision: Do not collide with obstacles

Combined Human and Robot Optimization

  • Idea: Optimize Human and Robot trajectory at the same time
  • Allows to plan for the robot while predicting human motion
  • Bidirectional information flow

  • Human-robot: Human predictions and a robotics agent $x$ should not collide: $$ c_{\text{j}}(\delta, x) = \sum_{d=t+1}^T \exp \big\{ -\alpha \| p_d -x_d \| \big\} \Delta_H \Delta_R $$
    $p_d=\phi_d(f_\text{RNN}(\delta)) \ldots$ position at time d and $\Delta_H=||p_{d+1}-p_d||$
    $\phi_d \ldots$ inverse kinematics map at time $d$
    $f_\text{RNN} \ldots$ recurrent neural network
    $x_d \ldots$ robot position at time d and $\Delta_R=||x_{d+1}-x_d||$
    $\alpha \ldots$ hyperparameter

Optimization



  • Final loss is weighted sum

  • Compute gradients using automatic differentiation

  • We optimize using a gradient-based optimizer (limited-memory BFGS)

Experiments

Full-Body Motion Capture Setup and visualization of the scene
  • 1 actor, total of ~120min
  • Data split into training and testing (9:1)
  • 3 data sets: 1) walking 2) pick and place 3) pick and place with chairs as obstacles
  • Purely kinematic model with joint angles

Results: Goal Constraint

Results: Goal Constraint

Results: Obstacle Objective

Results: Quantitative


Method 0.5s 1s 1.5s 2s
Zero velocity baseline 2.45 5.50 9.49 11.74
initial prediction without optimization 1.25 2.48 4.56 6.39
ours with goal constraint 1.25 2.27 2.52 2.18
ours with goal and obstacle objective 1.25 2.18 2.41 2.10

  • Averaged distance to ground truth over 22 reaching trajectories
  • Best performance with goal and obstacle optimization

Results: Joint Objective

Summary and Future Work


  • Presented novel prediction method for full-body motion

  • Possible to adapt prediction to environmental context

  • Combined human and robot trajectory optimization


  • More compact derivative computation

  • Object affordances (e.g. similar to Koppula and Saxena1) could be used to guide the optimizer
  1. H. S. Koppula, and A. Saxena, "Anticipating human activities using object affordances for reactive robotic response" (2015)