Accelerating Reinforcement Learning with Learned Skill Priors (SPiRL)
Authors: Karl Pertsch, Youngwoon Lee, Joseph J. Lim
Code: https://github.com/clvrai/spirl
Website: https://clvrai.github.io/spirl/
Motivation
Previous Work
- Extracting skills from unstructured data and leveraging them for downstream learning tasks
- Meta Learning Approaches - Online Learning, Require a defined set of learning tasks
- Learning skill embeddings
Contribution
Leveraging skills embeddings from large offline datasets and learning a prior over them for efficient downstream learning tasks
Methodology
Learning the Embedding Space and the Prior over the embeddings
- $p(z) \sim \mathcal{N}(0, 1)$
- Using the KL Divergence Loss to learn the prior over the
Using the prior over the skills for downstream reinforcement learning tasks
- Replacing one-step reward with H-step reward and single step transitions with H-step transitions
-
Learning the policy over the embeddings instead of the actions for model-free reinforcement learning $\pi_\theta(z |
s_t)$ |
- Integration with maximum entropy RL framework
Experiments
- Results have been reported for Soft-Actor Critic
- Three environments
- Maze navigation - D4RL
- Block Stacking
- Kitchen Environment - D4RL Benchmark
- BLOCK STACKING ENVIRONMENT
- PERFORMANCE
- SPiRL - Skill Prior Reinforcement Learning
- Flat Prior - Single Action Prior without temporal abstraction
- SSP w/o Prior - Learned skills without any prior
- BC + SAC - Learning a Behavior Cloning Policy from the offline data and finetuning it in the downstream RL task using SAC
- SAC - Soft Actor Critic
- EXPERIMENTS OVER DIFFERENT HORIZONS (ABLATION STUDIES)
Further work
- Experimenting with learning skill priors conditioned on different things; the paper reports results with the priors conditioned on the current state
- How does this approach scale with multi-task learning (continual learning agents)?