Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Dipendra Misra; Mikael Henaff; Akshay Krishnamurthy; John Langford

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Dipendra Misra ,
Mikael Henaff ,
Akshay Krishnamurthy ,
John Langford

ICML 2020 | July 2020

Download BibTex

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and, crucially, with no dependence on the size of the observation space, which could be infinitely large. This exploration guarantee further enables sample-efficient global policy optimization for any reward function. On the computational side, we show that the algorithm can be implemented efficiently whenever certain supervised learning problems are tractable. Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines.

Foundations of Real-World Reinforcement Learning

Reinforcement learning (RL) is an approach to sequential decision making under uncertainty which formalizes the principles for designing an autonomous learning agent. The broad goal of a reinforcement learning agent is to find an optimal policy which maximizes its long-term rewards over time. Its list of applications is growing as the technology advances and continues to be further integrated into many areas, such as education, health, advertising, autonomous systems, and gaming.

By starting from the perspective of an agent which interacts with and affects its environment, RL provides an improvement upon supervised learning in situations requiring decisions, and not just predictions. In particular, it motivates exploratory actions to discover novel rewarding behavior in the environment, a hallmark of intelligent agents.

In this webinar—led by Microsoft Researchers John Langford, Partner Research Manager with over a decade of experience in reinforcement learning-related research, and Alekh Agarwal, Principal Research Manager and leader of the Reinforcement Learning group in Redmond—learn how RL works to impact real-world problems across a variety of domains.

Together, you’ll explore:

The definition and uses of RL, from a general paradigm to its broad range of applications
The various benefits of using RL as well as its current challenges
The specific types of RL—contextual bandits, imitation learning, and strategic exploration
Where these cutting-edge methods might take the future of RL.

Resource list:

Reinforcement learning for the real world with Dr. John Langford and Rafah Hosn (opens in new tab) (Podcast)
Real World Reinforcement Learning (opens in new tab) (Project page)
Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning (opens in new tab) (Publication)
Provably efficient reinforcement learning with rich observations (opens in new tab) (Blog)
ICML 2017 Tutorial on Real World Interactive Learning (opens in new tab) (Tutorial)
Machine Learning (Theory) (opens in new tab) (John Langford’s blog)
Vowpal Wabbit (opens in new tab) (open source project)
Reinforcement Learning (opens in new tab) (Career opportunities)
Alekh Agarwal (opens in new tab) (Researcher profile)
John Langford (opens in new tab) (Researcher profile)

*This on-demand webinar features a previously recorded Q&A session and open captioning.

This webinar originally aired on December 5, 2019

Explore more Microsoft Research webinars: https://aka.ms/msrwebinars (opens in new tab)