We’ve educated an agent to reach a prime rating of 74,500 on Montezuma’s Revenge from a unmarried human demonstration, higher than any in the past revealed consequence. Our set of rules is inconspicuous: the agent performs a chain of video games ranging from sparsely selected states from the demonstration, and learns from them by way of optimizing the sport rating the usage of PPO, the similar reinforcement finding out set of rules that underpins OpenAI 5.

