Reinforcement studying with prediction-based rewards



We’ve advanced Random Community Distillation (RND), a prediction-based approach for encouraging reinforcement studying brokers to discover their environments via interest, which for the primary time exceeds reasonable human efficiency on Montezuma’s Revenge.


Leave a Comment

Your email address will not be published. Required fields are marked *