This paper presents a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C). Using synchronous training on the node level (while keeping the local, single node part of the algorithm asynchronous) and minimizing the memory footprint of the model, allowed the authors to achieve linear scaling for up to 64 CPU nodes. This corresponds to a training time of 21 minutes on 768 CPU cores, as opposed to 10 hours when using a single node with 24 cores achieved by a baseline single-node implementation.
Distributed Deep Reinforcement Learning: learn how to play Atari games in 21 minutes
By Quantilus|
2018-01-16T20:02:33+00:00
January 3rd, 2018|AI, NLP, Machine Learning|Comments Off on Distributed Deep Reinforcement Learning: learn how to play Atari games in 21 minutes