Learning about Reinforcement Learning
It took me more than two months to understand how these 48 lines of code work. It’s a policy-based algorithm, which means that it does not try to predict future rewards and choose actions that would produce the highest rewards. Instead, REINFORCE learns to take an action based on the current state, aiming for an optimal policy. This policy is approximated using a neural network.
It’s been very fun to learn about neural nets and reinforcement learning algorithms. Code implemented along the way is available at ai-playground.