How to build your own AlphaZero
With this, AlphaZero was born — the general algorithm for getting good at something, quickly, without any prior knowledge of human expert strategy. - Applied Data Science
see also
Ref
- AlphaGo Zero - How and Why it Works
- Mastering the game of Go without human knowledge
- AlphaGO zero cheat sheet
Learning Algorithm summary
-
Mentally play through possible future scenarios, giving priority to promising paths, whilst also considering how others are most likely to react to your actions and continuing to explore the unknown.
-
After reaching a state that is unfamiliar, evaluate how favourable you believe the position to be and cascade the score back through previous positions in the mental pathway that led to this point.
-
After you’ve finished thinking about future possibilities, take the action that you’ve explored the most.
-
At the end of the game, go back and evaluate where you misjudged the value of the future positions and update your understanding accordingly.
Code
see also AlphaGo Zero cheat sheet
- Jupyter notebook for DeepReinforcementLearning (github)