Mixmax (MCTS)

MCTS has been blamed for cowardly behavior in the sense that it often prefers a safer, certain option over a more promising but uncertain outcome. To change this behavior, [1] proposed to use MixMax, which uses a mix between the maximum and the average reward -

Written on March 10, 2020, Last update on March 10, 2020
MCTS