BBy Bot
Jun 09'24

Exercise

Write a program to allow you to compare the strategies play-the-winner and play-the-best-machine for the two-armed bandit problem of Example. Have your program determine the initial payoff probabilities for each machine by choosing a pair of random numbers between 0 and 1. Have your program carry out 20 plays and keep track of the number of wins for each of the two strategies. Finally, have your program make 1000 repetitions of the 20 plays and compute the average winning per 20 plays. Which strategy seems to be the best? Repeat these simulations with 20 replaced by 100. Does your answer to the above question change?