exercise:C67e5a9a0b: Difference between revisions

From Stochiki
(Created page with "<div class="d-none"><math> \newcommand{\NA}{{\rm NA}} \newcommand{\mat}[1]{{\bf#1}} \newcommand{\exref}[1]{\ref{##1}} \newcommand{\secstoprocess}{\all} \newcommand{\NA}{{\rm NA}} \newcommand{\mathds}{\mathbb}</math></div> Consider the two-armed bandit problem of Example. Bruce Barnes proposed the following strategy, which is a variation on the play-the-best-machine strategy. The machine with the greatest probability of winning is p...")
 
No edit summary
 
Line 1: Line 1:
<div class="d-none"><math>
Consider the two-armed bandit problem of [[guide:E05b0a84f3#exam 4.17 |Example]]. Bruce Barnes proposed the following strategy, which is a variation on the
\newcommand{\NA}{{\rm NA}}
\newcommand{\mat}[1]{{\bf#1}}
\newcommand{\exref}[1]{\ref{##1}}
\newcommand{\secstoprocess}{\all}
\newcommand{\NA}{{\rm NA}}
\newcommand{\mathds}{\mathbb}</math></div> Consider the two-armed bandit problem of [[guide:E05b0a84f3#exam 4.17 |Example]].  
Bruce Barnes proposed the following strategy, which is a variation on the
play-the-best-machine strategy.  The machine with the greatest probability of
play-the-best-machine strategy.  The machine with the greatest probability of
winning is played ''unless'' the following two conditions hold: (a) the
winning is played ''unless'' the following two conditions hold: (a) the

Latest revision as of 23:44, 13 June 2024

Consider the two-armed bandit problem of Example. Bruce Barnes proposed the following strategy, which is a variation on the play-the-best-machine strategy. The machine with the greatest probability of winning is played unless the following two conditions hold: (a) the difference in the probabilities for winning is less than .08, and (b) the ratio of the number of times played on the more often played machine to the number of times played on the less often played machine is greater than 1.4. If the above two conditions hold, then the machine with the smaller probability of winning is played. Write a program to simulate this strategy. Have your program choose the initial payoff probabilities at random from the unit interval [math][0,1][/math], make 20 plays, and keep track of the number of wins. Repeat this experiment 1000 times and obtain the average number of wins per 20 plays. Implement a second strategy---for example, play-the-best-machine or one of your own choice, and see how this second strategy compares with Bruce's on average wins.