A bayesian analysis of human decision-making on bandit problems
Steyvers, Lee, and Wagenmakers (2009) conducted their study on the differences in individuals balancing between exploration and exploitation in solving bandit problems by using Bayesian analysis. In a bandit problem situation, the individual will have to choose between a set of alternatives that have inherently different reward levels. Moreover, the individual will have to try and maximize the total reward that they receive over a set number of trials ( Steyvers et al., 2009). Bandit problems require that the individual analyzes their environment in two distinctive manners, both explorative and exploitative. It is crucial that the individual exploits situations in their environment that they are familiar with and explore areas of their environment that they are less familiarized with (Steyvers et al, 2009). Thus, conducting a happy medium between both exploitation and exploration in bandit problems is critical to effective decision-making thought processes.
Steyvers et al. (2009) utilized a Bayesian extension of optimal decision-making processes to display differences in human decision-making when the reward rates are different within the individuals situated environment. The ultimate goal is to determine which situations an individual would be more willing to make optimistic assumptions about reward rates, as opposed to pessimistic assumptions about the potential reward rate. The sample size for the study included 451 participants who completed a series of bandit problems, as well as a series of psychological tests. The psychological tests measured some aspects of psychometric assessments of cognitive, intelligence, and personality traits of the 451 participants (Steyvers et al., 2009).
Over the course of the study it was determined that by completing a larger amount of problems the participants were able to learn more effective decision-making processes. Hence, becoming more familiar with a certain environment improved decision-making abilities. Completing more tests allowed the individuals to have more efficient decision-making processes about what their assumptions should be to maximize rewards and minimize losses in different situations/environments. Thus, environments with high reward rates displayed participants as being more likely to conduct exploration as opposed to limited reward environments in which participants chose to be more exploitative in decision-making endeavors (Steyvers et al, 2009). Moreover, Steyvers et al. (2009) found that standard psychometric measurements of intelligence had a direct correlation with choosing to be explorative or exploitative in ones decision-making thought processes.
Overall, with not being strongly familiar with the concept of Bayesian analysis the article was a little hard to follow at times, however, the authors did describe the various explanations of the decision-making variables within the study that were part of the various Bayesian analysis equations. A more thorough explanation would certainly benefit the reader to follow the way in which the experiment and calculations were conducted more easily. Most significantly, I found the study interesting in the way it utilized Bayesian analysis to examine bandit problems. Bayesian analysis' ability to update the probability of an event to occur when more evidence is added allows for a good analysis of bandit problems. Bandit problems challenge the decision-maker to explore unfamiliar ground or exploit situations they have more direct experience with.
I agree with the authors that they would need to expand their study in order to determine if more individuals conduct decision-making process in bandit situations in the same manner that was displayed in this study. Thus, it would be necessary to choose more factors that would affect the cognitive thinking capabilities of the respondents. One such factor that would be needed to consider as a variable in this study would be the factor of learning. The authors found that continued testing allowed the participants to choose the correct decision-making possibility, either exploitative or exploratory. I think that it would be interesting to find out at what point over a certain amount of tests would the respondents have a sense that they were making the right decision, or is this type of decision-making inherently present in our cognitive abilities without testing.
Steyvers, M., Lee, M.D., & Wagenmakers, E.J. (2009). A bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 53 (3), 168-179. Retrieved from http://www.sciencedirect.com/science/article/pii/S0022249608001090.