In this study, authors Megan E. Piper, Wei-Yin Loh, Stevens S. Smith, Sandra J. Japuntich, and Timothy B. Baker used decision tree analysis to identify how variables linked to relapse of smoking interact with each other. The researchers then used this information to identify subgroups of smokers at a higher risk for relapse, and compared the results to those found through traditional methods.
Researchers conducted the study by inserting 70 variables with a connection to relapse in smoking, such as predictors related to environmental factors, dependence, and smoking history, into a decision tree model. This model is designed to take into account the effects of multiple factors. The more traditional model, logistic regression analysis, is a good predictor of factors influential to large groups and was used in addition to decision trees in order to create a comparison between the methods. Previous studies that depended solely on linear or logistic regression techniques were unable to identify smaller subgroups. Decision tree analysis allowed researchers to divide the data sets in order to look at the effects of certain factors on specific groups, such as men and women.
This study used the GUIDE decision tree program to analyze results from 1,071 smokers. The researchers designed models for one week after quitting, end of treatment, and six months postquit. The figure below shows the “pruned” decision tree of the question “How soon after you wake up do you smoke your first cigarette?” From the results of this question, taken one week postquit, the decision tree splits further to show the abstinence rates of smokers who did or did not receive treatment, and those who are or are not married. The study showed that people who smoke their first cigarette more than 30 minutes after waking are the most likely to quit smoking, and that it is important to give treatment to those whose first cigarette is less than 30 minutes after waking up.
The second figure shows how marital status, gender, and the age began daily smoking interacted to affect abstinence at the end of treatment. Household income, health status, and longest previous quit attempt were also significant factors identified to influence successful cessation of smoking. The results showed that it is important to consider environmental and contextual factors in addition to those related to dependence.
Although both the decision tree analysis and logistic regression technique used the same data, they produced significantly different results. According to the authors, the decision tree model was better suited to handling a large number of factors; however, they were not able to determine which method is more accurate. Overall, decision tree analysis produced statistically significant results that will hopefully be replicated in subsequent studies.
This study is a very good example of how decision tree analysis can be applied in real world situations to predict human behavior. One important characteristic of decision trees is that although they are based on scientific calculations, they are easy to understand and could be shown to decision makers to help explain the results of the analysis. Although this case study uses quantitative data, it would be interesting to see how decision trees can be translated for use with qualitative data more commonly used for intelligence analysis.
Piper, M.E., Wei-Yin, L., Stevens, S.S., Japuntich, S.J., Baker, T.B. (2011). Using decision tree analysis to identify risk factors for relapse to smoking. Substance Use & Misuse 46, p. 492-510. doi: 10.3109/10826081003682222