Sunday, March 10, 2013

Use of Decision Trees in Preventing Car Accidents

Juan de Oña, Griselda López, and Joaquín Abellán’s article Extracting Decision Rules from Police Accident Reports through Decision Trees explores the use of decision trees to identify the main factors that contribute to the severity of road accidents. The authors’ goal is to find ways to extract decision rules from the decision tree methodology that can be used by road safety analysts to address specific problems that contribute to severe car accidents in order to prevent crashes and in turn fatalities.  

According to de Oña et al. (2013), “the term decision trees (DTs) encompasses a series of techniques for extracting processable knowledge, implicit in databases, which is based on artificial intelligence and statistical analysis” (p. 1151). DTs are easy to interpret given their presentation as a graphical hierarchal structure. DTs are particularly useful in studying traffic accidents because the technique allows for the extraction of decision rules of the “if-then” type and also helps observers to understand the events leading up to a crash and the variables that determine the severity of the accident.

The authors seek results that can be used for predictive purposed by assessing different algorithms used to build DTs including Classification and Regression Trees (CART), ID3, and C4.5 and taking the results from the best method. Their assessment is based on four quantifiable evaluations: accuracy, sensitivity, specificity, and receiver operating characteristic curve area. They transform the DT structure into if-then or X→Y rules where X is a set of statuses of several attribute variables, such as accident type and atmospheric condition, and Y is the status of the class variable, or the severity of the accident. The authors consider 19 attribute variables related to the driver, road, vehicle, and context, along with two class variables, slightly injured (SI), and killed or seriously injured (KSI).

Figure 1: Example of top nodes of decision tree built with CART.  This model combines the results for the variables Female and Unknown for simplicity.
The authors assess that ID3 gives the worst results, while the difference in quality between results from CART and C4.5 is insignificant. For this reason, the authors pinpoint decision rules resulting from the latter two methods. The two models produced significantly different quantities of nodes, 19 and 52 respectively. Through these nodes, the authors identify a combined total of 15 decision rules that meet minimum standards in terms of population, support, and probability. Furthermore, the authors identify the importance of variables in each model, in which 11 incidences of importance overlap between the two models.

Figure 2: Example of same variables in top nodes of decision tree built with C4.5. This model creates individual nodes for each variable, regardless of significance.

Conclusion and Critique:
The use of decision trees, particularly different models of decision trees, allows researchers to identify patterns in the accidents that occurred in Granada, Spain over a 7-year period and determine interactions of variables. Authorities can use these results to prioritize efforts to prevent severe crashes, especially given that most of the rules extracted from the DTs coincide with conventional problems found in the rural highways of developed countries. Specifically, the authors advise experts to address the question of why women, as opposed to men, significantly increase their risk of severity under driving conditions with insufficient light.  

While I agree that the decision rules can be used by authorities, I feel that the authors under-emphasize the utility of doing so. Despite having statistical data to support their findings, the authors did not attempt to estimate the number of  severe accidents that could be prevented or the number of variables that could feasibly be addressed by road safety experts. Additionally, the authors explain how the different models of DTs allow for pruning to present simple results, however, they do not address the process or explain why they prune their decision trees the way they do. This would be a valuable addition to the research, particularly for those readers with little familiarity with the methodology.

The methodology on a whole is useful for this type of research, supporting evidence of several factors known to contribute to accident severity as identified in previous studies. The various models of DTs allows for appropriate application to various situations, though it is important that researchers assess which is most appropriate for their research before drawing conclusions from the resulting decision rules. 

De Oña, J., López, G., and Abellán J. (2013). Extracting Decision Rules from Police Accident Reports through Decision Trees. Accident Analysis & Prevention, 50(2013), 1151-1160. Retrieved from


  1. Will you provide your critique later?

  2. My mistake. An edit of the post has been made to include a critique in the conclusion of the post.

  3. I had no idea that there were multiple types of decision trees. I see that the CART model combines variables and C4.5 lists variables individually. It is interesting that they produced results with similar quality even though the number of nodes that they utilized was so different. Since the authors identified 15 important variables and CART lists 19 (compared to 52), I would assume that CART is the more useful one since it appears to have arrived at the same conclusion as the more complicated model.

  4. Ethan, they discuss the existence of even more models of DTs, but only test these three as the authors felt they were the most common and applicable for this type of research. While I tend to agree that the conciseness of the CART model makes it more appealing, the benefit of the C4.5's extensive nodes gives the analysts more opportunity to decide for themselves what is key and what to explore further.

  5. These are truly amongst the wonderful informative insurance quotes