Introduction:
Juan de Oña, Griselda López, and Joaquín Abellán’s article Extracting Decision Rules from Police
Accident Reports through Decision Trees explores the use of decision trees
to identify the main factors that contribute to the severity of road accidents.
The authors’ goal is to find ways to extract decision rules from the decision
tree methodology that can be used by road safety analysts to address specific
problems that contribute to severe car accidents in order to prevent crashes
and in turn fatalities.
Summary:
According to de Oña et al. (2013), “the
term decision trees (DTs) encompasses a series of techniques for extracting
processable knowledge, implicit in databases, which is based on artificial
intelligence and statistical analysis” (p. 1151). DTs are easy to interpret
given their presentation as a graphical hierarchal structure. DTs are
particularly useful in studying traffic accidents because the technique allows
for the extraction of decision rules of the “if-then” type and also helps observers
to understand the events leading up to a crash and the variables that determine
the severity of the accident.
The authors seek results that can
be used for predictive purposed by assessing different algorithms used to build
DTs including Classification and Regression Trees (CART), ID3, and C4.5 and
taking the results from the best method. Their assessment is based on four
quantifiable evaluations: accuracy, sensitivity, specificity, and receiver
operating characteristic curve area. They transform the DT structure into
if-then or X→Y
rules where X is a set of statuses of
several attribute variables, such as accident type and atmospheric condition,
and Y is the status of the class
variable, or the severity of the accident. The authors consider 19 attribute
variables related to the driver, road, vehicle, and context, along with two
class variables, slightly injured (SI), and killed or seriously injured (KSI).
The authors assess that ID3 gives
the worst results, while the difference in quality between results from CART
and C4.5 is insignificant. For this reason, the authors pinpoint decision rules
resulting from the latter two methods. The two models produced significantly
different quantities of nodes, 19 and 52 respectively. Through these nodes, the
authors identify a combined total of 15 decision rules that meet minimum
standards in terms of population, support, and probability. Furthermore, the
authors identify the importance of variables in each model, in which 11
incidences of importance overlap between the two models.
Figure 2: Example of same variables in top nodes of
decision tree built with C4.5. This model creates individual nodes for each
variable, regardless of significance.
Conclusion and Critique:
The use of decision trees, particularly
different models of decision trees, allows researchers to identify patterns in
the accidents that occurred in Granada, Spain over a 7-year period and
determine interactions of variables. Authorities can use these results to
prioritize efforts to prevent severe crashes, especially given that most of the
rules extracted from the DTs coincide with conventional problems found in the rural
highways of developed countries. Specifically, the authors advise experts to
address the question of why women, as opposed to men, significantly increase
their risk of severity under driving conditions with insufficient light.
While I agree that the decision rules can be used by authorities, I feel that the authors under-emphasize the utility of doing so. Despite having statistical data to support their findings, the authors did not attempt to estimate the number of severe accidents that could be prevented or the number of variables that could feasibly be addressed by road safety experts. Additionally, the authors explain how the different models of DTs allow for pruning to present simple results, however, they do not address the process or explain why they prune their decision trees the way they do. This would be a valuable addition to the research, particularly for those readers with little familiarity with the methodology.
The methodology on a whole is useful for this type of research, supporting evidence of several factors known to contribute to accident severity as identified in previous studies. The various models of DTs allows for appropriate application to various situations, though it is important that researchers assess which is most appropriate for their research before drawing conclusions from the resulting decision rules.
While I agree that the decision rules can be used by authorities, I feel that the authors under-emphasize the utility of doing so. Despite having statistical data to support their findings, the authors did not attempt to estimate the number of severe accidents that could be prevented or the number of variables that could feasibly be addressed by road safety experts. Additionally, the authors explain how the different models of DTs allow for pruning to present simple results, however, they do not address the process or explain why they prune their decision trees the way they do. This would be a valuable addition to the research, particularly for those readers with little familiarity with the methodology.
The methodology on a whole is useful for this type of research, supporting evidence of several factors known to contribute to accident severity as identified in previous studies. The various models of DTs allows for appropriate application to various situations, though it is important that researchers assess which is most appropriate for their research before drawing conclusions from the resulting decision rules.
Source:
De Oña, J., López, G., and Abellán J. (2013). Extracting
Decision Rules from Police Accident Reports through Decision Trees. Accident Analysis & Prevention, 50(2013),
1151-1160. Retrieved from http://www.sciencedirect.com/science/article/pii/S0001457512003132
Will you provide your critique later?
ReplyDeleteMy mistake. An edit of the post has been made to include a critique in the conclusion of the post.
ReplyDeleteI had no idea that there were multiple types of decision trees. I see that the CART model combines variables and C4.5 lists variables individually. It is interesting that they produced results with similar quality even though the number of nodes that they utilized was so different. Since the authors identified 15 important variables and CART lists 19 (compared to 52), I would assume that CART is the more useful one since it appears to have arrived at the same conclusion as the more complicated model.
ReplyDeleteEthan, they discuss the existence of even more models of DTs, but only test these three as the authors felt they were the most common and applicable for this type of research. While I tend to agree that the conciseness of the CART model makes it more appealing, the benefit of the C4.5's extensive nodes gives the analysts more opportunity to decide for themselves what is key and what to explore further.
ReplyDeleteThese are truly amongst the wonderful informative blogs.life insurance quotes
ReplyDeleteThis comment has been removed by the author.
ReplyDelete