Simplifying
Decision Trees
Summary and Critique by Michael Pouch
Summary:
J.R. Quinlan offers
different techniques to help simplify the making of decision trees while at the
same time maintaining their accuracy. The purpose of the study is to simplify
decision trees by using four methods that help
prove their fluency and the ability to apply the knowledge.
The author begins to
explain that in order to build knowledge based systems, the ability to function
at an expert level in some task does not necessarily confer a corresponding
ability to articulate this know-how. Therefore, all this knowledge that the “knowledge engineer” produced, tends to
state the engineer’s conclusions and reasoning in broad terms for the analysis.
Thus, Quinlan examines four methods to help simplify the method going into creating
decision trees. On the whole, three of the methods that Quinlan examines
involve pruning the decision tree by
replacing one or two subtrees with leaves.
The fourth method that Quinlan examines reformulates the decision trees as a
set of production rules.
The first method that Quinlan introduces is
the method of Cost-Complexity Pruning.
After creating a large decision tree, we look at the subtrees that have
one to zero nodes. To prune a tree, we look at these nodes that can be
evaluated based on its classification accuracy with the other nodes. Next, we
denote these nodes that are to be pruned
because they are not focused on the overall validation of the tree. The basic
idea of cost-complexity pruning is not to consider all pruned subtrees, but
only those that are the “best of their kind” in a sense to be defined.
The second method
introduced is Reduced Error Pruning. First, you
prune the nodes that are classified as the subtrees and you remove them to make
one leaf. Nodes are removed on the basis of choosing the node whose
removal most increases the decision tree accuracy on the graph. Pruning process
continues until further pruning is harmful.
The third method is
called Pessimistic Pruning, where it’s based on a single pass algorithm on a
True or False approach. At each step, we remove one rule, such that its removal
brings the lowest valued node among all possible removals of the rule in the
current tree.
The fourth method simplifies
the production rules that go into decision trees. As decision trees classify a case and the leaves are satisfied
along the path of conditions, examining the conditions or rules along the path
could be generalized on the left side of
the tree. Once generalize, there is a certainty factor that comes along the
cases. To classify a case, you must find a rule that applies to it. If there is
more than one, choose the rule with the higher certainty factor. Overall, this
method is a condition-elimination
strategy.
Overall, J.R. Quinlan
intention is to suggest methods for simplifying decision trees without
compromising their accuracy.
Critique:
Overall, I find the first
three methods easy to interpret and visualized. The pruning methods are helpful
due to their ability to narrow and scope the tree the way that fits the overall
validation. When the goal is to produce a sufficiently accurate compact concept
description, pruning is highly useful. However, there is a lot swapping of
conditions throughout the process. In reaction, decision trees perform well if
a few highly relevant attributes exist, but less so if many complex
interactions are present. Since most
decision trees divide the leaf into mutually exclusive regions to represent a
concept, in some cases the tree should contain several duplications of the same
sub-tree to represent the classifier. To conclude, decision trees provide a
framework to consider the probability and payoffs of decisions to help the analyst
evaluate the likely course, however, it
is impossible to plan for all contingencies when it comes to its outcomes. Though
the process is simplified, there is no statistical backing if it actually
works.
Reference:
Quinlan, J. (n.d.).
Simplifying Decision Trees. Massachusetts Institute Of Technology
Artificial Laboratory. Retrieved
from https://dspace.mit.edu/bitstream/handle/1721.1/6453/AIM- 930.pdf?sequence=2.
As a visualization technique I believe that decisions tress are much like mind mapping in they allow you to see how ideas are connected and categorized. However, as an analytical method while having some good applications it must be used with caution. Much like an analysis one will take the low valued information and either group it or discard it. However if used without care especially in an analytical work it pruning could take out information that could impact both the words of estimative probability used along with analytic confidence.
ReplyDeleteThe article does a good job breaking down the decision tree process without complexities. The pruning methods are very practical as describes and helps to streamline the process. In the cost-complexity pruning, was there any mention in the article about how the "best of their kind" is determined? Any specific parameters?
ReplyDeleteI share Ian's point. My concern is that some researchers may attempt to over-simplify variables within their decision trees. Reliability of decision trees depend on feeding precise information during the onset. Efforts to simplify a decision tree through pruning could result in classification errors, and thus should be used with caution. Furthermore, although decision trees follow a natural course of events by tracing relevant relationships between events,it may not be possible to plan for all contingencies that may arise from a decision, and such oversight can lead to bad decisions.
ReplyDeleteI agree on all points here. This article seems to stress decision trees in a qualitative, tinking/mind mapping tool sense. Where as these methods couldn't really be applied in the study Jared posted about(The classification of Japanese names). I think that it would be interesting to see this process applied to like an ACH where evidence can get out of control.
ReplyDeleteA decision tree is a visual model for decision making which represents consequences, including chance event outcomes, resource costs, and utility. It is also one way to display an algorithm that only contains conditional control statements. Making decision trees are super easy with a decision tree maker with free templates.
ReplyDelete