Thursday, September 14, 2017

Simplifying Decision Trees

Simplifying Decision Trees

Summary and Critique by Michael Pouch

J.R. Quinlan offers different techniques to help simplify the making of decision trees while at the same time maintaining their accuracy. The purpose of the study is to simplify decision trees by using four methods that help prove their fluency and the ability to apply the knowledge.

The author begins to explain that in order to build knowledge based systems, the ability to function at an expert level in some task does not necessarily confer a corresponding ability to articulate this know-how. Therefore, all this knowledge that the “knowledge engineer” produced, tends to state the engineer’s conclusions and reasoning in broad terms for the analysis. Thus, Quinlan examines four methods to help simplify the method going into creating decision trees. On the whole, three of the methods that Quinlan examines involve pruning the decision tree by replacing one or two subtrees with leaves. The fourth method that Quinlan examines reformulates the decision trees as a set of production rules.

The first method that Quinlan introduces is the method of Cost-Complexity Pruning.  After creating a large decision tree, we look at the subtrees that have one to zero nodes. To prune a tree, we look at these nodes that can be evaluated based on its classification accuracy with the other nodes. Next, we denote these nodes that are to be pruned because they are not focused on the overall validation of the tree. The basic idea of cost-complexity pruning is not to consider all pruned subtrees, but only those that are the “best of their kind” in a sense to be defined.

The second method introduced is Reduced Error Pruning. First, you prune the nodes that are classified as the subtrees and you remove them to make one leaf. Nodes are removed on the basis of choosing the node whose removal most increases the decision tree accuracy on the graph. Pruning process continues until further pruning is harmful.

The third method is called Pessimistic Pruning, where it’s based on a single pass algorithm on a True or False approach. At each step, we remove one rule, such that its removal brings the lowest valued node among all possible removals of the rule in the current tree.

The fourth method simplifies the production rules that go into decision trees. As decision trees classify a case and the leaves are satisfied along the path of conditions, examining the conditions or rules along the path could be generalized on the left side of the tree. Once generalize, there is a certainty factor that comes along the cases. To classify a case, you must find a rule that applies to it. If there is more than one, choose the rule with the higher certainty factor. Overall, this method is a condition-elimination strategy.

Overall, J.R. Quinlan intention is to suggest methods for simplifying decision trees without compromising their accuracy.

Overall, I find the first three methods easy to interpret and visualized. The pruning methods are helpful due to their ability to narrow and scope the tree the way that fits the overall validation. When the goal is to produce a sufficiently accurate compact concept description, pruning is highly useful. However, there is a lot swapping of conditions throughout the process. In reaction, decision trees perform well if a few highly relevant attributes exist, but less so if many complex interactions are present.  Since most decision trees divide the leaf into mutually exclusive regions to represent a concept, in some cases the tree should contain several duplications of the same sub-tree to represent the classifier. To conclude, decision trees provide a framework to consider the probability and payoffs of decisions to help the analyst evaluate the likely course, however, it is impossible to plan for all contingencies when it comes to its outcomes. Though the process is simplified, there is no statistical backing if it actually works.


Quinlan, J. (n.d.). Simplifying Decision Trees. Massachusetts Institute Of Technology Artificial   Laboratory. Retrieved from 930.pdf?sequence=2.


  1. As a visualization technique I believe that decisions tress are much like mind mapping in they allow you to see how ideas are connected and categorized. However, as an analytical method while having some good applications it must be used with caution. Much like an analysis one will take the low valued information and either group it or discard it. However if used without care especially in an analytical work it pruning could take out information that could impact both the words of estimative probability used along with analytic confidence.

  2. The article does a good job breaking down the decision tree process without complexities. The pruning methods are very practical as describes and helps to streamline the process. In the cost-complexity pruning, was there any mention in the article about how the "best of their kind" is determined? Any specific parameters?

  3. I share Ian's point. My concern is that some researchers may attempt to over-simplify variables within their decision trees. Reliability of decision trees depend on feeding precise information during the onset. Efforts to simplify a decision tree through pruning could result in classification errors, and thus should be used with caution. Furthermore, although decision trees follow a natural course of events by tracing relevant relationships between events,it may not be possible to plan for all contingencies that may arise from a decision, and such oversight can lead to bad decisions.

  4. I agree on all points here. This article seems to stress decision trees in a qualitative, tinking/mind mapping tool sense. Where as these methods couldn't really be applied in the study Jared posted about(The classification of Japanese names). I think that it would be interesting to see this process applied to like an ACH where evidence can get out of control.

  5. A decision tree is a visual model for decision making which represents consequences, including chance event outcomes, resource costs, and utility. It is also one way to display an algorithm that only contains conditional control statements. Making decision trees are super easy with a decision tree maker with free templates.