Tuesday, March 20, 2012

Constructing a decision tree from data with hierarchical class labels

Decision Tree construction algorithms have usually assumed that the class labels were categorical or Boolean variables, meaning that the algorithms operate under the assumption that the class labels are flat. In real-world applications, however, there are more complex classification scenarios, where the class labels to be predicted are hierarchically related. For example, in a digital library application, a document can be assigned to topics organized into a topic hierarchy; in web site management, a web page can be placed into categories organized as a hierarchical catalog. In both of these cases, the class labels are naturally organized as a hierarchical structure of class labels which defines an abstraction over class labels.

The HLC (Hierarchical class Label Classifier) algorithm is designed to construct a DT from data with hierarchal class labels. It follows the standard framework of classical DT induction methods, such as ID3 and C4.5. The HLC process is:

1. Attribute Selection Measure
An attribute selection measure is a heuristic to measure which attribute is the most discriminatory to split a current node. In view of the weakness of the traditional entropy-based method, a new measure is proposed, called hierarchical-entropy value, by modifying the traditional entropy measure. It can help measure the appropriateness of a node with respect to the given class hierarchical tree.

2. Hierarchical-entropy value
The hierarchical-entropy value of a node vb can be denoted as Hentropy (vb), and can be computed with the following equation:

This result indicates that by using the new hierarchical-entropy measure, the appropriateness of a node in a CHT can be properly measured.

3. Hierarchical Information Gain
Next, decide whether an attribute is a good splitting attribute according to how much hierarchical information is gained by splitting the node through the attribute. First define what the hierarchical information gain is, and then use it to develop a method for choosing the best splitting attribute. The hierarchical information gain is used to select a test attribute ar to split the current node vb, and can be denoted as H-info-Gain (ar,vb). Let vx denote a child node obtained by splitting vb through test attribute ar. The hierarchical information gain can then be computed as follows:

The goal is to select the attribute with the highest H-infoGain value, which is chosen as the next test attribute for the current node.

4. Stop criteria
For the labels at the bottom level, let majority be the label with the most data in vb, and let percent (vb, majority) be the percentage of data in vb whose labels are majority. If the following conditions are met, stop growing node vb; otherwise, the node must be expanded further.

Once stopped, assign a concept label to a stop node vb to cover most of the data without losing too much precision. This is accomplished by using the function getLabel, detailed in the next section.

5. Label assignment
A heuristic for assigning a class label when a node matches the stop criteria. The goal is to use the concept label in the class hierarchical tree with the highest accuracy and precision to label the node. Each concept label of a leaf node vb has three vital indices: accuracy, precision, and score.

Applying the function getLabel to the information you can obtain the accuracy, precision, and score values for each concept label of nodes v1 and v2.

6. Evaluation
An empirical study was done to demonstrate that the proposed method can achieve a good result in both predictive accuracy and predictive precision. Two real-world data sets were used in the experiments. Two criteria were used to compare the performances of different algorithms: (1) accuracy, which is the percentage of future data that can be correctly classified into the right concept label, and (2) precision, which is a measure of specificity of the label data can be assigned to.

To highlight the tradeoffs between predictive accuracy and predictive precision of the resulting hierarchical concept labels, and to explore the performance of our proposed method, we chose a well known decision tree system, C4.5, to compare with our method, HLC.

The HLC algorithm can achieve higher accuracy and better precision simultaneously. By carefully examining the C4.5 performance data level-by-level, it was found that when accuracy is high, precision becomes low, and vice-versa. This is because the traditional decision tree classifier does not consider data with hierarchical class labels, and thus it cannot handle the tradeoffs between accuracy and precision. On the other hand, since the aim of the project was to construct a DT from data with a class hierarchical tree, the ability to optimize the tradeoff between these two criteria is embedded into the design of the algorithm. The results in Fig. 14 indicate that the design achieved this goal.

The HLC algorithm can be used when evaluating hierarchical data as it will likely allow for more accurate and precise analysis.

Chen, Yen-Liang; Hu, Hsiao-Wei; Tang, Kwei (2009). Constructing a decision tree from data with hierarchical class labels. Expert Systems with Applications. Volume 36, Issue 3, Part 1, Pages 48384847.


  1. I'm not afraid to admit, this post sailed well over my head. I know what all the words mean, like 'entropy', but in this context they're more elusive.

  2. Now that I'm retreading it, I agree. I should have simplified the language. Thanks for pointing that out.