Advanced Analytic Techniques: Constructing a decision tree from data with hierarchical class labels

Tuesday, March 20, 2012

Constructing a decision tree from data with hierarchical class labels

Introduction:

Decision Tree construction algorithms have usually assumed that the class labels were categorical or Boolean variables, meaning that the algorithms operate under the assumption that the class labels are flat. In real-world applications, however, there are more complex classification scenarios, where the class labels to be predicted are hierarchically related. For example, in a digital library application, a document can be assigned to topics organized into a topic hierarchy; in web site management, a web page can be placed into categories organized as a hierarchical catalog. In both of these cases, the class labels are naturally organized as a hierarchical structure of class labels which defines an abstraction over class labels.

Summary:

The HLC (Hierarchical class Label Classifier) algorithm is designed to construct a DT from data with hierarchal class labels. It follows the standard framework of classical DT induction methods, such as ID3 and C4.5. The HLC process is:

1. Attribute Selection Measure

An attribute selection measure is a heuristic to measure which attribute is the most discriminatory to split a current node. In view of the weakness of the traditional entropy-based method, a new measure is proposed, called hierarchical-entropy value, by modifying the traditional entropy measure. It can help measure the appropriateness of a node with respect to the given class hierarchical tree.

2. Hierarchical-entropy value

The hierarchical-entropy value of a node v_b can be denoted as Hentropy (v_b), and can be computed with the following equation:

This result indicates that by using the new hierarchical-entropy measure, the appropriateness of a node in a CHT can be properly measured.

3. Hierarchical Information Gain

Next, decide whether an attribute is a good splitting attribute according to how much hierarchical information is gained by splitting the node through the attribute. First define what the hierarchical information gain is, and then use it to develop a method for choosing the best splitting attribute. The hierarchical information gain is used to select a test attribute a_r to split the current node v_b, and can be denoted as H-info-Gain (a_r,v_b). Let v_x denote a child node obtained by splitting v_b through test attribute ar. The hierarchical information gain can then be computed as follows:

The goal is to select the attribute with the highest H-infoGain value, which is chosen as the next test attribute for the current node.

4. Stop criteria

For the labels at the bottom level, let majority be the label with the most data in v_b, and let percent (v_b, majority) be the percentage of data in v_b whose labels are majority. If the following conditions are met, stop growing node v_b; otherwise, the node must be expanded further.

Once stopped, assign a concept label to a stop node v_b to cover most of the data without losing too much precision. This is accomplished by using the function getLabel, detailed in the next section.

5. Label assignment

A heuristic for assigning a class label when a node matches the stop criteria. The goal is to use the concept label in the class hierarchical tree with the highest accuracy and precision to label the node. Each concept label of a leaf node v_b has three vital indices: accuracy, precision, and score.

Applying the function getLabel to the information you can obtain the accuracy, precision, and score values for each concept label of nodes v₁ and v₂.

6. Evaluation

An empirical study was done to demonstrate that the proposed method can achieve a good result in both predictive accuracy and predictive precision. Two real-world data sets were used in the experiments. Two criteria were used to compare the performances of different algorithms: (1) accuracy, which is the percentage of future data that can be correctly classified into the right concept label, and (2) precision, which is a measure of specificity of the label data can be assigned to.

To highlight the tradeoffs between predictive accuracy and predictive precision of the resulting hierarchical concept labels, and to explore the performance of our proposed method, we chose a well known decision tree system, C4.5, to compare with our method, HLC.

Conclusion:

The HLC algorithm can achieve higher accuracy and better precision simultaneously. By carefully examining the C4.5 performance data level-by-level, it was found that when accuracy is high, precision becomes low, and vice-versa. This is because the traditional decision tree classifier does not consider data with hierarchical class labels, and thus it cannot handle the tradeoffs between accuracy and precision. On the other hand, since the aim of the project was to construct a DT from data with a class hierarchical tree, the ability to optimize the tradeoff between these two criteria is embedded into the design of the algorithm. The results in Fig. 14 indicate that the design achieved this goal.

The HLC algorithm can be used when evaluating hierarchical data as it will likely allow for more accurate and precise analysis.

Source:

Chen, Yen-Liang; Hu, Hsiao-Wei; Tang, Kwei (2009). Constructing a decision tree from data with hierarchical class labels. Expert Systems with Applications. Volume 36, Issue 3, Part 1, Pages 4838–4847.

http://140.123.102.14:8080/reportSys/file/paper/devil/devil_1_paper.pdf

19 comments:

BmannMarch 20, 2012 at 7:11 AM
I'm not afraid to admit, this post sailed well over my head. I know what all the words mean, like 'entropy', but in this context they're more elusive.
ReplyDelete
Replies
LeslieGMarch 21, 2012 at 6:17 PM
Now that I'm retreading it, I agree. I should have simplified the language. Thanks for pointing that out.
ReplyDelete
Replies
akhilapriya404October 26, 2018 at 6:43 AM
Great Information you have shared, Check it once machine learning online training Bangalore
ReplyDelete
Replies
TechystickFebruary 8, 2022 at 3:56 AM
world777 betting
1 bhk flat in ajmer road jaipur
best coaching classes for class 9 in gurgaon
palazzo kurti set online
azure sentinel
azure databricks
azure data lake
azure expressroute
azure application gateway
ReplyDelete
Replies
AnonymousJune 3, 2022 at 12:48 AM
beykoz samsung klima servisi
üsküdar samsung klima servisi
üsküdar mitsubishi klima servisi
pendik vestel klima servisi
pendik bosch klima servisi
çekmeköy bosch klima servisi
ataşehir arçelik klima servisi
kartal vestel klima servisi
ümraniye vestel klima servisi
ReplyDelete
Replies
hakanJuly 9, 2023 at 4:21 PM
başakşehir
bayrampaşa
beşiktaş
beykoz
beylikdüzü
5QSW2W
ReplyDelete
Replies
mansurJuly 24, 2023 at 8:44 PM
mecidiyeköy
sakarya
istanbul
kayseri
ordu

DQRT7
ReplyDelete
Replies
World 777March 18, 2025 at 4:13 AM
World777 Login
ReplyDelete
Replies
KheloyarMarch 19, 2025 at 4:45 AM
Kheloyar
ReplyDelete
Replies
Surana JewelersMarch 25, 2025 at 6:18 AM
kundan jewellery
ReplyDelete
Replies
Surana JewelersMarch 25, 2025 at 7:54 AM
Indian traditional jewellery
ReplyDelete
Replies
Surana Jewelers JaipurMarch 26, 2025 at 4:35 AM
kundan jewellery in Vienna
ReplyDelete
Replies
Surana JewelersMarch 26, 2025 at 12:15 PM
Diamond jewellery showroom in Vienna
ReplyDelete
Replies
Surana JewelersMarch 28, 2025 at 5:21 AM
Gold jewellery in Vienna
ReplyDelete
Replies
Surana Jewelers JaipurApril 4, 2025 at 5:23 AM
Uncut kundan jewellery in Vienna
ReplyDelete
Replies
Surana JewelersApril 4, 2025 at 5:24 AM
polki jewellery in Vienna
ReplyDelete
Replies
Dr.Vineet KiniJuly 3, 2025 at 5:37 AM
best piles surgeon in goregaon
ReplyDelete
Replies
Dr.Vineet KiniJuly 4, 2025 at 12:37 PM
best laproscopic surgeon in goregaon
ReplyDelete
Replies
best hair transplant in delhiAugust 6, 2025 at 1:16 AM
best hair transplant in delhi
ReplyDelete
Replies

Add comment

Advanced Analytic Techniques

Tuesday, March 20, 2012

Constructing a decision tree from data with hierarchical class labels

19 comments:

Contributors

ADVAT Course Projects:

Blog Archive

Labels

Advanced Analytic Techniques

Tuesday, March 20, 2012

Constructing a decision tree from data with hierarchical class labels

19 comments:

Subscribe To

Contributors

ADVAT Course Projects:

Blog Archive

Labels