Monday, April 6, 2009

An introduction to decision tree modeling

http://www.udel.edu/chemo/SDB/~pdf_papers/JChemo_18_275_2004.pdf

Summary:

The article offers a critique of decision tree mapping as a technique used to generalize data sets.

"In its simplest description, decision tree analysis is a divide-and-conquer approach to classification. Decision trees can be used to discover features and extract patterns in large databases that are important for discrimination and predictive modeling." It most common use is when using exploratory data and predictive modeling applications.

Decision mapping's advantages include recognizing the interpretability of the constructed model, as well as determining inter-dataset relationships. It takes the form of a hierarchical model formed by decision rules represented by nodes. The first node is reffered to as the branch node, with subsequent nodes reffered to as leaf nodes.

The general consesus is to overgrow a decision tree by incorporating all relevent criterion. The tree can then be "pruned" to reduce complexity. Generalizability is enhanced by incorporating ensemble methods such as bagging (randomly selecting samples) or boosting (reweighting criterion).
Enhanced by Zemanta

1 comment:

  1. Interesting article. I was taken by the idea that decision trees are most (only?) appropriate for "large databases". Is that a consensus opinion among the team? To DTs have to do primarily with databases? If so, then it limits their usefulness in an intel context.

    ReplyDelete