Tuesday, March 20, 2012

Using Decision Tree Analysis to Identify Risk Factors for Relapse to Smoking


Introduction
In this study, authors Megan E. Piper, Wei-Yin Loh, Stevens S. Smith, Sandra J. Japuntich, and Timothy B. Baker used decision tree analysis to identify how variables linked to relapse of smoking interact with each other.  The researchers then used this information to identify subgroups of smokers at a higher risk for relapse, and compared the results to those found through traditional methods.

Summary
Researchers conducted the study by inserting 70 variables with a connection to relapse in smoking, such as predictors related to environmental factors, dependence, and smoking history, into a decision tree model. This model is designed to take into account the effects of multiple factors. The more traditional model, logistic regression analysis, is a good predictor of factors influential to large groups and was used in addition to decision trees in order to create a comparison between the methods. Previous studies that depended solely on linear or logistic regression techniques were unable to identify smaller subgroups.  Decision tree analysis allowed researchers to divide the data sets in order to look at the effects of certain factors on specific groups, such as men and women. 

This study used the GUIDE decision tree program to analyze results from 1,071 smokers.  The researchers designed models for one week after quitting, end of treatment, and six months postquit. The figure below shows the “pruned” decision tree of the question “How soon after you wake up do you smoke your first cigarette?” From the results of this question, taken one week postquit, the decision tree splits further to show the abstinence rates of smokers who did or did not receive treatment, and those who are or are not married. The study showed that people who smoke their first cigarette more than 30 minutes after waking are the most likely to quit smoking, and that it is important to give treatment to those whose first cigarette is less than 30 minutes after waking up.



The second figure shows how marital status, gender, and the age began daily smoking interacted to affect abstinence at the end of treatment. Household income, health status, and longest previous quit attempt were also significant factors identified to influence successful cessation of smoking. The results showed that it is important to consider environmental and contextual factors in addition to those related to dependence.


Although both the decision tree analysis and logistic regression technique used the same data, they produced significantly different results. According to the authors, the decision tree model was better suited to handling a large number of factors; however, they were not able to determine which method is more accurate.  Overall, decision tree analysis produced statistically significant results that will hopefully be replicated in subsequent studies.

Conclusion
This study is a very good example of how decision tree analysis can be applied in real world situations to predict human behavior. One important characteristic of decision trees is that although they are based on scientific calculations, they are easy to understand and could be shown to decision makers to help explain the results of the analysis.  Although this case study uses quantitative data, it would be interesting to see how decision trees can be translated for use with qualitative data more commonly used for intelligence analysis.

Source
Piper, M.E., Wei-Yin, L., Stevens, S.S., Japuntich, S.J., Baker, T.B. (2011). Using decision tree analysis to identify risk factors for relapse to smoking. Substance Use & Misuse 46, p. 492-510. doi: 10.3109/10826081003682222

Probabilistic Part-of-Speech Tagging Using Decision Trees

Introduction:

In Probabilistic Part-of-Speech Tagging Using Decision Trees, author Helmut Schmid describes a new method for tagging words and their appropriate parts-of-speech in a tex by using a decision tree. In languages such as English, where a word can have several different meanings depending on the context (try “store”), highly accurate tagging can improve the ability of programming like spell-check to properly recognize and correct for errors. Earlier methods for tagging often degraded as the size of the text decreased, creating problems in accuracy. Schmid’s new method using a decision tree was intended to greatly increase the accuracy of probabilistic tagging regardless of the size of the text.

Summary:

In an effort to improve on the accuracy of probabilistic part-of-speech tagging, Helmut Schmid developed a method using a binary decision tree to develop paths that accurately predict the probability of a word by using the context around it. Each estimation of a word’s part-of-speech begins with a root node. If looking for a specific type of word, such as a noun preceded by an adjective, the root node asks if the preceding word is an adjective or not. The decision tree follows a path until it comes to an ending node (leaf) that determines the appropriate probabilities of what the word is.

In the design of the decision tree, each root node asks a simple question of the context surrounding a word. In the example above, this is whether the preceding word is an adjective or not. If the answer is “yes”, then the program follows to the next node. This node refines the search by asking more detail of preceding and following words to improve context. This path
continues until it comes to a node that leads only to leafs, in which case the word is then weighted and probabilities are given for the word’s part of speech.

The decision tree also incorporated probabilities and parts-of-speech from “fullform” and “suffix” lexicons. The “fullform” lexicon was built on a large tagged text to present probabilities in other texts. The lexicon allows for searches of words and their potential parts-of-speech. As interesting is the “suffix” lexicon, which is built on a decision tree. Starting at a root node (labeled “#”), the tree follows a suffix backward from the last letter to the first of the suffix. For example, when searching for the word “tagging” in the suffix lexicon, the tree starts at the root “#” then proceeds to the “g” node, then “n” node and finally the “i” node. Once there, the lexicon gives the probability of the word and its part-of-speech.

In Schmid’s testing, the TreeTagger was able to tag 10,000 words a second and over 2 million words in 6 minutes. After testing, TreeTagger turned out to be more accurate than other methods for tagging a large training text (corpus). In addition, the decision tree method allowed TreeTagger to tag smaller texts significantly more accurately. While other methods saw significant degradation in accuracy as the number of words dropped below 10,000, TreeTagger’s use of decision trees allowed it to maintain a respectable 84 percent accuracy.

Conclusion:

The TreeTagger method uses a decision tree and probabilities for accurate part-of-speech tagging in texts. The decision trees allow the method to work efficiently and accurately with texts regardless of the size while other methods saw significant reductions in accuracy as texts decreased in size and offered fewer examples for them to draw on. TreeTagger instead uses prebuilt lexicons and decision trees to track context and words.

This is an interesting way to employ decision trees. While it steers away from the intelligence field in application, its use of probabilities provides a different perspective on decision trees and probability. The application could be changed by switching out words for conditions or facts and linguistic contexts for cultural, social, and political contexts. While this is easier said than done, it may be an avenue of research worth pursuing in the intelligence field, especially since the end goal of both is to produce an accurate probability or estimate.

Source:

Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. Proceedings of International Conference on New Methods in Language Processing. 12(4) pp 44-49
Retrieved from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.1139

Decision Trees in Intelligence Analysis

Introduction:

Written during the Cold War, Edwin Sapp’s article on decision trees begins by noting the large amount of information available to intelligence analysts, the sophistication and seriousness of the weapons available, and ever-shortening time span in which effective decisions must be made. Sapp advocates the decision tree as a competent method for processing large amounts of data, while at the same time communicating degrees of certainty to an extent which language cannot.

Summary:

Edwin Greenlaw Sapp begins his article entitled Decision Trees by discussing the nature of intelligence analysis. He identifies four major categories that intelligence requirements fall into:

1) places (geographic locations, physical resources);

2) people (strengths and attitudes);

3) organizations (what people form and belong to); and

4) objects (things people make and possess; cities and weapons systems

Information is collected in these categories in order to make forecasts about futures states of affair. In Sapp’s view, Kent’s words of estimative probability are both vague and incomplete. He finds the decision tree more favorable, as it serves to “organize sizeable amounts of data…and communicate the degree of certainty relating to possible outcomes or the likelihood of the occurrence of specific events at some given time in the future.”

Sapp goes on to explain the technique used to create decision trees with two examples. His first is a simplistic question displaying the use of probabilities and subjective value of the outcomes to determine whether or not to hold a dinner party outside given a 40 percent chance of rain. The second example uses the Biblical story of the spies sent into Canaan. Sapp illustrates the intelligence cycle and how a decision tree would have aided this cycle by linking all the collection requirements and ensuring that the important questions were answered first.

According to Sapp there are two types of decision trees: deterministic and probabilistic. Deterministic decision trees deal with situations where no probability needs to be calculated in order to solve the problem on hand. The other use for decision trees is probabilistic, allowing for the forecasting of future events with a measured degree of certainty. Sapp also believes this type of decision tree allows the analyst to communicate more clearly to the decision maker, in that it provides solid numbers instead of seemingly vague language. The article goes on to address the difficulty of including all alternatives within a decision tree that is seeking to forecast the future. While one can never be certain that all possibilities are addressed within a probabilistic decision tree, Sapp notes that it is a simple process to return to the original model and update it should any unplanned alternatives arise.

Critique:

Can decision trees really handle the large amounts of data that Sapp claims?

How accurate can assigning probabilities be when such probabilities are related to intelligence problems?

Source:

Sapp, E. G. (1974, Winter). Decision Trees. Intelligence Studies, pp. 45-57 (declassified).

Using Decision Trees for Predicting Student Academic Performance

Using Decision Trees for Predicting Student Academic Performance


Introduction:
In this study, authors Anupama Kumar and Dr. Vijayalakshmi use decision trees to predict student academic performance. Data will be used to help tutors make better decisions regarding student success and academic needs.

Summary:
The authors look to find a way use a combined method of decision trees and rule mining to predict student performance in upcoming tests, quizzes, etc for the use of tutors to create a better way to teach these students. In this study, the authors use a decision tree algorithm called C4.5 to predict whether students will pass or fail said exam. The outcome of the decision tree analysis was then used in a comparative analysis which stated the prediction helped students who were struggling, and challenged excelling students to reach higher levels of success.

Conclusion:
Decision trees are a successful way to predict student outcome and improve their results based on prior educational data. Decision tree algorithms can be better rated for efficiency based on their accuracy and time taken to derive the tree.

Source:

Kumar, S. Anupama, Vijayalakshmi, Dr., (2009). Efficiency of decision trees in predicting student’s academic performance. Retrieved from http://airccj.org/CSCP/vol1/csit1230.pdf

Decision Trees and their Nodes


Summary

In the paper, Probabilistic Approaches: Scenario Analysis, Decision Trees and Simulations, the author looks at decision trees as an assessment of risk in a sequence. Therefore, the subject in question must pass a series of tests, failure at any point leads to a complete loss of value. The example given is pharmaceutical drugs.


Technique

The decision tree is broken down into distinct categories. Root nodes, decision nodes, event nodes and end nodes. A root node is at the beginning and where the decision maker has a decision choice or an uncertain outcome. An event node represents the possible outcomes on a given decision. You have to figure out the possible outcomes and their likelihood for this node. The decision nodes represent choices that can be made by a decision maker. The end nodes represent the final outcomes of the decision tree.


Strengths

  • By linking actions and choices – decision trees give decision makers a framework and make them think about the consequences.
  • Value of information – having to think through this process gives the decision maker insight on how valuable this information is.
  • Decision trees act as a form of risk management. If the decision doesn’t pass each test – it may be too risky to undertake.
  • They are easy to construct and give definitive answers.

Challenges

  • There is no wiggle room or room to maneuver.
  • Multiple risks are hard to assess at the same time. This is a linear process at each stage.
  • Event nodes require estimates of outcomes. This is subjective in a lot of cases.
  • The use of the decision tree depends entirely on the decision makers willingness to stick to it strictly.

Conclusions

Decision tree analysis is particularly useful when there are discrete outcomes. However, as a process becomes more complicated and the number of outcomes increases, it is harder to use this tool effectively. Its ease of use and ability to make decision makers think about the choices and consequences makes this a tool that can be applied to many different situations.

Bibliography

Probabilistic Approaches: Scenario Analysis, Decision Trees and Simulations. Stern. NYU. Retrieved from; http://people.stern.nyu.edu/adamodar/pdfiles/papers/probabilistic.pd

Constructing a decision tree from data with hierarchical class labels

Introduction:
Decision Tree construction algorithms have usually assumed that the class labels were categorical or Boolean variables, meaning that the algorithms operate under the assumption that the class labels are flat. In real-world applications, however, there are more complex classification scenarios, where the class labels to be predicted are hierarchically related. For example, in a digital library application, a document can be assigned to topics organized into a topic hierarchy; in web site management, a web page can be placed into categories organized as a hierarchical catalog. In both of these cases, the class labels are naturally organized as a hierarchical structure of class labels which defines an abstraction over class labels.

Summary:
The HLC (Hierarchical class Label Classifier) algorithm is designed to construct a DT from data with hierarchal class labels. It follows the standard framework of classical DT induction methods, such as ID3 and C4.5. The HLC process is:

1. Attribute Selection Measure
An attribute selection measure is a heuristic to measure which attribute is the most discriminatory to split a current node. In view of the weakness of the traditional entropy-based method, a new measure is proposed, called hierarchical-entropy value, by modifying the traditional entropy measure. It can help measure the appropriateness of a node with respect to the given class hierarchical tree.

2. Hierarchical-entropy value
The hierarchical-entropy value of a node vb can be denoted as Hentropy (vb), and can be computed with the following equation:

This result indicates that by using the new hierarchical-entropy measure, the appropriateness of a node in a CHT can be properly measured.

3. Hierarchical Information Gain
Next, decide whether an attribute is a good splitting attribute according to how much hierarchical information is gained by splitting the node through the attribute. First define what the hierarchical information gain is, and then use it to develop a method for choosing the best splitting attribute. The hierarchical information gain is used to select a test attribute ar to split the current node vb, and can be denoted as H-info-Gain (ar,vb). Let vx denote a child node obtained by splitting vb through test attribute ar. The hierarchical information gain can then be computed as follows:

The goal is to select the attribute with the highest H-infoGain value, which is chosen as the next test attribute for the current node.

4. Stop criteria
For the labels at the bottom level, let majority be the label with the most data in vb, and let percent (vb, majority) be the percentage of data in vb whose labels are majority. If the following conditions are met, stop growing node vb; otherwise, the node must be expanded further.

Once stopped, assign a concept label to a stop node vb to cover most of the data without losing too much precision. This is accomplished by using the function getLabel, detailed in the next section.

5. Label assignment
A heuristic for assigning a class label when a node matches the stop criteria. The goal is to use the concept label in the class hierarchical tree with the highest accuracy and precision to label the node. Each concept label of a leaf node vb has three vital indices: accuracy, precision, and score.

Applying the function getLabel to the information you can obtain the accuracy, precision, and score values for each concept label of nodes v1 and v2.

6. Evaluation
An empirical study was done to demonstrate that the proposed method can achieve a good result in both predictive accuracy and predictive precision. Two real-world data sets were used in the experiments. Two criteria were used to compare the performances of different algorithms: (1) accuracy, which is the percentage of future data that can be correctly classified into the right concept label, and (2) precision, which is a measure of specificity of the label data can be assigned to.

To highlight the tradeoffs between predictive accuracy and predictive precision of the resulting hierarchical concept labels, and to explore the performance of our proposed method, we chose a well known decision tree system, C4.5, to compare with our method, HLC.

Conclusion:
The HLC algorithm can achieve higher accuracy and better precision simultaneously. By carefully examining the C4.5 performance data level-by-level, it was found that when accuracy is high, precision becomes low, and vice-versa. This is because the traditional decision tree classifier does not consider data with hierarchical class labels, and thus it cannot handle the tradeoffs between accuracy and precision. On the other hand, since the aim of the project was to construct a DT from data with a class hierarchical tree, the ability to optimize the tradeoff between these two criteria is embedded into the design of the algorithm. The results in Fig. 14 indicate that the design achieved this goal.



The HLC algorithm can be used when evaluating hierarchical data as it will likely allow for more accurate and precise analysis.

Source:
Chen, Yen-Liang; Hu, Hsiao-Wei; Tang, Kwei (2009). Constructing a decision tree from data with hierarchical class labels. Expert Systems with Applications. Volume 36, Issue 3, Part 1, Pages 48384847.
http://140.123.102.14:8080/reportSys/file/paper/devil/devil_1_paper.pdf

Monday, March 19, 2012

A Decision Tree System For Finding Genes In DNA


Introduction:
Authors Steven Salzberg, Arthur Delcher, Kenneth Fasman and John Henderson’s article A Decision Tree System for Finding Genes in DNA discusses the use of decision tree system along with Morgan, an integrated system for finding genes in vertebrate DNA sequences as well as its performance on a benchmark database of vertebrate DNA.

Summary:
The authors look to expand on the research on gene finding by combining decision tree classifiers, signal recognition algorithms and dynamic programming. MORGAN, Multi-frame Optimal Rule-Based Gene Analyzer, is highly modular thus allowing improvements in any one aspect of the gene –finding task to be incorporated relatively easily into the system. The framework of their system is a dynamic programming algorithm that can efficiently consider the large number of alternative parses that are possible for any sequence of DNA. The resulting combined system is the first complete gene-finding system based on decision trees, and the experiments described below demonstrate that MORGAN is very accurate at finding genes in vertebrate sequence data.

Sample decision tree for classifying human DNA

The internal nodes of the tree represent feature values that are tested for each subsequence as it is passed to the tree. Subsequences are passed down the tree beginning at the top, where a "yes" result on any test means that an example should be passed down to the left. The features tested in this tree include the donor site score (donor), the sum of the donor and acceptor site scores (d + a), the in-frame hexamer frequency (hex), and Fickett's position asymmetry statistic (asym). The leaf nodes contain class distributions for the two classes "exon" and "pseudo-exon." Each successive node in the tree then represents a decision that is based on those values, until a final classification is reached. The bottom nodes of the tree (its leaf nodes) contain class labels indicating whether the subsequence is an exon or not. In addition, the leaf nodes contain the distributions of examples from all classes in the training set, which MORGAN uses to produce probability estimates.

Conclusion:
An important advantage of using decision trees is that they allow the experimenter to analyze the errors made by the system. The modular nature of MORGAN makes it possible in some cases to determine which components of the system are responsible for certain errors, and this helps to guide future development.

Source:
Salzberg, S., Delcher, A., Fasman, K., & Henderson, J. (1998). A decision tree system for finding genes in dna. Journal of Computational Biology, 5(4), 667-680. Retrieved from http://online.liebertpub.com/doi/pdf/10.1089/cmb.1998.5.667