Monday, March 19, 2012

A Decision Tree System For Finding Genes In DNA


Introduction:
Authors Steven Salzberg, Arthur Delcher, Kenneth Fasman and John Henderson’s article A Decision Tree System for Finding Genes in DNA discusses the use of decision tree system along with Morgan, an integrated system for finding genes in vertebrate DNA sequences as well as its performance on a benchmark database of vertebrate DNA.

Summary:
The authors look to expand on the research on gene finding by combining decision tree classifiers, signal recognition algorithms and dynamic programming. MORGAN, Multi-frame Optimal Rule-Based Gene Analyzer, is highly modular thus allowing improvements in any one aspect of the gene –finding task to be incorporated relatively easily into the system. The framework of their system is a dynamic programming algorithm that can efficiently consider the large number of alternative parses that are possible for any sequence of DNA. The resulting combined system is the first complete gene-finding system based on decision trees, and the experiments described below demonstrate that MORGAN is very accurate at finding genes in vertebrate sequence data.

Sample decision tree for classifying human DNA

The internal nodes of the tree represent feature values that are tested for each subsequence as it is passed to the tree. Subsequences are passed down the tree beginning at the top, where a "yes" result on any test means that an example should be passed down to the left. The features tested in this tree include the donor site score (donor), the sum of the donor and acceptor site scores (d + a), the in-frame hexamer frequency (hex), and Fickett's position asymmetry statistic (asym). The leaf nodes contain class distributions for the two classes "exon" and "pseudo-exon." Each successive node in the tree then represents a decision that is based on those values, until a final classification is reached. The bottom nodes of the tree (its leaf nodes) contain class labels indicating whether the subsequence is an exon or not. In addition, the leaf nodes contain the distributions of examples from all classes in the training set, which MORGAN uses to produce probability estimates.

Conclusion:
An important advantage of using decision trees is that they allow the experimenter to analyze the errors made by the system. The modular nature of MORGAN makes it possible in some cases to determine which components of the system are responsible for certain errors, and this helps to guide future development.

Source:
Salzberg, S., Delcher, A., Fasman, K., & Henderson, J. (1998). A decision tree system for finding genes in dna. Journal of Computational Biology, 5(4), 667-680. Retrieved from http://online.liebertpub.com/doi/pdf/10.1089/cmb.1998.5.667

Sunday, March 18, 2012

Decision Management

Introduction:
Tim Thompson's article Management Accounting - Decision Management discusses the use of decision trees involving probabilities with combinations of events and decisions.

Summary:
In the article the author discusses how a decision tree can help solve a problem, specifically situations where there is no control over what follows in an event sequence.  In this type of scenario, the decision maker needs to know the possible outcomes as well as the probabilities of the outcomes.

To demonstrate how a decision tree can be used to solve a problem, the author used an example of a student deciding how to get to class.  One alternative is for the student to take public transportation, which would cost £5. The second option is walk and save the cost of the transportation fare. The situation is complicated by adding other factors to the decision making process. In this scenario there is a 25 percent chance of rain. If the student is caught in the rain the student will have to pay £10 to have their clothes cleaned. 


In this example there is one decision, to walk or to take public transportation, and one event, rain or no rain. If the student take the public transit option there is an £5 financial impact. If the student walks there is no immediate financial impact. The decision tree (shown below) will recognize the key event once for each potential decision.



Expected values can be attached to the event nodes to provide meaningful insight. The expected values for each of the event nodes were calculated as follows:

  • Public Transportation: (£0 x 0.25) + (£0 x 0.75) = £0
  • Walk: (-£10 x 0.25) + (£0 x 0.75) = -£2.50
Using a decision tree a rational choice can be made. Looking at the public transportion option, there is a £5 cost but the expected value of the node is zero. Totaling the actual cost and expected value, the cost to the student of public transportation option is £5. The walk option has no cost, but the expected value of the event node is £2.50. Totaling the actual cost and expected value, the cost to the student of the walking option is £2.50. Since the total cost of walking is less than the total cost of public transportation, the student should decide to walk.

The author adds more complexity to this relatively simple model by adding a third option, the student takes an umbrella. Another event is also added, the student loses the umbrella. The author indicates that there is a 10 percent chance of the student losing the umbrella. If umbrella is lost, the student will incur a £20 cost to replace it. The updated decision tree with the third option is below.

The expected value for the new event was calculated as follows:
  • Lose Umbrella: (-£20 x 0.10) + (£0 x 0.90) = -£2.00
The updated decision tree shows the rational choice the student should make is to carry an umbrella when walking. Total cost for walking with an umbrella is £2.00, versus £2.50 for walking without an umbrella and £5.00 for taking public transportation.  

Conclusion:
Decision trees are well suited for repeat decisions and can be applied to a variety of decision-making scenarios. In the business world, for example, decision trees can be used to launch a new product. Decision trees can also be used to determine when and where products should be launched.

Source:
Thompson, T. (2007, May). Management Accounting - Decision Management. Financial Management, 41-41.

Decision Tree Matrix Applied to a City's Water Infrastructure

Introduction:

For those of you who were in Freyn’s competitive class last term, you might find this study amusing. It deals with water related infrastructure (a topic we are all well versed for having taken that class) and applies a decision tree matrix methodology to the nature of water related failures in these systems. It was quite interesting if only for the fact that I love the idea of applying advanced analytic methodologies (and hopefully one day softwares) to real life problems (in this case, the aforementioned water infrastructure).

Study Summary:

This study looks at the role that valves which connect and control water flow in pipes plays in the reliability and security of a water infrastructure system- for example, a municipal water grid delivering potable water to a town. The study wanted to look at the nature and impact of a failure at certain points in the grid and enumerate all the possible valve failure combinations and corresponding events that would likely follow. The study looked at three overall systems for examining this phenomenon: 1) a segment-valve matrix, 2) a simulation and 3) decision tree analysis. For the sake of this post I’ll only focus on number 3.

The decision tree analysis was highly applicable to this particular experiment because a branch of the decision tree can correspond directly to one progressive adjacent valve failure combination. Thus the number of branches in the tree is equal to the number of valve failure combinations possible from a segment (pipe) failure. The outcomes of the decision tree matrix are the probability-weighted number of customers who will not receive water from the municipal water system.

What this study really found out was that probability is everything and that potential for failure was extraordinarily high. The example water pipe network had a total of 849 valve failure combinations despite the fact that it had only 8 pipes, 7 nodes, and 9 valves. When applied to a real live city the potential for failure increases greatly.

Conclusion Summary:

This study compares the different types of analysis in the chart below:

The authors state “Decision tree analysis yields the expected values of customers without services exactly…” In addition, that the decision tree method can be used to evaluate the expected number of customers losing services based on the current configuration of a city’s water system. This is highly applicable to real life, say the authors, as it can be used to establish a regular valve maintenance program to target the most critical valves, pipes and segments in a city’s water supply system.

Source:

http://web.ebscohost.com.ezproxy.mercyhurst.edu/ehost/pdfviewer/pdfviewer?sid=0e1e2020-68d4-472b-94c8-c79e4dd2ce49%40sessionmgr14&vid=5&hid=21

Thursday, March 15, 2012

Visualizing Decision Trees in Games

Introduction:
This article by Haworth, Sheida, Bostani and Sedig involved an experiment in which the researchers designed a basic digital maze-game for subjects to play, filled with tile-based decisions which could be represented by a decision tree. The goal was to determine the effects of having access to the visualized decision tree on player decision-making.

Summary:
In the digital maze program designed for this experiment, the mazes were filled with tiles of varying effects - some forced a player to move one square in the indicated direction, while others might change the color of the player's avatar, enabling it to collect certain objects. Touching maze walls forced players to restart, and players' goal was to reach the door marking the end of each level. Many levels were designed with clever traps and delays for players to fall into. While there was no time constraint to force players to rush, players' careful consideration of each move was balanced by rewarding efficient clears of maze levels in the fewest possible number of moves, while punishing overly long routes. In this context, the ability to visualize one's options for each decision and its consequences would save moves and thus be beneficial to play, in theory.

Four variations on the game were part of the test:
1. IT+NK: Interactive Tree, No keys - Navigation by mouse (clicking on options in the decision tree) only
2. NIT+K: Non-interactive Tree, keys - Navigation by keyboard input only. Tree available for viewing.
3. IT+K: Navigation by mouse or keyboard, by choice. Tree available for viewing or navigation.
4. K: No tree available, navigation through maze by keyboard input only.

A sample of the decision tree available to players in game variants 1, 2, and 3.
The results of the test were interesting. None of the groups used the tree for decision-making early on, when paths were obvious and decisions straightforward, but those with access to the decision tree relied on it more heavily the more difficult and "trap-filled" the levels became. Trees were used to avoid traps at first, but eventually (with practice), subjects reported using trees to plan out navigation into specific areas or toward specific objectives. While participants tended to dislike navigation through clicking on the tree, they preferred the game variants with access to the tree to the one without it. Most of the participants felt that the game would be too difficult without access to the decision tree, requiring extensive trial and error which would rapidly turn it from a fun challenge into a repetitive chore.

Decision trees are integral to many games, both on the side of players, and on the side of designers in how their game functions and how AI makes decisions. This study ends with recommendations for further study into how to make the helping-power of decision trees support game design without intrusively ruining the experience.

Source:
Robert Haworth, Sousan Sheida Tagh Bostani, and Kamran Sedig, “Visualizing Decision Trees in Games to Support Children's Analytic Reasoning: Any Negative Effects on Gameplay?,” International Journal of Computer Games Technology, vol. 2010, Article ID 578784, 11 pages, 2010. doi:10.1155/2010/578784 Available online at http://www.hindawi.com/journals/ijcgt/2010/578784/cta/

Summary of Findings (White Team): Multi-Criteria Intelligence Matrix (3 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst College on 15 March 2012 regarding MCIM specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description: What is Multi-Criteria Intelligence Matrices?
Multi-Criteria Intelligence Matrices (MCIM) is a family of analytical intelligence methods adapted from Multi-Criteria Decision Making (MCDM) that is used to identify likely courses of action to screen against various criteria. MCIM is used to reduce uncertainty regarding likely outcomes and decisions of non-self actors. It uses an MCDM table matrix to convert primarily qualitative information into quantitative data.

Strengths:
  • Helps to reduce uncertainty
  • Relevant to many different types of decisions
  • Assists in converting qualitative data into quantitative data
  • Assists in defining key issues in a group setting
  • Helps in analyzing the decision making process of non-self actors  
Weaknesses:
  • Improper weighting of criteria can result in a single contentious data point completely changing the output of a decision matrix.
  • Difficult-to-quantify indicators/criteria can cost time and confuse discussion.
  • Due to limits on time and effort, relevancy is a key factor in determining whether COAs/criteria should be included.
  • Assumes there is a finite number of COAs
  • Highly subjective evaluation in some cases
  • Easily manipulated or skewed.
How-to:
  • Identify question/objective
  • Identify possible courses of action (COAs)
  • Screen COA's 
  • Identify criteria by which to rate the COAs
  • Weight criteria by importance (weight serves as a score multiplier)
  • Identify 
  • Identify a scoring scale (1 to 3 and 1 to 5 are common)
  • Evaluate each COA according to each criteria, entering the appropriate score on the scoring scale
  • If in a group situation, there is likely to be discussion/dissension while scoring some criteria.
  • Multiply the score in each cell of the matrix by the weighting of its criteria.
  • Enter the sum of each COA's scores in the far-right Totals column.
  • The highest Total score is the most likely/optimal COA for the question or objective under consideration.
Personal Application of Technique:
For the activity, the class was divided into two teams and given a question, along with a blank MCDM table matrix. The teams were given 15 minutes to complete the matrices.
Q1 - What actions will the Iranian government take in the next 60 days regarding its production of enriched uranium, and protecting its nuclear program?
Q2 -  What is the timeline for the US withdrawal from Afghanistan?

Each group was tasked with preliminary discussion/research of the question, and then asked to fill in the matrix as a team. Q2 Team (White Team) completed the matrix using only general knowledge of the topic.The criteria were weighted relatively, based on group agreement. Through this exercise, the class discovered practical strengths and weaknesses (included above) of trying to apply this method in a team setting.

Summary of Findings (Green Team): Multi-Criteria Intelligence Matrix (3 out of 5 stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 12 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst College in March 2012 regarding Multi-criteria Intelligence Matrices specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:
Multi-criteria Intelligence Metrics is a family of intelligence analysis methods which quantifies data to reduce the uncertainty for decision makers concerning the possible courses of action of external actors. Analysts judge the courses of action against a set of criteria to develop a score of their utility. The method is used to translate both qualitative and quantative data into a matrix where analysts can weight different criteria to achieve the most likely course of action.

Strengths:
  • Ability to quantify any type of data
  • Ability to rank criteria by relative importance
  • Can be broken down to the likely courses of action for specificity
  • Has multiple applications
  • Can use multiple mathematical formulas to improve accuracy
  • Easy to construct and use
  • Helps a team focus on a specific problem

Weaknesses:
  • Reliability depends on the selection of the right criteria
  • Weighting scale can be manipulated to produce biased results
  • Looking at finite number of actions
  • Need to analyze data from an outside perspective
  • Criteria may not be what is really relevant to the actors
  • Defining scale for measurement and ranking is possibly subjective
  • All entries must be in question form for the process to work properly

How-to:
  1. Define your problem
  2. Establish Criteria necessary to answer problem
  3. Establish scoring system for each criteria (for example: 1-3 or 1-5), including any weighting assigned to specific criteria.
  4. Brainstorm possible courses of action
  5. Score utility of each possible course of action as it relates to each criteria
  6. Build matrix with criteria ordered across columns and courses of application ordered across rows
  7. Add up the score for each course of action; the highest score is the most likely outcome
Personal Application of Technique:
What actions will the Iranian government take in the next 60 days regarding its production of enriched uranium, and protecting its existing nuclear program?

Constructing the MCIM requires thought on the most likely courses of action (this requires some knowledge and/or experience already and can have bias within it). The criteria can also be influenced by bias and needs to be carefully selected in order to make the MCIM usefuIdentified four criteria and potential coarses of action. Deciding on the courses of action proved to be difficult. We were allocated 15 minutes to research the topic. Our analysis showed that the most likely outcome would be for Iran to follow the UN guidlines.

Rating: 3 out of 5

Applying the MCDM technique to selecting the right Geographic Information Systems (GIS) software to invest in for a project

Introduction

In this paper, the author Khalid Eldrandaly elaborates on how MCDM is the most effective methodology for Geographic Information System software selection. The author’s overall goal was to provide a framework that assists computer system developers to select the most appropriate GIS software application for their organization.

Summary

In this paper, the author projects the application of the MCDM technique to select the most appropriate GIS software to purchase or invest in, that best fits an organization’s needs. According to the author, “a multi criteria decision problem generally involves choosing one of several alternatives based on how well those alternatives rate against a chosen set of structured and weighted criteria.” The author simplifies the GIS software selection process as follows:
  • Ø  Brainstorm the problem
  • Ø  Build the hierarchy
  • Ø  Rate the hierarchy
  • Ø  Select the best alternative

The author identifies the analytic hierarchy process (AHP), a type of MCDM, as the most effective method in selecting GIS software. The AHP method allows the consideration of both objective and subjective factors in selecting the best alternative, and reduces time and develops consensus for decision making. In order to apply the technique, one should define the different factors and requirements of the project, and then narrow down the different GIS software that meets the most requirements.

According to the author, the hierarchy process is based on three principles: decomposition, comparative judgments, and synthesis of priorities. In developing a hierarchy, the top level is the ultimate goal of the decision at hand. The author uses the figure 1 (below) to illustrate the four-level hierarchical structure of a simplified GIS software selection decision making problem.
Figure 1

Once the goal or requirement is established, it is necessary to identify the factors that influence the choice of GIS software. The author identified five essential evaluation criteria to use for the decision making process:
-          Cost: The expenditure associated with GIS software and includes product, license, training, maintenance, software subscription, and support services costs.
-          Functionality: Refers to extent to which the software package contains all the features and functions specified in your request for proposal (RFP) which is generated based on the organization needs assessment.
-          Reliability
-          Usability: Usability is the effectiveness, efficiency and satisfaction with which specified users can achieve specified goals in particular environments; Understandability, Learnability, and Operability
-          Vendor: The quality of vendor support and its characteristics are of major importance in the selection of software.

Once the criteria are set, judging the importance of criteria and scoring alternatives need to be developed. There are two methods for weighing; direct comparison or pair-wise comparison. After weighing the criteria, the author suggests using MCDM software to list the different GIS software available, and then assign & rate the different criteria to each GIS software; as shown in figure 2.

Figure 2


Source:  
Eldrandaly, K. (2007) – GIS Software Selection; a multi-criteria decision making approach. Retrieved from

Wednesday, March 14, 2012

Effectiveness of Multi-Criteria Intelligence Matrices in Intelligence Analysis

Introduction:

Lindsey Jakubchak examines the use of Multi-Criteria Intelligence Matrices (MCIM), which focuses on likely Courses of Action (COA) to be taken by external organizations. This differs from the conventional form of MCDM, which typically concentrates on an organization’s own COA.

Summary:

MCDM is a well-known method for determining a course of action based upon a variety of different criteria, goals, and objectives. There are a variety of different models and theories, the application of which are used in a wide variety of enterprises. The author proposes converting MCDM into an intelligence methodology termed the Multi-Criteria Intelligence Matrices. Instead of focusing on internal goals and priorities, MCIM is meant to analyze likely COAs for external organizations, while aiding both efficiency and thoroughness.

In order to test MCIM as an intelligence tool, Jakubchak conducted an experiment using two groups, a control group and an experimental group. Both groups were given the same real life intelligence scenario, asking about the future relationship between Russia and OPEC. However, the experimental group was given a 35-minute introduction to MCIM and its use, while the control group was merely given the question. Prior to the test, the control group indicated that they were both more interested in the topic and also more knowledgeable about the relationship between Russia and OPEC.

Interestingly, after the experiment was finished, the experimental group produced a larger number of COAs than the control group, did so in less time, and yet felt more strongly that they had been given adequate time to complete the task. Unfortunately, the issue that the groups were analyzing was not resolved within the timeframe of the project, so the accuracy of the MCIM could not be measured. In addition to being timelier and more efficient, the experimental group also expressed higher analytic confidence in their work and, due to the incorporation of various COAs in their analysis, maintained higher objectivity while incorporating alternative analysis, something that many members of the control group failed to do.

While the accuracy of MCIM remains in question, the author contends that the potential benefits warrant giving it a further look. The experiment indicates a positive influence on efficiency, analytic confidence, and objectivity. Furthermore, the table matrix allows for quick review and adaptation should unforeseen circumstances arise that could potentially alter an organization’s COA.

Source:

Jakubchak, L. N. (2009). The Effectiveness of Multi-Criteria Intelligence Matrices in Intelligence Analysis. Conference Papers -- International Studies Association, 1-27. Retrieved from: https://www.e-education.psu.edu/drupal6/files/The_Effectiveness_of_Multi-Criteria_Intelligence_Matrices_in_Intelligence_Analysis_isa09_proceeding_313553.pdf

Uses and Misuses of Multi-Criteria Decision Matrix (MCDA) in Environmental Decision-Making

Introduction

Even the best methods can be irrelevant if used improperly. In this article, Steele, K., Carmel, Y., Cross., J. and Wilcox., C. discuss how failing to align the weights of criteria and scales of performance indicators can skew the results of scientifically sound Multi-Criteria Decision Analysis (MCDA) methods. Written in the context of environmental decision-making, the article evaluates the sensitivity of final criteria rankings to the scales of performance indicators and choice of criteria weighting. The authors placed specific emphasis on how this relates to the Analytic Hierarchy Process (AHP), a type of MCDA.

Summary

The authors discuss the importance of correctly scoring the criteria. If two options are given the same weight, they should be measured on the same scale in order to make them of equal importance. Much emphasis has been placed on sensitivity analysis, which takes criteria of equal weight and shows how tolerant the final scores are to changes in the weight of each option. However, in addition to the importance of weights, the article points out that changes in the indicators used to measure performance for the criteria have an overlooked but significant influence on criteria performance. When looking at environmental problems – the focus of this case study – it is common for MCDMs to include multiple options with differing measures of performance. The scaling of an option’s indicator can easily skew the results, meaning the option’s final ranking is sensitive to how widely or narrowly the analyst scales the indicator.

Yet this is only true when weights are held constant. The scaling of performance-scoring indicators must be calibrated with the weighting for the technique to be accurate. According to the article, researchers must “appreciate and make explicit in their methodology the fact that criteria weights, taken on their own, are meaningless.” Instead of conducting further sensitivity analysis of weights and scales independent of each other, researchers would benefit from considering the interplay of these two factors from the beginning. If they do not properly account for the relationship between weighting and scaling, researchers risk increasing the arbitrariness of their analysis. Without clearly defined indicator scales, it is likely that stakeholders, especially those with limited understanding of the method, will have conflicting ideas of the scales’ importance, thereby decreasing the effectiveness of this technique.

This problem is intensified by multi-criteria methods that obscure the relationship between weightings and scaling. According to the article, this is the predominant issue with (AHP). The authors offer several options for improving the process of giving weights to criteria, including using the SMART multi-criteria model and asking decisionmakers to answer pairwise comparisons to determine the relative importance of each criterion. The authors conclude that stakeholders and decision modelers can overcome these challenges by discussing the importance of calibrating performance indicator scales, and gaining a better understanding of how changes to these scales relatively impact a criterion’s ranking.

Conclusion

Overall, the article was useful in highlighting an issue with common applications of MCDM. However, it does not go into much detail about specific techniques for addressing the issue. Still, being aware of the problem would be likely to help decisionmakers and analysts assign more accurate values, and therefore increase the accuracy of the method.

Source

Steele, K., Carmel, Y., Cross, J., & Wilcox, C. (2009). Uses and Misuses of Multi-Criteria Decision Matrix (MCDA) in Environmental Decision-Making. Risk Analysis (29)1. doi 10.1111/j.1539-6924.2008.01130.x