Saturday, March 16, 2013

Implementing and Integrating Crime Mapping into a Police Intelligence Environment



Summary:

Digitalization of police records such as crime records and service calls may provide decision makers with important information regarding crimes trends, series, and patterns.  Crime mapping is technique that coordinates and enhances the analysis of crime data already in the possession of law enforcement agencies.  Crime mapping combines multiple intelligence resources to establish a unify structure that improves data analysis at a low cost.  Geographic information system (GIS) assists in the mapping as well as in the geographical analysis of known crime records and service calls.  GIS relies on geocoding, a method that references incident location to be mapped within a GIS.      

One disadvantage of crime mapping is that only does it require a large volume of information in order to be effective, but also the information must be retrieved from police data bases.  In addition, the data must be processed into a format that is acceptable by the GIS.  Often police databases are incompatible with GIS software which also poses difficulties coordinating the data.     

The most common analytical paths of crime mapping include geographical profiling of repeated offenders and mapping of high volume crime.  The figure one depicts the three inputs required for spatial crime analysis; a GIS, crime data, and digital maps.  While serial repeated offender crimes are analyzed by analysts with specialized training related to a certain crime, high volume crimes are analyzed by divisional analysts.                          


Serial crime investigations or criminal profiling assume that offenders commit crimes in area they are most familiar with and away from their home address.  Hence crime mapping is useful to determine the likely area of residence belonging to these offenders.  A specialist analyst is needed to perform the analysis due to the sensitive nature of these offenses.  High volume analysis focuses on local areas or division, demands of this type of analysis varies depending on local crime rates.  This type of crime mapping is restricted to geographical boundaries. 

In UK, crime mapping has been applied to drug incidents, gang violence, and serial rapist investigations.  While crime mapping can assist law enforcement authorities in formulating crime reduction strategies, it is also prone to implementation issues.  Due to the highly technical nature of crime mapping, police department may have to employ skilled IT professionals.  Often these professionals do not understand the user requirements associated with crime mapping.  Thus, of the 52 police forces in the UK, only 23 forces had crime mapping capabilities at the divisional level.  Crime mapping is more effective in a decentralized environment where the user-implementation is more prominent.                        

Critique:

This article is unique in that it focuses on law enforcement intelligence analysis from a former police officer’s perspective.  Hence, the author is familiar with the subject matter which adds more credibility to the content presented on this article.  Although the author’s focus is on the UK police departments and how crime mapping is likely to be beneficial for them, crime mapping can be utilized as a universal tool to identify crime trends and series in any part of the world.  Crime occurs everywhere to varying degrees.  Crime mapping has the potential to assist decision makers with formulating strategies to combat crime.  Crime mapping can also serve as a visual tool for identifying crime trends and patterns.  A police department with a good budget and enough experts can easily adopt this method.  However, it is likely to be more beneficial for areas with high crime rates.  Serial crimes are rare in occurrence and these crime trends are more easily identified than high volume crime.  I’m unfamiliar with the cost of maintaining this type of analytic tool (GIS), but it may not be worth the money to adopt this technique to analyze serial crimes.   

The author identified a number of advantages and disadvantages to adopting crime mapping as an analytical tool.  Most disadvantages arose from difficulties in processing data before it is entered into the GIS software and having to employ IT professionals to maintain the GIS software.  This article is a bit outdated as it was published in 2000.  It seems crime mapping was in it early stages of development in 2000.  Due to technological advances within the past 13 years, it is likely that crime mapping has become more automated in the early stages of analysis, while the later stages of analysis is conducted by experts.  Also, in recent years most organizations have maintained IT departments.  Therefore, it is safe to assume that police departments may not have to create an IT department just to maintain the software.            

Although GIS software is capable of analyzing data across jurisdictions, I agree with the author in terms of utilizing crime mapping to organize and analyze data at local or divisional levels.   If GIS is used to analyze crime patterns across a number of jurisdictions, the final out put is highly likely to be effected by the amount of communication between police departments and the amount of crime data shared between police departments.  For the past few months, I have learned that in the United States there is a lack of intelligence sharing and lack of communication between agencies in the intelligence community.  Assuming that this is true of law enforcement agencies in the United States, GIS is more beneficial for analyzing local crime data.  However, it is important to emphasize that this is not a limitation in the crime mapping technique, but rather a limitation created by human error.             



Source: 
Ratcliffe, J. (1999). Implementing and Integrating Crime Mapping into a Police Intelligence environment. International Journal of Police Science & Management, 2(4), 313-323. Retrieved from http://www.jratcliffe.net/papers/Ratcliffe (2000) Implementing and integrating crime mapping.pdfhttp://www.jratcliffe.net/papers/Ratcliffe (2000) Implementing and integrating crime mapping.pdf

Thursday, March 14, 2013

Summary of Findings (Green Team): Decision Tree Analysis (3.5 out of 5 Stars)


Green Team:

Rating (3.5 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 8 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst University in March 2013 regarding Decision Tree Analysis specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:
Decision tree analysis is a technique that helps determine the best course of action by analyzing the costs, benefits, and the probability of success for possible decisions. The technique has multiple branching paths, beginning with the question on the right, followed by possible courses of action, with their costs, benefits, and their probability of success. The technique combines these three items to determine the expected value, or EV. The EV number signifies the final product of the process.   

Strengths:

  • Can be built in multiple directions
  • Give the ability to work through potential future outcomes

Weaknesses:

  • Easy to grow beyond manual computational capacity

  • Limits the innovation of potential products/decisions
  • Rely on others to gain some numerical information -- it is difficult to determine where the probability numbers originate
    • Doesn’t ferret out incorrect information or information that is intended to be deceptive

How-To:
  1. Decide which direction the analysis will go

  • Start with a central idea and branch out
  • Start with many factors and work towards a central idea

  1. Label items consistently and appropriately
  2. Compute relevant figures to show all options
  3. Select the option with the highest potential benefit which accounts for costs

Personal Application of Technique:
The class collaboratively built a decision tree to identify the product development strategy the hypothetical company Really Big Ideas, Inc. should pursue. The company could chose to develop one of two new products, a motion detector or fire and smoke detector, or neither product. Three nodes were added from the root of “product development” to demonstrate the company’s options, with the links indicating the cost related to each option. Next, nodes were linked to the applicable areas to demonstrate the potential success or failure of each product, based on a percentage. A mathematical formula (Estimated Value= [Potential Revenue * Chance of Success] + [Potential Cost * Chance of Failure]) was used to determine which decision rule would produce the largest profit margins.

Summary of Findings (White Team): Decision Tree Analysis (3 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 8 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst College in March 2013 regarding Decision Tree Analysis specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:
A decision tree is a diagram of nodes and branches.  The nodes indicate decision points, chance events, or branch terminals.  The branches correspond to a each decision alternative or event outcome connected to a node.  Decision trees are a tool that can be utilized as a predictive tool.  

Strengths:  
1. Able to graphically display alternative choices for a decision-maker and the probability of certain courses of action.   
2. Able to display different indicators for actions/warning list.
3. Display interrelationships between nodes and branches.
4. Can be useful to reduce certainty and give estimates in terms of probabilistic thinking.

Weaknesses:
1. Can become rather robust, and have many decision paths.  Too large the user can become computer reliant to create more robust decision trees.
2. Reliance on calculated probabilities and the individual who calculated those probabilities.
3. Lacks the ability to predict other information that may eventually be a factor in the decision
tree, only displays information known to the creator.
4. Decision trees lack the ability to take into account deceptive information and how it could affect both final outcomes and calculated probabilities within the decision tree.

Step by Step Action:
1. Draw a root node and extend branches for each decision alternative, including the decision to do nothing.
2. Label each branch and include the cost of each decision.
3. Draw a chance node for each decision and include two branches stemming from each node labeling the chance of success and failure.
4. Label the payoff of each success and failure by subtracting the cost from the expected revenue.
5. Label the probability of each success and failure in decimal form and then conduct a formula to calculate the expected value (EV); formula will depend on how the decision tree is conducted.
6. The highest value is the best decision, one that is most reliable that reduces uncertainty.

Exercise:
Conducted a decision tree exercise on two potential products the decision-maker should choose based on which would be more profitable.  Probabilities of success were given for each product, the cost to make each product, and the potential for economic gain for each product.  A decision tree was used to highlight this step with an expected value calculated at the end which highlighted what product would be the best to produce for the company.  Through the flow of the decision-tree the class was able to calculate which product was a more reliable choice to produce using the formula: EV= (Payoff x Prob of Success) + (Cost x Prob of Failure).   



Further Information:
This was a very good chapter of a book on decision trees and gives an excellent description of how decision trees are utilized and need to know information pertaining to decision trees.  A summary and critique of this chapter can be found on this blog and written by Ethan Robinson.  

http://www.public.asu.edu/~kirkwood/DAStuff/decisiontrees/DecisionTreePrimer-1.pdf   

Tuesday, March 12, 2013

Phases vs. Levels Using Decision Trees for Intrusion Detection Systems


Summary:
An article in the International Journal of Computer Science and Information Security written by the authors Heba Ezzat Ibrahim, Sherif M. Badr, and Mohamed A. Shaheen compares phase decision trees to level decision trees.  The authors state that decision trees are a useful and commonly used tool for detecting intrusions into computer networks.  Decision trees are composed of data that is broken down into likely attributes, and then assigned percentage detected values to these nodes.



For their paper they compared phase and level models.  The phase model is divided into three stages: the first detects if the incoming data is normal or an attack, the second detects if it is a DDoS, probe, R2L (remote to local), or U2R (user to root) attack, and the third detects the various intruder types from the previous step.  This model differs from the level model as the steps are sequential. 

The level model arranges each stage as separate processes that detect attacks individually and then tries to label them regardless of completion of the previous step.  Instead of treating detection as a process (singular tree), it treats each phase as a different section (three trees).  This allows for detection for false negatives of network attacks that the phase model may miss in the first step.

The authors found that the phase model frequently detects the threats than the level model.  Additionally, they found that the phase approach classifies new attacks more frequently.  The level model does show more 100% detection rates than the phase model, but on average the percentage rates are not higher.  Not only does a phase model decision tree show better consistency, it also exemplifies how real world attack prevention software processes an incoming threat through logical steps instead of trying all options simultaneously. 

Critique:  
The authors successfully explained the process of how a network attack can be detected through the use of decision trees.  I personally do not have any background in computer networks, but feel it was not too difficult to understand the reasoning for running the two separate models to compare consistentency.  This topic does not specifically relate to the intelligence field, however, it does relate to cyber security through computer network defense.

Despite arguing well for certain parts of this paper, I found several issues.  The first is the authors never clearly define the 23 types of attacks that are drawn from the second stage.  Without this information I feel that it is difficult to believe their results are accurate when I am unable to tell exactly what they are detecting.  They also do not describe the processes for which they run the data thoroughly, other than they are either new attacks or are partitioned data.  Additionally, the authors state that the data set they use (KDDCUP'99 data set) is the best available, but has some inherent problems.  The problems are never explained well (although they did eliminate duplicate entries), and instead they say that other people have used it and therefore this justifies them to use it.

Ibrahim, Heba Ezzat, Sherif M. Badr, and Mohamed A. Shaheen. (2012). Phases vs. Levels using Decision Trees for Intrusion Detection Systems. International Journal of Computer Science and Information Security, 10.8.  Retrieved from http://arxiv.org/ftp/arxiv/papers/1208/1208.5997.pdf  

Decision support system for risk management: A case study

Summary:


Prasanta Kumar Dey uses an analytical hierarchy process (AHP) in conjunction with decision trees to evaluate the risk of a cross-country petroleum pipeline construction project with a quantitative aspect. He states that a traditional and informal approach to project management does not assess risk efficiently and proposes a new method that is more effective through analyzing a particular case.

Dey states that the process in risk management takes three steps: identifying, analyzing and responding to the risk. He takes the approach of first using an analytical hierarchy process, a multi-criteria decision-making methodology, to quantitatively and therefore subjectively measure both the likelihood of a particular risk and its severity. This allows for what he believes to be a more accurate and less subjective decision tree because the factors have been given a quantitative measurement.

He then conducts the decision tree using the possible courses of action and subsequent outcomes including the probability and severity of each. His decision trees focus primarily on monetary risk for the business and the length of time it would take to for that decision to come to fruition. This final outcome then allows the decision-maker to respond to any risks present, knowing their likelihood and severity, and evaluating alternatives.


Critique:

Although this study was conducted evaluating a pipeline construction project, I believe that the methodology can to some extent be applied to the intelligence field. The main theory behind the article is reducing risk for a decision-maker, which is the essential function of decision trees and is what intelligence analysts strive to achieve. The extent to which they can be useful is questionable though. An implicit limitation in the study was the operationalization of risks in the analytical hierarchy process. Dey did not explain how he applied quantities to specific risks, but only said that the numbers were produced through brainstorming sessions with professionals who have been in the field for 15 years. It may be less feasible to objectively quantify the likelihood of a revolt in a foreign country, for instance.

An explicit limitation the author stated was that his methodology and technique is limited to smaller projects. However, he does not state what constitutes a large or a small project and therefore leaves that open to interpretation. It seems as though this methodology would have great applicability for business intelligence because it did prove to decrease monetary risk for the company Dey studied, but may not be as useful for national security intelligence. Although this article is quite dated, Dey took an interesting route in conducting a decision tree after first conducting an AHP to decrease subjectivity in his decision tree which I was curious to evaluate. If this particular dual method could be applied to the intelligence field, it would certainly decrease uncertainty in the formulation of the decision tree and could yield interesting results.



Dey, P. K. (2001). Decision support system for risk management: A case study. Management Decision,39(8), 634-649. Retrieved from http://portal.uni-freiburg.de/empiwifo/lehre-teaching-     1/summer-term-09/materials-seminar-in-risk-management/emeraldinsight-com_dey.pdf

A Copulas-Based Approach to Modeling Dependence in Decision Trees

Summary:


Tianyang Wang and James S. Dyer used a copula function as a basis for decision trees; these copulas contained parameters that were demonstrated through probability trees, which are discreet and conditional. 

The process of using a dependent decision tree is:
1.      Assessment of marginals, dependence, and copula. During this step, the authors state the need to assess the information available as well as determine what type of copula is best used for the dependence and existence of uncertainties. Statistical measures are used to go through the variables and determine to correct copula.
2.      Specification of parameters for the underlying copula. In order to use the copula in a dependency structure, parameters need to be established. For the elliptical copula parameters can be established through an estimate of the correlation between original uncertainties.
3.      Construction of the transient tree structure for underlying copula. The resulting process is a probability tree that uses the conditional probability that was established.
4.      Point-to-point inverse marginal transformation. This step uses different distributions to draw out the tree

From this process, decision trees are drawn out for various outcomes and the decision tree model is applied to various forms of copulas.

Critique:

This study provided an interesting application of decision trees.  Not only did the study illustrate the various potential outcomes, but it also broke down the process into specific steps.  While the application was not directly related to the intelligence field, the process of organizing and structuring decision trees.  This process was specifically geared towards copulas and creating a dependant tree, though there is an issue with the fact that there can be a significantly large number of variables that grow form the initial construct. The base of the tree is set from parameters established through equations.  While this application of a decision tree is effective for this purpose, it is difficult to gain a deep understanding of the concept in relation to non-statistical elements.  Decision trees are useful to gain a broad understanding of various different outcomes and they have the potential to be an effective initial step in an analysis of information.

This methodology appears to be useful in a statistical application, though this study does not directly demonstrate the application from an intelligence stand-point.

Wang, T. & Dyer J. S. (2012). A Copulas-Based Approach to Modeling Dependence in Decision Trees. Operations Research, 60(1), 225-42.

A Genetic Algorithm Optimized Decision Tree-SVM based Stock Market Trend Prediction System


Summary:

The ability to make predictions of stock market trends is highly desirable for traders as well as those who study the market as a career.  This research has devised a method using a data mining based stock market trend predictions system.  It uses a genetic algorithm optimized decision tree-support vector machine (SVM) hybrid designed to predict one-day-ahead trends.  Rather than approaching trend prediction in the traditional sense as a regression problem, a common approach in previous studies, the research uses a hybrid system able to adapt to the changing market conditions looking at it as a classification problem.  

This study uses the historical time series data from the Bombay stock exchange sensitive index (BSE-Sensex) from January 2 2007 to October 30, 2010.  A comparison is done to an artificial neutral network (ANN) based system and a naïve Bayes based system.  The results show that the trend prediction accuracy is highest for the hybrid system and the genetic algorithm optimized decision tree SVM hybrid system outperforms the artificial neutral network and the naïve Bayes based trend prediction system.  

The intention of the system, based on four steps, is to allow an individual an accurate insight into whether they should buy, sell, or hold their stocks.  The first step is the computation of technical indices from the historical stock market data. Second, the technical indices are selected using a decision tree, these are then used by a support vector classifier to predict the next day’s trends.  The final steps include the GA based optimization of the decision tree and support vector classifier parameters to ensure the most accurate prediction.  The decision tree is deemed one of the most important aspects of the hybrid as it plays a significant role in the prediction, the overall intent of the study.  

Critique:

Large unforeseen events have the potential to significantly influence the market one way or another that cannot be predicted or accounted for.  Therefore, this method is useful during times of relative world stability but when a large unforeseen crisis takes place, it is likely to exhibit inaccuracy.  Furthermore, by applying the method to more strenuous tests with other data sets in different markets, the efficiency and usefulness of the system would have increased.  Although it out performed the ANN and naïve Bayes in this particular set of data, further tests in other data sets would strengthen its validity and usefulness.    

Additionally, the passing reference of stock market speculation as a “regression problem” and the chosen stance as a “classification problem” should be elaborated upon for comparison.  The expansion of the issues associated with the “regression problem” approach using previous studies or basic examples would increase the validity of the “classification problem” approach.   A brief comparison would have been especially useful for individuals new to the field, further clarifying the benefits of the new approach.

Furthermore, due to the hybrid nature of the technique, an increased differentiation between each method is necessary before combining the methods to achieve the hybrid.  A more significant breakdown of each technique would have allowed for a more precise understanding of the overall hybrid.  Due to the decision tree's significance in the study, elaboration would have proved especially useful in this area of the methodology.

Source:  

Nair, B. B., Mohandas, V. P., & Sakthivel, N. R. (2010). A Genetic Algorithm Optimized Decision Tree-SVM basaed Stock Market Trend Prediction System. International Journal On Computer Science & Engineering, 2981-2988

Monday, March 11, 2013

An Introduction to Decision Tree Analysis

Summary

Craig Kirkwood, a member of the Department of Supply Chain Management of Arizona State University, published a primer on the use of decision tree analysis in 2002. The first chapter provides increasingly complex examples of decision trees and the various factors that can effect analysis.

Kirkwood begins by giving a simple example of a decision tree. This example involves the decision of a company on whether or not it should manufacture a temperature sensor, pressure sensor, or neither. The costs differ for the sensors, as does the potential profit. This is the most basic decision tree that he presents, and the examples become increasingly complex as more factors are added.

After the basic set-up of a decision tree is given, Kirkwood adds the first new "node": chance. What went from a relatively simple analysis of costs now include the probability that a product will sell or not. With the addition of this variable, a new concept is introduced: the expected value. Depending on the situation, the best expected value is either the highest number or the lowest number. By finding the expected value, it allows for a decision maker to know the best and worst decision with a quick glance at that number.

A second variable that Kirkwood introduces is dependent uncertainties   Dependent uncertainties further complicate decision trees, as they add a new degree of uncertainty. The example given discusses the uncertainty of a trading company in dealing with minerals from a nation that could face trade sanctions from the United States. This leads to the application of the term 'decision tree rollback', which is calculating the expected values from the endpoint of a decision tree to the root node, or beginning.

Sequential decisions is the last variable that Kirkwood discusses. These further complicate decision trees by adding a second layer (or more) of decisions that must be made after an initial decision has been decided. The example Kirkwood uses is a computer company bidding for a government contract. The company must first decide if it wants to bid for the contract or not, as there is a cost to creating a prototype to enter the bidding process. The company then must decide how much it will bid. Thirdly, the company then has to decide on whether or not to use a new, untested manufacturing process that will either save or lose money.

Kirkwood used diagrams of decision trees throughout the chapter. The most complex example (using all of the variables that he discussed) was presented at the end.
Figure 1: Final Dicission Tree Example 
Critique

Kirkwood gives a very good introduction to decision tree analysis. While his explanation of concepts can be vague, the examples he provides makes up for this. He also successfully demonstrated that decision trees can be used for increasingly complex scenarios. Though this is a good introduction to decision trees, there are two main criticisms that should be mentioned.

First, he does not explain sufficiently the formulas used to obtain the expected values that he bases decisions.  on. While he explains the first equation used to determine the expected value, this was done with only one variable (chance). With each new variable he introduces, the equation gets longer. Though there is no fundamental change in the equation, his discussion could have benefited from an explanation of each equation. As it was, it was difficult to remember how the equation he used came about.

Second, while his examples are very good at demonstrating the different variables that he discusses, they all have the same goal: profit. At no point did he discuss the application of decision trees to situations that did not have the endpoint of profit. Admittedly it would be much more difficult to give examples where the goal was something other than profit, but it would be interesting to see decision tree analysis applied to a problem that was not based on profit and costs.

Source

Kirkwood, C.W. (2002). Decision Trees. Decision Tree Primer, 1-18. Retrieved from  
http://www.public.asu.edu/~kirkwood/DAStuff/decisiontrees/DecisionTreePrimer-1.pdf   

Use of Rehabilitation Decision-Making For Buildings In the Wenchuan Area


Rehabilitation Decision-Making For Buildings In the Wenchuan Area

Summary:
            In 2008 an 8.0 magnitude earthquake occurred in the Wenchuan area of China.  The Earthquake left over 88,000 people dead or missing, and resulted in economic losses up to $125.6 billion.  After the earthquake, talks occurred within building owners in the Wenchuan area if seismic rehabilitation measures should be pursued.  Most individuals worried that spending the money to make their properties upgraded in terms of seismic design would prove futile if another earthquake didn’t occur.  To present possible decision options to building owners in the Wenchuan area of China, the authors present a Decision-Making Tree Model (DMT) to develop both risks and rewards for each option of action for the building owners. 
            The DMT model used by the authors consists of five components: decision nodes, decision options, uncertainty nodes, possible outcomes, and end nodes.  With the use of the DMT model, the authors broke down possible outcomes into three distinctions that building owners would want answers on: possible damage states and possibilities, possible earthquake intensities and probabilities, and rehabilitation options and costs.  The overall goal of the use of the DMT model is to inform building owners which of the three options is the most viable: restoring the buildings to original seismic consistency, repair and strengthen building seismic resistance, and upgrade seismic resistance to the highest level. 
            Overall, the use of the DMT model was able to demonstrate that upgrading building levels to the highest seismic resistance was economically justified.  When the authors examined whether seismic rehabilitation would occur, the DMT models displayed cost as the key factor influencing decision-makers.  The other factors of building vulnerability and the probability of seismic activity the authors concluded as remaining constant among regions.  Thus, the DMT model demonstrated that reducing rehabilitative costs made it more likely for building owners to choose to rehabilitate their building to higher standards of seismic resistance.  

Critique:
            The use of DMT modeling as a tool for risk assessment of earthquake probabilities and different options for economic improvement initiatives was a manner in which to reduce uncertainty and unwillingness to improve building quality.  For the purposes of this study, decision-makers needed to have a willingness to be open to the outcomes of the DMT model results despite their perceptions beforehand.  To make the DMT model as an analytical technique more reliable, the authors could have considered incorporating the building owners overall attitude to improving seismic stability of their buildings.  Incorporating the building owners’ attitudes might offer intriguing insights of the situation in the Wenchuan area of China as it relates to earthquake probability and citizen perceptions.      

            However, DMT modeling could be applied to any situation in which a decision-maker considers both the risks and benefits of a proposed action.  DMT models would be especially relevant in analysis endeavors in the intelligence community.  DMT modeling not only is a way to display visually different outcomes of action for a decision-maker, but also effective at analyzing information in probabilistic thinking terms.   DMT modeling allows for probabilistic thinking that creates more reliable measures that uses words of estimative probability within their estimates.  Overall, DMT modeling is an effective manner in which to display the possible options available to the decision-maker and has the capability to display both the risks and benefits in terms of probability measures.  In the intelligence field probability measures are the most crucial aspect of intelligence estimates and what decision-makers look for to reduce their uncertainty in actions proposed to them.    



Source:
Zhang, Hong, Xing, Feng, and Liu, Juan. (2011). Rehabilitation Decision-Making For Buildings In the Wenchuan Area. Construction Management and Economics, 29, 569-578. Retrieved from http://content.ebscohost.com/pdf25_26/pdf/2011/1JM/01Jun11/62872628.pdf?T=P&P=AN&K=62872628&S=R&D=bsh&EbscoContent=dGJyMNHr7ESep644y9fwOLCmr0ueqK5Sr6u4SLGWxWXS&ContentCustomer=dGJyMPGstFGwprVLuePfgeyx44Dt6fIA

Sunday, March 10, 2013

Use of Decision Trees in Preventing Car Accidents


Introduction:
Juan de Oña, Griselda López, and Joaquín Abellán’s article Extracting Decision Rules from Police Accident Reports through Decision Trees explores the use of decision trees to identify the main factors that contribute to the severity of road accidents. The authors’ goal is to find ways to extract decision rules from the decision tree methodology that can be used by road safety analysts to address specific problems that contribute to severe car accidents in order to prevent crashes and in turn fatalities.  

Summary:
According to de Oña et al. (2013), “the term decision trees (DTs) encompasses a series of techniques for extracting processable knowledge, implicit in databases, which is based on artificial intelligence and statistical analysis” (p. 1151). DTs are easy to interpret given their presentation as a graphical hierarchal structure. DTs are particularly useful in studying traffic accidents because the technique allows for the extraction of decision rules of the “if-then” type and also helps observers to understand the events leading up to a crash and the variables that determine the severity of the accident.

The authors seek results that can be used for predictive purposed by assessing different algorithms used to build DTs including Classification and Regression Trees (CART), ID3, and C4.5 and taking the results from the best method. Their assessment is based on four quantifiable evaluations: accuracy, sensitivity, specificity, and receiver operating characteristic curve area. They transform the DT structure into if-then or X→Y rules where X is a set of statuses of several attribute variables, such as accident type and atmospheric condition, and Y is the status of the class variable, or the severity of the accident. The authors consider 19 attribute variables related to the driver, road, vehicle, and context, along with two class variables, slightly injured (SI), and killed or seriously injured (KSI).


Figure 1: Example of top nodes of decision tree built with CART.  This model combines the results for the variables Female and Unknown for simplicity.
  
The authors assess that ID3 gives the worst results, while the difference in quality between results from CART and C4.5 is insignificant. For this reason, the authors pinpoint decision rules resulting from the latter two methods. The two models produced significantly different quantities of nodes, 19 and 52 respectively. Through these nodes, the authors identify a combined total of 15 decision rules that meet minimum standards in terms of population, support, and probability. Furthermore, the authors identify the importance of variables in each model, in which 11 incidences of importance overlap between the two models.




Figure 2: Example of same variables in top nodes of decision tree built with C4.5. This model creates individual nodes for each variable, regardless of significance.

Conclusion and Critique:
The use of decision trees, particularly different models of decision trees, allows researchers to identify patterns in the accidents that occurred in Granada, Spain over a 7-year period and determine interactions of variables. Authorities can use these results to prioritize efforts to prevent severe crashes, especially given that most of the rules extracted from the DTs coincide with conventional problems found in the rural highways of developed countries. Specifically, the authors advise experts to address the question of why women, as opposed to men, significantly increase their risk of severity under driving conditions with insufficient light.  

While I agree that the decision rules can be used by authorities, I feel that the authors under-emphasize the utility of doing so. Despite having statistical data to support their findings, the authors did not attempt to estimate the number of  severe accidents that could be prevented or the number of variables that could feasibly be addressed by road safety experts. Additionally, the authors explain how the different models of DTs allow for pruning to present simple results, however, they do not address the process or explain why they prune their decision trees the way they do. This would be a valuable addition to the research, particularly for those readers with little familiarity with the methodology.

The methodology on a whole is useful for this type of research, supporting evidence of several factors known to contribute to accident severity as identified in previous studies. The various models of DTs allows for appropriate application to various situations, though it is important that researchers assess which is most appropriate for their research before drawing conclusions from the resulting decision rules. 

Source:
De Oña, J., López, G., and Abellán J. (2013). Extracting Decision Rules from Police Accident Reports through Decision Trees. Accident Analysis & Prevention, 50(2013), 1151-1160. Retrieved from http://www.sciencedirect.com/science/article/pii/S0001457512003132

Use of Decision Trees for Intelligence Analysis


Summary:


Edwin Greenlaw Sapp’s four key categories of intelligence requirements include places, people, organizations, and objects.  Analysts collecting intelligence on these categories assist policy makers to formulate strategic and tactical decisions.  While collecting relevant intelligence to answer a requirement leads to better forecasting, the end results are always prone to chances of uncertainty.

According to Edwin Sapp, a modern day “information explosion” has caused delays and errors leading to ineffective decision making by the American Intelligence Community.  The amount of time the analysts and decision makers have to formulate an accurate decision is short due to the increased volume of information to be analyzed.  The Greek concept of ‘modeling’ may serves as an effective solution to managing these large volumes of data.  Analysts and decision makers may adopt this concept in order to manage intelligence which leads to more accurate and scientific decisions.   

An offshoot of the ‘modeling’ concept, a decision tree is a prototype of logic diagrams which graphically exhibit relationships and logical outcomes for a series of assessments.  Hence, decision trees serves as a method of organizing large volumes of data.  This method not only considers alternate outcomes to a certain scenario, it also indicates the degree of uncertainty with regards to adopting a certain outcome.   

Critique:


This article argues as to why decision trees can be an effective management tool in organizing intelligence.  However, the examples presented in the article failed to prove the effectiveness of decision trees for the use in the intelligence community. Given the simplicity of the first two examples, they did not successfully demonstrate the end product of managing a large volume of information.  

Sapp also failed to discuss the disadvantages of utilizing decision trees.  He failed to consider the outcomes of incomplete decision trees where certain nodes of the tree may lack information.  While complete decision trees can enhance the intelligence community’s accurate forecasting capabilities, incomplete decision trees can lead to inaccurate assessments and increase uncertainty.  In addition, drawing a decision tree for a large volume of data can be a lengthy and a complex process leading to difficulties in the interpretation.  Furthermore, decision trees are susceptible to even the smallest change in data.  Given the complexity of decision trees created from large volumes of intelligence, a small change in one node may compromise the entire tree.  In this case, decision trees are a better fit for managing a small volume of information.       

          
https://www.cia.gov/library/center-for-the-study-of-intelligence/kent-csi/vol18no4/html/v18i4a03p_0001.htm

Source:
CIA Historical Review Program. https://www.cia.gov/library/center-for-the-study-of-intelligence/kent-csi/vol18no4/html/v18i4a03p_0001.htm

Friday, May 25, 2012

Safety Net Game/Simulation

As my last post for the course, I want to share the fruits of my final project - a free, print-and-play card game designed to teach the basics about online security as it pertains to email, social media, and mobile devices: Safety Net.

Safety Net is a single-player card game pitting the player, armed with a wide array of technological defenses, against a nefarious “Hacker” bent on breaking the player’s security down and making off with valuable, sensitive information. The deck is stacked against you! Will the Hacker win? Or will you be able to protect yourself and your sensitive data by quickly building up an unbreakable Safety Net?

Each card also includes flavor text describing the relevant defensive option or potential threat. This game offers a quick, entertaining way to introduce someone to the basics of online security. Perfect for students, casual computer users, and anyone curious about the risks and defensive options involved in online security. Give it a try, or give a copy to someone you know!

Safety Net Page on BoardGameGeek:
http://boardgamegeek.com/boardgame/124049/safety-net

Download link:
http://boardgamegeek.com/file/download/8exgyz8ial/SN_Safety_Net_Complete_Game_Bundle_v1.rar

Tuesday, May 22, 2012

Addressing Bias

While browsing ted.com today I found an interesting talk addressing bias on a cognitive, neuroscientific level. It is called optimism bias, and I think that it is related to some of the most important biases we talked about in class. Tali Sharot studies the optimism bias in London, and found the centers of the brain that control optimism and pessimism in humans. Although optimism is directly linked to a better quality of life, in many cases it can distort the analysis that we are counted on to deliver objectively. I thought that this could start a good discussion: should we as analysts pursue medical or technological methods of reducing bias? Or is this something that takes away our essential humanity? For more information, watch the full talk here (its about 18 minutes long).