Tuesday, April 23, 2013

VAC Views - May 2008

This document titled VAC Views (May II, 2008) written by the Visual Analytics Center and covers a wide range of topics related, but not limited to, integrating visual analytics to enhance learning.  This group previously released these documents biannually and covered many different kinds of topics related to visual analytics.

Summary:
For this document I chose to highlight one particular article of interest to me and what I feel was the most relevant to this class: Active Products: Composition and Dissemination of Smart Analytic Reports.  This report states that one of the biggest problems for analysts is utilizing visualization effectively as part of the communication process with decision makers.  This clearly indicates that not all decision makers are the same.  The author breaks it down to production ("the act of composing reports"), presentation ("the outward form these reports take"), and dissemination ("how the content of these reports is shared across and organization").  The group proposes a new product they are coming out with that is a 'smart report'.  What this means is that the reports have to be made with the idea that they can be taken apart and put back together to give to anyone that may need to use them in an effective manner.  The data must be well sourced along with clearly stated operations the analyst used to acquire the data and process it as intelligence.  Confidence must be also clearly stated and updated as necessary.  Third, the reports must have some interactive capabilities such as a live visual component that engages the reader.  This can put the reader in the analyst's shoes and allow them to see how the analysts see.  Lastly, the products must be customizable to as many groups as they need to be.  Length and classification, among other categories, are examples of criteria these reports must be able to alter.


One specific tool the National Visualization and Analytics Center (NVAC) proposes is a snippet widget that allows the writer of the report to collect information at the time of research that may be in the form of evidence or other important information.  This also includes how the information was discovered.  These snippets are then combined together for the report to make a coherent argument or story.  Instead of having to retrace and reanalyze the information, this snippet widget will free up time that can be spent on organizing and refining the report.  Additionally, the reports generated from the snippet widget and the refining process can be placed into style sheets.  These style sheets allow for the document to be easily  transferred between different forms of reports including websites, analytic reports, or even blog sites.  These can also be edited by others (if the writer wishes) in order to improve the document or adjust information.  This eliminates the time spent making each document tailored to a specific person in a specific style.

Critique:
I really liked reading about the articles in this edition, as well as skimming other editions.  I think a lot of the products and articles they put forth are easily understandable and make great points about how visual analytics is improving.  This article in particular really speaks to what we hope to do here at Mercyhurst University (or are at least trying to do).  It's about making a report that a decision maker wants to read and doing it in a timely manner.  I think this is an excellent idea and could really help improve the report writing and publication process of reports.  Additionally, this report briefly mentions the four criteria for creating these new reports.  The third requirement- making products interactive- really emphasizes what I feel is the future of report writing.  Decision makers are not going to just want to read a report- they are going to want to visually see how different factors are working or perhaps even receive the document in a video news bulletin form.  By having a tool that help integrate all of these specifications in a timely fashion is an excellent research endeavor to tackle.

One of the drawbacks mentioned is that there needs to be a research community established that is willing to help improve documents when requested.  I agree that this is a drawback, however, the idea of having a document that is easily convertible and accurately sourced without much additional effort is a huge value in itself.  To be able to write a report and never have to create your own presentation for it, but instead just tweak an already made report would save a lot of time.

Lastly, when I first began to read this article I figured it would focus a lot on how to make charts and figures look more appealing.  I think this is also something that needs to be worked on, however, a report that can convert itself into a different type of report while maintaining citations is bigger time saver than adding different graphs.

Source:
May II, R. (2008). VAC Views - May 2008. Retrieved from http://readthis.pnl.gov/marketsource/readthis/B3065_not_print_quality.pdf

Intelligent Visualization and Information Presentation for Civil Crisis Management

Adrienko & Adrienko's paper "Intelligent Visualisation [sic] and Information Presentation for Civil Crisis Management" describes research conducted as part of the EU-funded project OASIS to develop methods for effective visualization support for situation analysis and management during times of crises, namely through the "Situation Manager" software module. The authors cite the major goals of the research are "to reduce the information load of the analyst, decision maker, or information recipient without omission of anything important and to ensure quick and accurate comprehending of the information" (Adrienko & Adrienko, 2007, p.889).

Summary:
The research behind "Situation Manager" aims to create a generic crisis management system to support response and rescue operations in the event of large-scale disasters. Like intelligence products, visual analytic products should be deliverable in a timely fashion, while the information is relevant, and presented in a way that it is easily understood and therefore can be used by decision-makers. The authors suggest intelligence visualization requires reducing the information load on the recipient, display choices and designs that ensure quick and accurate recognition of meaning, and account for the characteristics of the medium used to view the information.

The authors suggest that intelligent visualization is used for two different purposes: to support the work of an analyst, planner, or decision maker, or to build an information presentation to send to a specific recipient.  It is unclear if the authors see these purposes as mutually exclusive, though it is unlikely. In the development of intelligent visualization, expert knowledge is needed in the emergency management domain ontology, generic roles involved in emergency situations and their information needs, and techniques and methods to manipulate and organize different types of data. The research uses literature on crisis management as a source of domain-specific knowledge and literature on data analysis, graphic design, and geographic visualization as domain-independent knowledge to create an expert system.

"Situation Manager," still in its early stages at the time of this article's publication, is a scalable vector graphics presentation that is interactive and includes the emergency management expert information and a problem-specific user interface. The system takes information about the crisis situation and the territory affected by the situation, essentially answering the what and where questions. The where aspect includes an impact zone which can be supplied or self-created. The software module then identifies the hazardous agents involved in the situation based on preexisting knowledge and potential secondary hazardous events caused by the first crisis event. This helps to locate objects or resources that need to be saved or protected in the impact zone to prevent further damage, including an estimate of the number of people living in the zone based on census data. If the object is people, the Situation Manager detects if people in the impact zone can escape danger without outside intervention. The module displays objects and dangers by degree of how critical each is through the sizes of symbols used on the maps, larger symbols representing a more critical object at the current time.



Critique:  
This research is particularly applicable to the intelligence field considering the blatant similarities between what the authors conclude are requirements for intelligent visualizations and the requirements we at Mercyhurst accept as necessary for an analytic/intelligence product. When reading the article I could specifically relate it to the work a classmate is doing to identify the potential for nuclear exposure caused by earthquakes in Iran.

The article itself is only a description of research being conducted, rather than methodologically reporting the findings and methods of the research. While the article suggests different sources of information used to create the expert system for visualization for crisis management, it would strengthen the presentation of the research to formally address all areas of data collection. Still, the authors do a good job of explaining the research in developing the "Situation Manger" in general terms and use good examples to communicate the use of such a tool. Though the module was not complete at the time of the article's publication, the authors provide a roadmap of implementation upon completion and suggest likely real-world applications.

It is interesting to me that the article does not address why visualization is a useful communication tool for decision-makers in general, not even referring to the extant literature on the subject which is extensive. Further, because the module was not complete at the time of publication, there is no record of successful implementation or feedback by users. Further analysis of this type of visual analysis and data visualization is necessary to strengthen the argument for the Situation Manager module.

Source:
Andrienko, N., and Andrienko, G. (2007). Intelligent Visualisation [sic] and Information Presentation for Civil Crisis Management. Transactions in GIS 11(6), 889-909. doi:10.1111/j.1467-9671.2007.01078.x

Provenance of Intelligence Analysis Using Visual Analytics



Summary:
Intelligence agencies often retrieve information from multiple sources to anticipate and counter terrorist attacks. Often difficulties of organizing this information arise when dealing with large quantities of dynamically changing information. Interactive visuals tools allow for effectively analyzing data and providing faster insight. Post Cold War centralized terrorist organizations have been replaced by decentralized terrorist cells that continually reinvent threats. In this case, visual analytics in intelligence is extremely useful for identifying behaviors of terrorist groups. For example, some of the advantages of using visual tools to study behaviors of a network of individuals include identifying key players of the complex network, identify emerging themes, indicate geo-spatial relationships, and show correlations across multiple parameters.  

During intelligence analysis, raw information processed through multiple stages is transformed into actionable intelligence. According to the authors, processing a large amount of raw material requires the use of visual tools to track the transformation. Both evolving and complete analysis should involve a review of the analytic process including tracking the sources of data and their reliability, reviewing background knowledge of the scenario in question, and analysts assumptions. Also known as provenance information, information that is traceable is valuable when assessing the plausibility of conclusions. Also a framework of permission management can be applied to visual tools in an effort to manage transparency among analysis with differing security clearances. The authors’ approach to traceability includes three categories; data level, analysis products, and reasoning products. The framework is based on products present in any analytic workflow. However, a number of challenges exist in tracing provenance during intelligence analysis. The authors suggest another framework called provenance reasoning workspace that comprise of three spaces to reduce these challenges. The spaces include a data space, a computation space, and a reasoning space.                              

Critique:
Although this article was short, it was not an easy read. The article had a few sentence fragments and run on sentences. I had to read some sentences a number of times to understand the content. The authors focus was on using visual tools to keep track of the process of collecting information to analyzing intelligence as oppose to using it as a tool that enhances the presentation of the final product to the decision makers. Most of the article focused on providing advantages of using visual tools to trace the transformation of information to intelligence followed by frameworks. The information on the article needed to be explained in more than two pages. The authors’ framework for tracing the transformation of information into intelligence is not very descriptive. An example or a methodology section testing the frameworks may have shown the applicability of the frameworks.   

This reminds me of the article that I summarized about decision trees. Decision tress may not trace the process of gathering and analyzing intelligence, but it is a tool that can visually organize data for the analyst’s use rather than the decision maker’s. The authors’ frameworks are prone to similar challenges of incorporating decision trees to intelligence process. Similar to decision trees and contrary to the authors’ view, the provenance frameworks may not be able to handle complex scenarios. It seems quite tedious and requires the use of two different frameworks. Organizing data at each phase will require the use of multiple tools. It also requires the analysts to update each step when incorporating new evidence. I’m unsure of the benefit of tracing the process of creating intelligence from the beginning to the end in a visual manner.
 
Source:
Wong, W., Xu, K., & Attfield, S. (n.d.). Provenance for intelligence analysis using visual analytics. 
         Retrieved from http://eprints.mdx.ac.uk/8415/1/intelligence-analysis-provenance1.pdf

Interactive Dynamics for Visual Analysis

Summary:
In the article Interactive Dynamics for Visual Analysis, Jeffrey Heer and Ben Shneiderman (2012) provide a taxonomic guide for analysts, researchers and other professionals to create visual analysis tools. They discuss the usefulness of visualizing data for comprehension in saying that "by mapping data attributes to visual properties, ... visualization designers leverage perceptual skills to help users discern and interpret patterns." The authors also stress the importance of ensuring that the visuals are appropriate and intelligible for the consumer.

The authors describe three dynamics for visual analysis, each of which include examples of task types or steps that fit into those descriptions. The three dynamics are Data and View Specification, View Manipulation, and Process and Provenance. Data and View Specification involve determining which data is to be shown and visualized with programs such as Microsoft Excel. Then it involves the filtering of data which shifts the focus among the different data subsets to isolate specific categories of values. Sorting the data can show surface trends and clusters and organize data according to a unit of analysis. The following image shows a more complex form of a matrix-based visualization of a social network.

The first matrix plot shows a social network when people are sorted alphabetically. The second plot shows a reordering by node degrees resulting in more structure and the third plot is permutated by network connectivity, showing underlying clusters of communities. The final step is to derive new attributes from existing values when input data is insufficient.

The second dynamic is View Manipulation which consists of highlighting patterns, investigation of hypotheses and revealing additional details. Selection allows for pointing to an item of interest, for example, dragging along axes to "create interactive selections that highlight automobiles with low weight and high mileage." Navigating is determined by where the analyst begins, such as in a crime map that depicts crime activity by time and region. Coordinating allows the analyst to see multiple coordinated views at once which can facilitate comparison. This can be done in histograms, maps or network diagrams. The following image shows a complex patchwork of interlinked tables, plots and maps to analyze outcomes of elections in Michigan.


The image shows a combination of tables, plots and maps. The final step, organization, involves arranging visualization views, legends and controls for more simplified viewing.

The final dynamic is Process and Provenance which involves the actual interpretation of data. Recording involves chronicling and visualizing analysts' interaction histories in both a chronological and sequential fashion. Annotation includes recording, organizing and communicating insights gained during analysis. Sharing involves the accumulation of multiple analyses and interpretations derived from several people and the dissemination of results. Guiding is the final step and includes developing new strategies to guide newcomers.


Critique:
This article was very effective in both showing many ways of visualizing data and also the importance of visualization in comprehension and facilitation of analysis. The inclusion of some well-known applications such as Google searches and crime maps and their many examples were beneficial in the authors' explanation of the taxonomy for those readers who don't have extensive experience with other specialized software or the field. It was also important that the authors distinguished between what may be visually intriguing but don't have much real-world application.

Some of the features explained within each dynamic were directly applicable to the intelligence field or anyone conducting an analysis of data, such as the investigation of financial markets or terrorists networks. I can foresee this field being a new avenue to explore for more effective communication between analysts and decision-makers. Analysts often learn that decision-makers prefer concise, clear and preferably visual information but may not know an effective manner to convey such information. This article provides a basic but helpful overview in how to go about visualizing data.


Source:
Heer, J & Shneiderman, B. (2012). Interactive Dynamics for Visual Analysis. Communications of the ACM, 55(4), 45-54. doi:10.1145/2133806.2133821

Applied Visual Analytics for Economic Decision-Making

Summary:

Savikhin et al. (2008) utilize the application of visual analytics to improve an individuals' economic decision-making skills.   The authors investigated the application of visual analytics to common problems noticed in economics, the winner’s and loser’s curse.   The winner’s curse is the individual who tends to overpay for certain items or services, either the individual is worse off for buying the product or service, or the value of the asset is less than the bidder perceived.  The loser’s curse is when an individual pursues an asset that is below their profit-maximizing bid, or a competing entity attains the bid.   The main problems apparent is that economists are unable to see the potential for creating a business strategy that is able to maximize profit, with most economists are unable to consider all the information that could guide these decisions.   Thus, the authors apply visual analytics to improve the decision-making abilities in both winner’s and loser’s curse situations.   The hypothesis by Sayikhin et al. (2008) for their study was the subjects who participated in their interactive visual analysis study would bid closer to the profit maximization decision as opposed to those who participated in simple visual or tabular studies. 

Sayikhin et al. (2008) conducted six different treatment groups, 3 for winner’s curse scenarios and 3 for loser’s curse scenarios.  The three different visual aids the participants looked at to help with their decision-making were an interactive visual analytic model, a simple visual, and a tabular table.  Each subject in the experiment acted independently from other subjects in the study.   All were given the scenario of being a decision-maker who had to decide how much to bid for a company.   Participants were given a possible data range that they could bid for each company.  Decisions on how much to bid were conducted on a computer generated program that would randomly decide the value of the company and display the three different types of graphics.  Over the course of the experiment the participants switched between the three types of visual aids and would base their bid value off their interpretation of what the visuals portrayed.   In each of the three different visual representations individuals were given 30 different opportunities to bid on various companies. 

Overall, subjects that were given the interactive visual analytics treatment learned what the best bid/optimal solution would be more often as compared to those individuals who were given simple visual or tabular representation of the bidding information.  Moreover, for both the winner’s and loser’s curse scenario groups, the periods of using interactive visual analytics outperformed subjects given the other visual treatments.   Overall, results were statistically significant in this regard. Each increased usage of the interactive visual analytics model allowed the participants to learn from past decisions on bids and allowed these individuals to make more optimal bids as opposed to other participants who received the other two visual treatments.  It is also important to note that even a simple visual aide provided more effective decision making capabilities as opposed to viewing information with tabular formed displays.

Critique: 

I found that this study was useful as it provided a way in which to help individuals within the business realm to make more efficient decisions by analyzing their situation with interactive visual aids.  It is important to avow that this study seems to suggest the usefulness of showing information visually to overcome cognitive thinking judgments from decision-making and improve learning capabilities.   Moreover, it would be interesting for a future study to demonstrate how interactive visuals seem to engage our thinking more than just a simple visual does.  One limit of this study was that the sample size was small, so it would be interesting to conduct this study over a much larger sample size to replicate the results.  Another limitation of this study other than sample size was that the authors only looked at bidding patterns in winner’s curse or loser’s curse scenarios, not any other economic conditions.  Even though this is a topic that would come up often within the business environment, it would be interesting to see what other areas in business decision-making scenarios would interactive visual analytics improve the process of decision-making.   I would hypothesize that interactive visual analytics would be able to be applied to multiple areas in the business realm, especially for those individuals who learn more effectively visually.  



Source: Savikhin, A.,Maciejewski, R., & Ebert, D.S. (2008). Applied visual analytics for economic decision-making. IEEE Symposium on Visual Analytics Science and Technology, 107-114.  Retrieved from https://www.bioinformatics.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Applied%20Visual%20Analytics%20for%20Economic%20Decision-Making.pdf.

Monday, April 22, 2013

Empirical Studies of Information Visualization: A Meta-Analysis


Summary:

The study Empirical Studies of Information Visualization: A Meta-Analysis by Chaomei Chen and Yue Yu provides a meta-analysis on a variety of empirical studies of information visualization.  The intent of the research is to conduct a meta-analysis on this topic in order to capture the theories and practices in empirical examinations of information visualization. The analysis focuses on three areas of information visualization including users, tasks, and tools.  As a meta-analysis, this article provides a simplified description and displays the underlying relations in the large amount of convoluted, contradictory, and confusing information often found in the literature.  

The article first provides an overview of the meta-analytical method and selection for studies used, then a subjective review of the studies is presented follow by identifying the most commonly used hypothesis, independent variables, and dependent variables, this is followed by the results of the study.  The research includes experimental studies with independent variables that comprise a relationship to one of the three contextual variables (users, tasks, and tools).  The two types of dependent variables used are accuracy and efficiency measures. 

The study’s results come in two parts looking at both users and tools.  Each section compares the empirical findings of individual studies and are synthesized in terms of effect sizes and significance levels.  The study found that users with strong cognitive abilities will benefit significantly more from visual-spatial interfaces than those with weaker cognitive abilities.  The study found that users with stronger cognitive abilities will perform more efficiently than users with weaker cognitive abilities while using visualization. Additionally, the study displayed that visual-spatial information-retrieval interfaces will enable users to perform better than traditional retrieval interfaces.  Finally, the study stated that users using visualization interfaces in information retrieval will perform more efficiently than those using a none visualization interface.  The following list the major, all encompassing conclusions of the study.
                1. Empirical studies of information visualization are diverse and applying meta-analysis methods is difficult.
                2. Future studies would benefit from systematically investigating individual differences, including a  variety of cognitive abilities systematically.
                3. When users displayed the same level of cognitive abilities they tended to perform better with simpler visual-spatial interfaces.
                4. The combined effect size of visualization in not statistically significant. A larger homogeneous sample of studies is necessary for conclusive results.

Critique:

This meta-analysis is especially helpful due to the increasing amount of literate on the topic of visualization.  Although its finding were not significant it provided a very effective start in providing an overview of the current literature on visualization.  As technology progresses this study will be one to build upon.
Considering the study came out when visualization was in its infancy, it is understandable that a minimal amount of articles were available.  Although, if the authors would have slightly expanded their criteria the study would have provided a better overview of visualization overall.  Furthermore, only five studies tested the effects of visualization on accuracy.  To strengthen the argument the authors should have considered increasing the amount of articles used under this category.  Additionally only three studies tested the efficiency of visualization. The same critique applies here in that more sources would have greatly benefited the analysis as well as the argument overall.  Again, by expanding the criteria for an article to be used in the study as well as the categories tested in the study more articles may have been available.    

Overall, this study provided a very useful synthesis of visualization and its benefits.  Expanding upon the approach by conducting a similar study in terms of visualization today would provide an interesting comparison and an overview of its progression.   

Source:

Chen, C. & Yue, Y. Empirical studies of information visualization: a meta-analysis. Int. J. Human-Computer Studies. 53(5), 851-866. Retrieved from: http://www.sciencedirect.com/science/article/pii/S1071581900904221.

Geographical Information Systems–Based Marketing Decisions:Effects of Alternative Visualizations on Decision Quality

Ana-Marija Ozimec, Martin Natter, & Thomas Reutterer conducted a study to examine the effectiveness of different quantitative symbolization methods on maps. The researchers wanted to know which type of visualization method was the most effective with decision makers.

The symbolization styles they examined  were size-based circles and bars, shadings (value), pure (circles, bars, and shadings), combined (shadings and distortion), circles, and bars. To measure the effectiveness of these symbolization styles, the researchers examined decision accuracy, decision confidence, decision efficiency, and perceived ease of task. Below is a chart of their findings.  

The researchers found that using circles was the most effective visualization methods for decision makers. Using all four measures of effectiveness, circle symbolization was the best performing one. Other results showed the combined shadings and distortion ranked second in decision accuracy, but ranked lowest in decision confidence. Value shadings ranked lowest in decision efficiency while shadings and bars ranked lowest in perceived ease of task.

Critique 

This study has direct implications to intelligence analysts. This study found that circle symbolization of quantitative information is the most effective symbolization tool for decision makers. As intelligence analysts, it is our job to continue to improve communication between ourselves and our decision makers. The results of this study can help us do that. Since GIS data can be used by all types of intelligence (national security, law enforcement, and competitive), this study can be applied across all fields as well.  

The only criticism I have is that circles as a form of quantitative symbolization may not be applicable to all types of scenarios or quantitative data. Though at this moment I cannot think of any, there might be a situation where circles are not the best form of visualization. 

Source: Ozimec, A., Natter, M., & Reutterer, T. (2010). Geographical Information Systems–Based Marketing Decisions:Effects of Alternative Visualizations on Decision Quality. Journal of Marketing, 74 (6), 94-110. Retrieved from http://ehis.ebscohost.com/eds/pdfviewer/pdfviewer?sid=8fe30abf-7ab8-4068-be2c-26bac62fd9d7%40sessionmgr4&vid=2&hid=4# 

Sunday, April 21, 2013

Investigative Visual Analysis of Global Terrorism

Summary:
The application of visual analytics to global terrorism.  The authors, Miler, Smarick, Ribarsky and Chang, used an existing database with information on terrorist organizations to apply visual methods.  The Global Terrorism Database (GTD) contains information on both domestic and international terrorist organizations.  The authors applied their visual analytic system to this existing database to look at the five W's (who, what, where, when and why) of terrorist organizations in a manner that is easier for decision makers to understand.

Prior to this tool there were typically two groups of visual analytics: social network analysis and geo-temporal visualizations. The system implemented by the authors attempts to combine both social network analysis and geo-temporal visualization  This tool has a number of layers that can be activated to look at the various elements of terrorist organizations.  There are levels that show the location of attacks, which can be detailed on to see the specifics of what took place at that location. Through the different layers various elements can be visually depicted, which makes it easier to understand the data that is present.

The authors indicated that there are three types of individuals who typically go to the GTD website: the general public, investigative analysis, and terrorism experts.  Through the visual analytics tool, individuals with varying levels of exposure to the subject matter can gain a significant level of understanding of the material.  When the system was used by individuals in various organizations, they were all interested in applying the method to the various fields they were in.

Critique:
One element the authors identified in their conclusion was that there were certain elements that could be enhanced overtime.  For example, there were instances of over-plotting data and geographic lines, which made it difficult to look at.  Overall, it appears that this method is extremely useful to apply to existing data. Not only does it analyze various elements, but it also increases the ease of communicating with decision makers, as well as decrease the ambiguity that may be present in large data sets.  This method is certainly something that should be incorporated when possible and certainly enhances the distribution element of information.

Wang, X., Miller, E., Smarick, K., Ribarsky, W., & Chang, R. (2008). Investigate visual analysis of global terrorism. Computer Graphics Forum, 27(3), 919-26. Retrieved from: http://ehis.ebscohost.com/ehost/pdfviewer/pdfviewer?sid=74559524-5a41-4427-a79e-c5e026f76e72%40sessionmgr110&vid=3&hid=17

Thursday, April 18, 2013

Summary of Findings (Green Team): Bayesian Analysis (3.75 out of 5 Stars)


Game Theory
Bayesian Analysis
Rating (3.75 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 8 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst University in April 2013 regarding Bayesian Analysis specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:
Bayesian analysis, developed by Thomas Bayes, is a statistical approach that deals with using prior knowledge as well as updated information, separating the Bayesian approach from frequentist statistics. Bayesian analysis makes the analysis process iterative, allowing for more information to be added as it is learned or deemed relevant. According to Hubbard (2010), Bayesian theorem is a relationship of probabilities and conditional probabilities, or the chance of something given a certain condition (p.178-79). Bayesian analysis allows analysts to calculate probabilities or make estimates in terms of certain base assumptions as well as new developments. This is especially important in the intelligence field as analysts should be able to include new signals or indicators into their analysis to update estimates, further reducing their levels of uncertainty.   

Strengths:
  • Most helpful in situations in which your initial estimate is at either end of the spectrum
  • Helps analysts to combat emotional/irrational estimates
  • Particularly effective when analyzed in relation to the probability that was present prior to the addition of the newest piece of evidence.
  • Has the potential to be applied across multiple disciplines, as evidenced more frequently in recent academic journals
Weaknesses:
  • Complex, difficult to learn without a statistics background
  • The volume of evidence makes it difficult to continuously evaluate the information
  • No set way to decide what constitutes evidence
  • Unable to know what is important at the moment, particularly with real-world situations
    • Assumes that each piece of evidence is worth the same weight
  • Is less useful in intelligence applications when probabilities are initially determined to be 50/50

How-To:
  1. Start with a rudimentary estimation for the likelihood of an event occurring
  2. Take evidence and apply/use probabilities for potential outcomes
  3. Follow the formula whereby P is the probability, E is the event of interest, H is the factor likelihood, and E|H is the probability of the event of interest given the factor occurred in conjunction to it.
  4. Do the math to find the probability percentage.
  5. This process can be repeated as additional information is given/discovered in order to improve the probability estimate.

Personal Application of Technique:
The class was tasked with finding out the likelihood that an individual, Bob, drove to work based on the fact that he was late. Bob can choose between three different ways to get to work: car, bus, or commuter train. If he takes his car, there is a 50% chance that he will be late. The bus has a 20% chance of making him late while the train has a 1% chance of making him late.

The first question that the class was given was to find out the likelihood that Bob drove to work based on the probability that Bob chose evenly between the three choices. Using Bayesian analysis, the class came up the a 70.4% probability that Bob drove to work that day.

The second question added new information about Bob’s normal transportation habits. Bob’s co-worker knew that Bob almost always takes the commuter train, never takes the the bus, and takes the car 10% of the time. With this new information, the class was able to update the probability that Bob drove to work to 84.75%.

This exercise demonstrated that the application of Bayesian Analysis is not always the most direct method to get to the probability of the situation in that with this particular case, the boss could specifically ask Bob how he got to work that morning. As individuals not seeking a degree in mathematics, there was a significant hesitation to apply the formula and present the findings.

Rating:  3.75 out of 5 stars
Note: The analysts feel this methodology has very strong benefits and is widely applicable, however, it is relatively weak in terms of its application to the intelligence community, particularly in the reliance on numerous factors which affect the utility.  
For Further Information:
Hubbard, D. W. (2010). How to measure anything: Second edition. Hoboken, NJ: John Wiley & Sons, Inc.

Summary of Findings (White Team): Bayesian Theory ( 4.5 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 8 articles read in advance (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst University in April 2013 regarding Bayesian Probability Theory specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.


Description:
According to Hubbard, Bayesian Theory is a relationship of probabilities and “conditional” probability (Hubbard, 2010). Bayesian Analysis allows for the addition of new information as it is learned to decrease uncertainty in the hypothesis. This method allows a quantitative value to be applied to intelligence questions (ISBA, 2009). Additionally, it is very effective in reducing uncertainty.

Strengths:
  1. Applicable to the intelligence field because it gives a probability of likelihood rather than statistical significance, such as in frequentist statistics.
  2. Allows for updates to probabilities with addition of new information.
  3. Allows you to move out of the confirmation bias mindset by asking questions such as what would I observe if X were true or what would I expect to see if X were true?
  4. Effective when odds are very high or very low.
  5. Is more intuitive than traditional frequentist statistics.

Weaknesses:
  1. Less helpful when initial probability is close to 50/50.
  2. A piece of evidence that may may not be accounted for may turn out to be the most important.
  3. It is difficult to decide how my weight to put into each new piece of evidence.
  4. Sometimes difficult to see the real-world application and relevance.
  5. Possibly time consuming to conduct bayesian theory to situations.
  6. Can sometimes be fairly complex and difficult to understand.

Step by Step Action:
  1. Find a topic for which you want to estimate a probability.
  2. Assign probabilities to each hypothesis.
  3. Update the equation by adding new hypothesis when new evidence is available.
  4. Calculate the likelihood a certain scenario will occur based on the probabilities of each hypothesis.
  5. For the numerator, multiply the hypothesis being tested by the probability that it will occur.
  6. For the denominator, multiply each individual hypothesis by the probability of that event occurring with that hypothesis and find the sum of each probability.
  7. Divide the numerator by the denominator to find the final probability.

Exercise:
In class we conducted a simple Bayesian Theory problem to determine the likelihood that an employee would show up to work late by car. The formula that was used in this Bayesian approach was the general form of bayes theorem equation as shown below. There were two parts to the exercise, part A included the bosses probabilities that Bob would come late to work by car, bus, or train, overall determining the probability that Bob came to work late by car. Part B analyzed the same situation, but the probabilities of how a co-worker thought Bob would come late to work by car. By calculating both Part A and Part B it was noticed how the probabilities of Bob coming to work late by car changed with different evidence placed into the general form bayesian equation. In terms of real world applicability, the exercise demonstrated how applying bayesian theory to real life scenarios may be difficult to initiate or not seem as the best approach to figuring out a situation. It would seem just asking Bob what mode of transportation he came to work late in would be a much easier approach. However, the exercise does demonstrate the benefits of how bayesian theory can reduce the uncertainty of a question when more evidence is added, which is essential in the work conducted by intelligence analysts. Overall, Bayesian Theory can provide a much more reliable estimate for the analyst.





Tuesday, April 16, 2013

Is It Safe To Go Out Yet? Statistical Inference in a Zombie Outbreak Model

Summary: 
The authors Calderhead, Girolami, and Higham (2010) wrote a paper dealing with the potential outcomes of a zombie outbreak using Bayesian theorem to support their conclusions.  Since there has never been a zombie outbreak it is logical to use Bayesian theory to account for unknown data that can be estimated.  Estimations can be made regarding a zombie outbreak, and in turn these estimations can be used in particular ways to end with likely outcomes.

The idea for using Bayesian theory applied to zombie outbreaks starts with a logical probability.  In the case of this paper, the authors state that in one day the far extremes of probability are that no human will turn into a zombie and all humans are converted into zombies.  This probability (labeled prior) is then updated as new data is used (such as different quantity of days) and thus this posterior distribution then becomes the prior and the process is repeated.

The authors then state that many questions can be answered by successfully finding a likely distribution for human to zombie conversion rates.  Such questions include how many soldiers should be mobilized, the scale of quarantine needed, and whether or not it is alright to leave a hiding spot given the number of zombie sightings during a particular time span.  The authors also emphasize that since the rate of change from human to zombie is likely to not be constant, the beta (conversion coefficient) should be a range and not a singular number.

One of the model comparisons the authors use is the comparison of two models: one that assumes zombies can attack alone while the other shows that after a rumor is circulated, zombies are believed to only travel in pairs.  The authors then seek to disprove the second model through Bayes factors (posterior odds = Bayes factor * prior odds), in which statistical evidence for the first model is weighed heavier than the second.  The authors find that the first model with the least amount of noise (introduced as Gaussian distributed noise) is more likely.  This means that the experimental data deviated the least from the expected curve.  Adding additional noise negatively impacted the Bayes factor (shows as how strong the evidence was against the second model.


Another example the authors used Bayesian theory on was for answering the question of whether or not it is safer to leave a hiding spot based on the number of zombies spotted in the past few days.  The authors use two types of analysis for this: one that does not have any observations of previous day's zombie sightings and one that has five day totals of zombie sightings.  The stated the zombie sightings for the second case were 123, 127, 104, 92, and 74.  The left column of the figure below shows Bayesian factors applied to the first model mentioned above, and the potential outcomes.  The right column shows this process with a second layer of data (the five days of observations) which greatly reduces the uncertainty regarding potential zombie totals for the next 45 days.  Thus, due to these observations and using Bayesian theory uncertainty can be greatly reduced and the chance of surviving longer during a zombie outbreak are much higher.



Critique:
I found this article to be fairly complex to read with no experience with Bayesian theory.  However, I really appreciate the application of Bayesian theory to a zombie outbreak.  Although this article topic is (most likely) fantastical, it is well constructed and thoughtful.  Honestly, the topic caught my eye and I doubt I would have tried as hard as I did to understand Bayesian theory had it been on a drier topic.

One issue I had with this article is that it was clearly meant for someone that had previous experience with Bayesian theory.  At times the authors referenced various aspects of Bayesian theory without defining them.  For example, the authors did not just list Bayes factors but instead referenced an article.  For the average reader this does not make reading and understanding this topic any easier.  Additionally, simple Bayesian theory is not that long or difficult to write out and would have saved me time having to look it up to double check I was thinking about the correct thing.

This article is not related to intelligence, save for if there were to ever be a zombie attack it would prove useful for intelligence analysts to extend their lifespans.  However, this model could be applied to medicine for spread of infectious diseases if the transmission rate is unknown.  Instead of zombies and humans there would be infected and healthy individuals.

Source:
Calderhead, B., Girolami, M., & Higham, D. (2010). Is It Safe To Go Out Yet? Statistical Inferencein a Zombie Outbreak Model.University of Strathclyde, United Kingdom.  Retrieved from http://www.strath.ac.uk/media/departments/mathematics/researchreports/2010/6zombierep.pdf

Bayesian Inference Analysis of the Uncertainty Linked to the Evaluation of Potential Flood Damage in Urban Areas

Summary:
Fontanazza, Freni and Notaro explain that flood impact on highly urbanized areas can be high and has the potential to increase with the effects of climate change. Thus, decision-makers prefer reduced uncertainty when planning flooding mitigation and prevention. This analysis is beneficial because there exists uncertainty in the physical processes that must be simulated in hydraulic models and in the limit of data for model calibration. Additionally, there are sometimes measurement errors in terms of depth-damage curves which can affect data.

In this article, the authors applied Bayesian probability analysis to a case study of Palermo, Italy to determine whether uncertainty decreases with the addition of data. Bayesian analysis has two benefits: "parameter estimation and uncertainty analysis" in both hydraulic model parameters and the depth-damage curve coefficients. They create a mathematical probability model using Bayesian analysis including values in the equation for "the uncertainty of a generic model parameter", "observed values" and a "likelihood function."

The authors split the historical data into three sections, that from January 1994 to April 1999, from May 1999 to January 2003 and from February 2003 to December 2008, to determine whether uncertainty would decrease with each subsequent addition of a data group. The land use in the Palermo case study was identified as mostly for residential dwellings with 88 percent of the area being impervious. The following three images show the reduction in uncertainty once more data became available, demonstrating that Bayesian probability analysis did in fact reduce uncertainty. By the addition of only the second set of data (in the second image), the reduction in uncertainty was about 40%, without a reduction in reliability.





Critique:
There were some limitations in Bayesian analysis, such as that it relies on an initial hypothesis which can often be subjective as well as that the approach may not be objective if the parameter distribution is not made on physical observations. Nevertheless, I noticed many advantages of the methodology. The authors were successful in demonstrating its effectiveness with a case study, thus showing with real but historical data, that a significant reduction in uncertainty was possible. They also accounted for the aforementioned limitations with additional probabilistic analyses on the parameter choices to ensure that they did not skew the results.

The interest in reducing uncertainty for a decision-maker seems to be the same for any profession. I would be curious to see how this could be applicable to a study of crime mapping in which it is determined whether a decrease in uncertainty actually does occur with an increase in data. This could perhaps be applied to the "Newton-Swoope Buffers" in ATAC Workshop that are intended to determine the location of an offender's home or business. These buffers change with each additional piece of information, seemingly because they are becoming more accurate with more data. A Bayesian probability analysis could be applied to this tool to determine its effectiveness and additionally, application to law enforcement intelligence.

Source:
Fontanazza, C.M., Freni, G., & Notaro, V. (2012). Bayesian inference analysis of the uncertainty linked to the evaluation of potential flood damage in urban areas. Water Science and Technology, 1669-1677. doi: 10.2166/wst.2012.359

Fusion of Intelligence Information: A Bayesian Approach

Elizabeth Paté-Cornell presents a classical probabilistic Bayesian model that she believes can be utilized by the intelligence community to aid in the fusion of intelligence information. The awareness of the need for such fusion is apparent in the wake of September 11, 2001, which the author suggests the probability of impending attacks can be found through a Bayesian analysis. The author's two major arguments for the use of the Bayesian model in the IC, particularly related to terrorist attacks, is that it allows for the computation of the posterior probability of an event given the probability of the event prior to observing signals, and the quality of the signals based on the probabilities of false positives and false negatives.

Summary:
The author begins by discussing the problems associated with a fusion of information within the US intelligence community, namely difficulties in ensuring internal communications and the difficulties in merging the content of multiple signals, some more sharp than others, some dependent or independent of others. This research claims that Bayesian analysis can be applied to help solve the difficulties associated with the latter and is explained in terms of identifying the probability of an impending terrorist attack. It should be noted the author does not claim the model will better detect impending terrorist attacks, rather that it can increase the probability that an attack plan is foiled through guiding "clear thinking at a time when the amount of information is large and confusing and intuitions can be seriously misleading" (Paté-Cornell, 2002, p. 454).

The elements of Paté-Cornell's Bayesian model can be explained through the following notations:
 
Namely, the event of interest throughout the article is an impending terrorist attack. Through the model, the author presents formula that addresses both the prior probability of the event occurring before reading signals, such as intercepted telephone conversations, as well as the quality of the signals. The formula, as appears in the following figure, considers what alternatives to the event of interest could occur in conjunction with the signal, a very important thing to consider in the intelligence field.


Additionally, the formulas the author presents address the chances the signals observed are false positives, or that some signals has been missed (false negatives), and how these affect the probability of a future terrorist attack. The probability of false positives can be calculated by considering the prior probability of the impending attack without considering the signals, in conjunction with the rate at which the signal occurs during normal sensor operation when the event does not occur. She explains that her definition of false positives and its application in Bayesian analysis is most useful to the intelligence community because of its consideration of the prior probability of the event, especially considering how drastically the prior probability has increased post-September 11.

Estimating the prior probability of an impending attack can be considered as a combination of the intention of the enemy to attack, the effective planning of that attack (ie. the ability of the perpetrators to coordinate a plan and avoid detection), and the successful implementation of the plan on a given day (ie. the ability of the perpetrators to carry out the plan and avoid target's safeguards). The author argues that the identification of these probabilities alone is of use to the intelligence community given the chance to reduce the probability of an attack attempt through various measures hitting these areas (ie. cutting flow of funds or increasing security).

Critique:
The research applies a Bayesian model by using hypothetical numerical illustrations for the interpretation and fusion of intelligence information and could be strengthened through the use of real-life numerical examples, though sensitive in nature. Additionally, the author switches between examples for multiple formulas, sometimes relating it back to the overarching theme of terrorist attacks, while other times relying on the unrelated example of testing chemicals for poison. This back-and-forth detracts from the overall readability of the research and does not add to the application of the model to the intelligence community. The author uses good examples of potential signals used intelligence but does not carry them throughout the research.

The author further admits some limitations of the research. First, the assumption that both the event and the signals are black and white, either they occur or they do not occur, which is not always the case, particularly in the intelligence community. Further, the research assumes that the likelihood of false signals, whether positive or negative, remains the same throughout time, also unlikely in the intelligence field. Finally, many of the sources of data for such a model are difficult to accurately quantify, including the frequency of past observations, reliability data for sensors or links, or expert opinions. For instance, how can we accurately, and quantitatively, determine the reliability of human intelligence?

Overall, the research is very interesting and provides insight into the intelligence community and process. Admittedly, the approach only helps solve the second half of the information fusion, not aiding in the means of internal communication among the intelligence community, however, any reduction in uncertainty, particularly through objective means, helps the success rate of thwarting plans of terrorists attacks, or other such problems addressed by the intelligence community.

Source: 
Paté-Cornell, E. (2002). Fusion of Intelligence Information: A Bayesian Approach. Risk Analysis: An International Journal, 22(3), 445-454.

The Deterrent Effect of Arrest in Incidents of Domestic Violence: A Bayesian Analysis of Four Field Experiments

Summary:
The authors of this study, Berk, Campbell, Klap and Western (1992), looked at a number of different studies conducted following a study of the Minneapolis Police Department.  The initial study looked at police responses to misdemeanors for domestic assault.  There were three response options available to the police officers, and this measures were supposed to be given out randomly.  These options were (1) to arrest the suspect, (2) remove the suspect from the premises for 24 hours, and (3) to attempt to restore order at that moment.  Through a series of initial and follow-up interviews, it was determined that arrest of the suspect was the most effective way to reduce further violence.  Based on these result, police departments were encouraged to arrest suspects as soon as possible in domestic assault cases.  In addition, that National Institute of Justice funded six replications of the Minneapolis experiment to take place across the United States.

The authors took results from the initial Minneapolis study as well as the following six studies, applied Bayesian Analysis, and attempted to determine if there was an applicable theory; labeling theory or social control theory.  The authors took a combination of a Bayesian Analysis and meta-analysis to attempt to replicate the original study as well as the results that came with that. The subsequent studies were used as different levels of the Bayesian Analysis.

The findings of the this analysis determined that there was no generalizable approach to effectively reducing further violence in domestic assault incidents.  Berk, et. al. determined that there were "good" and "bad" risks, and the different positions and relations individuals held in society determined the effectiveness of arrests.  Individuals who did not feel as constrained by their social standing, or not constrained by social controls are seen to be "bad' risks -- they are likely to repeatedly offend, since they are not as deterred.

This study concluded that social control elements, such as familial ties, relationships, and public perception, are only indicators, not actual measures of attachment.  Therefore, there is no generalizable finding that is applicable to offenders across the United States, or even to offenders in the same region, just over time.  Therefore, there is no statement overall that is applicable to offenders or one that applies specifically to site's past, present, and future offenders.

Critique:
The application of the Bayesian Analysis was interesting since it not only looked at a statisical element, but it also included a meta-analysis to attempt to understand a method that is most effective at curbing domestic violence.

The study did note that the detailed steps for the Bayesian Analysis were located in another document, which made it slightly difficult to understand the larger picture, including the specific elements that went into the analysis.  Overall findings from the analysis are presented, and analyzed in a manner that is coherent to individuals outside of the field.  That being said, it would have been beneficial to include a more detailed element in this study depicting the numerical application of Bayesian analysis rather than the written element.

Berk, R., Campbell, A., Klap, R., & Western, B. (1992). The deterrent effect of arrests in incidents of domestic violence: A Bayesian Analysis of four field experiments. American Sociological Review, 57(5), 698-708. Retrieved from http://www.jstor.org/stable/10.2307/2095923