Advanced Analytic Techniques: analytic method

Showing posts with label analytic method. Show all posts

Friday, September 8, 2017

Gap Analysis: An Innovative Look at Gateway Courses and Student Retention

Summary and Critique by Keith Robinson Jr.

Summary

Authors William Bloemer, Scott Day, and Karen Swan of University of Illinois--Springfield challenged the notion that identifying gateway courses in which faculty focuses their attention on the large number of students that fail or withdraw is an effective use of limited resources. The authors recognized that these gateway courses do not have the same impact on all students, an unfair assumption that previous approaches in quantifying and identifying the best "fix" tend to share. Students come from diverse backgrounds and have different learning needs. They argue that the effectiveness of a course should be determined by the contexts of the students within it; "it is unreasonable to expect all courses to serve students equally well." The authors searched for the right measures to identify problem gateway courses, taking a look at the context of the students enrolled.

The data was pulled from all undergraduate degree-seeking students enrolled in a small, Midwestern public university over a four year period. An end of term grade of D, F, or prior Withdrawal from a course indicated that the student failed to complete the course successfully. The authors measured persistence, enrollment in the next regularly scheduled term or graduation, for two reasons: the probability of a student that takes a break in enrollment to graduate is much lower because most students simply do not return, and in order to connect individual courses to persistence it must be done with a short measure.

Students were classified according to type and stages in their academic life cycle. Student types were as follows: Native Freshmen, Honors Freshmen, On-ground Transfers, and Online Transfers. Stages in the academic life cycle were the first term (critical for transfer students), end of the first year/second semester (freshmen), and second and third year. Anything beyond the third year was considered the last stage. The research utilized a binary logistic regression to predict the probability a student would post a D, F, or Withdraw grade in any specific course the institution offered over a particular four-year period. Student Type, point in the Student Life Cycle, prior cumulative GPA, and fraction of courses students already received a D, F, or W were predictor variables. The predicted D, F, and W rates were utilized as a benchmark against actual course performance, and the difference (Gap) between them was calculated.

The results of the study illustrated that the ranking of courses based on actual and predicted DFW rates painted a similar picture; however, significant differences were noted as well. Some courses with high DFW rates had high predicted rates, some courses performed better than expected. There were also courses with DFW rates not high enough to attract attention but are much higher than expected given the student population. And there were courses with extremely low DFW rates, near zero, despite predicted rates being substantial.

In conclusion, while gateway courses with D, F, and Withdrawal rates are the first to receive the attention of retention efforts, at times this attention is misdirected and often harmful. It may be necessary to identify "problem" courses to replicate its relative success with a specific student type or problematic student population. It is beneficial to identify problem courses, however, it is equally as beneficial to identify the student types and at what cycle they are in their academic careers to base expectations.

Critique

The study is clearly limited to the undergraduate population of a small, Midwestern public university over a four-year period; it is not indicative of all students across the US. In analysis of D, F, and W grades received, there appeared to be students with high GPAs with high W rates, assumed to protect their GPAs. That skews the data. It is also nearly impossible to truly quantify problem courses. While a course may prove difficult, it could just as easily be due to lack of student effort. Additionally, The authors recognize that other approaches than that utilized in the research may be more appropriate for other academic institutions based on local factors that impact student success (or failure). And finally, while the fastest, most cost-effective solution to high DFW rates may be in academic advising such as preventing particular types of students from attempting courses known to be difficult for students at their current stage of academic life cycle, that may come across as discriminatory.

Source:
Gap Analysis: An Innovative Look at Gateway Courses and Student Retention

Friday, September 1, 2017

Using Subjective Logic to Evaluate Competing Hypotheses

Summary and Critique by Claude Bingham
Original research by Simon Pope, Audun Jøsang of the CRC for Enterprise Distributed Systems Technology.

Summary
Intelligence analysis is difficult to define and even harder to evaluate in terms of quality. To analyze and improve upon the Analysis of Competing Hypothesis (ACH) process, one of the CIA's primary analysis methodologies, the researchers at the CRC for Enterprise Distributed Systems Technology turned to Subjective Logic. This addition to the ACH is intended to serve as a starting point for reasoning through hypotheses and their outputs. One of the major benefits, of the ACH-SL variant, is claimed to be removing the step that requires evaluating evidence biases.

The CRC researchers give an extensive overview of the basic steps of the ACH process, highlighting important steps that can affect the validity of the analysis' outcome. The Steps to ACH are listed below:

Identify the possible hypotheses to be considered. Use a group of analysts with different perspectives to brainstorm the possibilities.
Make a list of significant evidence and arguments for and against each hypothesis.
Prepare a matrix with hypotheses across the top and evidence down the side. Analyze the “diagnosticity” of the evidence and arguments–that is, identify which items are most helpful in judging the relative likelihood of the hypotheses.
Refine the matrix. Reconsider the hypotheses and delete evidence and arguments that have no diagnostic value.
Draw tentative conclusions about the relative likelihood of each hypothesis. Proceed by trying to disprove the hypotheses rather than prove them.
Analyze how sensitive your conclusion is to a few critical items of evidence. Consider the consequences for your analysis if that evidence were wrong, misleading, or subject to a different interpretation.
Report conclusions. Discuss the relative likelihood of all the hypotheses, not just the most likely one.
Identify possible milestones for future observation that may indicate events are taking a different course than expected.

Step 5 is seen as particularly important. In this step, an analyst must make a judgment about individual evidence; does it support or refute the hypotheses, does it have counterfactuals that negate its importance? These questions are important because the ACH methodology does not expressly require them to be asked for the methodology to function. Not doing so can result in too narrow a framing of the impact of a piece of evidence.

The researchers also examined ACH-Counter Deception (ACH-CD), a variant that looks at ways to counteract possible deception in evidence. The key point of mentioning this variant is that it lessens the chance that positively confirming evidence can lead to reasoning errors. False positives are likely when taking positively confirming evidence at face value. The researchers of this study contend that the ACH-CD has a minor flaw in that it requires the analyst to decide between evidence OR its counterfactual rather than making a subjective judgment call and evaluating both.

Finally, the researchers lay out how the ACH-Subjective Logic (ACH-SL) methodology is different and what the steps entail.

(ACH Step 1)
(ACH Step 2)
Prepare a model consisting of:

A set of exhaustive and exclusive hypotheses – where one and only one must be true.
A set of items of evidence that are relevant to one or more hypotheses; are influences that have a causal influence on one or more hypotheses; or, would disconfirm one or more hypotheses.

Consider the evidence with respect to the hypotheses:

For each hypothesis and item of evidence, assess its base rate.
Should the evidence be treated as causal or derivative? Decide and record for each item of evidence or evidence/hypothesis pair.
Make judgments for causal evidence as to the likelihood of each hypothesis if the evidence were true and if the evidence were false.
Make judgments for derivative evidence as to the likelihood that the evidence will be true if the hypothesis were true, and if the hypothesis were false.
From the judgments provided, compute the diagnosticity for each item of evidence.

Measure the evidence itself and decide the likelihood that the evidence is true. Supply the measured evidence as input into the constructed model, and use the Subjective Logic calculus to compute the overall likelihood of each hypothesis.
Analyze how sensitive the conclusion is to a few critical items of evidence. Changes in the value of evidence with high diagnosticity will alter the calculated likelihoods more than evidence with low diagnosticity. Consider the consequences for your analysis if that evidence were wrong, misleading, or subject to a different interpretation.
(ACH Step 7)
(ACH Step 8)

Because Subjective Logic is a form of calculus, ACH-SL requires analyst-supplied values for subjective judgments on each hypothesis and compares those values to empirically established base rates. Intelligence hypotheses are often one-off events so the base rate really does not exist and the curve is smoothed with a formula based on the number of competing hypotheses. The evidence goes through a similar process of comparing the base rates to the subjectively valued rates.

Traditional ACH uses deductive reasoning through causal links between evidence and the hypotheses while ACH-CD uses abductive reasoning about derivative evidence, evidence that is not directly related to the hypothesis. There is a breakdown in these two methods because deductive reasoning about derivative evidence is difficult because the evidence is not causally related and the reverse is true of doing abductive reasoning about causal evidence. There, the evidence is more closely related to why a given hypothesis is possible and reasoning breaks down over what is and is not possible.

Finally, the researchers introduce the calculus behind Subjective Logic and explain how the Bayesian data points of systems and subjective value judgments of analysts can be combined to form probabilistic conclusions.

The researchers of this study conclude by stating that not only does ACH-SL work but that the Distributed Systems Technology Centre has created a technology framework called s ShEBA based upon it.

Critique

This study is very well thought out and takes a broad approach to explaining the path ACH methodology has taken to get to its current iteration. While the math is dense and difficult to understand critically at a novice level, it shows a more open-minded approach. It give the chance for the semantic rationalizations and subjective valuations analysts already give to evidence to be quantifiable and less arbitrary. It gives more shape and order to the resulting Words of Estimative Probability.

Wednesday, May 6, 2009

Summary Of Findings: Bayesian Analysis (4 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 12 articles read in advance of (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst College on 6 MAY 2009 regarding Bayesian Analysis specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:
Bayesian analysis is a method that uses Bayesian statistics to assess the likelihood of an event happening in light of new evidence. It generates an estimate and the use of Bayesian statistics in Intelligence analysis allows for the uncertainty of the traditional intelligence data set to be understood in a scientifically valid manner.

Strengths:

*can limit analyst biases by reducing the weight of evidence simply because it is new or vivid

*forces the analyst to resassess evidence and consider alternative possibilites

*adheres to rigid mathematical formulas

*provides a numerical likelihood

*provides audit trail and ability to reproduce results

Weaknesses:

*Probabilities are based largely on subjectivity

*Susceptible to biases

*Highly complex problems require heavy computations

*Can be mathematically complex

*Not always useful as a stand alone method (works well in tandem with methods like Delphi); may require SMEs for determining probability distributions

*Some reliance on ambiguous validities

*"Negative evidence"--absence of positive evidence

How-To:

This method loosely follows the guidance suggested by his line of research into the use of natural frequencies in teaching and explaining Bayes to beginners.

1.) Create a 2x2 matrix. Label the quadrants with the respective information that creates true positive, false negative, false positive, and true negative quadrants.

2.) Take the given information, the base line (for example, 100 out of 1,000) with the new information (for example, a new document that is 90% credible saying that war is immiment) which means that your true positive and your false negative must equal 100 and the false positive and true negative must equal 900.

3.) To calculate the true positive quandrant, take 90% of the 100 from the base line (which equals 90).

4.) To calculate the false negative quadrant, take the numerator of the base line (100) and subtract the true positive quadrant (90), creating an answer of 10.

5.) To calculate the true negative quadrant, take 90% of your non-war cases (900), equalling 810.

6.) To calculate the false positives, subtract the sum of the three quadrants known from the total number of cases (1,000), which equals 90.

7.) To calculate the new probablitiy, divide what the numerator of the base line (100) from the new total of positive caes (90+90=180), which equals 55.5%

The 55.5% means that there is a 55% probability that countries X and Y are likely to go to war.

Experience:

To understand the basic mathematical principles behind Bayes, the class worked through some sample problems. One of the problems was based on a medical test with an 80% accuracy rate for a cancer with 2% affliction rate in the general population. The class applied this to a sample population of 1000 cases. We established a matrix and assessed the true positive, false positive, false negative, and true negative quantities (16, 116, 4, and 784 respectively). We plugged these numbers into the appropriate matrix fields. We then divided the number of actual cases of cancer (20 or 2% of 1000) into the number of positive tests (132--the 16 true positives and 116 false postives). The result was 15% rate of those who have the cancer from the positive tests, a rather stark difference from the 2% base rate! This problem actually reflects the number of breast cancer rates from a medical treatment from around a two decades ago!

Note: see the matrix for a synopsis of another of the problems we worked through (a peice of evidence emerging suggesting a cause for war).

The class also used a Bayesian application to assess the likelihood we would contract swine flu. We started with the initial hypothesis that we would contract swine flu or we would not contract swine flu, and assigned an initial probability to each hypothesis (the latter >5%) We then added weighted evidence which influenced the base rate of the hypothesis. After all the evidence was entered, the class assessed the likelihood of contracting swine flu.

Tuesday, May 5, 2009

Using Search Engine Optimization For Intelligence Analysis

SEO Analysis Now - A Site For Using SEO in Intelligence Analysis
Rated: 4 Stars out of 5

As a final requirement of this course, I had to research and report findings on an analytical technique of my choosing. In addition, I had to test-out and apply the technique in order to reveal its true strengths and shortcomings; as well as come up with a how-to guide to use the technique. For my project, I chose to conduct a Search Engine Optimization (SEO) Analysis on the Websites of two popular coffee shops, Starbucks (which is well established) and Caribou Coffee (which is slowly gaining popularity) and reflect on how this process can be applied to Competitive Intelligence (CI), to Law Enforcement Intelligence (LEI), or to the National Security sector. I chose SEO not only because it is an emerging analytical process and can be useful to intelligence analysts, but also because I found it quite intriguing and fun.

SEO provides a way to gain insight into a Website’s audience.

If we know who an audience is (age, gender, ethnicity, education, affluence, etc), how they behave (what other Websites, or types of Websites, they visit), from where they log on, when they visit, and what other sites direct their traffic (as well as what their general interests are), then we can make assessments and predictions about how to either promote their behavior (steering them toward a particular site – useful for CI & marketing purposes) or how to counter that behavior (keeping them away from sites – useful for LEI & N’tl Sec purposes).

For more detailed information about how SEO can be used for Intel analysis, please check out my project:

SEO Analysis Now - A Site For Using SEO in Intelligence Analysis

(The image below is a geographical search index comparison of online users searching for Caribou Coffee [left] and Starbucks [right]. Images provided by Google Insights for Search)

Wednesday, April 22, 2009

Summary Of Findings: Game Theory (4 out of 5 Stars)

Note: This post represents the synthesis of the thoughts, procedures and experiences of others as represented in the 12 articles read in advance of (see previous posts) and the discussion among the students and instructor during the Advanced Analytic Techniques class at Mercyhurst College on 22 APR 2009 regarding Game Theory specifically. This technique was evaluated based on its overall validity, simplicity, flexibility and its ability to effectively use unstructured data.

Description:

Game theory is a method based on applied mathematics and economic theory. It can be useful when attempting to analyze (and ultimately predict) the strategic interactions between two or more actors and the way in which their actions influence future decisions. Game theory assumes that all actors are rational, and can be influenced by various individuals and factors. Games typically involve five common elements: players, strategies, rules, outcomes, and payoffs.

Strengths:

-assumes rational actors

-assumes actors will adjust their actions based on the actions of other actors

-not clearly differentiated from role-playing, simulations, and/or decision trees

-very mathematically based (can be intimidating)

-difficult to quantify options, strategies, and motivations

-may not be a valid method to produce an accurate estimation (see Game Theory, Simulated Interaction, and Unaided Judgment For Forecasting Decisions in Conflict: Further Evidence)

--In real world applications, identifying all of the key players and outcomes can be difficult

Weaknesses:

-Visual step-by-step trail to a conclusion/estimate

-Ability to quantify variables in play

-Emphasis on mathematics and scientifc method

-Applicable to multiple fields (economics, conflict, etc)

-90% rate of success according to BDM

How-To:

Game Theory varies in complexity and in application, however, each application has the following in common:

*Establish the players and the complexity of the game being played, so as to understand the rules which govern the players and the game.

*Identify the possible outcomes for the choices the players can make (although this is particularly difficult as not all decisions can be predicted)

*Establish measurable values for predicted outcomes.

*Eliminate dominated strategies and employ dominate strategies. Repeat this step until a clear, singular strategy emerges or equilibrium is reached between the players.

*Employ selected strategy.

Experience:

As a class, we visited www.gametheory.net and played the repeatable version of Prisoner's Delemma under the "Interactive Materials" tab. Each student played the game at their personal computers. Our objective as we played against the five "personalities" was to identify the particular strategies employed by the computer (in addition to scoring the most utility points). Some of the strategies employed by the computers personalities included "tit-for-tat" and "tit-for-two tats."

Advanced Analytic Techniques

Friday, September 8, 2017

Gap Analysis: An Innovative Look at Gateway Courses and Student Retention

Friday, September 1, 2017

Using Subjective Logic to Evaluate Competing Hypotheses

Wednesday, May 6, 2009

Summary Of Findings: Bayesian Analysis (4 out of 5 Stars)

Tuesday, May 5, 2009

Using Search Engine Optimization For Intelligence Analysis

Wednesday, April 22, 2009

Summary Of Findings: Game Theory (4 out of 5 Stars)

Contributors

ADVAT Course Projects:

Blog Archive

Labels

Advanced Analytic Techniques

Friday, September 8, 2017

Gap Analysis: An Innovative Look at Gateway Courses and Student Retention

Friday, September 1, 2017

Using Subjective Logic to Evaluate Competing Hypotheses

Wednesday, May 6, 2009

Summary Of Findings: Bayesian Analysis (4 out of 5 Stars)

Tuesday, May 5, 2009

Using Search Engine Optimization For Intelligence Analysis

Wednesday, April 22, 2009

Summary Of Findings: Game Theory (4 out of 5 Stars)

Subscribe To

Contributors

ADVAT Course Projects:

Blog Archive

Labels