Tuesday, April 17, 2012

Bayesian Analysis of Genetic Differentiation Between Populations


In their 2003 article, Corander, Waldmann, and Sillanpaa introduce a Bayesian method for estimating hidden population substructures using genetic molecular markers and geographical information. The technique is supposed to be a more flexible and durable method for detecting genetic differentiation and genetic drift in populations, especially when migration and mutation rates are low. While most studies use Wright’s F-statistics from 1965, this is one of several new studies that introduce new methods for analysis. This method uses Bayesian analysis focused on geographical sampling information while all group genetic combinations are considered a priori equally likely.


The method where samples are gathered from distinct populations based on geographic separation (as an example, the same tree species that grows in two distinctly separate river basins starts with two populations). Next, a number of unlinked molecular marker loci with a specific number of distinguishable alleles are collected. However, markers should be considered neutral and mutation rates low at this stage. The number of populations with differing allele frequencies is treated as a parameter where the upper bound is directly given by the sampling design. These numbers are then plugged into a Bayesian formula to determine allele
frequency and density and, by extension, how much genetic drift and substructures exist in a given population.

This method is superior to other statistical methods since those methods require conditioning on a known population structure, either causing uncertainty or resampling methods that may generate biased estimates. This Bayesian method accounts for uncertainty related to unknown population structure.

Using the Morrocan argan tree as a real data set, Corander et al applied their method. Measuring this pre-known population with 12 specific sub-populations based on a number of different allele frequencies provided a baseline comparison for the effectiveness of the method. Though results came up slightly lower than prior studies, the results were accurate. The lower estimates arise from the method’s accounting for the equality of certain populations. When compared to a geographic map, the method also demonstrated that genetic distances coincide with geographical distances while a few pairs of genetically similar populations are very distant geographically.

The group also used simulated data for a population lacking substructure. If there is no evidence against similarity, subpopulations tended to remain equal or merge into one larger population where the frequency of alleles is consistent. In examples where there was an underlying structure, especially if it was divided into two sub-populations, the empirical power to detect the correct underlying structure increases with the sample size.

For specific cases where potential population sizes are relatively large, a Markov Chain Monte Carlo estimation can be used. This model can be modified to handle more general settings. The group also proposes that the method could also use other genetic measures and markers if alleles are too numerous or insufficient.


While the method is not a conclusive method that can cover every situation, it does offer the ability to protect against exaggerated interpretations concerning differences caused by random fluctuations in allele frequencies over generations. In addition, using the Markov chain Monte Carlo estimation, the method could also project genetic and population trends into the future if they are left uninterrupted. The largest advantage of the method, however, is that the number of populations is treated as an unknown parameter, avoiding a potential anchoring bias or labeling problems that occur with high levels of gene flow. In other words, the method allows analysts to more accurately assess the differences and genetic drift between different parts of the population,
creating a more accurate genetic and geographic substructure.


Corander, J., Waldmann, P., Sillanpaa, M. (2003). Bayesian analysis of genetic differentiation between populations. The Genetics Society of America.
Retrieved From: http://www.genetics.org/content/163/1/367.full.pdf

1 comment:

  1. this sounds interesting, but incredibly specific. I've heard of applications of monte carlo simulations, but reading your article, it sounds more difficult than i previously thought.