| dataset_desc |
Description of the PDF File
This document is a se Description of the PDF File
This document is a set of lecture notes on Population Genetics designed for a university-level module (G14TBS). It serves as a theoretical and mathematical introduction to the study of genetic variation within populations. The notes progress from a brief history of genetics (Mendel, Darwin, Molecular) to the core principles of population genetics, specifically the Hardy-Weinberg Law (HWL). It provides detailed mathematical derivations of the law, methods for estimating allele frequencies (including Fisher’s Approximate Variance Formula and the EM Algorithm), and statistical tests for detecting deviations from equilibrium. The course emphasizes problem-based learning, moving from simple 2-allele models (e.g., albinism, moth coloration) to complex multi-allele scenarios (e.g., ABO blood groups) and eventually touches on forces that disrupt equilibrium like genetic drift (Wright-Fisher model) and selection.
2. Key Points, Headings, Topics, and Questions
Heading 1: Introduction & History
Topic: Foundations of Genetics
Key Points:
Classical Genetics: Mendel’s laws (Segregation, Independent Assortment) and the concept of discrete genes/alleles.
Molecular Genetics: Discovery of DNA as the genetic material (Watson & Crick, 1953) and the genetic code.
Evolution: Darwin’s theory of natural selection acts on the variation provided by mutations and Mendelian inheritance.
Glossary Key Terms: Allele, Genotype, Phenotype, Haploid/Diploid, Locus, Linkage.
Study Questions:
What is the difference between a genotype and a phenotype?
Explain Mendel’s Law of Segregation.
Heading 2: Hardy-Weinberg Equilibrium (HWE)
Topic: The Fundamental Law of Population Genetics
Key Points:
Definition: In the absence of evolutionary forces (mutation, migration, selection, non-random mating), allele and genotype frequencies remain constant from generation to generation.
Assumptions: Random mating, infinite population size, no mutation/migration/selection.
The HWL Equation: For two alleles (
A
and
a
), if
p
= freq(
A
) and
q
= freq(
a
), then genotype frequencies are
p
2
,
2pq
,
q
2
.
Significance: It serves as a "null hypothesis." Deviations indicate that evolutionary forces are acting on the population.
Study Questions:
Why is HWL considered a "zero-force law"?
If the frequency of allele
A
is
0.7
, what are the frequencies of genotypes
AA
,
Aa
, and
aa
?
Heading 3: Estimating Allele Frequencies
Topic: Estimation Methods & Statistics
Key Points:
Dominant Phenotypes: Recessive individuals (
aa
) are observable, but dominant homozygotes (
AA
) and heterozygotes (
Aa
) look the same.
Sampling: We count recessive individuals (
R
) and total sample size (
N
).
Point Estimate:
q
^
=
R/N
.
Fisher’s Variance Formula:
Var(
q
^
)≈
4N
1
(1−
N
R
)
. Measures uncertainty in our estimate.
Confidence Intervals: Allow us to determine if two populations have significantly different allele frequencies.
Study Questions:
How do we estimate the frequency of a recessive allele if we only observe phenotypes?
What does Fisher’s variance formula help us calculate?
Heading 4: The EM Algorithm
Topic: Maximum Likelihood Estimation (MLE)
Key Points:
Concept: An iterative algorithm to estimate parameters (
θ
) when data is incomplete or missing (e.g., missing
AA
and
Aa
counts).
Steps:
E-step (Expectation): Estimate the missing data (
n
AA
,n
Aa
) given current parameter estimates (
q(m)
).
M-step (Maximization): Re-estimate the parameter (
q(m+1)
) that maximizes the likelihood given the completed data.
Convergence: Repeat until values stabilize.
Application (Albinism): If only recessives (
naa
) and total (
n
d
) are known, the algorithm iterates to find
q
.
Study Questions:
What does "EM" stand for?
Why is the EM algorithm useful in population genetics?
Heading 5: Testing for HWE
Topic: Statistical Goodness of Fit
Key Points:
Null Hypothesis (
H
0
): The population is in Hardy-Weinberg Equilibrium.
Likelihood Ratio Test (LRT):
Λ=2log(L(
θ
^
)/L(
θ
^
0
))
. Compares the fit of the observed data under the full model vs. restricted (HWE) model.
Pearson’s Chi-Squared:
X
2
=∑
E
i
(O
i
−E
i
)
2
. Used for large samples to test for significant deviation.
Degrees of Freedom: Difference in the number of free parameters between the two models.
Study Questions:
What is the purpose of a Likelihood Ratio Test?
How do you determine the degrees of freedom for the chi-squared test?
Heading 6: Genetic Drift & Mutation
Topic: Wright-Fisher Model
Key Points:
Genetic Drift: Random changes in allele frequencies due to sampling error in finite populations. Stronger in small populations.
Wright-Fisher Model:
Assumptions: Constant population size (
2N
), non-overlapping generations, random mating.
States:
X
t
= number of
A
alleles at time
t
.
Absorbing States:** Fixation (
X=2N
) and Loss (
X=0
).
Probability of Fixation: The chance that any specific allele will eventually become fixed in the population is equal to its initial frequency.
Study Questions:
What is the main difference between genetic drift and natural selection in terms of directionality?
In the Wright-Fisher model, what does it mean for an allele to be in an "absorbing state"?
3. Easy Explanation (Simplified Concepts)
The "Bank Account" Analogy (Hardy-Weinberg)
Imagine a bank account representing a gene.
Alleles (
p
and
q
): These are the types of coins (Penny and Quarter) in the bank.
Genotype Frequencies (
p
2
,
2pq
,
q
2
): This is how the coins are distributed (pairs of Pennies, mixed pairs, pairs of Quarters).
The Law: If no one deposits or withdraws money (No Evolutionary Forces), the ratio of coins stays exactly the same forever, regardless of how much money is in the bank.
Why do we count moths (Estimation)?
Imagine you are at a beach where 87% of seashells are black (dominant color). You want to know the frequency of the "white shell" allele (recessive).
Since you can't tell the difference between a heterozygous moth (carrying one white gene) and a homozygous dominant moth (two black genes), you can't just count genes directly.
You have to calculate: If 13 out of 100 are white, the frequency of the white allele is
0.13
≈0.36
.
The EM Algorithm (Iterative Fixing)
Imagine you have a puzzle with missing pieces.
Guess: You guess what the missing pieces look like (
q(0)
).
Check: You see if your guess makes the picture look consistent.
Adjust: You slightly change your guess to make the picture even more consistent.
Repeat: You keep guessing and adjusting until the picture is perfect and doesn't change anymore. This is "Convergence."
Genetic Drift: The Coin Flip
Imagine you have a jar with 10 black marbles and 10 white marbles (
2N=20
).
You pick 2 marbles at random, note their colors, and put them back (Wright-Fisher model).
By chance, you might pick 2 black ones. Now the jar has more white marbles (relatively).
If you keep doing this for generations, eventually, you might end up with a jar of only white marbles (Fixation) or only black marbles (Loss).
This is Genetic Drift: The luck of the draw changes the population, even if the marbles are equally good at surviving.
4. Presentation Structure
Slide 1: Title Slide
Title: Population Genetics (G14TBS Part II)
Lecturer: Dr. Richard Wilkinson
Module Focus: Introduction, Hardy-Weinberg Equilibrium, Estimation, and Genetic Drift.
Slide 2: Course Introduction
Goal: Problem-based learning to understand genetic variation and evolution.
Key Textbooks: Gillespie, Hartl, Ewens, Holsinger.
Methodology: Mathematical derivations + Statistical applications.
Slide 3: A Brief History of Genetics
Classical: Mendel (Segregation, Independent Assortment).
Molecular: Discovery of DNA/RNA/Proteins.
Key Definitions: Gene, Allele, Genotype, Phenotype, Chromosome.
Slide 4: Hardy-Weinberg Law
Concept: Stability of allele frequencies in the absence of forces.
The Equation:
p
2
+2pq+q
2
=1
.
Assumptions: Large population, random mating, no mutation/migration/selection.
Significance: The "Null Hypothesis" of population genetics.
Slide 5: Estimating Allele Frequencies (Moths)
Problem: Dominant phenotypes hide recessive genotypes.
Solution: Observe Recessives (
R
), Total (
N
)
→
q
^
=
R/N
.
Example: Industrial Melanism (87% black moths).
Slide 6: Estimation Statistics (Fisher’s Variance)
Formula:
Var(
q
^
)≈
4N
1
(1−
N
R
)
.
Purpose: To quantify uncertainty/standard error of our estimate.
Application: Comparing genetic variation between populations.
Slide 7: The EM Algorithm
Scenario: Missing Data (
N
AA
,N
Aa
unknown).
Logic:
Estimate missing counts (
E
-step) based on current parameter estimate.
Maximize Likelihood (
M
-step) to update parameter.
Outcome: Converges to the most likely allele frequency.
Slide 8: Testing for HWE
Null Hypothesis (
H
0
): Population is in Hardy-Weinberg Equilibrium.
Statistical Tests:
Likelihood Ratio Test (General).
Pearson’s Chi-Squared (Goodness of fit).
Decision: Reject
H
0
if the test statistic is too high (indicating evolutionary forces).
Slide 9: Genetic Drift (Wright-Fisher Model)
Definition: Random changes in allele frequencies due to finite population size.
The Model:
Binomial sampling of alleles for the next generation.
Absorbing States: Fixation (
2N
) and Loss (
0
).
Key Result: Probability of fixation = initial frequency.
Slide 10: Summary
HWE provides a baseline to detect evolutionary forces.
Estimation methods (Fisher/EM) handle real-world data limitations.
Drift explains random evolutionary changes in small populations.... |