Heritability: The Interplay of Genetics and Environment İmplementation with Python
Heritability in genetics refers to the proportion of variation in a trait that is due to genetic factors. Here are some key findings from the search results:
- Intelligence is highly heritable, meaning that a large proportion of the variation in intelligence can be attributed to genetic factors.
- The heritability of intelligence increases with age, from weakly correlated with genetics for children to strongly correlated with genetics for late teens and adults.
- Intelligence is a polygenic trait, meaning that it is influenced by many genes, with at least 500 genes involved.
- Recent genome-wide association studies have identified inherited genome sequence differences that account for 20% of the 50% heritability of intelligence.
- The genetic influences on intelligence are ongoing areas of research, and it is likely that intelligence involves many genes that each make only a small contribution to a person's intelligence.
- Environmental factors, such as home environment, parenting, education, availability of learning resources, healthcare, and nutrition, also play a significant role in intelligence.
- Although IQ differences between individuals have been shown to have a large hereditary component, it does not follow that disparities in IQ between groups have a genetic basis.
Heritability in genetics refers to the proportion of variation in a trait that is due to genetic factors. Intelligence is highly heritable, but it is also influenced by environmental factors. Intelligence is a polygenic trait, and recent genome-wide association studies have identified inherited genome sequence differences that account for 20% of the 50% heritability of intelligence.
Broad-sense and Narrow-sense Heritability
Broad-sense heritability and narrow-sense heritability are two types of heritability that help us understand the contribution of genetic factors to the variation in a trait. Here's a breakdown of these two concepts:
- Broad-sense heritability (H^2): This type of heritability is the proportion of phenotypic variance that can be explained by all genetic differences among individuals. It is the ratio of total genetic variance (VG) to total phenotypic variance (VP). Broad-sense heritability takes into account both additive and non-additive genetic effects, such as dominance and epistasis.
- Narrow-sense heritability (h^2): Narrow-sense heritability is the proportion of phenotypic variation that is due to additive genetic effects only. It is the ratio of additive genetic variance (VA) to total phenotypic variance (VP). Additive genetic effects refer to the contribution of individual genes or alleles that have a linear and independent influence on the trait.
The distinction between broad-sense and narrow-sense heritability is important because:
- Narrow-sense heritability is more relevant in animal and plant selection programs, as response to artificial and natural selection depends on additive genetic variance.
- Resemblance between relatives is mostly driven by additive genetic variance.
- Broad-sense heritability provides a more general understanding of the contribution of genetics to a trait, while narrow-sense heritability focuses specifically on the additive genetic effects.
Additive and dominant/recessive genetic models
Additive and dominant/recessive genetic models are two ways to describe the genetic basis of traits. Here are the differences between these two models: Additive genetic model:
- The effect of each allele is independent of the other alleles at the same locus.
- The total effect of all alleles is the sum of their individual effects.
- Narrow-sense heritability is related to the additive genetic model, as it captures the proportion of phenotypic variation that is due to additive genetic values only.
- The additive model estimates the average effect at a single locus by linear regression of phenotypes on allele counts (i.e., genotypes).
- The additive model may give inaccurate estimates of the average effect when dominance is present, as it falsely assumes that the residuals are independent and identically distributed.
Dominant/recessive genetic model:
- The effect of one allele masks the effect of the other allele at the same locus.
- Dominant alleles are expressed in the phenotype even if only one copy is present, while recessive alleles are only expressed if two copies are present.
- The dominant model assumes that having one or more copies of the dominant allele increases risk compared to the recessive allele, while the recessive model assumes that two copies of the recessive allele are required to alter the risk.
- The dominant/recessive model is commonly used in genome-wide association studies to study the association between genetic variants and phenotypes.
- The additive model is underpowered when the true mode of inheritance is recessive.
Identical by State (IBS) and Identical by Decent (IBD)
Identical by state (IBS) and identical by descent (IBD) are terms used in genetics to describe similarities between nucleotide sequences of DNA. Here are the definitions of these terms:
- Identical by state (IBS): Two alleles or two segments of DNA are identical by state if they have the same nucleotide sequence, regardless of whether they were inherited from a common ancestor or not. IBS segments are not genealogically relevant and do not share a recent common ancestor.
- Identical by descent (IBD): Two alleles or two segments of DNA are identical by descent if they have the same nucleotide sequence and were inherited from a common ancestor without any intervening recombination. IBD segments are genealogically relevant and share a recent common ancestor.
In genetic genealogy, IBS is generally used to describe segments which are not identical by descent and therefore do not share a recent common ancestor. On the other hand, IBD is used to describe a matching segment of DNA shared by two or more people that has been inherited from a common ancestor without any intervening recombination.
Narrowsense Heritability and Identical by Decent
Identical by descent (IBD) regression in families is a method to estimate the heritability of complex traits using identity-by-descent information. It involves fitting a linear mixed model of genetic relatedness between close and distant relatives to jointly estimate variance components that correspond to heritability explained by genome-wide common genetic variation and variance explained by uncaptured genetic variation, the sum representing total narrow-sense heritability.
Narrow-sense heritability and identical by descent (IBD) regression in families are two concepts related to the genetic basis of traits. Narrow-sense heritability captures the proportion of phenotypic variance that is due to additive genetic values only, and it is estimated by comparing the agreement of the phenotype in family pairs by the expected correlation based on genetic relatedness. Identical by descent regression in families is a method to estimate the heritability of complex traits using identity-by-descent information, and it involves fitting a linear mixed model of genetic relatedness between close and distant relatives to jointly estimate variance components that correspond to heritability explained by genome-wide common genetic variation and variance explained by uncaptured genetic variation, the sum representing total narrow-sense heritability.
How is Narrow-sense Heritability Estimated Using Identity-by-Descent Information
Narrow-sense heritability can be estimated using identity-by-descent (IBD) information by leveraging the proportion of the genome shared IBD between pairs of individuals in a sample and using a genetic relationship matrix (GRM) to drive and estimate heritability.
Here is a step-by-step overview of the process:
1. IBD segment identification: The first step is to identify IBD segments, which are regions of the genome that are identical by descent from a common ancestor. This can be done using established IBD estimation methods.
2. Estimation of IBD proportions: Once the IBD segments are identified, the next step is to estimate the proportions of the genome that are shared IBD between all pairs of individuals in the sample. This can be done using SNP data and IBD estimation methods.
3. Construction of the IBD-GRM: The estimated proportions of the genome shared IBD are used to construct a genetic relationship matrix (GRM), where the elements of the matrix represent the estimated proportions of IBD between pairs of individuals.
4. Estimation of narrow-sense heritability: The constructed IBD-GRM is then used to estimate narrow-sense heritability. This can be done by fitting a linear mixed model of genetic relatedness between close and distant relatives to jointly estimate variance components that correspond to heritability explained by genome-wide common genetic variation and variance explained by uncaptured genetic variation, the sum representing total narrow-sense heritability.
By leveraging IBD information and using a GRM to estimate heritability, researchers can potentially estimate the full narrow-sense heritability without the confounding effects shared within families that can bias estimates when close relatives are used, and without the downward bias in estimation when causal variants are rare or poorly tagged by SNPs.
Methods for Detecting Identity-by-Descent Segments in Genetic Data
There are several methods for detecting identity-by-descent (IBD) segments in genetic data. Here are some common methods:
1. Haplotype-based methods: Haplotype-based methods use phased genotype data to identify IBD segments by comparing haplotypes between individuals. These methods can be computationally intensive but can provide high accuracy in detecting IBD segments.
2. Genotype-based methods: Genotype-based methods use unphased genotype data to identify IBD segments by comparing allele sharing between individuals. These methods are less computationally intensive than haplotype-based methods but may be less accurate in detecting IBD segments.
3. PBLAS method: The PBLAS (positional Burrows-Wheeler transform-based linkage analysis) method is a fast and simple method for detecting IBD segments in large-scale data. This method combines a compressed representation of genotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times.
4. Identity-by-state (IBS) methods: IBS methods use genotype data to identify segments that are identical by state, which may or may not be identical by descent. These methods are less powerful than IBD methods but can be useful in certain situations, such as when IBD information is not available.
5. Admixture-based methods: Admixture-based methods use IBD information to identify genetic variants associated with disease risk in admixed populations. These methods can be used to reveal evidence of differential disease risk by genetic ancestry.
Advantages and Limitations of Using Identity-by-Descent Information to Estimate Narrow-sense Heritability
Using identity-by-descent (IBD) information to estimate narrow-sense heritability has several advantages over traditional methods based on family or twin studies. Here are some of the advantages:
- Unbiased estimates: IBD-based approaches can produce unbiased estimates of narrow-sense heritability, even when causal variants are rare or poorly tagged by SNPs.
- Less confounding effects: IBD-based approaches can potentially estimate the full narrow-sense heritability without the confounding effects shared within families that can bias estimates when close relatives are used.
- Leverages large datasets: IBD-based approaches can leverage large genetic datasets to infer the contribution of very rare variants lost using other methods.
- Flexible: IBD-based approaches can be applied to a wide range of complex traits and populations.
- Population stratification: Population stratification can lead to strong biases in IBD-based heritability estimates.
- Environmental effects: Non-genetic familial environmental effects shared across generations can also lead to biases in IBD-based heritability estimates.
- Low precision: The precision of IBD-based heritability estimates can be low, especially when causal variants are rare.
Using IBD information to estimate narrow-sense heritability has several advantages, including unbiased estimates, less confounding effects, and the ability to leverage large datasets. However, there are also some limitations and challenges associated with this approach, such as population stratification, environmental effects, and low precision.
Multiple Sclerosis and Heritability
Identity-by-descent (IBD) information can be used to estimate narrow-sense heritability of complex traits, including multiple sclerosis (MS). Here are some examples of how IBD information has been used to study MS:
- IBD mapping to detect rare variants conferring susceptibility to MS: IBD mapping has been used to detect rare variants in genome-wide association study (GWAS) datasets that may contribute to the heritability of MS. By identifying segments shared IBD using the PBLA approach, researchers can narrow down the region of interest for sequencing priority and potentially identify rare variants that contribute to MS susceptibility.
- IBD mapping in a Scandinavian MS cohort: IBD mapping has been used to map chromosomal regions carrying rare gene variants contributing to the risk of MS in a Scandinavian MS cohort.
- By identifying segments shared IBD, researchers were able to identify potential regions of the genome that may harbor rare variants contributing to MS susceptibility.
- Admixture mapping to reveal differential MS risk by genetic ancestry: Admixture mapping, a method that leverages IBD information to identify genetic variants associated with disease risk in admixed populations, has been used to reveal evidence of differential MS risk by genetic ancestry. By characterizing the ancestry of MS-associated alleles using RFMix, researchers were able to estimate their local ancestry and identify potential genetic risk factors for MS in different ancestral populations.
In conclusion, IBD information can be used to estimate narrow-sense heritability of complex traits, including MS. By leveraging IBD information, researchers can potentially identify rare variants that contribute to MS susceptibility and identify potential genetic risk factors for MS in different ancestral populations.
Python implementation for IBD regression for Narrow-sense Heritability
To translate this process into Python code using built-in functions and NumPy, we'll walk through each step, assuming that you already have the IBD segments identified and the SNP data available.
Here's how each step could be approached programmatically:
1. IBD Segment Identification:
- The find_ibd_segments function you've provided is designed to identify IBD segments between two haplotypes by comparing their allele matches and looking for contiguous regions that meet a certain length criterion. This function can indeed be used to generate the IBD data that would be required for the heritability estimation step.
Here's how you could integrate this function into the broader workflow for heritability estimation:
1. Use the find_ibd_segments function to process pairs of haplotypes and generate IBD segment data.
2. Sum the lengths of the IBD segments for each pair to calculate the proportion of the genome that is shared IBD.
3. Use these proportions to construct the GRM.
4. With the GRM and phenotype data, estimate narrow-sense heritability using a linear mixed model.
In a practical setting, you would use find_ibd_segments to process a dataset containing haplotypes from multiple individuals. After calculating the IBD segments for all pairs, you would sum the lengths of these segments to estimate IBD proportions for each pair of individuals. This data would then be used to create the GRM as input for the heritability estimation.
2. Estimation of IBD Proportions:
- You would take the IBD segments for each pair and calculate the proportion of the genome shared. This involves summing the lengths of IBD segments and dividing by the total length of the genome considered.
3. Construction of the IBD-GRM:
- Using the IBD proportions, construct a matrix where each element (i, j) is the proportion of the genome shared IBD between individuals i and j. This matrix is symmetric and can be constructed efficiently using NumPy arrays.
4. Estimation of Narrow-Sense Heritability:
- Fit a linear mixed model using the IBD-GRM as a random effect to partition variance components. This is typically done using specialized statistical software packages that can handle mixed models, like GCTA, but you can also implement a simple version in Python using optimization libraries like scipy.optimize.
import numpy as np
from scipy.optimize import minimize
# This function is provided by you
def find_ibd_segments(hap1, hap2, min_match_length):
matches = hap1 == hap2
change_points = np.where(np.diff(matches.astype(int)))[0]
segment_lengths = np.diff(np.append(change_points, matches.size))
ibd_segments = [(start, start + length)
for start, length in zip(change_points, segment_lengths)
if length >= min_match_length and matches[start]]
return ibd_segments
# Assuming we have a function that returns all pairs of haplotypes
def process_all_pairs(all_haplotypes, min_match_length):
num_individuals = len(all_haplotypes)
ibd_proportions = np.zeros((num_individuals, num_individuals))
for i in range(num_individuals):
for j in range(i+1, num_individuals): # To avoid duplicate comparisons
hap1 = all_haplotypes[i]
hap2 = all_haplotypes[j]
ibd_segments = find_ibd_segments(hap1, hap2, min_match_length)
# Sum the lengths of all IBD segments for the pair
total_ibd_length = sum(end - start for start, end in ibd_segments)
# Calculate the proportion of the genome that is IBD
ibd_proportion = total_ibd_length / len(hap1)
ibd_proportions[i, j] = ibd_proportions[j, i] = ibd_proportion
return ibd_proportions
This process_all_pairs function iterates over all pairs of individuals, computes the IBD segments using find_ibd_segments, and then calculates the IBD proportions. These proportions can then be used to construct the GRM as described in the previous messages. The next steps would involve using the GRM in the linear mixed model to estimate heritability.
# Step 3: Construction of the IBD-GRM
def construct_ibd_grm(ibd_proportions, num_individuals):
grm = np.zeros((num_individuals, num_individuals))
for (i, j), proportion in ibd_proportions.items():
grm[i, j] = grm[j, i] = proportion
return grm
Estimating narrow-sense heritability using a linear mixed model is a statistical method that can be quite complex, especially when it comes to defining the likelihood function for optimization. However, I can provide you with a conceptual framework for how you might structure this function in Python. You would need to use additional statistical libraries such as statsmodels or limix to perform the actual mixed model analysis.
Below is a conceptual outline of how you might write a mixed_model_likelihood function using a Python library that supports mixed models. This is not a complete or executable code, as the real implementation would be much more complex:
from scipy.stats import norm
def estimate_heritability(grm, phenotype_data):
# Define the number of individuals
n = phenotype_data.shape[0]
# Define the mixed model likelihood function
def mixed_model_likelihood(parameters):
# Extract variance components from parameters
sigma_g2, sigma_e2 = parameters
# Calculate the total variance
sigma_p2 = sigma_g2 + sigma_e2
# Calculate the kinship matrix, which is proportional to the GRM
kinship_matrix = sigma_g2 * grm
# Construct the covariance matrix for the random effects
covariance_matrix = kinship_matrix + sigma_e2 * np.eye(n)
# Invert the covariance matrix
try:
inv_cov_matrix = np.linalg.inv(covariance_matrix)
except np.linalg.LinAlgError:
return np.inf # Return a large number in case of non-invertible matrix
# Calculate the log-likelihood
log_likelihood = -0.5 * n * np.log(2 * np.pi)
log_likelihood -= 0.5 * np.log(np.linalg.det(covariance_matrix))
log_likelihood -= 0.5 * phenotype_data.T @ inv_cov_matrix @ phenotype_data
# Return the negative log-likelihood for minimization
return -log_likelihood
# Initial guess for the variance components
initial_guess = [np.var(phenotype_data) * 0.5] * 2 # Start with half variance for genetic and half for error
# Bounds for variance components to ensure they are positive
bounds = [(0, None), (0, None)]
# Minimize the negative log-likelihood
result = minimize(mixed_model_likelihood, initial_guess, bounds=bounds)
# The proportion of variance explained by the genetic variance component is the heritability
heritability_estimate = result.x[0] / (result.x[0] + result.x[1])
return heritability_estimate
# You would call this function with the GRM and the phenotype data:
# heritability_estimate = estimate_heritability(grm, phenotype_data)
# Example usage:
# Assuming that 'haplotype1' and 'haplotype2' are numpy arrays representing the haplotypes of two individuals.
# For demonstration purposes, these are arbitrary alleles at 10 loci:
# Let's create a hypothetical dataset of haplotypes for 4 individuals
haplotypes_dataset = [
np.array([0, 1, 1, 0, 2, 2, 1, 0, 1, 2]),
np.array([0, 1, 0, 0, 2, 2, 1, 1, 1, 2]),
np.array([1, 1, 1, 0, 2, 1, 0, 0, 1, 2]),
np.array([0, 1, 1, 1, 2, 0, 1, 0, 1, 1])
]
# Define the minimum matching length for an IBD segment
min_matching_length = 3 # This is an arbitrary threshold for this example
# Process all pairs to get the IBD proportions matrix
ibd_proportions_matrix = process_all_pairs(haplotypes_dataset, min_matching_length)
num_individuals = 4 # Placeholder for the number of individuals in your study
grm = construct_ibd_grm(ibd_proportions, num_individuals)
phenotype_data = [0,0,1,1] # Load or define your phenotype data here
heritability_estimate = estimate_heritability(grm, phenotype_data)
This pseudocode outlines how to structure the likelihood function for a simple variance components model. The mixed_model_likelihood function computes the negative log-likelihood of the mixed model given the genetic variance (sigma_g2) and the error variance (sigma_e2). It uses the genetic relationship matrix (GRM) to construct the covariance matrix for the random genetic effects.
In reality, to run this model, you'd likely need to use a specialized library or software that is capable of handling the optimization of such complex likelihood functions, especially when working with large datasets typical in genetic studies.