Exploring the Causal Pathways with Python: Implementing Mendelian Randomization Methods in Research

Introduction:

Mendelian Randomization (MR) is a research method that uses genetic variation to investigate causal relationships between modifiable risk factors and diseases. It is based on the principles of Mendelian inheritance, which states that each person randomly inherits one of the two versions of every gene from their parents, and each genetic difference is inherited independently. This independence allows researchers to look for links between health outcomes and genetic differences that have a similar effect as the randomly assigned behaviors, environments, or other factors in Randomized Controlled Trials (RCTs), but are not subject to the influence of other factors.

Use Cases of Mendelian Randomization in Genetics

Mendelian Randomization has been used in various fields to study the causal effects of modifiable risk factors on diseases. Some of its use cases include:
--Studying the effects of toxins: Mendelian Randomization studies can look at genetic differences that affect how people's bodies break down a toxin that is thought to be linked to a disease, providing evidence about the causal influence of the toxin on the disease.
--Assessing the impact of lifestyle factors: Researchers can use Mendelian Randomization to investigate the causal effects of lifestyle factors such as smoking, alcohol consumption, and physical activity on various health outcomes.
--Evaluating the efficacy of drugs: Mendelian Randomization can be used to estimate the effects of drug targets on disease outcomes, providing evidence for the potential efficacy of new drugs.
--Investigating the relationship between biomarkers and diseases: Researchers can use Mendelian Randomization to examine the causal effects of biomarkers on diseases, helping to identify potential therapeutic targets.

Interpreting Mendelian Randomization Studies

Interpreting Mendelian Randomization studies can be challenging, and there are two key aspects to consider:
--Evaluating the plausibility of the underlying assumptions: Mendelian Randomization relies on several assumptions, including the relevance of the genetic variant as an instrumental variable, the absence of pleiotropy (where a genetic variant affects multiple traits), and the absence of horizontal pleiotropy (where a genetic variant affects the outcome through a different pathway than the risk factor). Assessing the plausibility of these assumptions is crucial for interpreting the results of Mendelian Randomization studies.
--Understanding the results: Mendelian Randomization studies provide estimates of the causal effects of risk factors on diseases. These estimates are often presented as the change in the outcome per unit change in the risk factor (e.g., the change in the odds of disease per one-unit increase in body mass index). Understanding the magnitude and uncertainty of these estimates is essential for interpreting the results.
Mendelian Randomization (MR) has been used in various public health studies to investigate the causal effects of modifiable risk factors on diseases. Some examples of these studies include:
--Studying the effects of smoking on lung cancer: A Mendelian Randomization study examined the causal relationship between smoking and lung cancer by using genetic variants associated with smoking behavior as instrumental variables. The study found strong evidence supporting a causal link between smoking and lung cancer.
--Investigating the impact of alcohol consumption on cardiovascular health: Researchers used Mendelian Randomization to estimate the causal effects of alcohol consumption on various cardiovascular outcomes, such as blood pressure, heart rate, and risk of coronary heart disease. The study found that moderate alcohol consumption was not associated with a lower risk of cardiovascular disease, challenging the notion of a protective effect of alcohol on heart health.
--Assessing the relationship between obesity and type 2 diabetes: A Mendelian Randomization study examined the causal relationship between obesity and type 2 diabetes by using genetic variants associated with body mass index (BMI) as instrumental variables. The study found strong evidence supporting a causal link between obesity and an increased risk of type 2 diabetes.
--Evaluating the impact of physical activity on mental health: Researchers used Mendelian Randomization to investigate the causal effects of physical activity on various mental health outcomes, such as depression, anxiety, and schizophrenia. The study found that higher levels of physical activity were causally associated with a lower risk of depression.
These examples demonstrate how Mendelian Randomization can provide valuable insights into the causal relationships between modifiable risk factors and diseases in public health research. However, interpreting Mendelian Randomization studies can be challenging, and researchers need to carefully consider the underlying assumptions and limitations of the method.

Mendelian Randomization Studies in Multiple Sclerosis

Mendelian randomization (MR) studies have been conducted to examine the association between possible risk factors and multiple sclerosis (MS). MR is an analytical approach that uses genetic variants as instrumental variables to strengthen the causal inference of an exposure-outcome association by minimizing confounding and reverse causality. Here are some key findings from MR studies on MS:
--Low vitamin D levels and high adult body mass index (BMI) appear to be uncontested risk factors for increased MS risk.
--Peripheral leucocyte and lymphocyte counts have been found to be positively associated with the risk for MS.
--Genetically low 25-hydroxyvitamin D (25OHD) levels are associated with a high susceptibility to MS.
--Proteins FCRL3, TYMP, and AHSG in plasma have a protective effect on MS, while MMEL1 in cerebrospinal fluid (CSF) increases the risk of MS. These findings provide insights into the potential causal effects of various genetic and non-genetic risk factors on MS development. MR analysis allows for more reliable and accurate causal conclusions compared to observational studies, which are prone to confounding and reverse causation.

Limitations of Mendelian Randomization Studies

Mendelian Randomization (MR) studies have several limitations that researchers need to consider when interpreting their results. These limitations include:
--Assumptions: MR relies on several assumptions, such as the genetic variant being strongly associated with the exposure, the genetic variant not being associated with confounders, and the genetic variant only influencing the outcome through the exposure of interest. These assumptions are not always met, and violations can lead to biased estimates.
--Statistical power: MR studies may have limited statistical power, especially when using weak instrumental variables or investigating rare outcomes. This can result in imprecise estimates and difficulties in detecting causal effects.
--Instrumental variable validity: The genetic variants used as instrumental variables should be valid and specifically associated with the exposure of interest. However, genetic variants may also be associated with other traits, leading to potential pleiotropy and bias in the estimates.
--Linkage disequilibrium and genetic heterogeneity: The presence of linkage disequilibrium, where genetic variants are inherited together, and genetic heterogeneity, where different genetic variants influence the exposure or outcome, can complicate MR analysis and affect the validity of the results.
--Availability of genetic variants: MR can only examine associations for which there are functional genetic variants or markers linked to the exposure of interest. If suitable genetic variants are not available, MR may not be applicable.
--Single measurement of exposure: The assumptions of MR are more likely to hold when there is a complete and accurate characterization of the exposure. However, if the exposure is measured only once or with error, the validity of the results may be compromised.
Despite these limitations, MR studies can still provide valuable insights into causal relationships between exposures and outcomes, especially when other study designs, such as randomized controlled trials, are not feasible or ethical. Researchers should carefully consider the assumptions and limitations of MR and complement the findings with other evidence to strengthen the causal inference.

Pleiotropy in Mendelian Randomization Studies

Pleiotropy, the phenomenon of a single genetic variant influencing multiple traits, is a significant limitation in Mendelian Randomization (MR) studies. Pleiotropy can lead to biased estimates and affect the validity of causal inferences. Here are some key points about the pleiotropic limitations of MR studies:
--Unprovable assumption: MR relies on the assumption that apparent pleiotropic associations are mediated by the exposure of interest (vertical pleiotropy) and not due to genetic variants influencing the two traits through independent pathways (horizontal pleiotropy). However, this assumption is unprovable, and the presence of horizontal pleiotropy can lead to incorrect causal inferences.
--Challenging to manage: Assessing and managing pleiotropy is a critical task in MR analysis. Researchers have developed various methods to address pleiotropy, such as MR-Egger, weighted median, and MR-PRESSO. These methods aim to detect and correct for pleiotropic effects, but their effectiveness may vary depending on the specific study context.
--Heterogeneous pleiotropy: Pleiotropic effects can be heterogeneous, meaning that different genetic variants may have different pleiotropic effects on the exposure and outcome. This can complicate the interpretation of MR results and make it challenging to identify the true causal relationship.
--Alternative approaches: In some cases, researchers have explored the use of horizontal pleiotropy to search for causal pathways within the MR framework. This approach, known as "Mendelian randomization with horizontal pleiotropy," aims to leverage the pleiotropic effects of genetic variants to gain insights into the underlying biology.
--Sensitivity analysis: To address the potential impact of pleiotropy on their results, researchers can perform sensitivity analyses, such as leave-one-out analysis or MR-PRESSO. These analyses help assess the robustness of the findings to the presence of pleiotropy and provide insights into the potential bias introduced by pleiotropic effects.

Python implementation of Mendelian Randomization Methods

1. Two-Sample MR

  
    import numpy as np
    import scipy.stats as stats
    
    def two_sample_mr(beta_exposure, se_exposure, beta_outcome, se_outcome):
        """
        Two-sample Mendelian Randomization using the inverse-variance weighted (IVW) method.
        
        :param beta_exposure: Effect size estimates for the exposure
        :param se_exposure: Standard errors for the exposure effect sizes
        :param beta_outcome: Effect size estimates for the outcome
        :param se_outcome: Standard errors for the outcome effect sizes
        
        :return: MR estimate and its standard error
        """
        weights = 1 / (se_exposure ** 2)
        mr_estimate = np.sum(weights * beta_outcome * beta_exposure) / np.sum(weights * beta_exposure ** 2)
        mr_se = np.sqrt(1 / np.sum(weights * beta_exposure ** 2))
        return mr_estimate, mr_se
    
    # Example usage:
    beta_exposure = np.array([0.1, 0.2, 0.3])
    se_exposure = np.array([0.05, 0.06, 0.07])
    beta_outcome = np.array([0.4, 0.5, 0.6])
    se_outcome = np.array([0.08, 0.09, 0.1])
    
    mr_est, mr_se = two_sample_mr(beta_exposure, se_exposure, beta_outcome, se_outcome)
    print("MR Estimate:", mr_est, "SE:", mr_se)

2. Median-Based MR

    
    def median_based_mr(beta_exposure, beta_outcome):
        """
        Median-based Mendelian Randomization.
        
        :param beta_exposure: Effect size estimates for the exposure
        :param beta_outcome: Effect size estimates for the outcome
        
        :return: MR estimate
        """
        mr_estimates = beta_outcome / beta_exposure
        median_mr_estimate = np.median(mr_estimates)
        return median_mr_estimate
    
    # Example usage:
    mr_est = median_based_mr(beta_exposure, beta_outcome)
    print("MR Estimate:", mr_est)

MR-PRESSO:

MR-PRESSO (Mendelian Randomization Pleiotropy RESidual Sum and Outlier) is designed to detect and correct for horizontal pleiotropy. Implementing MR-PRESSO from scratch is complex, but I'll provide a basic idea of how outlier detection might be done.

      
        import numpy as np
        from numpy.linalg import eig, inv
        
        # Define the matrix power function using eigen decomposition
        def matrix_power_eig(x, n):
            values, vectors = eig(x)
            return vectors @ np.diag(values**n) @ vectors.T
        
        # Define the getRSS_LOO function
        def getRSS_LOO(BetaOutcome, BetaExposure, data, returnIV):
            dataW = data[[BetaOutcome] + BetaExposure].multiply(np.sqrt(data["Weights"]), axis="index")
            X = dataW[BetaExposure].values
            Y = dataW[BetaOutcome].values.reshape(-1, 1)
            
            CausalEstimate_LOO = np.array([
                inv(X[:i].T @ X[:i] + X[i+1:].T @ X[i+1:]) @ (X[:i].T @ Y[:i] + X[i+1:].T @ Y[i+1:])
                for i in range(len(dataW))
            ]).squeeze()
            
            if len(BetaExposure) == 1:
                RSS = np.sum((Y.flatten() - CausalEstimate_LOO * X.flatten())**2)
            else:
                RSS = np.sum((Y - np.sum(CausalEstimate_LOO[:, np.newaxis] * X, axis=1)[:, np.newaxis])**2)
            
            if returnIV:
                return RSS, CausalEstimate_LOO
            return RSS
        
        # Example of usage:
        # BetaOutcome = 'outcome_variable_name'
        # BetaExposure = ['exposure_variable_1', 'exposure_variable_2', ...]
        # data = pd.DataFrame(data) # Assuming data is already a pandas DataFrame with appropriate columns and weights
        # returnIV = True or False depending on whether you want the IVs returned
        
        # Call the function
        # rss_loo = getRSS_LOO(BetaOutcome, BetaExposure, data, returnIV)

- The data parameter is expected to be a pandas DataFrame containing the outcome, exposure, and weight columns.
- The BetaOutcome parameter should be the name of the outcome variable column as a string.
- The BetaExposure parameter should be a list of the names of the exposure variable columns.
- returnIV is a boolean that, when set to True, the function will return a list where the first element is the RSS and the second element is the LOO causal estimates.
- Ensure that your input data does not contain any None values, as this code does not handle missing data. If your data might have missing values, you'll need to include preprocessing steps to handle them appropriately.
Keep in mind that the above code does not implement any optimizations for performance, and the LOO process can be quite slow for large datasets since it's using a for loop in Python. Depending on your dataset size, you might want to consider more efficient approaches or parallelization.

    
    def get_random_data(BetaOutcome, BetaExposure, SdOutcome, SdExposure, data):
        def mod_IVW(i, data, BetaOutcome, BetaExposure):
            X = add_constant(data.drop(i)[BetaExposure])
            Y = data.drop(i)[BetaOutcome]
            weights = data.drop(i)['Weights']
            model = OLS(Y, X, weights=weights)
            return model.fit()
    
        # Fit the models excluding each data point (LOO - Leave-One-Out approach)
        mod_IVW_list = [mod_IVW(i, data, BetaOutcome, BetaExposure) for i in data.index]
    
        # Generate random data
        random_exposures = np.random.normal(data[BetaExposure].values, SdExposure, data[BetaExposure].shape)
        random_outcomes = np.array([
            np.random.normal(mod.predict(add_constant(row)).item(), SdOutcome[i])
            for i, (mod, row) in enumerate(zip(mod_IVW_list, random_exposures))
        ])
        # Combine the random exposures, outcomes, and weights into a DataFrame
        random_data_combined = np.hstack([random_exposures, random_outcomes[:, np.newaxis], data['Weights'].values[:, np.newaxis]])
        columns = BetaExposure + [BetaOutcome, 'Weights']
        dataRandom = pd.DataFrame(random_data_combined, columns=columns)
        
        return dataRandom

- Each mod_IVW in mod_IVW_list is fitted without the ith data point (LOO - Leave-One-Out approach).
- np.random.normal is used to create random exposures based on the original exposures and their standard deviations, and random outcomes based on the model's predictions and the outcome's standard deviation.
- I've assumed SdExposure is an array-like object with the same length as BetaExposure, providing standard deviations for each exposure variable.
- For each random outcome, mod.predict is used with the newly generated exposures (with a constant added for the intercept) to predict the outcome values, to which noise is then added based on the outcome's standard deviation.
- The random outcomes are combined with the random exposures and the weights into a new DataFrame dataRandom.
Make sure that **`SdOutcome`** and **`SdExposure`** are specified correctly and correspond to the correct standard deviations for the data being simulated. Also, the **`BetaExposure`** should match the column names of the exposure variables in your data.

    
        import pandas as pd

        # Assuming 'data' is a pandas DataFrame and the other variables are lists or strings specifying column names.
        
        # 0 - Transforming the data + checking number of observations
        data = data[[BetaOutcome] + BetaExposure + [SdOutcome] + SdExposure].dropna()
        
        # Ensuring the sign of BetaExposure's first variable is positive for all rows
        sign_correction = np.sign(data[BetaExposure[0]])
        for col in [BetaOutcome] + BetaExposure:
            data[col] *= sign_correction
        
        # Creating a new column for weights
        data['Weights'] = 1 / data[SdOutcome] ** 2
        
        # Checking the number of observations
        if len(data) <= len(BetaExposure) + 2:
            raise ValueError("Not enough instrumental variables")
        
        if len(data) < NbDistribution:
            raise ValueError("Not enough elements to compute empirical P-values, increase NbDistribution")

1. Selects the relevant columns (assuming BetaOutcome, BetaExposure, SdOutcome, SdExposure are lists or strings with column names).
2. Removes any rows with missing data
3. Adjusts the signs of the BetaOutcome and BetaExposure columns based on the sign of the first BetaExposure variable.
4. Creates a 'Weights' column based on the inverse square of SdOutcome.
5. Checks if there are enough rows (observations) in the data compared to the number of instrumental variables.
6. Checks if there are enough rows to compute empirical P-values, given the NbDistribution.
This code assumes that data is a pandas DataFrame and the other parameters (BetaOutcome, BetaExposure, SdOutcome, SdExposure) are appropriately defined before this snippet is run. Adjust the column names accordingly to match your actual data.

    
        # Then you call the function with your data
        RSSobs = getRSS_LOO(
            beta_outcome=BetaOutcome, 
            beta_exposure=BetaExposure, 
            data=data, 
            return_iv=OUTLIERtest
        )

The actual implementation of getRSS_LOO depends on the specifics of how you want to model your data and what exactly you are calculating. You'll likely be performing a series of regressions within this function, excluding one of the IVs each time, and calculating the sum of squared residuals for each regression.
Remember to replace beta_outcome and beta_exposure with the actual columns you want to use from your data. Also, the data parameter should be the DataFrame after preprocessing, as discussed in the previous transformation step.
Once you have your actual getRSS_LOO function written, the call to getRSS_LOO as shown above will give you the RSSobs value that you need.

    
        import numpy as np

        # Generate the distribution of expected RSS
        random_data = [get_random_data(beta_outcome, beta_exposure, sd_outcome, sd_exposure, data) for _ in range(nb_distribution)]
        rss_exp = [getRSS_LOO(beta_outcome, beta_exposure, rd, return_iv=OUTLIERtest) for rd in random_data]
        
        # Calculate p-value
        if OUTLIERtest:
            # Assuming get_rss_loo returns a tuple or list where the first element is RSS
            rss_obs = RSSobs[0]
            p_value = sum(rss[0] > rss_obs for rss in rss_exp) / nb_distribution
            global_test = {'RSSobs': rss_obs, 'Pvalue': p_value}
        else:
            rss_obs = RSSobs[0]
            p_value = sum(rss > rss_obs for rss in rss_exp) / nb_distribution
            global_test = {'RSSobs': rss_obs, 'Pvalue': p_value}

- nb_distribution representing the number of random datasets to generate.
- OUTLIERtest is a boolean variable indicating whether outlier testing should be performed.
- get_random_data is a function that generates random data based on input parameters (needs implementation).
- getRSS_LOO is a function that calculates the residual sum of squares, optionally returning additional data for outlier testing (needs implementation).
- random_data is a list of datasets generated by the get_random_data function.
- rss_exp is a list of RSS values computed from each randomly generated dataset.
- p_value is calculated as the proportion of RSS values in rss_exp that are greater than the observed RSS value.
- global_test is a dictionary that stores the observed RSS and the p-value.
This Python code presumes that the getRSS_LOO function returns a list or tuple where the first element is the RSS value, especially when return_iv is True. You need to adjust this based on the actual implementation of your getRSS_LOO function

        
            import pandas as pd

            if global_test['Pvalue'] < signif_threshold and OUTLIERtest:
                outlier_tests = []
            
                for snv in range(len(data)):
                    random_snp = pd.concat([mat.iloc[snv] for mat in random_data], axis=1)
                    if len(beta_exposure) == 1:
                        diff = data.iloc[snv][beta_outcome] - data.iloc[snv][beta_exposure] * RSSobs[1][snv]
                        exp = random_snp[beta_outcome] - random_snp[beta_exposure] * RSSobs[1][snv]
                    else:
                        diff = data.iloc[snv][beta_outcome] - sum(data.iloc[snv][beta_exposure] * RSSobs[1][:, snv])
                        exp = random_snp[beta_outcome] - random_snp[beta_exposure].apply(lambda x: x * RSSobs[1][:, snv], axis=0)
            
                    pval = (exp**2 > diff**2).sum() / len(random_data)
                    outlier_test = {'RSSobs': diff**2, 'Pvalue': pval}
                    outlier_tests.append(outlier_test)
            
                outlier_tests_df = pd.DataFrame(outlier_tests)
                outlier_tests_df.index = data.index
                outlier_tests_df['Pvalue'] = outlier_tests_df['Pvalue'].apply(lambda p: min(p * len(data), 1))  # Bonferroni correction
            else:
                OUTLIERtest = False

- We loop over each row (representing an instrumental variable) in the data DataFrame to calculate the outlier test for that IV.
- For each IV, we calculate the observed difference and the expected difference based on the random data generated.
- The p-value for each IV is calculated and then stored in a list outlier_tests.
- After looping through all IVs, we convert the list of dictionaries to a pandas DataFrame outlier_tests_df.
- We apply a Bonferroni correction to the p-values to account for multiple testing.
- If the GlobalTest p-value is not below the significance threshold or if OUTLIERtest is not True, we set OUTLIERtest to False to indicate that outlier testing will not be performed.

    
        import numpy as np
        import statsmodels.api as sm
        
        # Assuming data is a pandas DataFrame
        # Assuming BetaOutcome, BetaExposure, Weights, and other variables are properly defined
        # The OUTLIERtest variable indicates whether outlier testing has been done
        # The refOutlier variable is a list of indices of significant outliers identified
        
        def get_random_bias(beta_outcome, beta_exposure, data, ref_outlier, weights):
            indices = np.random.choice(list(set(range(len(data))) - set(ref_outlier)), replace=False, size=len(data) - len(ref_outlier))
            data_random = data.iloc[indices]
            mod_random = sm.WLS(data_random[beta_outcome], data_random[beta_exposure], weights=weights[indices]).fit()
            return mod_random.params
        
        if DISTORTIONtest and OUTLIERtest:
            mod_all = sm.WLS(data[BetaOutcome], data[BetaExposure], weights=data[Weights]).fit()
        
            if len(refOutlier) > 0 and len(refOutlier) < len(data):
                bias_exp = [get_random_bias(BetaOutcome, BetaExposure, data, refOutlier, Weights) for _ in range(NbDistribution)]
                bias_exp = np.vstack(bias_exp)
        
                data_no_outliers = data.drop(refOutlier)
                weights_no_outliers = data_no_outliers[Weights]
                mod_no_outliers = sm.WLS(data_no_outliers[BetaOutcome], data_no_outliers[BetaExposure], weights=weights_no_outliers).fit()
                
                bias_obs = (mod_all.params - mod_no_outliers.params) / np.abs(mod_no_outliers.params)
                bias_exp = (mod_all.params[:, None] - bias_exp) / np.abs(bias_exp)
                bias_test = {
                    'Outliers Indices': refOutlier,
                    'Distortion Coefficient': 100 * bias_obs,
                    'Pvalue': np.sum(np.abs(bias_exp) > np.abs(bias_obs), axis=0) / NbDistribution
                }
            else:
                message = "All SNPs considered as outliers" if len(refOutlier) == len(data) else "No significant outliers"
                bias_test = {
                    'Outliers Indices': message,
                    'Distortion Coefficient': np.nan,
                    'Pvalue': np.nan
                }

- We use statsmodels.api.WLS for weighted least squares regression, which is equivalent to the lm function with the weights
- Instead of a formula, we pass the outcome and exposure directly as arrays to WLS. This requires that BetaOutcome and BetaExposure are appropriately defined as arrays or DataFrame columns.
- We use comprehension to generate the expected biases based on random data.
- Bias is calculated both observed (bias_obs) and expected (bias_exp), and the distortion coefficient is reported as a percentage.
- We calculate the p-value by comparing the absolute values of observed and expected biases.
- The refOutlier variable is expected to be a list of indices for the outliers determined from the previous steps.

    
        import pandas as pd

        # Assuming that the following variables have been previously defined and calculated:
        # GlobalTest, OutlierTest, BiasTest, SignifThreshold, NbDistribution, data
        # and that mod_all and potentially mod_no_outliers have been fitted.
        
        # Helper function to format P-values
        def format_p_value(p_value, nb_distribution, data_length):
            return f"<{1 / nb_distribution}" if p_value == 0 else p_value
        
        # Format Global Test P-values
        GlobalTest['Pvalue'] = format_p_value(GlobalTest['Pvalue'], NbDistribution, len(data))
        
        results = {'Global Test': GlobalTest}
        
        # Format Outlier Test results if applicable
        if OUTLIERtest:
            OutlierTest['Pvalue'] = [format_p_value(p, NbDistribution, len(data)) for p in OutlierTest['Pvalue']]
            if DISTORTIONtest:
                BiasTest['Pvalue'] = format_p_value(BiasTest['Pvalue'], NbDistribution, len(data))
                results['Distortion Test'] = BiasTest
            results['Outlier Test'] = OutlierTest
        
            # Warning if precision is insufficient
            if len(data) / NbDistribution > SignifThreshold:
                print(f"Warning: Outlier test unstable. The significance threshold of {SignifThreshold} for the outlier test is not achievable with only {NbDistribution} to compute the null distribution. The current precision is <{len(data) / NbDistribution}. Increase NbDistribution.")
        
        # Prepare MR results
        original_mr = pd.DataFrame({
            "Exposure": BetaExposure,
            "MR Analysis": "Raw",
            "Causal Estimate": mod_all.params,
            "Sd": mod_all.bse,
            "T-stat": mod_all.tvalues,
            "P-value": mod_all.pvalues
        })
        
        if 'mod_no_outliers' in locals():
            outlier_corrected_mr = pd.DataFrame({
                "Exposure": BetaExposure,
                "MR Analysis": "Outlier-corrected",
                "Causal Estimate": mod_no_outliers.params,
                "Sd": mod_no_outliers.bse,
                "T-stat": mod_no_outliers.tvalues,
                "P-value": mod_no_outliers.pvalues
            })
        else:
            print("Warning: No outliers were identified, therefore the results for the outlier-corrected MR are set to NA")
            outlier_corrected_mr = pd.DataFrame({
                "Exposure": BetaExposure,
                "MR Analysis": "Outlier-corrected",
                "Causal Estimate": [np.nan] * len(BetaExposure),
                "Sd": [np.nan] * len(BetaExposure),
                "T-stat": [np.nan] * len(BetaExposure),
                "P-value": [np.nan] * len(BetaExposure)
            })
        
        # Combine MR results
        mr_results = pd.concat([original_mr, outlier_corrected_mr], ignore_index=True)
        
        # Combine all results into a final dictionary
        final_results = {
            "Main MR results": mr_results,
            "MR-PRESSO results": results
        }
        
        # Replace 'return(res)' with 'return final_results' if this is within a function.
        final_results

- P-values are formatted using a helper function to handle cases where they are zero.
- Outlier and distortion tests are included in the results if they were performed.
- The MR results for both the original analysis and the outlier-corrected analysis are presented in a pandas DataFrame for easy viewing.
- The final results are returned as a dictionary, which can be converted to JSON if needed or directly used within Python.
You should ensure that variables like GlobalTest, OutlierTest, and BiasTest are dictionaries with appropriate key-value pairs according to the structure you have set up in previous steps. Also, replace BetaExposure with the actual names or identifiers for your exposures.

Exploring the Causal Pathways with Python: Implementing Mendelian Randomization Methods in Research

Introduction:

Use Cases of Mendelian Randomization in Genetics

Interpreting Mendelian Randomization Studies

Mendelian Randomization Studies in Multiple Sclerosis

Limitations of Mendelian Randomization Studies

Pleiotropy in Mendelian Randomization Studies

Python implementation of Mendelian Randomization Methods

1. Two-Sample MR

2. Median-Based MR

MR-PRESSO:

Alper Bülbül

About Me

Feature posts

Genetics

Genetics

Blog Tags

Follow Me

Exploring the Causal Pathways with Python: Implementing Mendelian Randomization Methods in Research

Share:

Introduction:

Use Cases of Mendelian Randomization in Genetics

Interpreting Mendelian Randomization Studies

Mendelian Randomization Studies in Multiple Sclerosis

Limitations of Mendelian Randomization Studies

Pleiotropy in Mendelian Randomization Studies

Python implementation of Mendelian Randomization Methods

1. Two-Sample MR

2. Median-Based MR

MR-PRESSO:

Alper Bülbül

Genetics

Genetics

Blog Tags

Follow Me