Winner's Curse and weak instrument bias in MR • mr.simss

Winner’s Curse in MR

What is Winner’s Curse bias?

As stated in Forde et al. (2023), it has been observed that in general, the effect size of a genetic variant tends to be lower in a replication study than in the GWAS that discovered the variant-trait association. This observation is due to the phenomenon known as Winner’s Curse. In the context of a single association study, the term Winner’s Curse describes how the estimates of association strength for genetic variants that have been deemed most significant are very likely to be exaggerated compared with their true underlying values. A more detailed description of the Winner’s Curse phenomenon, especially with respect to genetic association studies, can be found here.

Winner’s Curse bias can have many practical consequences, most notably affecting techniques which are reliant on variant-trait association estimates obtained from GWASs. Thus, as Mendelian Randomization (MR) is a statistical framework which uses genetic variants as instrumental variables (IVs) to estimate the magnitude of the causal effect of an exposure on an outcome, Winner’s Curse bias can prove problematic in the MR setting.

How and why does Winner’s Curse bias affect MR causal effect estimates?

In order to correctly determine the causal effect of an exposure on an outcome, an MR study requires certain essential IV conditions to be met. The first IV condition, namely instrument relevance, states that the genetic variants used as instruments must be robustly associated with the modifiable exposure (Von Hinke et al. (2016)). Thus, in order to ensure that this condition is satisfied, MR practitioners typically choose genetic variants for use in the study if statistical evidence suggests that the variants are strongly associated with the exposure. In a summary-level MR framework, it is therefore most common practice to select genetic instruments according to their exposure GWAS summary data and an imposed selection criterion, i.e. typically genetic variants with \(p\)-values less than \(5 \times 10^{-8}\) in the exposure GWAS.

Winner’s Curse arises in an MR investigation when there exists sample overlap between the GWAS used to select instruments and the GWASs used to estimate the instrument-exposure and instrument-outcome associations (Jiang et al. (2022)), i.e. a certain portion of individuals are used for both selection and estimation purposes.

Firstly, if the discovery study, the GWAS used to choose suitable genetic instruments, also supplies the variant-exposure association estimates used in the MR analysis, then due to Winner’s Curse, overestimation of the effect of the genetic instruments relative to the exposure is likely to occur. A deflation in the overall MR causal effect estimate is subsequently incurred. Many common MR methods combine the Wald ratio estimates of multiple variants to provide an overall causal effect estimate. Therefore, if Winner’s Curse bias exists in the denominators of the Wald ratio estimates, i.e. the estimated genetic associations with the exposure, each variant-specific ratio estimate will be subject to downward bias in the direction of the null, and so too will the final MR estimate.
Even though the severity of Winner’s Curse bias will generally be greatest with respect to the associations with the trait used for variant selection purposes, i.e. the exposure, associations with another trait, correlated with the exposure, can also be affected. As the exposure and the outcome tend to be associated due to confounding, this means that if the same set of individuals are used in the discovery study as in the GWAS used to obtain the variant-outcome association estimates, these estimates will also likely be exaggerated. In this instance, in which the variant-outcome association estimates used in the MR analysis are biased due to Winner’s Curse, the resulting MR estimate will likely be inflated.

Empirical evidence for the impact of Winner’s Curse on MR estimates

Motivated by the fact that the effect of Winner’s Curse, with respect to MR studies, remained unclear, Jiang et al. (2022) performed an empirical investigation into the impact of Winner’s Curse on MR estimates, by considering the effect of BMI on coronary artery disease risk. In their work, Jiang et al. (2022) noted that in practice, Winner’s Curse is most consequential for MR when it impacts genetic association estimates with the outcome. In this case, the resulting inflation in the causal effect estimate can potentially lead to a false positive finding. However, the magnitude of bias induced by Winner’s Curse in estimated variant-outcome associations was witnessed to be much lower than that in the variant-exposure associations. Despite this, the potential of Winner’s Curse to substantially bias MR estimates was clearly demonstrated. In fact, it has also been shown separately that Winner’s Curse can greatly magnify the extent of weak instrument bias in MR analyses (Sadreev et al. (2021)).

How is Winner’s Curse bias typically avoided in summary-level MR studies?

Several methods have been published which aim to correct variant-trait association estimates for Winner’s Curse bias, as described in Forde et al. (2023). Unfortunately, in practice, incorporation of such approaches into MR investigations has been rarely seen in the literature. In fact, the most common recognised solution to Winner’s Curse in summary-level MR analyses is simply the employment of a ‘three-sample’ design, in which three independent non-overlapping data sets are used to select instrument variants, estimate variant-exposure associations and estimate variant-outcome associations (Jiang et al. (2022)). For instance, Zhao et al. (2020) suggest that Winner’s Curse bias should be dealt with ‘by requiring use of an independent dataset for IV selection’. However, it can be difficult to obtain three large non-overlapping GWAS data sets, that are sufficiently similar with respect to participant characteristics. In addition, potentially splitting a large exposure data set in two may be undesirable as it reduces the power to both detect suitable instruments and estimate their associations with the exposure.

Weak Instrument Bias in MR

What is weak instrument bias?

A weak instrument is a valid IV that perfectly satisfies the core IV conditions, but only explains a small fraction of variation in the exposure and thus, the strength of the statistical association between this instrument and the exposure is seen as ‘weak’ (Burgess and Thompson (2011)). For such an instrument, it is likely that chance associations of the instrument with confounding factors are responsible for a sizeable proportion of the observed instrument-exposure association. If genetic variants that are weakly associated with the exposure are used as instruments in an MR analysis, this will result in a non-trivial form of bias, namely weak instrument bias, being introduced into the causal effect estimation.

How does weak instrument bias impact MR causal effect estimates?

The bias induced into the MR causal effect estimate, arising from the use of weak instruments, varies in magnitude and direction according to the extent of overlap between the exposure and outcome sample sets (Burgess et al. (2016)).

In a one-sample MR setting, in which a single fully overlapping sample is used to estimate the genetic associations with the exposure and the outcome, these chance correlations with confounders have a related impact on association estimation with respect to both the exposure and outcome. The incorporation of weak instruments thus biases the causal effect estimate towards the confounded observational exposure-outcome association (Burgess and Thompson (2011)), likely increasing the risk of Type I error (Stock and Yogo (2002)).
On the contrary, for a two-sample MR analysis, in which variant-exposure and variant-outcome associations are estimated using two non-overlapping samples, the chance confounder correlations influence these associations independently. Weak instrument bias will then be witnessed to act in the direction of the null (Pierce and Burgess (2013)).

As the former of these two versions of bias can greatly increase the risk of Type I error, many investigators favour two-sample MR analyses to circumvent incurring such an error. In addition, the majority of MR summary-level methods have been designed according to an assumption that the variant-exposure and variant-outcome association estimates have been produced by independent GWASs. However, as highlighted by Burgess et al. (2016), with respect to a population of interest, the largest outcome and exposure GWAS have generally been carried out by a large GWAS consortium and thus, there may be a substantial number of individuals common to both GWAS data sets. Given this, taking a two-sample approach and restricting analyses to GWASs with zero overlap often leads to inefficient resource usage and reduced statistical power.

\(\star\) Note: In a summary-level MR study in which the samples are only partially overlapping, the weak instrument bias introduced will be some form of compromise between the two aforementioned extremes. Interestingly, Burgess et al. (2016) demonstrated that this bias is in fact linearly related to the fraction of sample overlap between the two data sets used to perform the exposure and outcome GWASs.

Summary

The various ways in which both Winner’s Curse and weak instrument bias can impact the MR causal effect estimate are summarized in the image below. The coloured circles, i.e. circle A, circle B and circle C, represent three sufficiently similar large independent samples in which association studies have been performed.

For instance, the first row describes a setting in which the association study performed on sample A is used to select the genetic instruments for the MR study and to estimate both variant-exposure and variant-outcome associations. Here, as the same study is used to select variants and obtain variant-exposure association estimates, these variant-exposure association estimates will suffer from Winner’s Curse bias. Similarly, as the same study is also used to select variants and obtain variant-outcome association estimates, the variant-outcome associations will also be overestimated due to Winner’s Curse bias. This means that as a result of Winner’s Curse in both sets of association estimates, the final MR causal effect estimate will be subject to both an inflation and deflation, of likely differing magnitudes. In addition, as the same study is used to estimate variant-exposure and variant-outcome associations, the weak instrument bias will act in the direction of the confounded observational exposure-outcome association.
The final row depicts the ‘three-sample’ design scenario, in which sample A is used to select the genetic instruments for the MR study, sample B provides the variant-exposure association estimates and the variant-outcome associations are estimated using sample C. As there is no sample overlap between the dataset used for selection purposes and the datasets used for estimation, neither the variant-exposure association estimates nor the variant-outcome association estimates suffer from Winner’s Curse bias. Furthermore, as the sample used to estimate the variant-exposure and the sample used to estimate the variant-outcome are non-overlapping/independent, the weak instrument bias will be towards the null.

Winner’s Curse and weak instrument bias in MR