MSE_minimizer is a function which implements an approach that combines the association estimates obtained from discovery and replication GWASs to form a new combined estimate for each SNP. The method used by this function is inspired by that detailed in Ferguson et al. (2017).

MSE_minimizer(summary_disc, summary_rep, alpha = 5e-08, spline = TRUE)

Arguments

summary_disc

A data frame containing summary statistics from the discovery GWAS. It must have three columns with column names rsid, beta and se, respectively, and columns beta and se must contain numerical values. Each row must correspond to a unique SNP, identified by rsid. The function requires that there must be at least 5 SNPs as any less will result in issues upon usage of the smoothing spline.

summary_rep

A data frame containing summary statistics from the replication GWAS. It must have three columns with column names rsid, beta and se, respectively, and all columns must contain numerical values. Each row must correspond to a unique SNP, identified by the numerical value rsid. SNPs must be ordered in the exact same manner as those in summary_disc, i.e. summary_rep$rsid must be equivalent to summary_disc$rsid.

alpha

A numerical value which specifies the desired genome-wide significance threshold for the discovery GWAS. The default is given as 5e-8.

spline

A logical value which determines whether or not a cubic smoothing spline is to be used. When spline=FALSE, the value for \(B\) in the formula detailed in the aforementioned paper is merely calculated as B=summary_disc$beta - summary_rep$beta for each SNP. Alternatively, spline=TRUE applies a cubic smoothing spline to predict values for \(B\) when B=summary_disc$beta - summary_rep$beta is regressed on z=summary_disc$beta/summary_disc$se, and it is these predicted values that are then used for \(B\).

Value

A data frame with summary statistics and adjusted association estimate of only those SNPs which have been deemed significant in the discovery GWAS according to the specified threshold, alpha, i.e. SNPs with \(p\)-values less than alpha. The inputted summary data occupies the first five columns, in which the columns beta_disc and

se_disc contain the statistics from the discovery GWAS and columns

beta_rep and se_rep hold the replication GWAS statistics. The new combination estimate for each SNPis contained in the final column, namely beta_joint. The SNPs are contained in this data frame according to their significance, with the most significant SNP, i.e. the SNP with the largest absolute \(z\)-statistic, now located in the first row of the data frame. If no SNPs are detected as significant in the discovery GWAS, MSE_minimizer merely returns a data frame which combines the two inputted data sets.

References

Ferguson, J., Alvarez-Iglesias, A., Newell, J., Hinde, J., & O'Donnell, M. (2017). Joint incorporation of randomised and observational evidence in estimating treatment effects. Statistical Methods in Medical Research, 28(1), 235\(-\)247. doi:10.1177/0962280217720854

See also

https://amandaforde.github.io/winnerscurse/articles/discovery_replication.html for illustration of the use of MSE_minimizer with toy data sets and further information regarding computation of the combined SNP-trait association estimates for significant SNPs.