MSE_minimizer.RdMSE_minimizer is a function which implements an approach that combines
the association estimates obtained from discovery and replication GWASs to
form a new combined estimate for each SNP. The method used by this function
is inspired by that detailed in
Ferguson
et al. (2017).
MSE_minimizer(summary_disc, summary_rep, alpha = 5e-08, spline = TRUE)A data frame containing summary statistics from the
discovery GWAS. It must have three columns with column names
rsid, beta and se, respectively, and columns
beta and se must contain numerical values. Each row must
correspond to a unique SNP, identified by rsid. The function
requires that there must be at least 5 SNPs as any less will result in
issues upon usage of the smoothing spline.
A data frame containing summary statistics from the
replication GWAS. It must have three columns with column names
rsid, beta and se, respectively, and all columns must
contain numerical values. Each row must correspond to a unique SNP,
identified by the numerical value rsid. SNPs must be ordered in the
exact same manner as those in summary_disc, i.e.
summary_rep$rsid must be equivalent to summary_disc$rsid.
A numerical value which specifies the desired genome-wide
significance threshold for the discovery GWAS. The default is given as
5e-8.
A logical value which determines whether or not a cubic
smoothing spline is to be used. When spline=FALSE, the value for
\(B\) in the formula detailed in the aforementioned paper is merely
calculated as B=summary_disc$beta - summary_rep$beta for each SNP.
Alternatively, spline=TRUE applies a cubic smoothing spline to
predict values for \(B\) when B=summary_disc$beta -
summary_rep$beta is regressed on
z=summary_disc$beta/summary_disc$se, and it is these predicted
values that are then used for \(B\).
A data frame with summary statistics and adjusted association
  estimate of only those SNPs which have been deemed significant in the
  discovery GWAS according to the specified threshold, alpha, i.e.
  SNPs with \(p\)-values less than alpha. The inputted summary data
  occupies the first five columns, in which the columns beta_disc and
se_disc contain the statistics from the discovery GWAS and columns
beta_rep and se_rep hold the replication GWAS statistics. The
  new combination estimate for each SNPis contained in the final column,
  namely beta_joint. The SNPs are contained in this data frame
  according to their significance, with the most significant SNP, i.e. the
  SNP with the largest absolute \(z\)-statistic, now located in the first
  row of the data frame. If no SNPs are detected as significant in the
  discovery GWAS, MSE_minimizer merely returns a data frame which
  combines the two inputted data sets.
Ferguson, J., Alvarez-Iglesias, A., Newell, J., Hinde, J., & O'Donnell, M. (2017). Joint incorporation of randomised and observational evidence in estimating treatment effects. Statistical Methods in Medical Research, 28(1), 235\(-\)247. doi:10.1177/0962280217720854
https://amandaforde.github.io/winnerscurse/articles/discovery_replication.html
for illustration of the use of MSE_minimizer with toy data sets and
further information regarding computation of the combined SNP-trait
association estimates for significant SNPs.