MSE_minimizer.Rd
MSE_minimizer
is a function which implements an approach that combines
the association estimates obtained from discovery and replication GWASs to
form a new combined estimate for each SNP. The method used by this function
is inspired by that detailed in
Ferguson
et al. (2017).
MSE_minimizer(summary_disc, summary_rep, alpha = 5e-08, spline = TRUE)
A data frame containing summary statistics from the
discovery GWAS. It must have three columns with column names
rsid
, beta
and se
, respectively, and columns
beta
and se
must contain numerical values. Each row must
correspond to a unique SNP, identified by rsid
. The function
requires that there must be at least 5 SNPs as any less will result in
issues upon usage of the smoothing spline.
A data frame containing summary statistics from the
replication GWAS. It must have three columns with column names
rsid
, beta
and se
, respectively, and all columns must
contain numerical values. Each row must correspond to a unique SNP,
identified by the numerical value rsid
. SNPs must be ordered in the
exact same manner as those in summary_disc
, i.e.
summary_rep$rsid
must be equivalent to summary_disc$rsid
.
A numerical value which specifies the desired genome-wide
significance threshold for the discovery GWAS. The default is given as
5e-8
.
A logical value which determines whether or not a cubic
smoothing spline is to be used. When spline=FALSE
, the value for
\(B\) in the formula detailed in the aforementioned paper is merely
calculated as B=summary_disc$beta - summary_rep$beta
for each SNP.
Alternatively, spline=TRUE
applies a cubic smoothing spline to
predict values for \(B\) when B=summary_disc$beta -
summary_rep$beta
is regressed on
z=summary_disc$beta/summary_disc$se
, and it is these predicted
values that are then used for \(B\).
A data frame with summary statistics and adjusted association
estimate of only those SNPs which have been deemed significant in the
discovery GWAS according to the specified threshold, alpha
, i.e.
SNPs with \(p\)-values less than alpha
. The inputted summary data
occupies the first five columns, in which the columns beta_disc
and
se_disc
contain the statistics from the discovery GWAS and columns
beta_rep
and se_rep
hold the replication GWAS statistics. The
new combination estimate for each SNPis contained in the final column,
namely beta_joint
. The SNPs are contained in this data frame
according to their significance, with the most significant SNP, i.e. the
SNP with the largest absolute \(z\)-statistic, now located in the first
row of the data frame. If no SNPs are detected as significant in the
discovery GWAS, MSE_minimizer
merely returns a data frame which
combines the two inputted data sets.
Ferguson, J., Alvarez-Iglesias, A., Newell, J., Hinde, J., & O'Donnell, M. (2017). Joint incorporation of randomised and observational evidence in estimating treatment effects. Statistical Methods in Medical Research, 28(1), 235\(-\)247. doi:10.1177/0962280217720854
https://amandaforde.github.io/winnerscurse/articles/discovery_replication.html
for illustration of the use of MSE_minimizer
with toy data sets and
further information regarding computation of the combined SNP-trait
association estimates for significant SNPs.