condlike_rep.Rd
condlike_rep
is a function which attempts to produce less biased
SNP-trait association estimates for SNPs deemed significant in the discovery
GWAS, using summary statistics from both discovery and replication GWASs. The
function computes three new association estimates for each SNP in a manner
based closely on the method described in
Zhong and
Prentice (2008). It also returns confidence intervals for each new
association estimate, if desired by the user.
condlike_rep(
summary_disc,
summary_rep,
alpha = 5e-08,
conf_interval = FALSE,
conf_level = 0.95
)
A data frame containing summary statistics from the
discovery GWAS. It must have three columns with column names
rsid
, beta
and se
, respectively, and columns
beta
and se
must contain numerical values. Each row must
correspond to a unique SNP, identified by rsid
.
A data frame containing summary statistics from the
replication GWAS. It must have three columns with column names
rsid
, beta
and se
, respectively, and all columns must
contain numerical values. Each row must correspond to a unique SNP,
identified by the numerical value rsid
. SNPs must be ordered in the
exact same manner as those in summary_disc
, i.e.
summary_rep$rsid
must be equivalent to summary_disc$rsid
.
A numerical value which specifies the desired genome-wide
significance threshold for the discovery GWAS. The default is given as
5e-8
.
A logical value which determines whether or not
confidence intervals for each form of adjusted association estimate is also
to be computed and outputted. The default is conf_interval=FALSE
.
A numerical value between 0 and 1 which specifies the
confidence interval to be computed. The default setting is 0.95
which results in the calculation of a 95% confidence interval for the
adjusted association estimate for each SNP.
A data frame with summary statistics and adjusted association
estimates of only those SNPs which have been deemed significant in the
discovery GWAS according to the specified threshold, alpha
, i.e.
SNPs with \(p\)-values less than alpha
. The inputted summary data
occupies the first five columns, in which the columns beta_disc
and
se_disc
contain the statistics from the discovery GWAS and columns
beta_rep
and se_rep
hold the replication GWAS statistics. For
the default setting of conf_interval=FALSE
, the new adjusted
association estimates for each SNP, as defined in the aforementioned paper,
are contained in the next three columns, namely beta_com
,
beta_MLE
and beta_MSE
. For the case when
conf_interval=TRUE
, the lower and upper boundaries of each
confidence interval for each form of adjusted estimate are included in the
data frame as well as the adjusted estimates for each SNP. The SNPs are
contained in this data frame according to their significance, with the most
significant SNP, i.e. the SNP with the largest absolute \(z\)-statistic,
now located in the first row of the data frame. If no SNPs are detected as
significant in the discovery GWAS, condlike_rep
merely returns a
data frame which combines the two inputted data sets.
Zhong, H., & Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics (Oxford, England), 9(4), 621\(-\)634. doi:10.1093/biostatistics/kxn001
https://amandaforde.github.io/winnerscurse/articles/discovery_replication.html
for illustration of the use of condlike_rep
with toy data sets and
further information regarding computation of the adjusted SNP-trait
association estimates and their corresponding confidence intervals for
significant SNPs.