Conditional likelihood method for use with discovery and replication GWASs

condlike_rep is a function which attempts to produce less biased SNP-trait association estimates for SNPs deemed significant in the discovery GWAS, using summary statistics from both discovery and replication GWASs. The function computes three new association estimates for each SNP in a manner based closely on the method described in Zhong and Prentice (2008). It also returns confidence intervals for each new association estimate, if desired by the user.

condlike_rep(
  summary_disc,
  summary_rep,
  alpha = 5e-08,
  conf_interval = FALSE,
  conf_level = 0.95
)

Arguments

summary_disc: A data frame containing summary statistics from the discovery GWAS. It must have three columns with column names rsid, beta and se, respectively, and columns beta and se must contain numerical values. Each row must correspond to a unique SNP, identified by rsid.
summary_rep: A data frame containing summary statistics from the replication GWAS. It must have three columns with column names rsid, beta and se, respectively, and all columns must contain numerical values. Each row must correspond to a unique SNP, identified by the numerical value rsid. SNPs must be ordered in the exact same manner as those in summary_disc, i.e. summary_rep$rsid must be equivalent to summary_disc$rsid.
alpha: A numerical value which specifies the desired genome-wide significance threshold for the discovery GWAS. The default is given as 5e-8.
conf_interval: A logical value which determines whether or not confidence intervals for each form of adjusted association estimate is also to be computed and outputted. The default is conf_interval=FALSE.
conf_level: A numerical value between 0 and 1 which specifies the confidence interval to be computed. The default setting is 0.95 which results in the calculation of a 95% confidence interval for the adjusted association estimate for each SNP.

Value

A data frame with summary statistics and adjusted association estimates of only those SNPs which have been deemed significant in the discovery GWAS according to the specified threshold, alpha, i.e. SNPs with $p$-values less than alpha. The inputted summary data occupies the first five columns, in which the columns beta_disc and

se_disc contain the statistics from the discovery GWAS and columns

beta_rep and se_rep hold the replication GWAS statistics. For the default setting of conf_interval=FALSE, the new adjusted association estimates for each SNP, as defined in the aforementioned paper, are contained in the next three columns, namely beta_com,

beta_MLE and beta_MSE. For the case when

conf_interval=TRUE, the lower and upper boundaries of each confidence interval for each form of adjusted estimate are included in the data frame as well as the adjusted estimates for each SNP. The SNPs are contained in this data frame according to their significance, with the most significant SNP, i.e. the SNP with the largest absolute $z$-statistic, now located in the first row of the data frame. If no SNPs are detected as significant in the discovery GWAS, condlike_rep merely returns a data frame which combines the two inputted data sets.

References

Zhong, H., & Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics (Oxford, England), 9(4), 621$-$634. doi:10.1093/biostatistics/kxn001

Conditional likelihood method for use with discovery and replication GWASs

Arguments

Value

References

See also