sim_stats is a function which can be used to simulate summary statistics for a set of independent SNPs for both discovery and replication GWASs. This function allows the user to create toy datasets with which they can explore the implementation of the Winner's Curse correction methods.

sim_stats(
  nsnp = 10^6,
  h2 = 0.4,
  prop_effect = 0.01,
  nid = 50000,
  rep = FALSE,
  rep_nid = 50000
)

Arguments

nsnp

A numerical value which specifies the total number of SNPs that the user wishes to simulate summary statistics for. The default is 1,000,000 SNPs, i.e. nsnp=10^6.

h2

A numerical value between 0 and 1 which represents the desired heritability of the trait of interest, or in other words, the total variance explained in the trait by all SNPs. The default is a moderate heritability value of 0.4, h2=0.4.

prop_effect

A numerical value between 0 and 1 which determines the trait's polygenicity, the fraction of the total number of SNPs which are truly associated with the trait. The default setting is prop_effect = 0.01.

nid

A numerical value which specifies the number of individuals that the discovery GWAS has been performed with. This value defaults to 50,000 individuals, nid=50000.

rep

A logical value which allows the user to state whether they would also like to simulate summary statistics for a replication GWAS based on the same parameters and true effect sizes. The default setting is rep=FALSE.

rep_nid

A numerical value which specifies the number of individuals that the replication GWAS has been performed with. Similar to nid, this value defaults to 50,000 individuals, nid=50000.

Value

A list containing three different components in the form of data frames, true, disc and rep. The first element,

true has two columns, rsid which contains identification numbers for each SNP and true_beta which is each SNP's simulated true association value. disc has three columns representing the summary statistics one would obtain in a discovery GWAS. For each SNP, this data frame contains its ID number, its estimated effect size, beta, and associated standard error, se. Similarly, if the rep argument in the function has been set to TRUE, then the data frame, rep has three columns representing the summary statistics one would obtain in a replication GWAS. In this data frame, just as with disc, the values for beta have been simulated using the true association values, true_beta, and the standard errors are reflective of the chosen sample size. If rep=FALSE, NULL is merely returned for this third element.