sim_stats.Rdsim_stats is a function which can be used to simulate summary
statistics for a set of independent SNPs for both discovery and replication
GWASs. This function allows the user to create toy datasets with which they
can explore the implementation of the Winner's Curse correction methods.
sim_stats(
nsnp = 10^6,
h2 = 0.4,
prop_effect = 0.01,
nid = 50000,
rep = FALSE,
rep_nid = 50000
)A numerical value which specifies the total number of SNPs that
the user wishes to simulate summary statistics for. The default is 1,000,000 SNPs,
i.e. nsnp=10^6.
A numerical value between 0 and 1 which represents the desired
heritability of the trait of interest, or in other words, the total
variance explained in the trait by all SNPs. The default is a moderate
heritability value of 0.4, h2=0.4.
A numerical value between 0 and 1 which determines the
trait's polygenicity, the fraction of the total number of SNPs which are truly associated with the
trait. The default setting is prop_effect = 0.01.
A numerical value which specifies the number of individuals that
the discovery GWAS has been performed with. This value defaults to 50,000 individuals, nid=50000.
A logical value which allows the user to state whether they would
also like to simulate summary statistics for a replication GWAS based on
the same parameters and true effect sizes. The default setting is
rep=FALSE.
A numerical value which specifies the number of individuals
that the replication GWAS has been performed with. Similar to nid,
this value defaults to 50,000 individuals, nid=50000.
A list containing three different components in the form of data
frames, true, disc and rep. The first element,
true has two columns, rsid which contains identification numbers
for each SNP and true_beta which is each SNP's simulated true
association value. disc has three columns representing the summary statistics
one would obtain in a discovery GWAS. For each SNP, this data frame contains
its ID number, its estimated effect size, beta, and associated standard error, se.
Similarly, if the rep argument in the function has been set to TRUE,
then the data frame, rep has three columns representing the summary statistics
one would obtain in a replication GWAS. In this data frame, just as with disc,
the values for beta have been simulated using the true association values, true_beta,
and the standard errors are reflective of the chosen sample size.
If rep=FALSE, NULL is merely returned for this third element.