sim_stats.Rd
sim_stats
is a function which can be used to simulate summary
statistics for a set of independent SNPs for both discovery and replication
GWASs. This function allows the user to create toy datasets with which they
can explore the implementation of the Winner's Curse correction methods.
sim_stats(
nsnp = 10^6,
h2 = 0.4,
prop_effect = 0.01,
nid = 50000,
rep = FALSE,
rep_nid = 50000
)
A numerical value which specifies the total number of SNPs that
the user wishes to simulate summary statistics for. The default is 1,000,000 SNPs,
i.e. nsnp=10^6
.
A numerical value between 0 and 1 which represents the desired
heritability of the trait of interest, or in other words, the total
variance explained in the trait by all SNPs. The default is a moderate
heritability value of 0.4, h2=0.4
.
A numerical value between 0 and 1 which determines the
trait's polygenicity, the fraction of the total number of SNPs which are truly associated with the
trait. The default setting is prop_effect = 0.01
.
A numerical value which specifies the number of individuals that
the discovery GWAS has been performed with. This value defaults to 50,000 individuals, nid=50000
.
A logical value which allows the user to state whether they would
also like to simulate summary statistics for a replication GWAS based on
the same parameters and true effect sizes. The default setting is
rep=FALSE
.
A numerical value which specifies the number of individuals
that the replication GWAS has been performed with. Similar to nid
,
this value defaults to 50,000 individuals, nid=50000
.
A list containing three different components in the form of data
frames, true
, disc
and rep
. The first element,
true
has two columns, rsid
which contains identification numbers
for each SNP and true_beta
which is each SNP's simulated true
association value. disc
has three columns representing the summary statistics
one would obtain in a discovery GWAS. For each SNP, this data frame contains
its ID number, its estimated effect size, beta
, and associated standard error, se
.
Similarly, if the rep
argument in the function has been set to TRUE
,
then the data frame, rep
has three columns representing the summary statistics
one would obtain in a replication GWAS. In this data frame, just as with disc
,
the values for beta
have been simulated using the true association values, true_beta
,
and the standard errors are reflective of the chosen sample size.
If rep=FALSE
, NULL
is merely returned for this third element.