mr_simss
is the main function for the method, MR-SimSS, which
is a method based on simulated sample splitting in order to alleviate
Winner's Curse bias in MR causal effect estimates. It also takes into
account sample overlap between the exposure and outcome GWASs. It uses GWAS
summary statistics and works in combination with existing MR methods, such as
IVW and MR-RAPS.
mr_simss(
data,
subset = FALSE,
sub.cut = 0.05,
est.lambda = TRUE,
n.exposure = 1,
n.outcome = 1,
n.overlap = 1,
cor.xy = 0,
n.iter = 1000,
splits = 2,
pi = 0.5,
pi2 = 0.5,
threshold = 5e-08,
mr_method = "mr_ivw",
parallel = TRUE,
n.cores = NULL,
lambda.thresh = 0.5
)
A data frame to be inputted by the user containing summary
statistics from the exposure and outcome GWASs. It must have at least five
columns with column names SNP
, beta.exposure
,
beta.outcome
, se.exposure
and se.outcome
. Each row must
correspond to a unique SNP, identified by SNP
.
A logical which permits the user to perform this method with
either the original complete set of SNPs or a subset of SNPs in order to
reduce computational time. The default setting is subset=FALSE
.
A numerical value required if subset=TRUE
, which ensures
that for a single iteration of our method, the number of instruments
selected if the full set of SNPs is used and the number of instruments if
merely the subset is used will be equal with probability at least
1-sub.cut
.
A logical which allows the user to specify if they want to use
the function, est_lambda
, to obtain an estimate for lambda, a
term used to describe the correlation between the SNP-outcome and
SNP-exposure effect sizes. This correlation is affected by the number of
overlapping samples between the two GWASs and the correlation between the
exposure and the outcome. Thus, it is recommended to use est_lambda
if the fraction of overlap and the correlation between exposure and outcome
are unknown. The default setting is est.lambda=TRUE
.
A numerical value to be specified by the user which is equal
to the number of individuals that were in the exposure GWAS. It should be
specified by the user if est.lambda=FALSE
. The default setting is
n.exposure=1
.
A numerical value to be specified by the user which is equal
to the number of individuals that were in the outcome GWAS. It should be
specified by the user if est.lambda=FALSE
. The default setting is
n.outcome=1
.
A numerical value to be specified by the user which is equal
to the number of individuals that were in both the exposure and outcome
GWAS. It should be specified by the user if est.lambda=FALSE
. The
default setting is n.overlap=1
. The function requires that this value
is less than or equal to the minimum of n.exposure
and
n.outcome
.
A numerical value to be specified by the user which is equal to
the observed correlation between the exposure and the outcome. This value
must be between -1 and 1. It should be specified by the user if
est.lambda=FALSE
. The default setting is cor.xy=0
. If this
value is unknown, the user is encouraged to use the function
est_lambda
.
A numerical value which specifies the number of iterations of
the method, i.e. the number of times sample splits are randomly simulated.
The default setting is n.iter=1000
.
A numerical value that must be equal to 2 or 3, indicating
whether splits of 2 or 3 should be simulated. It is recommended that in the
case of no overlap between the two GWASs that splits of 2 should be used
while in the presence of overlap, especially full overlap, splits of 3
should be used. The default setting is splits=2
.
A numerical value which determines the fraction of the first split
in both the 2 and 3 split approaches. This is the fraction that will be used
for SNP selection. The default setting is pi=0.5
. This value must be
between 0 and 1.
A numerical value which determines the fraction of the second split
in the 3 split approach. The default setting is pi2=0.5
. This value
must be between 0 and 1.
A numerical value which specifies the threshold used to
select instrument SNPs for MR at each iteration. The default setting is
threshold=5e-8
. This value must be between 0 and 1.
A string which specifies the MR method that MR-SimSS works in
combination with. It is possible to use any method outputted in the list
TwoSampleMR::mr_method_list()$obj
. However, it is currently advised
that the user chooses "mr_ivw"
or "mr_raps"
. The default
setting is mr_method="mr_ivw"
.
A logical value which allows the user to specify if they wish
to use this function in parallel or in series. The default setting is
parallel=TRUE
. It is advisable to use this default, especially when
n.iter
is large.
A numerical value which determines how many cores will be used
if parallel=TRUE
. This value should be supplied by the user if they
wish to use less cores than the output of parallel::detectCores()-1
.
The default setting is n.cores=NULL
.
A value which is used when estimating lambda to
obtain a subset of SNPs which have absolute z-statistics for both exposure and outcome GWASs less than
this value. The method then assumes that both of the true SNP-outcome and
SNP-exposure effect sizes of each SNP in this subset are approximately 0.
The default setting is lambda.thresh=0.5
.
A list containing two elements, summary
and results
.
summary
is a data frame with one row which outputs b
, the
estimated causal effect of exposure on outcome obtained using the
MR-SimSS method, as well as se
, the associated standard
error of this estimate and pval
, corresponding p-value. It
also contains the MR method used, the average number of instrument SNPs used
in each iteration and the number of iterations used. results
is a
data frame which contains the output from each iteration. It is in a similar
style as the output from using the function mr
from the
TwoSampleMR
R package.
https://amandaforde.github.io/mr.simss/articles/perform-MR-SimSS.html
for illustration of the use of mr_simss
with a toy data set and further
information regarding this MR method.