vignettes/SRAT.Rmd
SRAT.Rmd
The Sequence Robust Association Test (SRAT) is a fully rank-based, flexible approach to test for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. SRAT was developed specifically for genetic association studies that involve family data, such as genome-wide association studies (GWAS) and next-generation sequencing studies (NGSS).
SRAT offers several advantages over traditional methods:
Robustness: As a fully rank-based method, SRAT is robust to outliers and non-normality in the outcome distribution.
Flexibility in correlation structure: SRAT allows for unknown correlation structures within families.
Improved type I error control: SRAT provides better protection against type I error rate inflation compared to existing methods.
Enhanced power: For settings with skewed outcome distributions, SRAT can be more powerful than parametric approaches.
Special case: SRAT includes the well-known Wilcoxon rank sum test as a special case.
For demonstration, we’ll use the example data from the SKAT package,
which contains: - Z
: A genotype matrix containing genetic
variants - X
: A covariate matrix (e.g., age, gender) -
y.c
: A continuous phenotype variable
Let’s first run the analysis using SKAT for comparison:
# Run SKAT analysis
obj.skat <- SKAT_Null_Model(y.c ~ X, out_type = "C")
skat_pval <- SKAT(Z, obj.skat, weights.beta = c(1, 1))$p.value
print(paste("SKAT p-value:", skat_pval))
Now, let’s run the same analysis using SRAT:
SRAT operates in a two-step process:
Null model fitting (srat.null()
):
This function fits a null model that accounts for the covariates while
ignoring the genetic variants. It returns an object containing elements
needed for the subsequent testing.
Association testing (srat.test()
):
This function tests for the association between the genetic variants and
the outcome, conditional on the null model.
SRAT allows for weighting of genetic variants, which can improve power when certain variants are more likely to be causal:
# Create weights (example: upweight rare variants)
weights <- rep(1, ncol(Z))
maf <- colMeans(Z)/2
rare_variants <- which(maf < 0.05)
weights[rare_variants] <- 2
# Apply weights in SRAT
obj.srat <- srat.null(y.c, X)
weighted_result <- srat.test(Z, obj.srat, w.sqrt = sqrt(weights))
print(paste("SRAT p-value with weights:", weighted_result))
The p-value returned by SRAT indicates the statistical significance of the association between the set of genetic variants and the phenotype. A smaller p-value suggests stronger evidence against the null hypothesis of no association.
SRAT has been shown to be particularly advantageous when: - The outcome distribution is skewed or contains outliers - Family correlation structures are complex - The sample size is relatively small