chrXRa software pioneers the measurement of demethylated CpGs in X-linked genes, providing a tangible metric for assessing defective X chromosome inactivation (XCI) and X chromosome reactivation (X-Ra). Our software precisely delineates the fraction of demethylated CpGs within X-linked genes, serving as a robust indicator of XCI control.
Download chrXRa software filling out the https://xra.isglobal.org/request form. Save the binary file in a
local directory (i.e.B /path/to/directory) and install the
package locally. For Windows do
setwd("/path/to/directory")
install.packages("chrXRa_1.0.zip", repos = NULL, type = "win.binary")
and load as usual
library(chrXRa)
We will now illustrate how to compute XRa level for each women using data from the GEO study with accession number GSE226206. XRa is a female biomarker that gives the individual level of chromosome X reactivation from methylation data. GSE226206 is a longitudinal study on COVID 19 patients that were in the UCI. To compute XRa for each women in the study,we first obtain the phenotype, methylation, and CpG annotation data
library(GEOquery)
#retrieve data
gsm <- getGEO("GSE226206", AnnotGPL =TRUE)[[1]]
#obtain phenotypes
pheno <- pData(phenoData(gsm))
#obtain methylation levels
met <- exprs(gsm)
#obtain annotation
annot <- fData(gsm)
From the phenotype data we select the women and from the annotation data we select the CpGs that belong to the chromosome X
#choose female data
pheno$sub <- pheno$`subject id:ch1`
selsub <- pheno$`Sex:ch1` == "female"
#choose phenotye and methylation for women
phenofemale <- pheno[selsub, ]
metfemale <- as.matrix(met[,selsub])
#choose CpGs only in chrX
sel <- annot$CHR == "X"
metfemale <- as.matrix(metfemale[sel, ])
metfemale <- t(metfemale)
We only use methylation data from the chromosome X of females to compute their XRa levels. Note that selecting chromosome X CpGs before XRa computation greatly improves performance.
XRa is particularly computed from the CpGs of genes that are assumed
to be under inactivation. It gives the level of reactivation by the
removal of complete demethylation of these CpGs. To check that
everything is fine, we first plot the methylation levels of genes under
chromosome X inactivation (XCI) and those that escape from it. The
function IGlevels extracts the CpG values of genes under
XCI and EGlevels from those that escape. The plot of both
distributions should show that the CPGs under XCI rarely have complete
demethylation (\(> 0.2\)), and more
frequely they have intermediate and complete methylation. Cpgs from
genes that scape should only have complete demethylation and complete
methylation.
#compute the CpG levels of inactive genes
ig <- IGlevels(metfemale, colnames(metfemale))
#compute the CpG levels of escapees genes
eg <- EGlevels(metfemale, colnames(metfemale))
#compare their distributions
plot(density(ig, na.rm = TRUE),
lwd = 3, ylim = c(0,5),
main = " ", xlab = "CpG methylation levels")
lines(density(eg, na.rm = TRUE), lwd = 3, lty = 2)
legend("topright",
legend = c("CpGs levels in inactive genes",
"CpGs levels in escapees"),
lwd = c(3, 3),
lty = c(1, 2))
abline(v = 0.2)
Now, we compute the level of XRa for each individual. That is the
fraction of CpGs under XCI with values lower than 0.2. We use the
function XRa, and plot its distribution across women
#compute XRa
phenofemale$XRa <- XRa(metfemale, colnames(metfemale))
plot(density(phenofemale$XRa, na.rm = TRUE),
xlim = c(0.02, 0.20),
main = "" , xlab="X-Ra")
We can use the valued of XRa to study its association with phenotypes and clinical outcomes. In this study, patients were followed during COVID19 infection at four points. The first time point was admission to the UCI and time point 3 corresponds to one day before release from hospital. Methylation was obtained from blood samples at baseline and follow-ups.
This data set allows us to study the evolution of XRa throughout time. We first organize the data
phenofemale$time <- as.numeric(phenofemale$`time point:ch1`)
oo <- order(phenofemale$sub)
opheno <- phenofemale[oo, ]
For time points 3 and 4 we can ask whether the XRa measures are consistent between subjects. In particular, we can test whether XRa is reliable using an intra-class correlation coefficient. We organize the data where each row is an individual and the first and second column are the test-retest of the subjects
library(irr)
data <- matrix(opheno$XRa[opheno$time%in%c(3,4)],
ncol = 2, byrow = TRUE)
colnames(data) <- c("time3", "time4")
data
## time3 time4
## [1,] 0.11809781 0.09420780
## [2,] 0.07279693 0.07572684
## [3,] 0.13004282 0.13432499
## [4,] 0.13973405 0.12328150
## [5,] 0.06130268 0.05183683
## [6,] 0.10818120 0.09105251
## [7,] 0.11990083 0.11020960
## [8,] 0.13950868 0.13973405
## [9,] 0.07595222 0.07009240
## [10,] 0.10344828 0.08789723
icc(data, model = "twoway", type = "consistency", unit = "single")
## Single Score Intraclass Correlation
##
## Model: twoway
## Type : consistency
##
## Subjects = 10
## Raters = 2
## ICC(C,1) = 0.945
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(9,9) = 35.5 , p = 5.72e-06
##
## 95%-Confidence Interval for ICC Population Values:
## 0.796 < ICC < 0.986
An ICC of 0.94 shows that the time variation is much smaller than the change between subjects and that, in particular, those that score high XRa in time 3 will very likely score high in time 4. Therefore, XRa is a reliable biomarker that can distinguish between subjects.
While XRa variation may be small over a period of COVID19 infection. We can use the other time points to see whether we can detect a significant variation due to time. We can fit a mixed model effect with a random effect on subject and study whether the individual trajectories have a clear pattern, given by the fixed effect of time during the study, that is COVID19 progression.
library(lmerTest)
opheno$age <- as.numeric(opheno$`age:ch1`)
mod <- lmerTest::lmer(XRa ~ time + age + (1 |sub),
data = opheno)
summary(mod)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: XRa ~ time + age + (1 | sub)
## Data: opheno
##
## REML criterion at convergence: -179.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.13789 -0.47842 -0.00463 0.55390 1.49439
##
## Random effects:
## Groups Name Variance Std.Dev.
## sub (Intercept) 7.183e-04 0.02680
## Residual 7.798e-05 0.00883
## Number of obs: 37, groups: sub, 10
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.097e-01 2.821e-02 8.258e+00 3.889 0.00434 **
## time -3.032e-03 1.378e-03 2.607e+01 -2.200 0.03688 *
## age 4.409e-05 4.473e-04 7.982e+00 0.099 0.92391
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) time
## time -0.120
## age -0.944 -0.010
We see that there is a significant effect of time on XRa levels, indicating that the biomarker may be sensitive to transient states of an individual's health, given by the stage of the infection. We can plot the individual trajectories and appreciate a slow decline on XRa though time.
plot(XRa ~ time, data = opheno, pch = 16)
for(ss in unique(opheno$sub))
lines(XRa ~ time,
data = opheno[opheno$sub == ss,], col = ss)
We can see also that trajectories between time points 3 and 4 are quite stable and do not criss-cross much between subjects, indicating high reliability of the biomarker.