chrXRa software pioneers the measurement of demethylated CpGs in X-linked genes, providing a tangible metric for assessing defective X chromosome inactivation (XCI) and X chromosome reactivation (X-Ra). Our software precisely delineates the fraction of demethylated CpGs within X-linked genes, serving as a robust indicator of XCI control.

Notably, chrXRa stands as a gonosome-based biomarker specifically designed to address the unique genetic landscape of women. Its focal point on the X chromosome directly tackles gender-specific health concerns, emphasizing the significance of gender-sensitive approaches in advancing precision medicine.
chrXRa sheds light on the associations between chromosome X inactivation and critical aspects of women's health and disease, encompassing aging, infertility, response to infections, and cancer risks across diverse sites.

General inquiries

Associate professor

Juan R. Gonzalez

juanr.gonzalez@isglobal.org

Statistician

Alejandro Cáceres

alejandro.caceres@isglobal.org

Licensing

Innovation department

ISGlobal

innovation@isglobal.org
<span class='plain'>chrXRa</span> quick example

Download and install

Download chrXRa software filling out the https://xra.isglobal.org/request form. Save the binary file in a local directory (i.e.B /path/to/directory) and install the package locally. For Windows do

setwd("/path/to/directory")

install.packages("chrXRa_1.0.zip", repos = NULL, type = "win.binary")

and load as usual

library(chrXRa)

Compute XRa from GEO data

We will now illustrate how to compute XRa level for each women using data from the GEO study with accession number GSE226206. XRa is a female biomarker that gives the individual level of chromosome X reactivation from methylation data. GSE226206 is a longitudinal study on COVID 19 patients that were in the UCI. To compute XRa for each women in the study,we first obtain the phenotype, methylation, and CpG annotation data

library(GEOquery)

#retrieve data
gsm <- getGEO("GSE226206", AnnotGPL =TRUE)[[1]]

#obtain phenotypes
pheno <- pData(phenoData(gsm))

#obtain methylation levels
met <- exprs(gsm)

#obtain annotation
annot <- fData(gsm)

From the phenotype data we select the women and from the annotation data we select the CpGs that belong to the chromosome X

#choose female data
pheno$sub <- pheno$`subject id:ch1`
selsub <- pheno$`Sex:ch1` == "female"

#choose phenotye and methylation for women
phenofemale <- pheno[selsub, ]
metfemale <- as.matrix(met[,selsub])

#choose CpGs only in chrX
sel <- annot$CHR == "X"
metfemale <- as.matrix(metfemale[sel, ])
metfemale <- t(metfemale)

We only use methylation data from the chromosome X of females to compute their XRa levels. Note that selecting chromosome X CpGs before XRa computation greatly improves performance.

XRa is particularly computed from the CpGs of genes that are assumed to be under inactivation. It gives the level of reactivation by the removal of complete demethylation of these CpGs. To check that everything is fine, we first plot the methylation levels of genes under chromosome X inactivation (XCI) and those that escape from it. The function IGlevels extracts the CpG values of genes under XCI and EGlevels from those that escape. The plot of both distributions should show that the CPGs under XCI rarely have complete demethylation (\(> 0.2\)), and more frequely they have intermediate and complete methylation. Cpgs from genes that scape should only have complete demethylation and complete methylation.

#compute the CpG levels of inactive genes
ig <- IGlevels(metfemale, colnames(metfemale))

#compute the CpG levels of escapees genes
eg <- EGlevels(metfemale, colnames(metfemale))

#compare their distributions
plot(density(ig, na.rm = TRUE),  
     lwd = 3, ylim = c(0,5), 
     main = " ", xlab = "CpG methylation levels")

lines(density(eg, na.rm = TRUE), lwd = 3, lty = 2)

legend("topright", 
       legend = c("CpGs levels in inactive genes",
                  "CpGs levels in escapees"),
       lwd = c(3, 3), 
       lty = c(1, 2)) 

abline(v = 0.2)

Now, we compute the level of XRa for each individual. That is the fraction of CpGs under XCI with values lower than 0.2. We use the function XRa, and plot its distribution across women

#compute XRa
phenofemale$XRa <- XRa(metfemale, colnames(metfemale))

plot(density(phenofemale$XRa, na.rm = TRUE), 
     xlim = c(0.02, 0.20), 
     main = "" , xlab="X-Ra")

XRa progression in time

We can use the valued of XRa to study its association with phenotypes and clinical outcomes. In this study, patients were followed during COVID19 infection at four points. The first time point was admission to the UCI and time point 3 corresponds to one day before release from hospital. Methylation was obtained from blood samples at baseline and follow-ups.

This data set allows us to study the evolution of XRa throughout time. We first organize the data

phenofemale$time <- as.numeric(phenofemale$`time point:ch1`)

oo <- order(phenofemale$sub)
opheno <- phenofemale[oo, ]

For time points 3 and 4 we can ask whether the XRa measures are consistent between subjects. In particular, we can test whether XRa is reliable using an intra-class correlation coefficient. We organize the data where each row is an individual and the first and second column are the test-retest of the subjects

library(irr)

data <- matrix(opheno$XRa[opheno$time%in%c(3,4)], 
               ncol = 2, byrow = TRUE)

colnames(data) <- c("time3", "time4")
data
##            time3      time4
##  [1,] 0.11809781 0.09420780
##  [2,] 0.07279693 0.07572684
##  [3,] 0.13004282 0.13432499
##  [4,] 0.13973405 0.12328150
##  [5,] 0.06130268 0.05183683
##  [6,] 0.10818120 0.09105251
##  [7,] 0.11990083 0.11020960
##  [8,] 0.13950868 0.13973405
##  [9,] 0.07595222 0.07009240
## [10,] 0.10344828 0.08789723
icc(data, model = "twoway", type = "consistency", unit = "single")
##  Single Score Intraclass Correlation
## 
##    Model: twoway 
##    Type : consistency 
## 
##    Subjects = 10 
##      Raters = 2 
##    ICC(C,1) = 0.945
## 
##  F-Test, H0: r0 = 0 ; H1: r0 > 0 
##      F(9,9) = 35.5 , p = 5.72e-06 
## 
##  95%-Confidence Interval for ICC Population Values:
##   0.796 < ICC < 0.986

An ICC of 0.94 shows that the time variation is much smaller than the change between subjects and that, in particular, those that score high XRa in time 3 will very likely score high in time 4. Therefore, XRa is a reliable biomarker that can distinguish between subjects.

While XRa variation may be small over a period of COVID19 infection. We can use the other time points to see whether we can detect a significant variation due to time. We can fit a mixed model effect with a random effect on subject and study whether the individual trajectories have a clear pattern, given by the fixed effect of time during the study, that is COVID19 progression.

library(lmerTest)

opheno$age <- as.numeric(opheno$`age:ch1`)

mod <- lmerTest::lmer(XRa ~ time + age  + (1 |sub), 
                      data = opheno)

summary(mod)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: XRa ~ time + age + (1 | sub)
##    Data: opheno
## 
## REML criterion at convergence: -179.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.13789 -0.47842 -0.00463  0.55390  1.49439 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  sub      (Intercept) 7.183e-04 0.02680 
##  Residual             7.798e-05 0.00883 
## Number of obs: 37, groups:  sub, 10
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)   
## (Intercept)  1.097e-01  2.821e-02  8.258e+00   3.889  0.00434 **
## time        -3.032e-03  1.378e-03  2.607e+01  -2.200  0.03688 * 
## age          4.409e-05  4.473e-04  7.982e+00   0.099  0.92391   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr) time  
## time -0.120       
## age  -0.944 -0.010

We see that there is a significant effect of time on XRa levels, indicating that the biomarker may be sensitive to transient states of an individual's health, given by the stage of the infection. We can plot the individual trajectories and appreciate a slow decline on XRa though time.

plot(XRa ~ time, data = opheno, pch = 16)

for(ss in unique(opheno$sub))
  lines(XRa ~ time,
        data = opheno[opheno$sub == ss,], col = ss)

We can see also that trajectories between time points 3 and 4 are quite stable and do not criss-cross much between subjects, indicating high reliability of the biomarker.