At TruDiagnostic we believe in holding ourselves and the larger community of leaders and researchers in our field, to the highest scientific, transparency, and thoughtful engagement standards. In our Expert Dialogues content, we share insights, responses, and additional context to the latest buzz and conversations taking place in the field of epigenetics.
Varun B. Dwaraka, Natalia Carreras-Gallo, Hannah Went, Ryan Smith
First, before we get started, we want to acknowledge the difficulty in creating excellent biological age algorithms. The definition of biological age is complex and highly debatable. However, it is important to talk about why we are searching for this surrogate. As age is the biggest risk factor for disease and death, we know age is important. However, we also know that people age differently, and therefore we need a tool to measure this process biochemically. This tool, hopefully, will allow us to predict the trajectory of aging so we can find interventions that work to reverse risk without having to wait decades to analyze impact and results.
So with that in mind, let's discuss some of Dr. Attia’s recent summary and report.
▐ How does methylation relate to age?
Dr. Attia starts his article by talking about methylation with age. In this, he gives a great overview of some of the first very important clocks such as Dr. Steve Horvath’s 2013 DNAm Age clock which was trained on multi-tissue and chronological age. He mentions the connection between these early clocks and trends of methylation.
However, as a critique, he also mentions that “the predictive value of this “biological clock” was poor in certain tissue types including breast, endometrium, skeletal, and cardiac tissues, which may indicate that other factors such as hormones, high tissue turnover, and the recruitment of satellite or stem cells may all be factors that influence a tissue’s DNA methylation “age”.
We think that this critique is absolutely correct and highlights one of the biggest issues with first generation or chronological trained clocks. The signal it is measuring is optimized for chronological age which shows limited information. In fact, it can certainly give us WRONG information as well.
In a 2020 publication posted in eLIFE, Dr. Jamie Justice from Wake Forest (now of the longevity XPrize) asked the question, “Are these DNAm clocks ready for clinical trial application in geroscience?” The short answer conclusion was no. According to the criteria in the study, no clock was able to satisfy the last requirement: “responsive to interventions that beneficially affect the biology of aging”, namely these first generation clocks.
In a recent publication in Nature Aging, researchers from Colombia evaluated these clocks in a randomized controlled trial in which n = 220 adults without obesity were randomized to 25% CR or ad libitum control diet for 2 yr. They said the following:
“We found that CALERIE intervention slowed the pace of aging, as measured by the DunedinPACE DNAm algorithm, but did not lead to significant changes in biological age estimates measured by various DNAm clocks including PhenoAge and GrimAge. Treatment effect sizes were small. Nevertheless, modest slowing of the pace of aging can have profound effects on population health. The finding that CR modified DunedinPACE in a randomized controlled trial supports the geroscience hypothesis, building on evidence from small and uncontrolled studies and contrasting with reports that biological aging may not be modifiable."
Thus, if we were to reflect back on the criteria established by Jamie Justice, only one of the clocks would be able to fit the requirements of a DNAm biomarker.
This same trend was also shown more recently in an analysis of senolytic interventions by Dr. Edwin Lee and TruDiagnostic. In their longitudinal trial of Dasatinib and Quercetin, they showed first generation clocks increased while 2nd and 3rd showed minor decreased ages that did not reach significance. The takeaway here is that 1st generation clocks have significant limitations and probably are not effective tracking change of biological age.
▐ Next-generation tests show vast advancements.
This brings us to the next topic, which is the clocks. Second generation clocks go beyond training to predict chronological age and instead are trained on biological information. The main clock used for Dr. Attia’s overview was PhenoAge which is one of the first 2nd generation clocks.
Everything he says about PhenoAge is true. However, this was a clock published in 2018. That was over five years ago and there have been much more advanced clocks which have been developed.
You can see a good graphical representation of this below.
These newer 2nd and 3rd generation clocks are simply much better for a few reasons. For instance, it has been shown that one standard deviation increase in the Horvath clock represented a 2% increase in mortality while a 1 standard deviation increase in the 3rd generation (trained against biological phenotypes longitudinally) DunedinPACE had a 64% increase in mortality (Belsky et al. 2022, eLIFE). Similarly, GrimAge and DunedinPACE have shown robust associations with mortality, independent of genetic influenced when assessed in older populations (Fohr et al. 2023, JoG Part A).
Comparing even PhenoAge, we see that the hazard ratios are 1.57 for DunedinPACE and 1.14 for PhenoAge, while the DunedinPACE has a higher hazard ratio in every other comparison.
Even with that said, we still have newer clocks. Most notably, SystemsAge which was developed by Yale and OMICm Age which was developed by Harvard and Trudiagnostic. Together, these new clocks offer even more features such as the ability to differentiate between heterogeneous types of aging and show even higher association to disease outcomes.
Lastly, recent analyses have shown the utility of these new age clocks to independent frailty markers, suggesting that these outputs have clinical importance. A recent report explored the importance of DNA methylation clocks, specifically tools estimating epigenetic age, in the context of spinal deformity surgery (Safaee, Dwaraka, et al. 2023, JNS Spine). Notably second (PC PhenoAge and PC GrimAge) and third generation (DunedinPACE) clocks offered age estimation and have the potential to improve risk assessment for surgical procedures. The results highlight significant associations between epigenetic clocks and markers of frailty and disability. Moreover, patients facing postoperative complications exhibited shorter telomere lengths, suggesting a potential role for aging biomarkers in surgical risk assessment. Integrating biological age into existing risk assessment tools could enhance accuracy and provide valuable insights for a broad range of stakeholders, including patients, surgeons, and healthcare decision-makers.
▐ Methylation is affected by more than just age.
Dr. Attia also helps explain that methylation is a variable which is widely affected by many different types of behaviors. The examples of smoking are great and actually show that DNAm can be used as a biomarker for many other features such as smoking status. In fact, studies have shown that DNAm prediction of smoking status can capture more health related risk than self reported smoking status.
However, Dr. Attia mentions this as a fault and not a feature. That is because he asks, “do we need to control in the analysis for all features which might change DNAm status?” We certainly believe that the answer is no, as long as the biological clocks are trained on enough biological data.
In fact, this was the reason that OMICm Age was created. Unfortunately, aging is extremely complex. In 2020, there were nine recognized hallmarks of aging. Today, we have already expanded to 14-15 recognized hallmarks of aging depending on who you ask. This is because as we study aging more, we are obtaining more resolution to the features which change with age. Thus, in order to capture as much information as possible in a single aging model, Dr. Jessica Lasky-Su at Harvard decided to measure as much information as possible on a select cohort. This included 2,000+ proteins (compared to 88 measured in GrimAge), 1,400+ metabolites, 61 clinical variables, and put them into a single epigenetic age model. Together, the full picture of measured human biological changes could be incorporated into a single model.
In the figures below, you can see that OMICm Age has higher odds and hazard ratios of disease than any other clock. Additionally, it is predictive of death within 10 years at a 90% accuracy while chronological age is only accurate 75% of the time.
▐ Discerning signal from biological noise.
The noise of biological age clocks, both biological and technical, are also topics which Dr. Attia brings into consideration. This is for good reason as both of these have been large limitations of the clocks. However, once again, new techniques and clocks have significantly improved from what Dr. Attia mentions in his post.
First, let's discuss the improvements in discerning signals from biological noise. As Dr. Attia mentions, “in order to use biological clocks as a tool for testing geroprotective molecules, it’s also important to understand how day-to-day behaviors may cause acute fluctuations in predicted DNA methylation age on a relatively short timescale, as this fluctuation creates noise that may mask any potential positive or negative changes associated with various interventions.”
We agree and think that studies to investigate these changes will tell us a lot about biological clocks. For instance, see a paper like this from Jesse Poganik and Valdim Gladyshev at Harvard, which show that you can have immediate increases in age with certain stressors such as pregnancy, viral illness, or trauma. However, it is transient.
While some might speculate that the transient rise is something which detracts from biological clocks, we believe this argument is flawed. Almost all biomarkers work by quantifying individuals at a single point in time and using that information to predict a trajectory for a certain outcome (risk). That is also what a biological age clock is meant to do.
Like any biomarker, biological clocks are showing outcome trajectory, ie. predicting risk of certain aging outcomes. Given the assumption that the current behaviors are chronic, this still is doing what it is meant to do.
To put this in a classical medical example, think of hs-CRP. CRP is an acute phase protein which has been shown to change at high magnitudes in a short time; up to 1000-fold within just an hour. However, this is not a bad thing for clinicians who use this measure of CRP. It is still providing information at the point of the draw and can be informative to assess a patient's risk.
In fact, if we take this investigation a little further, we can actually see that DNAm based biomarkers might even offer more exciting features for biological signals in the context of CRP. For instance, Dr. Ricardo Marioni, an epigenetic researcher from the University of Edinburgh, created DNAm based predictors of proteins. When he analyzed CRP, he found that DNAm predictors of CRP might actually be more clinically beneficial than classical CRP measurements.
Not only did DNAm prediction of CRP (DNAmCRP) show an association to age which wasn’t present with classical CRP, it also showed higher correlation to multiple diseases as you can see in the table below.
In another paper, Dr. Horvath mentioned the following: “Another potential benefit of using DNAm-based biomarkers instead of plasma biomarkers is that the DNAm-based biomarkers are representing a longer average estimate of the biomarker concentration and are not as affected by day-to-day variations that could bias the results.”
Thus, if we are worried about transient variation of DNAm clocks giving us wrong information at the point of time, the incorporation of traditional plasma biomarkers through epigenetics might be able to improve this. It might even show that these biological locks can act like HbA1C or three-month running averages of current aging trajectories.
In fact, OMICm Age and GrimAge have both used this approach. They have incorporated DNAm predictors of other clinical variables to incorporate biological signals into these clocks. GrimAge did it initially with 8 plasma proteins, but recently incorporated DNAmHbA1C and DNAmCRP in GrimAge2. The outcome was that GrimAge2 has better hazard ratios to disease than the previous version.
The other study Dr. Attia mentions to reference biological noise is the study led by Dr. Kara Fitzgerald on lifestyle interventions. However, he fails to mention that this published study only used a 2013 first generation clock which has high technical variation and captures signals associated with chronological age. It did not use or report data on 2nd or 3rd generation clocks. In fact, when this data was run on the 2nd generation clocks, there was no statistically significant change (results unpublished).
In this section on biological noise, Dr. Attia also mentions a Japanese study of two, healthy, Japanese men in their 30s who analyzed epigenetic age 24 times over three months. The result was that the predicted PhenoAge fluctuated by 8-12 years, with daily changes between 5-6 years. Dr. Attia’s direct quote is that “This is a profound amount of biological noise!”. However, this is the wrong conclusion. It is unclear whether the changes here are due to technical noise as the original PhenoAge, and all clocks prior to 2021, had very poor technical precision (Higgins-Chen et al. 2022).
When we talk about precision and accuracy of these epigenetic clocks, we have to define some important terms. Accuracy of the first generation clocks are easy to determine because we are able to see how close they were to someone’s current age. However, for biological clocks, we don’t have a defining outcome to compare. Thus, generally, accuracy of a biological clock is determined by how effective it is at predicting aging outcomes.
That is why we look at hazard ratios and odds ratios for these clocks to disease outcomes and time until death. Technical precision on the other hand is relatively easy to gauge. We do this with the Intra-class correlation value or ICC.
In the context of statistics and research, the Intraclass Correlation Coefficient (ICC) is a measure used to assess the reliability or consistency of measurements made by different observers or methods. In layman's terms, ICC helps us understand how much agreement or similarity there is between different sets of measurements. The ICC value ranges from 0 to 1, where:
- **0 indicates no agreement:** The measurements are completely inconsistent or unreliable.
- **1 indicates perfect agreement:** The measurements are entirely consistent or reliable.
For instance, if we measure a blood sample’s biological age separately 10 times, how much error could we see in the prediction of biological age?
This has been a huge problem with clocks in the past. This variance is shown by ICC values below which have been published in this study by Daniel Belsky from Colombia. You can see that we could have over 20% variance for the clock used in the Japanese study.
However, this changed with this article by Albert Higgins-Chen and Morgan Levine from Yale. They developed a computational solution using principal components to improve this precision. When added to previous clocks, all clocks achieved ICC values above .95.
Additionally, in a recent PNAS paper, these PC clocks were shown to have similar relationships to disease outcomes as the earlier clocks. The DunedinPACE didn’t need this correction for precision. Now, in general, the clocks are much more precise. However, you also need to know the clock's level of precision. Otherwise, when you retest you could be reading into the lab's error instead of your own aging.
Would we see the same amount of variation in the study if these newer methods were used? We would suspect not.
It is also important to mention that this is a study that used a 450k array which has been discontinued since 2018. It has fewer and more unreliable probes. This platform has since been upgraded to the EPIC850k array which trained most newer clocks. Now, we even have the EPICv2 array, validated in this paper, which shows that even more probes with poor thresholds were removed. In addition, it must be noted that DunedinPACE further removed noisy probes in their development, by focusing only on probests that produced high reproducibility (Sugden et al. 2020).
Dr. Attia also mentions that the Japanese study illustrates another source of biological noise in epigenetic clocks: variations across different types of cells and tissues. This is absolutely another important point of focus for anyone doing epigenetic clock research as it has been a noticeable problem. For instance, in a recent paper from Alan Tomusiak and Eric Verdin from the buck institute showed that “human naïve CD8+ T cells, which decrease in humans during aging, exhibit an epigenetic age 15–20 years younger than effector memory CD8+ T cells isolated from the same individual.”
This is a problem because if someone uses, for instance, caffeine; they could change their cell proportion and that probably is not really impacting their healthspan or lifespan. However, this issue has been improved with two techniques. The first is that clocks have been made without regard to immune cell subset. This was the aim of the Buck institute paper above as they developed the INTRINClock.
This clock was trained from looking at consistent CpGs from each cell type and putting it into a single model. We used this clock in our study on senolytics and it showed no differences from the other clocks.
The other way that it has been solved is with immune deconvolution metrics.
As every cell differentiates from a pluripotent stem cell, it receives its identity from unique methylation signatures. We can identify these signatures and show their relative proportions of each signal and therefore cell type. By incorporating these predictions into the epigenetic clocks, we can control for this variation. In fact this recent paper from Andrew Teschendorff and TruDiagnostic showed that these deconvolution methods have a high degree of accuracy with high correlations to flow cytometry and RNA-Seq.
By incorporating these, you can make immune cell changes controlled for in biological age output and reduce this as a confounding variable. This has been done in clocks like GrimAge and OMICm Age, but was not done in the Phenoage clock used in the study. Also, as our immune deconvolution features get more advanced we can start to control for more cell types. OMICm Age is the only clock to incorporate 12 cell immune deconvolution methods.
Thus, to recap on Dr. Attia’s claims on biological noise, most of these have been addressed! The examples used for Dr. Attia’s claims have used outdated methods.
▐ Discerning signal from technical noise.
Dr. Attia also tries to address the technical noise. However, he fails to mention that the ICCs are now extremely precise and in most cases way above the minimum ICC values of CLIA labs for traditional diagnostics.
What he does mention is that DNAm array probes might provide region specific analysis and not high resolution on individual CpGs. However, as mentioned earlier, this has been improved with new array methods. In addition, just because it is not the best doesn’t mean it isn’t clinical actionable.
Everyone knows WGS provides more accurate information on genetic sequences, however, it is also way more expensive. The performance is an easy critique to make but not one that incorporates the research which already shows associations to health and one that doesn’t seem to care about scale of cost to the end user or the healthcare system. The longevity field is already critiqued as being for the super rich, setting a standard that we can only use highly precise sequencing would make it worse.
Additionally, with WGBS you still have technical issues. You have to go to at least 100x sequencing to have the same resolution of arrays. That makes it even more expensive and cost prohibitive. Early phase liquid biopsy companies like Grail used epigenetic arrays as well but this was never a critique to their technology.
▐ Missing links for lifespan prediction.
Dr. Attia wraps up his critiques with the largest point. That is “is the ability of an aging clock to predict remaining years of life better than chronological age can?”
As shown above, the answer is definitely yes. In the OMICm Age preprint, OMICm Age showed it is predictive of death within 10 years at a 90% accuracy while chronological age is only accurate 75% of the time. Additionally, PhenoAge, EMRAge, and GrimAge all outperformed the accuracy of chronological age.
Dr. Attia uses an example of this in the following statement: “What does an epigenetic age of 112 mean to a person who is 105 years old? They should already be dead? Conversely, what does an epigenetic age of 70 mean for a 105-year-old?”
This example is a logical fallacy which places a heavy burden on these clocks. Mortality predictors have existed for a long time before epigenetic clocks. In fact, many risk scores in medicine aim to predict relative and absolute risk. Biological age clocks are no different. They are aiming to predict and quantify the risk of age. Just because there is an individual outlier doesn’t mean the risk model is ineffective or not clinically useful. BMI is still used in the classical medical system and in life insurance models when we know that it doesn’t account for fat mass or muscle mass. The output of age is a concept used to show the trajectory of the biggest risk factor of almost every chronic disease and death.
Biological clocks do respond to interventions we know positively affect biological aging (caloric restriction). They are extremely precise with many newer clocks showing less than a 0.5% variation. They are starting to incorporate multiple features of biology in 2nd and 3rd generation clocks which are trained to predict metabolites, proteins, and clinical values. They are controlling for immune cell characteristics to improve cell type variation. Lastly, they do show the ability to predict death over chronological age and even blood based biomarker methods.