Health insights: how to listen to the biochemical pathways in your microbiome

In a previous blog, we discussed why it is more important to know what your microbes are doing (their biochemical function) rather than just their names or whether they are present.  We mentioned that Viome can detect more than 80,000 active microbial genes (aka functions) in a typical person’s microbiome. Collectively, these microbial functions eventually produce small molecules (called metabolites) that directly or indirectly influence the host. Amidst this complex collection of biochemical activities that is your microbiome, how do we make sense of what specifically may impact your health or illness?  How does it all add up to the health scores you see in your Viome app? Read on to find out.

Think of a large public market full of people, perhaps a flea market, shopping hall, or bazaar.  When you listen from a distance, it sounds chaotic, with a lot of random noise.  But as you approach smaller groups of people, cues help you to interpret what is happening. You can more clearly hear what people are saying, feel the dynamics of their interactions, pick up details from their conversations, and appreciate the outcomes of their transactions.  It’s the same with the microbiome – the large collection of biochemical activities may seem chaotic from afar, but when you get closer, you can see the molecules interacting with each other in identifiable patterns and pathways, much like the patterns of transactions that occur in the market. These identifiable pathways produce molecules that impact our health and illness. So let’s get closer and listen to these patterns.

Making sense of microbial interactions

Fortunately for us, thousands of researchers before us have studied the many coordinated activities of biochemical pathways in all kinds of organisms, including microbes.  Some of these functional pathways are very well understood and well documented, while others are still being discovered at labs around the world.  One example pathway employs several dozen gene products, which interact with each other to biochemically transform carbohydrates from the food you eat into beneficial metabolites, such as short chain fatty acids like butyrate.  Hundreds of these well understood pathways are documented in one of the most respected knowledge repositories called the Kyoto Encyclopedia of Genes and Genomes, or KEGG 1.  Knowledge repositories like KEGG organize domain research into interpretable pathway maps, and although these repositories offer limited annotations on the role(s) for each molecular function, they are invaluable for making sense of pathways. These resources help us to access and leverage much of the world’s collective knowledge on this scientific domain.

So how does Viome make this collective knowledge work for you? After identifying all the individual genes, functions, and microbes expressed in your microbiome, there are a multitude of ways we could organize your data, such as providing you with a simple count for each of these components (i.e., active microbial transcripts). Our earlier blog discusses these issues in depth, including why Viome’s approach focuses on helping you to know what your microbiome is doing (biochemically). Therefore, what if we consider an approach to evaluate the coordinated activities across the microbiome and to quantify activities of entire biochemical pathways? So for example, if we were able to quantify the coordinated activity of several dozen transcripts that together produce butyrate, that would be a very good proxy for the end-product metabolite, butyrate.  

Meaningful data points cube

That is precisely what we do in Viome – starting from a carefully curated set of transcripts (functional components) that participate in particular biochemical pathways, we combine the individual activity levels for each transcript into a level of combined activity of the entire biochemical pathway.  This combined quantification is then normalized into a score between 0 to 100 to make it more easily understandable and interpretable. This pathway score essentially provides insight into the potential health impact of that biochemical pathway.  Several such interrelated biochemical pathways are then combined further into a higher-level health insight score.

Viome provides a suite of health insight scores that together provide a holistic view of an individual’s health status. Each score represents a key molecular mechanism, selected based on our understanding of the underlying biochemical activities. As an example, consider the Oral Inflammatory Pathways score, which assesses microbial signals that correlate with inflammation of the oral cavity.  During the score design process, we examine the relevant scientific and clinical domain, like the activities associated with microbial colonization, consumption of salivary proteins, and destruction of host tissues in the mouth, and identify the microbial features associated with these phenomena, as shown in the left part of the figure below 2. These selected features are then combined algorithmically to determine weights based on the first component (PC1) of Principal Component Analysis (PCA).  The final score, when applied to a large cohort, ideally exhibits a gaussian-like curve as shown in the figure below.

Score example oral inflammatory

Isolating biological signals of interest

Let us further break down and reconstruct the underlying principles for development of Viome pathway scores and health insights.  In this section we will focus on step 1, shown in the figures above, which is to “curate a set of molecular features that best model the biological signal of interest.”

Pre-step: Cohort development.  Scores are developed in the context of a general cohort of individuals, each of whom has provided all necessary biological samples – stool, blood, and saliva – at the same time. This cohort represents the adult Viome population and is carefully curated using stringent criteria, with consideration of distributions like birth sex, age, BMI, and dietary breakdown. Additional criteria serve to diminish biases related to sequencing depth or data sparseness. 

Domain exploration.  We survey knowledge sources as well as both scientific and clinical literature in order to identify an exhaustive set of features based on their known functions or correlations to relevant metadata. With each iteration, we hone in on an initial set of features as well as metadata which we can utilize in downstream processing.

Metadata curation. Numerous kinds of metadata are available through consumer-submitted information in the Viome App, such as health conditions, medications, supplements, clinically validated questionnaires, and other survey questions. Our computational tools facilitate a scalable normalization process, which allows us to leverage any kind of metadata to conduct a variety of mini case/control studies during the iterative score development process. The output of metadata curation includes all relevant metadata and molecular data.

Signal definition. There are two primary analyses we employ to drive the iterative score design process towards success, and these include: 1) an array of case/control analyses; and 2) raw data analyses. The output of the signal definition is an initial, very minimal, feature space that appropriately stratifies the reference cohort. This minimal feature space sets a precedent for us to build upon for the final feature selection. 

Feature selection. The final feature set is defined through an extremely iterative curation process, which utilizes all of the features and metadata identified through the above steps. Iteration continues until score metrics reach a maxima through case/control analyses, stratifying the cohort so that selected metadata signals are consistent, and this includes correlations based on richness/sequencing depth. If the metrics are unsatisfactory, then the iterative process continues, though occasionally, the score design process may reset and begin from the start. All of these iterative steps can be visualized as follows.

Score concept to score definition

Of course, each of these steps (domain exploration, metadata curation, signal definition, and feature selection) is highly involved, and cannot be done without sophisticated computational tools that facilitate and expedite the detailed work. Our MetaData Labeler (MDL) tool transforms raw cohort metadata into curated labels for downstream automation. Our Target Data Pipeline (TDP) outputs a list of candidate features which account for the largest variance across a single case/control scenario. Our Pan-TDP tool visualizes the variance of specific features across multiple case/control scenarios. Our Score Generation Tool (SGT) is our standard tool for generating scores, and it creates simple visualizations, distributions, correlations, loadings and heatmaps. Finally, our Score Iterator Tool (SIT) iterates over long lists of features and visualizes SGT results across multiple case/control scenarios.

Once the design steps are finished and a set of features is defined, the complete feature set is algorithmically combined.  We aggregate the expression of the feature set using a weighted sum, to capture the activity of the overall mechanism.  Weights could be determined many ways – our current method is through Principal Component Analysis (PCA), because this method can rapidly determine each feature’s contribution to the total variance of expression across a cohort.  The description of this algorithm is beyond the scope of this blog, but perhaps will be described in a future blog.

So what does a pathway score really mean?

As we’ve said before, each score is a measure of the activity of biological pathways, functions, or systems of interest. A biological pathway is a set of molecular interactions in your body which have a particular function that could determine your health status. Using domain-driven analysis and data-driven analysis from our meta transcriptomic data, our goal is to capture a biological signal such that it correlates with meaningful health and disease phenotypes.

So for example, oral inflammatory pathways should be correlated with oral health conditions associated with inflammation.  An increased Oral Inflammatory Pathways score equates to increased transcripts related to inflammation of the gum.  Periodontopathic bacteria, typically present at low abundance, exert inflammatory effects which may disrupt the oral microbiota, overwhelm the host’s immune response, and result in the development of oral diseases 3.  For example, the activity of three species that comprise the “red complex” – Porphyromonas gingivalis, Treponema denticola, Tannerella forsythia – are acknowledged as the most important pathogens in various oral health issues 4. All three are usually found together suggesting a cooperative role in tissue inflammation and disease.  The literature also suggests that the presence of oral inflammation is associated with systemic conditions such as metabolic syndrome 5.

Oral Inflammatory Pathways Score

Based on these factors, we show here the association between the Oral Inflammatory Pathways score and the self-reported oral health issues (n=192) and metabolic health issues (n=152), within the Viome customer population.

From pathway scores to aggregate health insights

Throughout this blog, we have used Oral Inflammatory Pathways as a specific illustrative example of a molecular pathway.  This is one of dozens of molecular pathways that Viome evaluates to determine your health status.  Once we quantify multiple pathways, we aggregate them into “functional scores”.  For example, we combine the Oral Inflammatory Pathways score with the Oral Mucin Degradation Pathway score to quantify the Gum Health functional score.  Similarly, we combine the scores for Cavity Promoting Pathways and Cavity Promoting Microbes to determine the Dental Health functional score.  And the Oral Sulfide Production Pathways score and Oral Polyamine Production Pathway score to determine the Breath Odor functional score.  Finally, the Gum Health score, the Dental Health score, and the Breath Odor score are aggregated into the top-level Oral Health integrative score, which provides an overall assessment.

The figure below illustrates this aggregation.  In this example, this person’s Oral Health of 53 is deemed “not optimal” since one of its components, Breath Odor, is significantly not optimal, due to their Oral Polyamine Production Pathway score, and an average Oral Sulfide Production Pathway score.  Note however, that this person’s Gum Health score is average since it is a combination of a not-optimal Oral Inflammatory Pathways and a good score for the Oral Mucin Degradation Pathways.  All of these scores are available today in your Viome App from your saliva sample!  

(The specific mathematical functions used for the aggregation of scores represented by the symbol “⊕” in the figure is beyond the scope of this blog.)

Health functional pathway scores

In conclusion

This blog has covered the following key points:

  • A Viome molecular score is a measure of the activity of biological pathways, functions or systems

  • A score is built by curating molecular features, combining them algorithmically, and normalizing them to a reference cohort

  • Curating molecular features is a sophisticated process of domain exploration, metadata curation, signal definition, and feature selection

  • Since each score is associated with a specific biological function, it is correlated with specific health area like oral health or metabolic health

  • Pathway scores are aggregated into functional scores, which are further aggregated into health insight scores in your Viome App

There are many other topics related to scores that we will write about in the future.  For example, how a pathway score changes over time longitudinally.  Or about how scores not only tell you about your health, but are also the basis of your recommendations. Please stay tuned!


1. M. Kanehisa and S. Goto, Nucleic Acids Res 28, 27 (2000).
2. N. S. Jakubovics, S. D. Goodman, L. Mashburn‐Warren, G. P. Stafford, and F. Cieplik, Periodontol. 2000 86, 32 (2021).
3. P. M. Bartold and T. E. V. Dyke, Periodontol. 2000 62, 203 (2013).
4. C. Bodet, F. Chandad, and D. Grenier, Pathol. Biol. 55, 154 (2007).
5. F. Q. Bui, C. L. C. Almeida-da-Silva, B. Huynh, A. Trinh, J. Liu, J. Woodward, H. Asadi, and D. M. Ojcius, Biomed. J. 42, 27 (2019).