Summary: | Linking microbiome composition obtained from metagenomic or 16S rRNA sequencing to various factors poses a real challenge. The compositional approach to such data is well described: a so-called isometric log-ratio (ILR) transform provides correct treatment of relative abundances. Most existing compositional methods differ in the particular choice of the transform. Although this choice does not influence the prediction of a model, it determines the subset of balances between groups of microbial taxa subsequently used for interpreting the composition shifts. We propose a method to interpret these shifts independently of the initial choice of ILR coordinates by the nearest single-balance shift. We describe here application of the method to regression, classification, and principal balance analysis of compositional data. Analytical treatment and cross-validation show that the approach provides the least-squares estimate of a single-balance shift associated with a factor with possible adjustment for covariates. As for classification and principal balance analysis, the nearest balance method provides results comparable to other compositional tools. Its advantages are the absence of assumptions about the number of taxa included in the balance and its low computational cost. The method is implemented in the R package NearestBalance. IMPORTANCE The method proposed here extends the range of compositional methods providing interpretation of classical statistical tools applied to data converted to the ILR coordinates. It provides a strictly optimal solution in several special cases. The approach is universally applicable to compositional data of any nature, including microbiome data sets. © 2022 Odintsova et al.
|