Summary: | Abstract Objectives Improvements in bioinformatics applications for the enzyme identification of biochemical reactions, enzyme classifications, mining for specific inhibitors and pathfinding require the accurate computational detection of reaction similarity. We provide a set of substrate-product pairs, clustered by reactions that share similar chemical transformation patterns, for which accuracy was calculated, comparing this set with manually curated data sets. Data description The data were analyzed by a new method that naturally split each reaction into compound pairs and loner compounds, which we called architectures (Vazquez-Hernandez et al. in BMC Syst Biol 12:63, 2018). The data include a set of 7491 curated reactions from the KEGG-Ligand data set. The data are presented in two formats, a string format and a tree structure, both of which reflect the splitting process and the final architectures of each reaction. We are also reporting sets of reactions that show similar splitting patterns naturally grouped into clusters of tree structures. The compound pairs in each cluster were compared with the reactant pairs proposed by the KEGG-RCLASS data set, and a match precision value is also provided. These data were collected with the aim of providing research with a confident set of reactant pairs that is useful for selecting between alternative substrate-product pairs predicted by pathfinders.
|