Summary: | Nonribosomal peptides represent a large variety of natural active compounds produced by microorganisms. Due to their specific biosynthesis pathway through large assembly lines called NonRibosomal Peptide Synthetases (NRPSs), they often display complex structures with cycles and branches. Moreover they often contain non proteogenic or modified monomers, such as the D-monomers produced by epimerization. We investigate here some sequence specificities of the condensation (C) and epimerization (E) domains of NRPS that can be used to predict the possible isomeric state (D or L) of each monomer in a putative peptide. We show that C- and E- domains can be divided into 2 sub-regions called Up-Seq and Down-Seq. The Up-Seq region corresponds to an InterPro domain (IPR001242) and is shared by C- and E-domains. The Down-Seq region is specific to the enzymatic activity of the domain. Amino-acid signatures (represented as sequence logos) previously described for complete C-and E-domains have been restricted to the Down-Seq region and amplified thanks to additional sequences. Moreover a new Down-Seq signature has been found for Ct-domains found in fungi and responsible for terminal cyclization of the peptides. The identification of these signatures has been included in a workflow named Florine, aimed to predict nonribosomal peptides from NRPS sequence analyses. In some cases, the prediction of isomery is guided by genus-specific rules. Florine was used on a Pseudomonas genome to allow the determination of the type of pyoverdin produced, the update of syringafactin structure and the identification of novel putative products.
|