Summary: | Decoding the genomes from organisms that encompass all taxonomies provides the foundation for extensive, large scale studies of biological molecules such as RNA, protein and carbohydrates. The high-throughput studies facilitated by the existence of these genome sequences necessitate the development of new analytic methods for the interpretation of large sets of results. The work herein focuses on the development of a novel clustering method for the analysis of protein array results and examines its utilization in the analysis of integrated interaction data sets. Sets of proteins that interact with a molecule of interest were clustered according to their functional similarity. The simUI distance metric in the statistical analysis package BioConductor was applied to measure the similarity of two proteins utilizing the assembly of their Gene Ontology annotation. Clusters were identified by partitioning around medoids and interpreted using the summary label provided by the Gene Ontology annotation of the medoid. The utility of the method was tested on two published yeast protein array data sets and shown to allow interpretation of the data to yield novel biological hypotheses. We performed a protein array screen using the E3 ubiquitin ligase and PDZ domain-containing protein LNX1. We combined these results with other published LNX1 interactors to produce a set of 220 proteins that was clustered according to Gene Ontology annotation. From the clustering results, 14 proteins were selected for subsequent examination by co-immunoprecipitation, of which 8 proteins were confirmed as LNX1 interactors. Recognition of 6 proteins by specific LNX1 PDZ domains was confirmed by fusion protein pull-downs. This work supports the role of LNX1 as a signalling scaffold. The interpretation of protein array results using our novel clustering method facilitated the identification of candidate molecules for subsequent experimental analysis. Thus our analytical method facilitates identification of biologically relevant molecules within a large data set, making this method an essential component of complex, high-throughput experimentation.
|