Summary: | Thesis (MSc)--Stellenbosch University, 2014. === ENGLISH ABSTRACT: This thesis explores the use of networks as a means to visualise, interpret and
mine MS-based proteomics data.
A network-based approach was applied to a quantitative, cross-species LCMS/
MS dataset derived from two yeast species, namely Saccharomyces cere-
visiae strain VIN13 and Saccharomyces paradoxus strain RO88.
In order to identify and quantify proteins from the mass spectra, a workflow
consisting of both custom-built and existing programs was assembled. Networks
which place the identifed proteins in several biological contexts were
then constructed. The contexts included sequence similarity to other proteins,
ontological descriptions, proteins-protein interactions, metabolic pathways and
cellular location.
The contextual, network-based representations of the proteins proved effective
for identifying trends and patterns in the data that may otherwise have
been obscured. Moreover, by bringing the experimentally derived data together
with multiple, extant biological resources, the networks represented the
data in a manner that better represents the interconnected biological system
from which the samples were derived. Both existing and new hypotheses based
on proteins relating to the yeast cell wall and proteins of putative oenological
potential were investigated. These proteins were investigated in light of
their differential expression between the two yeast species. Examples of proteins
that were investigated included cell wall proteins such as GGP1 and SCW4. Proteins with putative oenological potential included haze protection
factor proteins such as HPF2. Furthermore, differences in capacity for maloethanolic
fermentation between the two strains were also investigated in light
of the protein data. The network-based representations also allowed new hypotheses
to be formed around proteins that were identified in the dataset, but
were of unknown function. === AFRIKAANSE OPSOMMING: Hierdie studie verken die gebruik van netwerke om proteonomiese data te visualiseer,
te interpreteer en te ontgin.
'n Netwerkgebaseerde benadering is gevolg ter ontleding van 'n kwantitatiewe
LC-MS/MS datastel wat afkomstig was van twee gis-spesies nl, Saccharomyces
cerevisiae ras VIN1 en Saccharomyces paradoxus ras RO88.
Die massaspektra is met bestaande en selfgeskrewe rekenaarprogramme verwerk
om 'n werkvloei saam te stel ter identifisering en kwantifisering van die
betrokke proteïene. Hierdie proteïene is dan aan bestaande biologiese databasisse
gekoppel om die proteïene in biologiese konteks te plaas. Die gekontekstualiseerde
is dan gebruik om biologiese netwerke van die data te bou. Die
kontekste beskou onder meer lokalisering van selaktiwiteite, ontologiese beskrywings,
ooreenkomste in aminosuur-volgordes en interaksies met bekende
proteïene asook assosiasie en verbintenisse met metaboliese paaie.
Hierdie kontekstuele, netwerk-gebaseerde voorstelling van die betrokke prote-
ïene het effektief duidelike data-tendense en patrone opgelewer wat andersins
nie opmerkbaar sou wees nie. Daarby het die kombinering van eksperimentele
data en bestaande biologiese bronne 'n beter perspektief aan die data-analise
verleen. Beide bestaande en nuwe hipoteses tov gis-selwandproteïene en prote
ïene met moontlike wynkundige potensiaal is ondersoek in die lig van hul
differensiële uitdrukking in die twee gis-spesies. Voorbeelde wat ondersoek is sluit in selwandproteïene soos GGP1 en SCW4 asook waasbeskermingsfaktorproteïen HPF2. Verskille tov kapasiteit mbt malo-etanoliese gisting is ook
gevind. Die netwerk-gebaseerde voorstellings het ook aanleiding gegee tot die
formulering van nuwe hipoteses mbt datastel-proteïene waarvan die funksies
tans onbekend is.
|