Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking

As part of an ongoing multidisciplinary effort at California Polytechnic State University, biologists and computer scientists have developed a new Library-Dependent Microbial Source Tracking method for identifying the host animals causing fecal contamination in local water sources. The Cal Poly Libr...

Full description

Bibliographic Details
Main Author: Johnson, Eric
Format: Others
Published: DigitalCommons@CalPoly 2015
Online Access:https://digitalcommons.calpoly.edu/theses/1511
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2651&context=theses
id ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-2651
record_format oai_dc
spelling ndltd-CALPOLY-oai-digitalcommons.calpoly.edu-theses-26512019-10-24T15:16:55Z Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking Johnson, Eric As part of an ongoing multidisciplinary effort at California Polytechnic State University, biologists and computer scientists have developed a new Library-Dependent Microbial Source Tracking method for identifying the host animals causing fecal contamination in local water sources. The Cal Poly Library of Pyroprints (CPLOP) is a database which stores E. coli representations of fecal samples from known hosts acquired from a novel method developed by the biologists called Pyroprinting. The research group considers E. coli samples whose Pyroprints match above a certain threshold to be part of the same bacterial strain. If an environmental sample from an unknown host animal matches one of the strains in CPLOP, then it is likely that the host of the unknown sample is the same species as one of the hosts that the strain was previously found in. The computer science technique for finding groups of related data (ie. strains) in a data set is called clustering. In this thesis, we evaluate the use of density-based clustering for identifying strains in CPLOP. Density-based clustering finds clusters of points which have a minimum number of other points within a given radius. We contribute a clustering algorithm based on the original DBSCAN algorithm which removes points from the search space after they have been seen once. We also present a new method for comparing Pyroprints which is algebraically related to the current method. The method has mathematical properties which make it possible to use Pyroprints in a spatial index we designed especially for Pyroprints, which can be utilized by the DBSCAN algorithm to speed up clustering. 2015-12-01T08:00:00Z text application/pdf https://digitalcommons.calpoly.edu/theses/1511 https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2651&context=theses Master's Theses and Project Reports DigitalCommons@CalPoly
collection NDLTD
format Others
sources NDLTD
description As part of an ongoing multidisciplinary effort at California Polytechnic State University, biologists and computer scientists have developed a new Library-Dependent Microbial Source Tracking method for identifying the host animals causing fecal contamination in local water sources. The Cal Poly Library of Pyroprints (CPLOP) is a database which stores E. coli representations of fecal samples from known hosts acquired from a novel method developed by the biologists called Pyroprinting. The research group considers E. coli samples whose Pyroprints match above a certain threshold to be part of the same bacterial strain. If an environmental sample from an unknown host animal matches one of the strains in CPLOP, then it is likely that the host of the unknown sample is the same species as one of the hosts that the strain was previously found in. The computer science technique for finding groups of related data (ie. strains) in a data set is called clustering. In this thesis, we evaluate the use of density-based clustering for identifying strains in CPLOP. Density-based clustering finds clusters of points which have a minimum number of other points within a given radius. We contribute a clustering algorithm based on the original DBSCAN algorithm which removes points from the search space after they have been seen once. We also present a new method for comparing Pyroprints which is algebraically related to the current method. The method has mathematical properties which make it possible to use Pyroprints in a spatial index we designed especially for Pyroprints, which can be utilized by the DBSCAN algorithm to speed up clustering.
author Johnson, Eric
spellingShingle Johnson, Eric
Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
author_facet Johnson, Eric
author_sort Johnson, Eric
title Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
title_short Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
title_full Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
title_fullStr Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
title_full_unstemmed Density-Based Clustering of High-Dimensional DNA Fingerprints for Library-Dependent Microbial Source Tracking
title_sort density-based clustering of high-dimensional dna fingerprints for library-dependent microbial source tracking
publisher DigitalCommons@CalPoly
publishDate 2015
url https://digitalcommons.calpoly.edu/theses/1511
https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=2651&context=theses
work_keys_str_mv AT johnsoneric densitybasedclusteringofhighdimensionaldnafingerprintsforlibrarydependentmicrobialsourcetracking
_version_ 1719277452392398848