Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for uns...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2017-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC5546606?pdf=render |
id |
doaj-098e06f2fba349da8fd80c7ffac4ff0e |
---|---|
record_format |
Article |
spelling |
doaj-098e06f2fba349da8fd80c7ffac4ff0e2020-11-24T20:50:16ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01128e018213010.1371/journal.pone.0182130Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.Jiayi WuYong-Bei MaCharles CongdonBevin BrettShuobing ChenYaofang XuQi OuyangYoudong MaoStructural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.http://europepmc.org/articles/PMC5546606?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jiayi Wu Yong-Bei Ma Charles Congdon Bevin Brett Shuobing Chen Yaofang Xu Qi Ouyang Youdong Mao |
spellingShingle |
Jiayi Wu Yong-Bei Ma Charles Congdon Bevin Brett Shuobing Chen Yaofang Xu Qi Ouyang Youdong Mao Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. PLoS ONE |
author_facet |
Jiayi Wu Yong-Bei Ma Charles Congdon Bevin Brett Shuobing Chen Yaofang Xu Qi Ouyang Youdong Mao |
author_sort |
Jiayi Wu |
title |
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. |
title_short |
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. |
title_full |
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. |
title_fullStr |
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. |
title_full_unstemmed |
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning. |
title_sort |
massively parallel unsupervised single-particle cryo-em data clustering via statistical manifold learning. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2017-01-01 |
description |
Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. |
url |
http://europepmc.org/articles/PMC5546606?pdf=render |
work_keys_str_mv |
AT jiayiwu massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT yongbeima massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT charlescongdon massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT bevinbrett massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT shuobingchen massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT yaofangxu massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT qiouyang massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning AT youdongmao massivelyparallelunsupervisedsingleparticlecryoemdataclusteringviastatisticalmanifoldlearning |
_version_ |
1716804184974557184 |