Accurate wisdom of the crowd from unsupervised dimension reduction

Wisdom of the crowd, the collective intelligence from responses of multiple human or machine individuals to the same questions, can be more accurate than each individual and improve social decision-making and prediction accuracy. Crowd wisdom estimates each individual’s error level and minimizes the...

Full description

Bibliographic Details
Main Authors: Lingfei Wang, Tom Michoel
Format: Article
Language:English
Published: The Royal Society 2019-07-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.181806
Description
Summary:Wisdom of the crowd, the collective intelligence from responses of multiple human or machine individuals to the same questions, can be more accurate than each individual and improve social decision-making and prediction accuracy. Crowd wisdom estimates each individual’s error level and minimizes the overall error in the crowd consensus. However, with problem-specific models mostly concerning binary (yes/no) predictions, crowd wisdom remains overlooked in biomedical disciplines. Here we show, in real-world examples of transcription factor target prediction and skin cancer diagnosis, and with simulated data, that the crowd wisdom problem is analogous to one-dimensional unsupervised dimension reduction in machine learning. This provides a natural class of generalized, accurate and mature crowd wisdom solutions, such as PCA and Isomap, that can handle binary and also continuous responses, like confidence levels. They even outperform supervised-learning-based collective intelligence that is calibrated on historical performance of individuals, e.g. random forest. This study unifies crowd wisdom and unsupervised dimension reduction, and extends its applications to continuous data. As the scales of data acquisition and processing rapidly increase, especially in high-throughput sequencing and imaging, crowd wisdom can provide accurate predictions by combining multiple datasets and/or analytical methods.
ISSN:2054-5703