Summary: | We examined the
trade-off between the cost of response redundancy and the gain in output
quality on the popular crowdsourcing platform Mechanical Turk, as a partial
replication of Kosinski et al. (2012) who demonstrated a significant
improvement in performance by aggregating multiple responses through majority
vote. We submitted single items from a validated intelligence test as Human
Intelligence Tasks (HITs) and aggregated the responses from “virtual groups”
consisting of 1 to 24 workers. While the original study relied on resampling
from a relatively small number of responses across a range of experimental
conditions, we randomly and independently sampled from a large number of HITs,
focusing only on the main effect of group size. We found that – on average – a
group of six MTurkers has a collective IQ one standard deviation above the mean
for the general population, thus demonstrating a “wisdom of the crowd” effect.
The relationship between group size and collective IQ was characterised by
diminishing returns, suggesting moderately sized groups provide the best return
on investment. We also analysed performance of a smaller subset of workers who
had each completed all 60 test items, allowing for a direct comparison between
a group’s collective IQ and the individual IQ of its members. This demonstrated
that randomly selected groups collectively equalled the performance of the
best-performing individual within the group. Our findings support the idea that
substantial intellectual capacity can be gained through crowdsourcing,
contingent on moderate redundancy built into the task request.
|