Bayesian Bias Mitigation for Crowdsourcing

Wednesday, May 9, 2012 - 1:30pm - 2:30pm
Keller 3-180
Fabian Wauthier (University of California, Berkeley)
Biased labelers are a systemic problem in crowdsourcing, and a
comprehensive toolbox for handling their responses is still being
developed. A typical crowdsourcing application can be divided into
three steps: data collection, data curation, and learning. At present
these steps are often treated separately. We present Bayesian Bias
Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all
three. Most data curation methods account for the effects of
labeler bias by modeling all labels as coming from a single latent
truth. Our model captures the sources of bias by describing
labelers as influenced by shared random effects. This approach can
account for more complex bias patterns that arise in ambiguous or hard
labeling tasks and allows us to merge data curation and learning into
a single computation. Active learning integrates data collection with
learning, but is commonly considered infeasible with Gibbs sampling
inference. We propose a general approximation strategy for Markov
chains to efficiently quantify the effect of a perturbation on the
stationary distribution and specialize this approach to allow active
learning with Gibbs sampling in our model. Experiments show BBMC to
outperform many common heuristics when a useful consensus labelling
cannot be estimated.
MSC Code: