The goal of the proposed research is to develop proactive approaches for handling systemic biases in machine learning datasets by tackling the following question: Can we build better crowdsourcing systems which are robust to handling subjective biases of human crowd-workers such that datasets derived from such systems are bias-free (thereby leading to unbiased ML models)? In other words, we propose to build robust crowdsourcing systems which can ensure that they generate datasets which are bias-free at the time of creation. As a result, ML algorithms trained in the future (on such datasets) do not have to worry about bias in their training. More specifically, we propose to achieve this goal by designing optimal ways of incentivizing (or dissuading) human crowd-workers so that it is in their best interests (from a utilitarian perspective) to not let their personal biases affect their annotation process, thereby resulting in bias-free data.
We propose to develop a general-purpose framework for the comprehensive study of robust crowdsourcing systems (RCS), which can be utilized to derive a suite of theoretically sound algorithms that ensure robustness in the outputs derived from such crowdsourcing systems in a variety of settings. Our algorithms and models can operate with boundedly rational crowd-workers and are robust to a variety of real-world uncertainties. Specifically, we propose to conduct research on two distinct methodologies for developing RCS, each of which has its own research challenges. Depending on the application domain, one of these approaches may be more suitable, and we intend to develop algorithmic solutions for fundamental problems in each of these approaches.
First, we propose to conduct research in developing game-theory based RCS by modeling the interaction between job requesters (leaders) and crowd-workers (followers) on crowdsourcing platforms as a Stackelberg game. We intend to provide novel game formulations for these problems, define appropriate notions of equilibria in these games, and provide efficient algorithms for computation of such equilibria. Second, we propose to develop RCS based on simultaneous move games. Finally, we propose evaluating the effectiveness of our RCS via theoretical characterization, simulation experiments, and human subject experiments on real-world crowdsourcing platforms, e.g., AMT, Figure Eight, among others. The insights obtained from our evaluation will be used to lead to the development of even more practical and effective RCS.