Recovering Labels from Crowdsourced Data

An Optimal and Polynomial Time Method

Introduction

Workers are given binary tasks to which they have to give a response: YES or NO

Examples

  • Image Classification: “Does this image contain a dog?”
  • Text Moderation: “Is this comment toxic or offensive?”
  • Sentiment Analysis: “Does this review express positive sentiment?”

Questions

3 questions

  • recover the true label?
  • rank the workers?
  • estimate their abilities?

Main Quetion: How can we accurately recover the labels?

This Talk

  1. Introducing the non-parametric isotonic model and main result
  2. Presenting two already existing algorithms
  3. Iterative Spectral Voting (ISV) algo and key insights

The Isotonic Model

Illustration

  • Two binary tasks with labels in \(\{\color{green}{-1},\color{blue}{+1}\}\)
  • Unknown vector \(x^*\) of labels with \(d=9\) tasks \(x^*= (\color{blue}{+1},\color{blue}{+1},\color{green}{-1},\color{blue}{+1},\color{green}{-1},\color{blue}{+1},\color{blue}{+1},\color{blue}{+1},\color{blue}{+1})\)

\[ Y=\left(\begin{array}{ccccccccc} \color{green}{-1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} & \color{green}{-1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} \\ \color{blue}{+1} & \color{blue}{+1} & \color{blue}{+1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} \\ \color{blue}{+1} & \color{green}{-1} & \color{green}{-1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} \\ \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} & \color{blue}{+1} & \color{blue}{+1} \end{array}\right) \]

\[ Y=\left(\begin{array}{ccccccccc} \color{red}{0} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{red}{0} & \color{green}{-1} & \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} \\ \color{red}{0} & \color{blue}{+1} & \color{red}{0} & \color{blue}{+1} & \color{green}{-1} & \color{red}{0} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} \\ \color{blue}{+1} & \color{green}{-1} & \color{green}{-1} & \color{blue}{+1} & \color{green}{-1} & \color{blue}{+1} & \color{red}{0} & \color{blue}{+1} & \color{blue}{+1} \\ \color{green}{-1} & \color{blue}{+1} & \color{blue}{+1} & \color{red}{0} & \color{green}{-1} & \color{blue}{+1} & \color{red}{0} & \color{red}{0} & \color{red}{0} \end{array}\right) \]

  • Rate of observations: \(\color{red}{\lambda= 0.72}\)

Observation Model

We observe

\[Y = B\odot(M\mathrm{diag}(x^*) + E) \in \mathbb R^{n \times d}\]

where \(\odot\) is the Hadamard product.

  • \(x^*\in \{-1, 1\}^d\) is the vector of unknown labels
  • \(E\) is a matrix of independent sub-gaussian noise
  • \(M \in [0,1]^{n \times d}\) is the unknown ability matrix.
  • \(B\) is a Bernoulli \(\mathcal B(\lambda)\) “mask” matrix, modelling partial observations

Shape Constraints

  • \(M \in [0,1]^{n \times d}\) represents the ability matrix of worker/task pairs
  • We assume that \(M\) is isotonic up to a permutation \(\pi^*\) , that is

\[ M_{\pi^*}=\left(\begin{array}{ccccccccc} \color{#000099}{0.9} & \color{#000099}{0.8} & \color{#000099}{0.9} & \color{#000099}{1\phantom{.0}} \\ \color{#4B0082}{0.8} & \color{#4B0082}{0.7} & \color{#4B0082}{0.9} & \color{#4B0082}{0.8} \\ \color{#990066}{0.6} & \color{#990066}{0.7} & \color{#990066}{0.7} & \color{#990066}{0.6} \\ \color{#CC0000}{0.5} & \color{#CC0000}{0.7} & \color{#CC0000}{0.5} & \color{#CC0000}{0.6} \end{array}\right) \]

\[ M_{\phantom{{\pi^*}}}=\left(\begin{array}{ccccccccc} \color{#990066}{0.6} & \color{#990066}{0.7} & \color{#990066}{0.7} & \color{#990066}{0.6} \\ \color{#CC0000}{0.5} & \color{#CC0000}{0.7} & \color{#CC0000}{0.5} & \color{#CC0000}{0.6} \\ \color{#000099}{0.9} & \color{#000099}{0.8} & \color{#000099}{0.9} & \color{#000099}{1\phantom{.0}} \\ \color{#4B0082}{0.8} & \color{#4B0082}{0.7} & \color{#4B0082}{0.9} & \color{#4B0082}{0.8} \end{array}\right) \]

Square Norm Loss VS Hamming Loss

Hamming Loss: \[ \sum_{k=1}^d \mathbf 1\{\hat x_k \neq x_k^*\}\]

Square Norm Loss: \[ \|M \mathrm{diag}(x^* \neq \hat x)\|_F^2\]

  • If workers are bad (\(M \sim 0\)), \(M\) is small but Hamming Loss is large (\(\sim d\))
  • Square norm loss evaluates the quality of \(\hat x\) rather than performances of workers

Crowdsourcing Problems

Setting

\[ Y = B\odot(M \mathrm{diag}(x^*) + E)\] Shape constraint: \(\exists \pi^*\) s.t. \(M_{\pi^*} \in [0,1]^{n \times d}\) is isotonic

Objectives

Recovering the labels (\(x^*\))

Ranking the workers (\(\pi^*\))

Estimating their abilities (\(M\))

Losses

\(\phantom{\color{purple}{\mathbb E}}\|M \mathrm{diag}(x^* \neq \hat x)\|_F^2\)

\(\phantom{\color{purple}{\mathbb E}}\|M_{\pi^*} - M_{\hat \pi}\|_F^2\)

\(\phantom{\color{purple}{\mathbb E}}\|M - \hat M\|_F^2\)

Crowdsourcing Problems

Setting

\[ Y = B\odot(M \mathrm{diag}(x^*) + E)\] Shape constraint: \(\exists \pi^*\) s.t. \(M_{\pi^*} \in [0,1]^{n \times d}\) is isotonic

Objectives

Recovering the labels (\(x^*\))

Ranking the workers (\(\pi^*\))

Estimating their abilities (\(M\))

Risks

\(\color{purple}{\mathbb E}[\|M \mathrm{diag}(x^* \neq \hat x)\|_F^2]\)

\(\color{purple}{\mathbb E}[\|M_{\pi^*} - M_{\hat \pi}\|_F^2]\)

\(\color{purple}{\mathbb E}[\|M - \hat M\|_F^2]\)

MiniMax Risk for Recovering Labels

\[\mathcal R^*_{\mathrm{reco}}(n,d,\lambda)=\min_{\hat x}\max_{M,\pi^*, x^*}{\mathbb E}[\|M \mathrm{diag}(x^* \neq \hat x)\|_F^2]\]

  • minimize on all estimator \(\hat x\)
  • maximize square norm loss on all \(M\), \(\pi^*\), \(x^*\) such that \(M_{\pi^*}\) is isotonic

Main Results

Theorem

If \(\tfrac{1}{\lambda} \leq n \leq d\), there exists a poly. time method \(\hat x\) achieving \(\mathcal R^*_{\mathrm{reco}}\) up to polylog factors, i.e. for all \(M\in[0,1]^{n\times d}\), \(\pi^*\) and \(x^* \in \{-1,1\}^d\), \[ \max_{M,\pi^*, x^*}{\mathbb E}[\|M \mathrm{diag}(x^* \neq \hat x)\|_F^2] \lesssim \mathcal R^*_{\mathrm{reco}}(n,d,\lambda) \enspace . \]

Moreover, up to polylogs,

\[ \mathcal R^*_{\mathrm{reco}}(n,d,\lambda) \asymp \frac{d}{\lambda} \enspace . \]

Already Existing Methods

Majority Voting

\(\hat x^{(maj)}_k = \mathrm{sign} \left( \sum_{i=1}^n Y_{ik} \right)\)

Maximum risk of \(\hat x^{(maj)}\):

\[\max_{M,\pi^*, x^*}\mathbb E\|M\mathrm{diag}(\hat x^{maj} \neq x^*)\|_F^2 \asymp \tfrac{d \sqrt{n}}{\lambda}\]

OBI-WAN (Shah et al., 2020)

Idea: Population term \((M\mathrm{diag}(x^*))(M\mathrm{diag}(x^*))^T= MM^T\) is independent of \(x^*\)

PCA Step:

\[\hat v = \underset{\|v\|=1}{\mathrm{argmax}}\|v^T Y\|^2\]

Result on Obi-Wan Method

Theorem 1 (Shah et al., 2020)

In submodel where \(\mathrm{rk}(M)= 1\), \(\hat x^{(\mathrm{Obi-Wan})}\) achieves minimax risk up to polylogs: \[\mathbb E\|M\mathrm{diag}(\hat x^{\text{Obi-Wan}} \neq x^*)\|_F^2 \lesssim \frac{d}{\lambda} \quad \textbf{(minimax)}\]

Theorem 2 (Shah et al., 2020)

In the model where \(M_{\pi^*\eta^*}\) is bi-isotonic up to polylogs: \[\mathbb E\|M\mathrm{diag}(\hat x^{\text{Obi-Wan}} \neq x^*)\|_F^2 \lesssim \frac{d\sqrt{n}}{\lambda} \quad \textbf{(not minimax)}\]

Iterative Spectral Voting

PCA Step

\[\hat v = \underset{\|v\|=1}{\mathrm{argmax}}\|v^T Y\|^2\]

Main idea: Since \(M\) is isotonic, \(M\) is close to low rank

\[\|MM^T\|_{\mathrm{op}} \gtrsim \|M\|_F^2\]

Voting Step

Define the weighted vote vector (on second sample)

\[\hat w = \hat v^T Y\]

Define the estimated label as

\[\hat x_k^{(1)} = \mathrm{sign}(\hat w_k)\mathbf{1}\bigg\{|\hat w_k| \gg \sqrt{\sum_{i=1}^n \hat v_i B_{ik}}\bigg\}\]

Iterate these two Steps

Keep certain labels: if \(\hat x^{(t)}_k\neq 0\), set \(\hat x^{(t+1)}_k= \hat x^{(t)}_k\).

Restrict columns \(k\) of \(Y\) to uncertain labels \(\hat x^{(t)}_k=0\)

Repeat PCA Step + Voting Step on \(Y \mathrm{diag}(\hat x^{(t)} \neq 0)\) a polylogarithmic number of times.

Output last estimator \(\hat x^{(T)}\)

\(\newcommand{\and}{\quad \mathrm{and} \quad}\)

Ranking and Estimating Abilities

  1. Use ISV to recover some labels \(\hat x\)
  2. Restrict to columns \(k\) such that \(\hat x_k \in \{-1, 1\}\)
  3. Use methods from Pilliat et al. (2024) to estimate \(\pi^*\) and \(M\)

Conclusion

  1. Three connected problems: recovering labels, ranking workers and estimating their abilities
  2. Non parametric isotonic model very flexible in crowdsourcing problems
  3. No computational-statistical gap recovering labels, ranking or estimating ability