3-minute Pitch: Retrieval Guided Contrastive Learning for Hateful Memes Detection

[898 words, 3-minute read]

Hateful memes are captioned images promoting hostility towards specific social groups. Most hateful memes detection systems are logistic classifiers built on the embedding space of pre-trained visual-langauge model (e.g., CLIP). However, we find that under these embedding spaces, hateful memes and belign memes are located in close proximity when they differ in subtle but important details (e.g., Figure 1). This often results in wrong classification. In our recent work, we introduce Retrieval-Guided Contrastive Learning (RGCL) to learn an embedding space that better separate hateful and belign memes. RGCL achieves state-of-the-art performance on the HatefulMemes dataset and HarMeme dataset.


Left: Hateful. Right: Belign. Changing one word flips the label. However, these two memes have high cosine similarity under the CLIP embedding space.

Retrieval-Guided Contrastive Learning

We introduce Retrieval-Guided Contrasitive Loss (RGCL) to pull same-label samples closer and push opposite-label samples further in the embedding space. Computing the RGCL loss for each sample in the batch involves three types of examples:

  1. Pseudo-gold positive example: one same-label sample retrieved from the training set which have high similarity scores under the embedding space. This example pulls same-label memes with similar semantic meanings closer in the embedding space.
  2. Hard negative examples: opposite-label samples in the training set that have high similarity scores under the embedding space. These examples explicitly separate opposite-label samples that are hard to distinguish under the current embedding space.
  3. In-batch negative examples: opposite-label samples in the same batch, as commonly used in contrasitive learning.

Given the sample \(\mathbf{g}_{i}\), the set of negative examples as \(\mathbf{G}_{i}^{-}\), and the pseudo-gold positive example \(g_i^+\), the RGCL loss is:

\[
\begin{align}
\mathcal{L}_{i}^{RGCL} &= L(\mathbf{g}_{i},\mathbf{g}_{i}^{+},\mathbf{G}_{i}^{-}) \newline
&= - \log \frac{ e^{\textrm{sim}(\mathbf{g}_{i},\mathbf{g}_{i}^{+})}}{ e^{\textrm{sim}(\mathbf{g}_{i},\mathbf{g}_{i}^{+})} + \sum_{\mathbf{g}\in\mathbf{G}_{i}^{-}} e^{\textrm{sim}(\mathbf{g}_i,\mathbf{g})}}.
\end{align}
\]

We attach an MLP layer after the last layer of the Vision-Langauge Encoder. To train the logistic classifier and the MLP, we optimise the joint loss of RGCL and Cross Entropy (CE). Our approach is compatible with any vision-langauge encoder.


\[ \mathcal{L} = \mathcal{L}^{RGCL} + \mathcal{L}^{CE} \]

Harnessing the embedding space: Retrieval-based KNN classifier

We show that RGCL indeed induces desirable structures in the embedding space by testing the performance of a K-Nearest-Neighbour (KNN) majority voting classifier. Note that KNN majority voting only performs well if the distance between two samples under the embedding space reflects how they differ in hatefulness, which is the learning goal of RGCL. In addition to demonstrating the effectiveness of RGCL, the KNN majority voting classifier also allows developers to update the hateful memes detection system by simply adding new examples to a retrieval vector database without retraining — a desirable feature for real services in the constantly evolving landscape of hateful memes on the Internet.

Here's how the KNN majority voting classifier works. For a test meme \(t\), we retrieve \(K\) memes located in close proximity within the embedding space from the retrieval vector database \mathbf{G}. We keep a record of the retrieved memes' labels \(y_k\) and similarity scores \(s_k=\text{sim}(g_k, g_t)\) with the test meme \(t\), where \(g_t\) is the embedding vector of the test meme \(t\).
We perform similarity-weighted majority voting to obtain the prediction:


\[
\hat{y}'_t = \sigma(\sum_{k=1}^K\bar{y}_k \cdot s_k),
\]


where \(\sigma(\cdot)\) is the sigmoid function and


\[
\bar{y}_k:=
\begin{cases}
1 &\text{if } y_k= 1\newline
-1 &\text{if } y_k=0
\end{cases}.
\]

Experiment and Results

We evaluate RGCL on the HatefulMemes dataset and the Harmful Meme (HarMeme) dataset. Our system obtains an AUC of 86.7\% and an accuracy of 78.8\% on the HatefulMemes dataset. Our system obtains an AUC of 91.8\% and an accuracy of 87.0\% on the HarMeme dataset. Our system outperforms the state-of-the-art systems like Flamingo and HateCLIPper by a large margin.

Under the RGCL trained embedding space, the simple KNN classifier out-performs zero-shot large multi-modal models such as LLaVA and Flamingo-80B on HatefulMemes and HarMeme datasets. We also demonstrate that the KNN classifier retains competitive performance when it is used to classify examples in HarMemes by retrieving K-nearest neighbours from HatefulMemes. This shows that basing classification on similar examples is very effective for hateful memes detection and that the RGCL trained encoder is able to generalize beyond the training dataset.

Conclusion

We introduced Retrieval-Guided Contrastive Learning to enhance any VL encoder in addressing challenges in distinguishing confounding memes. Our approach uses novel auxiliary task loss with retrieved examples and significantly improves contextual understanding. Achieving an AUC score of 86.7\% on the HatefulMemes dataset, our system outperforms prior state-of-the-art models, including the 200 times larger Flamingo-80B. Our approach also demonstrated state-of-the-art results on the HarMeme dataset, emphasising its usefulness across diverse meme domains.

Make sure you check out the post from the first author, Jingbiao Mei, for more technical details. Also, don't miss his lovely photographs https://jingbiao.me/archive/?tag=Photography!