Multimodal Sexism Detection in Memes

external-link

Mar 2026

Finished

Project for the EXIST 2026 Lab competition (sEXism Identification in Social neTworks), Task 3: sexism detection in TikTok videos. The model uses a multimodal architecture with XLM-RoBERTa as text encoder (OCR, transcription, Qwen reasoning), two CrossModalAttention branches for physiological signals (EEG and ET+HR separately), and QwenGatedFusion to integrate video embeddings (Qwen3-VL 8B). Training is two-phase: frozen backbone (only gate, branches, and classifier trained) followed by full fine-tune with layer-wise discriminative learning rates. Loss uses KLDiv over annotator distributions (SoftLabelLoss) instead of binary hard labels. Includes text dropout, early stopping, and CosineAnnealing. Dataset contains ~2500 videos with multi-subject physiological annotations.

Technologies
ai

AI

huggingface

HuggingFace

jupyter

Jupyter

numpy

NumPy

opencv

OpenCV

pandas

Pandas

python

Python

pytorch

PyTorch

sklearn

Sklearn