Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

doi:10.1145/3821418

DOI: 10.1145/3821418 ISSN: 2157-6904

Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

Abdulkadir Erol, Trilok Padhi, Agnik Saha, Mehmet Emin Aktas, Ugur Kursuncu

The rapid advancement of Large Vision-Language Models (LVLMs) has enhanced their capabilities from content creation to productivity enhancement. Despite their innovative potential, LVLMs exhibit vulnerabilities, especially in generating potentially toxic or unsafe responses. Malicious actors can exploit these vulnerabilities to propagate toxic content using strategically crafted prompts without fine-tuning or compute-intensive procedures. Despite ongoing red-teaming efforts to identify and mitigate these risks, the exploration of LVLM vulnerabilities remains nascent and yet to be fully addressed in a systematic approach. This study systematically examines the vulnerabilities of open and closed-weight LVLMs, including

LLaVA

,

InstructBLIP

,

Fuyu

,

Qwen

,

DeepSeek

,

Gemini

,

GPT

, and

Grok

, using adversarial prompting strategies informed by social theories to simulate real-world social manipulation tactics. Our findings show that (i) toxicity and insult are the most prevalent behaviors, with mean toxicity scores 19.32% and 12.36%, respectively; (ii)

Gemini-2.0-Flash

,

LLaVA-v1.6-Vicuna-13B

, and

Grok-2-Vision-1212

are the most vulnerable models. Their toxic response rates reach 46.93%, 23.81%, and 17.98%, respectively, while insult response rates reach 47.94%, 14.62%, 12.27%, respectively; (iii) prompting strategies incorporating dark humor and multimodal toxic prompt completion significantly elevate these vulnerabilities. Despite extensive safety alignment efforts, models still generate content with varying degrees of toxicity when prompted with adversarial inputs, highlighting the urgent need for enhanced safety mechanisms and robust guardrails in LVLM development.

Outline

Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models

More from our Archive