AI is 10 to 20 times more likely to help you build a bomb if you hide your request in cyberpunk fiction, new research paper says

1 month ago 36

In November 2025, a team of DexAI Icaro Lab, Sapienza University of Rome, and Sant'Anna School of Advanced Studies researchers published a study in which they were able to circumvent the safety guardrails of major LLMs by rephrasing harmful prompts as "adversarial" poems. This week, those same researchers have published a new paper presenting their Adversarial Humanities Benchmark, a broader assessment of AI security that they say reveals "a critical gap" in current LLM safety standards through similar weaponized wordplay.

Expanding on the team's work with adversarial poetry, the Adversarial Humanities Benchmark (AHB) evaluates LLM safety guidelines by rephrasing harmful prompts in alternate writing styles. By presenting prompts as cyberpunk short fiction, theological disputation, or mythopoetic metaphor for the LLM to analyze, the AHB assesses whether major AI models can b...

Read Entire Article

AI is 10 to 20 times more likely to help you build a bomb if you hide your request in cyberpunk fiction, new research paper says

Related

Nvidia RTX Spark's gaming battery life will be 'better than anything you've seen before on RTX laptops'

Nvidia's working with all the anti-cheat vendors to make competive games work on RTX Spark and WoA… just when gaming on Linux was looking good

Nvidia sets out roadmap for RTX Spark, its new PC platform, which also gives us a better idea of when to expect next-gen RTX GPUs