NeFut Logo NeFut
Admin Login

[CS.AI] SEVRA-BENCH: Social Engineering of Vulnerabilities in Review Agents

Published at: 2026-06-16 22:00 Last updated: 2026-06-17 01:38
#algorithm #AI #Open Source

Large language model (LLM) reviewers are increasingly used in pull-request (PR) workflows, where their approvals help decide which code is merged into a repository. This raises a question that benchmarks for static vulnerability detection or code generation do not address: can an automated reviewer reject a malicious contribution when the attacker controls both the code change and the accompanying PR text?

We introduce SEVRA-BENCH (Social Engineering of Vulnerabilities in Review Agents), a benchmark that measures how often an automated reviewer approves such adversarial pull requests. Each malicious PR in SEVRA-BENCH is built from a real project commit that previously fixed a vulnerability listed in the Common Vulnerabilities and Exposures (CVE) database. We automatically invert that fix to restore the original vulnerable code and submit it as a pull request wrapped in one of 15 social-engineering framings, which vary the claims made, the supporting evidence, the urgency conveyed, signals of prior approval, and appeals to authority.

SEVRA-BENCH contains 1,062 malicious PRs drawn from Common Vulnerabilities and Exposures (CVE)-linked fixes across the top 10 entries of the 2025 Common Weakness Enumeration (CWE) Top 25. In a realistic setting, we evaluate 8 current LLMs as code review agents on PRs that introduce vulnerabilities previously reported in public disclosures. Our results reveal a sharp gap in security capabilities between closed- and open-source models. We hope SEVRA-BENCH will serve as a valuable resource for advancing open-source models and narrowing this gap.

Blogger's Review: The introduction of SEVRA-BENCH offers a novel perspective on the security of automated code reviews, especially considering the influence of social engineering. This benchmark not only highlights the vulnerabilities of current LLMs in handling malicious requests but also provides a clear direction for future model improvements. Enhancing open-source models is crucial, and we look forward to advancements in their security capabilities.

Original Source: https://arxiv.org/abs/2606.13757

[h] Back to Home