[AI Frontier] Cybersecurity Experts Unhappy with Anthropi...

Anthropic released its latest model Fable on Tuesday, billing it as a public and limited version of its powerful and much-hyped cybersecurity model Mythos. However, not everyone is happy with the restrictions, and a number of cybersecurity researchers and professionals have aired complaints online. Valentina “Chompie” Palmiotti, a well-known security researcher at IBM X-Force, stated, “[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post.” When a prompt triggers its guardrails, Fable pauses the chat and says that its “safety measures flagged this message for cybersecurity or biology topics.” The guardrails were put in place to limit the risk that Fable could be used to develop malware or compromise software — a long-standing concern within Anthropic. The restrictions on biology come from a similar concern around developing biological weapons. When the AI giant released Mythos in April, it restricted the model to a limited number of companies and organizations in what it called Project Glasswing, an effort to deploy the model to secure critical software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries. Despite the good intentions, many cybersecurity experts are still put off by the haphazard nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch, “If you ask it to write secure code, it assumes it is cybersecurity-related work instead of software engineering best practices, and you get downgraded.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It seems to be keyword-based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails.” Another researcher griped on X that “even asking for a code review” triggers Fable’s guardrails. Apart from guardrails inside its models, Anthropic requires cybersecurity professionals to apply to the Cyber Verification Program. If they get approved, the applicants have fewer limitations on using Claude for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber.

Blogger's Review: The strict guardrails set by Anthropic during the release of Fable have sparked widespread debate, highlighting the delicate balance between AI safety and usability. While establishing guardrails is essential to prevent misuse, excessive restrictions may undermine the practicality of AI. As the technology evolves, it is hoped that Anthropic will gradually optimize these limitations to promote healthy development of AI in the cybersecurity domain.

[AI Frontier] Cybersecurity Experts Unhappy with Anthropic's Fable Restrictions