Cloudflare released findings from Project Glasswing showing that Mythos and other security-focused LLMs were tested on live code in critical infrastructure, and clarified what must be fixed in operations and governance before these models can be scaled.
What ChangedCloudflare released findings from Project Glasswing showing that Mythos and other security-focused LLMs were tested on live code in critical infrastructure, and clarified what must be fixed in operations and governance before these models can be scaled.
Why It MattersSecurity teams can treat Mythos-style assistants as high-risk helpers that still need explicit human-in-the-loop controls, which can prevent over-automation of critical infrastructure tasks before safeguards are proven. The report is based on testing against live operational code and exposes practical deployment risks such as incomplete correctness, unsafe recommendations, and ambiguous escalation boundaries; monitor false-fail/false-pass behavior, recommendation reliability under real prompts, and whether operational controls consistently block dangerous autonomous actions before scale-up.