2026-05-13
> Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark
Microsoft's new MDASH (multi-model agentic scanning harness) scored 88.45% on the CyberGym cybersecurity benchmark, surpassing single-model systems including Anthropic's Mythos and OpenAI's GPT-5.5. It runs more than 100 specialized AI agents across multiple models in a staged pipeline that finds, debates and proves vulnerabilities with proof-of-concept exploits. Microsoft used MDASH to disclose 16 new Windows vulnerabilities, including four critical remote code execution flaws fixed in May's Patch Tuesday.
→ read on external site ↗