Testing your own government and finance software against a powerful AI model is either responsible security work or a quiet admission that we’ve built critical systems that can’t handle the world we’re walking into. I lean toward “responsible”… but only if the people running this don’t treat it like a checkbox exercise.
Based on what’s been shared publicly, the Indian government is testing vulnerabilities in sensitive, public-facing financial and government applications against Anthropic’s next-generation Mythos AI model. Big Indian tech firms like Infosys and Tata Consultancy Services are involved. One specific focus is building patches for widely used systems, including Infosys’s Finacle banking software.
That’s the fact pattern. The judgment is the uncomfortable part: this is what it looks like when governments start acting like AI is not just a productivity tool but an attacker with infinite patience.
Because the scary thing about advanced AI isn’t that it’s “smart.” It’s that it can try again and again and again. It can generate variations. It can test edge cases. It can write convincing messages. It can hunt for the one weird configuration mistake nobody documented. If you’re defending public systems that millions of people rely on, the old pace of security work—slow audits, slow patch cycles, slow procurement—starts to look like a liability.
And public-facing systems are the worst place to be fragile. They’re the front door. If the front door is weak, it doesn’t matter how strong the safe is.
Involving major vendors cuts both ways. On the good side, it’s practical. The people who built and maintain the software are the ones who can realistically patch it. If Finacle is widely used, then hardening it isn’t just “helping a company.” It’s reducing risk for a huge slice of the banking ecosystem. That’s a public good, whether we like that dependency or not.
On the bad side, the incentives get messy fast. Vendors don’t love public narratives about their systems being vulnerable. Governments don’t love admitting their critical infrastructure has cracks. So the natural temptation is to keep this vague, keep it quiet, patch what’s easy, and declare victory. That’s how you end up with “we tested against AI” as a press line, instead of building real muscle for a world where AI-driven attacks get cheaper every month.
What’s actually at stake here isn’t some abstract cyber risk. It’s normal people having their lives interrupted.
Imagine you’re running payroll at a small company and you can’t get bank transfers out because something upstream got hit and systems are down. Imagine you’re trying to get a government service that only works through one portal, and that portal is suddenly unreliable or compromised. Imagine a bank employee gets a perfectly believable email that looks like it came from an internal team, clicks the link, and now someone has a foothold. None of that requires movie-style hacking. It requires volume, realism, and persistence—exactly what AI can provide.
There’s also a bigger, less comfortable consequence: once a government starts testing systems against a specific AI model, it signals something else. It signals they believe the threat is no longer “a few highly skilled attackers.” It’s “lots of attackers who can rent capability.” That changes the baseline. Defense can’t just be “keep out the best.” It has to be “withstand the average person with good tools.”
This could go right. Done well, this kind of testing forces long-overdue upgrades: better input validation, better authentication flows, better monitoring, faster patching, tighter access controls. Not glamorous stuff, but it’s the difference between a leak that gets caught in minutes and a slow bleed that lasts for months.
But it can also go wrong in a very predictable way: focusing on the cool part (AI) and ignoring the boring part (operations). You can patch code and still lose to weak processes. If help desks can be tricked, if passwords are reused, if access isn’t segmented, if logs aren’t watched, then “AI vulnerability testing” becomes theater. Attackers don’t need a perfect exploit if they can talk their way into the building.
I also don’t love the idea that we might end up in a model-specific mindset—like “we hardened against Mythos, so we’re good.” That’s not how this works. If one model can find a weakness, another model can too. And if the goal becomes keeping up with model capabilities, defenders will always feel behind. The real goal should be raising the floor: making whole categories of failure harder, no matter which tool is used.
To be fair, there’s an alternative view that deserves respect: maybe publicizing AI-focused security testing increases fear more than safety. Maybe it makes people think the systems are already broken. Maybe it gives attackers ideas. Maybe the smarter move is to quietly improve defenses without making AI the headline. I get that.
Still, I’d rather see a government admit the threat is changing than pretend the old playbook is fine. The only version of this I don’t respect is the one where the testing happens, the patches ship, and the deeper habits stay the same—slow response, shallow accountability, and a reliance on vendors to magically keep everything safe.
If India is serious about this, the test isn’t whether Mythos can break something in a lab. The test is whether the government and its partners can fix things fast, keep fixing them, and build a culture where “public-facing” doesn’t mean “publicly vulnerable.”
So here’s the real debate: should governments treat AI-driven security threats as a reason to centralize more control and testing at the top, or as a reason to decentralize responsibility so every agency and vendor is forced to build stronger daily security habits?