Most AI systems aren't ready. Check yours in 15 min →
HS

Harvard-Perplexity Study: AI Agents Work 26 Minutes vs 33 Seconds

AuthorAndrew
Published on:
Published in:AI

The “AI agents are just faster search” story is already falling apart. And that’s exactly why I don’t fully trust the victory lap.

Because if an agent can sit there and work on something for 26 minutes without you touching it, that’s not search. That’s delegation. And delegation changes who has power, who gets blamed, and how work actually gets done.

Based on what’s been shared publicly, a study from Harvard and Perplexity compared AI agents to traditional search. The headline number is blunt: agents averaged 26 minutes of autonomous work per session. Search averaged 33 seconds. Not “a little better.” Not “a bit more helpful.” A different shape of tool.

They also report cost and time differences that sound almost rude. AI agents completed tasks in 36 minutes at a cost of $0.16 per step. Human-assisted search took 269 minutes and $2.05. If those numbers hold up outside the study, it’s not just a productivity boost. It’s a budgeting event. It’s a staffing conversation people don’t want to have out loud.

But the part that should make you sit up isn’t even the speed. It’s the scope. The study says 23% of queries involved work users didn’t even send to traditional search. That’s the quiet tell. When people treat search like a library, they treat agents like an assistant. They ask for outcomes, not links. And once you start asking for outcomes, you stop seeing all the little choices along the way.

That’s where my skepticism kicks in.

If you’ve ever managed someone—or been managed—you know autonomy is not automatically good. Autonomy is a trade. You give up control to get time back. Sometimes it’s worth it. Sometimes you spend that saved time cleaning up a mess you didn’t see coming.

Imagine you’re in marketing and you need a competitor summary before a meeting. Search is annoying but honest: you’ll skim, you’ll judge, you’ll notice what looks off. An agent will happily produce something polished in one go. If it’s wrong, it won’t look wrong. It’ll look “confident.” Now the meeting is happening, you’re speaking from the agent’s summary, and the moment someone challenges a claim you can’t defend it because you didn’t do the work—you approved it.

Or say you’re a founder trying to answer customer emails at night. An agent can draft replies, spot patterns, maybe even propose a fix you didn’t notice. That’s exciting. It’s also dangerous, because customer trust is built on tiny details: tone, accuracy, promises you can keep. When an agent is doing 26 minutes of “work,” what it’s really doing is making 26 minutes of decisions. If you don’t review carefully, you’re outsourcing your judgment, not just your typing.

And yes, you can say, “Just review it.” But review is a skill. Review takes time. Review also gets lazy when the output looks good. The better the agent gets at sounding right, the more we rely on it being right. That’s the trap.

The cost numbers are going to push people into that trap. When something is cheaper per step than a human process, the pressure is obvious: use it more, review less, ship faster. The winners are the people who can turn work into agent-friendly tasks and move quickly. The losers are everyone downstream when a fast, cheap, wrong decision hits reality.

There’s another tension I can’t ignore: 23% of queries covering work users didn’t send to traditional search sounds like “new value.” It also sounds like “people don’t know what to ask for.” Agents can fill in blanks. That’s great when the blanks are real work you forgot. It’s bad when the blanks are assumptions the agent invents. Search forces you to see the source material. Agents can blur the line between “found” and “made up,” especially when you’re tired, rushed, or not an expert.

To be fair, I can see the upside clearly. There are whole categories of work that are just time theft: gathering info, comparing options, turning messy notes into a plan, setting up checklists, drafting first versions. If an agent really can take 36 minutes to do what used to take 269 minutes with human-assisted search, that can pull people out of busywork. It can give small teams the kind of leverage only big teams used to have. It can help someone who isn’t great at writing get to a decent first draft. It can help a nurse manager, a teacher, a contractor—people who don’t have time—get organized without hiring extra help.

But the more we let agents do “autonomous work,” the more we have to admit what work really is: responsibility. Not effort. Not time. Responsibility.

If your agent makes a mistake, you still own it. If your agent misses a key constraint, you still pay. If your agent nudges you toward a decision you wouldn’t have made, you still live with it. The real question isn’t whether agents are 48 times “more efficient” than search. The real question is whether we’re ready for a world where the default mode is not “look it up,” but “let it handle it.”

So here’s what I actually want to know: when AI agents become normal at work, what should count as “good enough” human review before we trust the output and move on?

Frequently asked questions

What is AI agent governance?

AI agent governance is the set of policies, controls, and monitoring systems that ensure autonomous AI agents behave safely, comply with regulations, and remain auditable. It covers decision logging, policy enforcement, access controls, and incident response for AI systems that act on behalf of a business.

Does the EU AI Act apply to my company?

The EU AI Act applies to any organisation that develops, deploys, or uses AI systems in the EU, regardless of where the company is headquartered. High-risk AI systems face strict obligations starting 2 August 2026, including risk management, data governance, transparency, human oversight, and conformity assessments.

How do I test an AI agent for security vulnerabilities?

AI agent security testing evaluates agents for prompt injection, data exfiltration, policy bypass, jailbreaks, and compliance violations. Talan.tech's Talantir platform runs 500+ automated test scenarios across 11 categories and produces a certified security score with remediation guidance.

Where should I start with AI governance?

Start with a free AI Readiness Assessment to benchmark your current maturity across 10 dimensions (strategy, data, security, compliance, operations, and more). The assessment takes about 15 minutes and produces a prioritised roadmap you can act on immediately.

Ready to secure and govern your AI agents?

Start with a free AI Readiness Assessment to benchmark your maturity across 10 dimensions, or dive into the product that solves your specific problem.