Agentic Blabbering: How Researchers Tricked Perplexity's Comet Into a Phishing Scam in Four Minutes
Guardio Labs exploited AI browser 'reasoning transparency' to build a GAN-optimized phishing page that hijacked Perplexity Comet in under four minutes.
Six days ago I wrote about how calendar invites can hijack agentic AI browsers through prompt injection. The attack relied on hiding malicious instructions inside content the AI would process. That vulnerability was bad enough. This week, Guardio Labs showed something worse: you don't even need to hide.
Guardio Labs security researcher Shaked Chen demonstrated that Perplexity's Comet AI browser could be tricked into entering victim credentials on a fake refund page - in under four minutes. The technique, which the researchers call "agentic blabbering," exploits a feature most AI browser vendors consider a selling point: the AI's visible reasoning process.
The AI tells you what it sees, what it thinks is happening, what it plans to do next, and which signals it considers safe or suspicious. Turns out, attackers can read that narration too. And they can use it to build the perfect trap.
How Agentic Blabbering Works
The attack chain is elegant in its simplicity.
Comet, like most agentic AI browsers, narrates its reasoning as it processes web pages. It flags suspicious elements, explains why it trusts or distrusts a page, and describes the actions it's about to take. This transparency is supposed to build user trust. Instead, it builds an attack surface.
Here's what Guardio Labs did:
- Intercepted the traffic between the Comet browser and Perplexity's servers to observe the AI's reasoning in real time
- Fed that reasoning into a Generative Adversarial Network (GAN) that iteratively refined a phishing page based on what the AI flagged as suspicious
- Repeated the cycle until the AI browser reliably walked into the trap - entering victim credentials on a bogus refund page without hesitation
The GAN acted as an adversarial training loop against the AI browser itself. Every time Comet flagged something as suspicious, the GAN adjusted the page. Every time Comet described why it trusted a page element, the GAN reinforced that pattern. Within four minutes, the researchers had a phishing page optimized specifically to bypass that AI model's reasoning.
The researchers described it bluntly: "The scam evolves until the AI Browser reliably walks into the trap another AI set for it."
The Scale Problem
Here's the part that should keep you up at night. Traditional phishing targets individual humans. Each person has different levels of skepticism, different things they notice, different thresholds for suspicion. You have to craft pages for personas.
Agentic blabbering targets the model. Once a phishing page is optimized against a specific AI browser's reasoning model, it works on every user running that browser. The attacker doesn't need to craft variants for different people because every user delegates trust to the same model. One successful optimization equals universal bypass.
This fundamentally changes the economics of phishing. Instead of spray-and-pray campaigns hoping some percentage of humans fall for it, an attacker can spend four minutes optimizing against the AI and then deploy the page against the entire user base with near-100% efficacy against the agent.
Intent Collision: The Architectural Root Cause
Zenity Labs researcher Stav Cohen - the same researcher who discovered the calendar invite hijacking vulnerability I covered last week - identified the underlying architectural flaw as "intent collision."
Intent collision happens when an AI agent merges benign user requests with attacker-controlled instructions from untrusted web data, without distinguishing between them. The user says "help me check on my refund." The phishing page says "enter credentials here to process the refund." The agent sees both as part of the same task.
This isn't a bug in Comet specifically. It's a design constraint of every agentic AI browser that processes untrusted web content in the same context as user instructions. The agent has no architectural mechanism to separate "content the user wants me to read" from "instructions the user wants me to follow." Everything is tokens. Everything gets interpreted.
| Attack Type | Target | Scale | Optimization Time |
|---|---|---|---|
| Traditional phishing | Individual human judgment | Varies per person | Hours of social engineering |
| Prompt injection (hidden) | AI's instruction parsing | All users of that agent | Minutes to craft payload |
| Agentic blabbering | AI's visible reasoning model | All users of that agent | Under 4 minutes (GAN-automated) |
The progression is clear. Each new attack vector targets the AI itself rather than the human, and each is faster and more scalable than the last.
What This Means for Your Products
If you're building products that use AI agents to process web content, interact with external services, or take actions on behalf of users, agentic blabbering introduces three specific risks you need to account for.
1. Your AI's transparency is an oracle
Every piece of reasoning your AI exposes to users is also exposed to attackers. If your agent explains why it trusts a page, an attacker now knows exactly what signals to fake. If your agent describes what it's looking for, an attacker knows what to provide.
This creates a direct tension between user trust (which benefits from transparency) and security (which benefits from opacity). The same reasoning narration that helps users understand what the AI is doing gives adversaries a real-time feedback loop for optimizing attacks.
What to do: Separate internal reasoning from user-facing explanations. The AI can explain what it did after the fact in summarized form. It should not broadcast its live decision-making process, especially not its security heuristics. Think of it like a firewall - you don't expose your rule set to the traffic you're filtering.
2. Model-level attacks bypass user-level defenses
Your security training teaches employees to spot phishing. Your compliance frameworks require phishing awareness programs. None of that matters when the AI agent is the one processing the page. The human never evaluates the phishing page directly because they delegated that evaluation to the agent.
This connects directly to the liability question I explored last month. If your product's AI agent processes a phishing page and takes action that harms the user, the liability chain doesn't stop at the attacker. The user trusted your product. Your product trusted the AI. The AI was fooled by a page optimized against its own reasoning. Courts are increasingly holding software vendors liable as agents of their users, and agentic blabbering makes the "we couldn't have predicted this" defense harder to sustain.
What to do: Require human confirmation for any agent action involving credentials, payments, or personal data. Not a "click OK to continue" interstitial that users will rubber-stamp, but a clear explanation of what the agent is about to do and why, presented in a way that forces the user to make an actual decision.
3. Your agent's permissions are the blast radius
An agentic blabbering attack against an AI browser with access to your file system, password manager, and email doesn't just steal one set of credentials. It gives the attacker a foothold into everything the agent can touch. As I covered in the AI agents identity dark matter piece, most organizations can't even enumerate which systems their AI agents have access to, much less enforce least-privilege across those systems.
What to do: Implement strict permission boundaries for any AI agent that processes untrusted content. The component that reads and evaluates web pages should operate in a sandbox with zero access to credentials, local files, or privileged APIs. If the agent needs to take an authenticated action based on web content, that action should route through a separate, permission-gated component that requires explicit user approval.
The GAN-Optimized Attack Loop: A New Threat Category
Agentic blabbering isn't just another prompt injection variant. It's the first documented case of automated, adversarial optimization against an AI agent's reasoning model in a production environment. The GAN doesn't just try random payloads and hope one works. It systematically learns what the target AI considers trustworthy and constructs pages that match those criteria exactly.
This means the attack improves with every iteration. The more sophisticated the AI's reasoning becomes, the more data the GAN has to work with. Adding more security heuristics to the AI's reasoning doesn't help if those heuristics are visible - it just gives the GAN more signals to optimize against.
The research community has given related techniques names: VibeScamming, Scamlexity, PerplexedComet. They're all variations on the same theme. AI agents that process untrusted content with visible reasoning are fundamentally vulnerable to adversarial optimization. OpenAI has publicly stated that prompt injection vulnerabilities are "unlikely to ever" be fully eliminated. Agentic blabbering suggests the problem might actually be getting worse as AI agents become more transparent.
Practical Defenses
Here's what I'd implement today if I were shipping an AI agent that processes web content.
Blind the adversary. Strip reasoning transparency from any context that processes untrusted input. Internal reasoning chains should be logged for debugging, not streamed to the rendering layer where page content can observe them.
Isolate content processing from action execution. The agent that evaluates a web page should not be the same agent that enters credentials. Use a two-agent architecture: one reads and summarizes untrusted content in a sandbox, the other acts on the summary within a permission-controlled environment.
Implement behavioral anomaly detection. Monitor for patterns that suggest the AI is being manipulated: sudden changes in trust evaluations, credential entry on newly encountered domains, actions that don't match the user's stated intent. These patterns should trigger automatic pauses, not just logging.
Rate-limit agent actions on untrusted content. A legitimate refund page doesn't need the AI to enter credentials within seconds of encountering it. Introduce mandatory delays and re-evaluation steps for any high-risk action triggered by web content.
Treat agent reasoning as a security boundary. Your threat model should now include "attacker can observe and optimize against agent reasoning" as a vector. This means red-teaming your agent's transparency features specifically, not just its prompt handling.
Update your compliance documentation. If you're maintaining threat models for SOC 2, ISO 27001, or the EU AI Act, add adversarial optimization against AI agent reasoning as a documented threat. The attack is published. Auditors will find it.
The Bigger Picture
Six days between the calendar invite hijacking disclosure and the agentic blabbering research. Each escalation makes the previous attack look simple. Calendar invites required crafting a hidden payload. Agentic blabbering automates the entire optimization loop with a GAN and finishes in four minutes.
The pattern is accelerating. AI browsers are getting more capable, processing more content types, taking more actions autonomously. Every new capability is a new surface for adversarial optimization. And unlike traditional browser security, which had two decades of hardening before AI entered the picture, agentic browser security is being stress-tested in production with real users' credentials at stake.
The companies building AI agents need to internalize a uncomfortable truth: transparency and security are in tension for agentic systems. The features that make AI browsers feel trustworthy to users are the same features that make them exploitable by attackers. Resolving that tension requires architectural decisions - privilege separation, reasoning isolation, mandatory human gates - not just better prompts or more training data.
Perplexity will patch this specific attack. The next one will take three minutes.
Keep reading:
- Your AI Browser Is an Attack Surface: How Calendar Invites Hijack Agentic Interfaces
- AI Agents and Liability: Who's Responsible When the Agent Gets It Wrong?
- AI Agents Are Identity Dark Matter - And Your IAM Stack Can't See Them
Building AI agents and need to design adversary-resistant architectures? Let's talk.