Cloudflare Accuses Perplexity of Bypassing Bot Blocks; AI Firm Denies Claims

Cloudflare accuses Perplexity of evading site blocks for web scraping; Perplexity denies wrongdoing, calling it a misunderstanding of AI.
A fresh controversy has erupted in the tech world as Cloudflare, a leading internet infrastructure provider, has accused AI startup Perplexity of engaging in deceptive web scraping practices. The company alleges that Perplexity accessed content from websites that had explicitly restricted such activity, drawing attention once again to the murky boundaries of AI, data access, and internet ethics.
In a blog post published Monday, Cloudflare claimed it detected Perplexity scraping content from sites that had added rules in their robots.txt files to block such bots. According to Cloudflare, the AI firm allegedly circumvented these blocks by disguising its crawler’s identity, including tactics like changing its user-agent strings and using multiple IP addresses to evade detection.
“This activity was observed across tens of thousands of domains and millions of requests per day,” the blog post stated. Cloudflare said it relied on a combination of machine learning tools and traffic analysis to pinpoint Perplexity as the source of the behavior. It added that some of the requests impersonated legitimate browsers, including Google Chrome on macOS.
Cloudflare said the scraping came to its attention after several of its clients reported suspicious traffic coming from Perplexity, despite efforts to block it. In response, Cloudflare has now removed Perplexity’s bots from its list of verified crawlers and introduced additional measures to prevent similar activity in the future.
Perplexity has strongly denied the accusations, pushing back in a detailed rebuttal. The AI startup dismissed the claims as a “sales pitch,” arguing that Cloudflare’s blog post reflects a fundamental misunderstanding of how AI assistant’s function.
“When Perplexity fetches a webpage, it’s because a user asked a specific question,” the company stated. It emphasized that its AI platform does not engage in traditional web crawling or mass data harvesting. Instead, it claimed its system only retrieves real-time information when prompted by user queries and does not store or use that content to train its AI models.
Further defending itself, Perplexity said that Cloudflare had wrongly attributed some of the automated traffic to its systems. It pointed to a third-party service, BrowserBase, suggesting that only a minor portion of the requests in question originated from there. “This is a basic traffic analysis failure,” Perplexity argued, accusing Cloudflare of presenting misleading data and diagrams.
The dispute comes at a time when the lines between helpful AI tools and unauthorized bots are increasingly blurred. As more AI applications rely on real-time data, concerns are growing among website operators over how their content is accessed and used.
While Cloudflare has yet to issue a follow-up to Perplexity’s rebuttal, the clash has already fueled broader discussions about ethical web scraping, AI transparency, and the urgent need for standardized guidelines on digital content access.
With both companies standing firm on their positions, this incident may become a touchstone case in the ongoing struggle between open web advocates and those demanding tighter content controls in the AI era.


















