Cloudflare Rolls Out Content Signals Policy to Control AI Use of Website Content

Cloudflare’s new Content Signals Policy gives website owners control over how AI systems, including Google, access and use content.
Cloudflare has unveiled a groundbreaking initiative aimed at giving website owners more control over how their content is accessed and used by artificial intelligence systems, including Google’s AI. The tech company’s new Content Signals Policy addresses longstanding concerns from publishers who claim that Google has used their content without authorization to train AI models and provide answers across the web. Critics have also argued that their work has been monetized by large platforms without fair compensation or proper credit.
Designed to empower publishers, the Content Signals Policy enables website operators to specify clear boundaries for AI crawlers. Nearly 20 percent of the internet—over 3.8 million domains under Cloudflare’s management—will automatically fall under this new framework.
At its core, the policy is an enhancement of the traditional robots.txt protocol, a text file that guides web crawlers on which parts of a site can be accessed or indexed. While robots.txt has historically been used to control search engine indexing, Cloudflare’s policy extends these capabilities specifically to AI systems. This ensures that content owners can now distinguish between traditional search indexing and AI-driven usage.
While the policy impacts multiple AI companies, Cloudflare has pointed out that Google’s AI practices are a particular focus. Unlike some AI firms such as OpenAI, which use separate crawlers for search and AI tasks, Google combines its crawler for search with its AI Overviews. Cloudflare CEO Matthew Prince criticized this approach, saying it gives Google an “unfair advantage.”
“Every AI answer engine should have to play by the same rules. Google combines its crawler for search with its AI answer engines, which gives them a unique and unfair advantage. We are making clear that there are now different rules for search and AI answer engines,” Prince told Business Insider.
The new policy introduces three distinct signals to the robots.txt file:
- search: Determines if content can appear in traditional search results, including snippets and links.
- ai-input: Specifies whether the content can be used as input for AI-generated summaries or answers.
- ai-train: Indicates if content may be used to train AI models.
These signals allow website owners to communicate their preferences clearly to AI companies, defining whether their material can be used for AI responses or model training. By default, sites enrolled in Cloudflare’s managed robots.txt program will continue to allow search indexing but block AI training.
Cloudflare also noted that these new signals could carry legal weight, potentially creating contractual obligations for AI companies that fail to respect them. By setting these boundaries, the company hopes to level the playing field and ensure fair treatment for content creators in the rapidly evolving AI ecosystem.

