OpenAI has its own search robot called GPTBot

Last updated December 5th, 2023 23:49

Just as the internet is crawled by Google, Yahoo, or the Russian Yandex, GPTBot has now become a part of the family of other search robots. The company OpenAI released it on the internet for a simple reason. OpenAI has released it on the internet for a simple reason. It will be indexing web content, and the company will use the information for the development and training of new models within the GPT family. OpenAI has its own search robot called GPTBot, specifically on Microsoft’s servers, with which it collaborates closely. Similar to other robots, you can influence the behavior of GPTBot using a robots.txt file.

You can recognize that your website has been visited by an OpenAI robot in the log through this entry:

					User agent token: GPTBot
Full user-agent 
string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +

As the company itself states, it will use the indexed content for training new models, and the robot is also configured to filter content. Thanks to filters, the robot avoids websites with paid content and does not index any text that violates OpenAI’s policies.

You can influence its behavior using standard directives in the robots.txt file, which are the same as those for any other search robot.

How to prevent GPTBot from indexing content

In case you haven’t placed a robots.txt file on your website yet, all you need to do is create such a file on your computer. Afterward, upload it to the root directory of your website. If you want to block GPTBot, insert these directives into the file:

					User-agent: GPTBot
Disallow: /

When you want the robot to index only a part of your website’s content, create a slightly different entry. Conversely, if you wish to strictly forbid another part, adjust the entry accordingly.

					User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

In the preceding passage, you’re informing the robot that it is permitted to explore a directory named “directory-1.” Moreover, it has the authorization to index the content contained within it. However, the directory named directory-2 is forbidden for it, so it shouldn’t access this directory or index its content. It’s important to note that no robot is obligated to strictly follow such rules. These are merely a set of instructions on how it should behave, where it can and cannot go. Whether these rules are adhered to depends entirely on the developers of the company that owns and operates the given robot.

From which IP addresses can GPTBot visit you?

OpenAI provides a list of IP addresses of servers where the robot is operated on its website ( Specifically, this list includes the following IP ranges:


The IP addresses belong to the Microsoft Azure range. The company operates the robot in this range, thanks to collaboration with Microsoft. Microsoft not only provides the infrastructure for operations but also uses Chat GPT 4 alongside its Bing search engine.

