
Image credit: Search Engine Journal
New research revealed that inspecting ChatGPT‘s network traffic uncovered internal labels and specific data providers the artificial intelligence model uses for information retrieval.
The analysis provides a first look at the mechanisms ChatGPT employs to select its sources, challenging common assumptions about its underlying data acquisition processes, according to the researchers.
The study found that ChatGPT assigns an internal label, ‘result_source,’ to each web result, which can take one of four values: serp, labrador, bright, and oxylabs. Researchers Mark Williams-Cook, Metehan and Suganthan Mohanadasan conducted the analysis.
The ‘bright’ label directly corresponds to Bright Data, while ‘oxylabs’ refers to Oxylabs, indicating that ChatGPT uses these third-party data providers for some of its information, the researchers reported.
The methodology involved inspecting browser network traffic through HTTP inspection after TLS decryption, allowing access to JSON metadata associated with ChatGPT’s operations. This approach differed from traditional packet sniffing, the researchers said.
Researchers also identified six ‘turn_use_case’ values, which categorize different types of user queries, according to the study. This internal categorization helps dictate how ChatGPT processes information requests.
Text-based queries often bypassed direct web searches, while more complex ‘Thinking’ queries triggered numerous sub-queries, including site-specific searches and price verifications, the analysis found.
The frequency observations, such as percentages and rankings, were directional due to a small sample size focused on software as a service and tech-related queries, the researchers reported. However, the structural findings regarding internal labels and source types were deemed high-confidence.
Initial data capture was complicated by ChatGPT’s answer streaming over long-lived connections, and automated Chrome browsers used for the study were blocked by Cloudflare, the researchers noted.
OpenAI, the developer of ChatGPT, did not immediately respond to requests for comment on the findings.
Source: Search Engine Journal
Written by
Palumbo Angela
Angela Palumbo, Senior Editor at Rabbit Rank since 2023, holds a bachelor's in communications. She focuses on fact-checking and simplifying complex topics while also leading strategy for the news department.
Keep reading
Related Articles

Google Search Stops Serving AMP Pages From Its Cache Globally
Google Search has stopped serving AMP pages from its cache, now directing users to the domain’s AMP host page...

Google Study: LCP Optimization Fails Due to Browser Misidentification
Google’s John Mueller highlighted a Nuvemshop case study showing why LCP optimizations fail when browsers misi...

Cloudflare AI Rules May Block Googlebot, Impacting Search Visibility
Cloudflare’s updated AI crawler management, effective September 15, could block Googlebot for sites preventing...