September 14, 2025

General Studies Paper 3

Context

  • A group of news media organisations, including The New York Times, Reuters, CNN and the Australian Broadcasting Corporation, recently shut off OpenAI’s ability to access their content.

About

  • The development comes in the wake of reports that The New York Times is planning on suing the artificial intelligence (AI) research company over copyright violations, which would represent a considerable escalation in tensions between media companies and the leading creator of generative artificial intelligence solutions.

OpenAI

  • The company is best known for creating ‘ChatGPT’, which is an AI conversational chatbot.
  • Users can ask questions on just about anything, and ChatGPT will respond pretty accurately with answers, stories and essays.
  • It can even help programmers write software code. The hype around ChatGPT — specifically, the breathtaking advancements in the field of AI required to create it — has propelled OpenAI into becoming a $30 billion company.

The face-off between news outlets and OpenAI

  • Software products like ChatGPT are based on what AI researchers call ‘large language models’ (LLMs).
  • These models require enormous amounts of information to train their systems.
  • If chat bots or digital assistants need to be able to understand the questions that humans throw at them, they need to study human language patterns.
  • Tech companies that work on LLMs like Google, Meta or Open AI are secretive about what kind of training data they use.
  • But it’s clear that online content found across the Internet, such as social media posts, news articles, Wikipedia, e-books, form a significant part of the dataset used to train ChatGPT and other similar products.
  • This data is put together by scraping it off the Internet. Tech companies use software called ‘crawlers’ to scan web pages, hoover up content and put it together in a dataset that can be used to train their LLMs.
  • This is what news outlets took a stand against last week when The New York Times and others blocked a web crawler known as GPT bot, through which OpenAI used to scrape data.

Other media companies

  • Search engines like Google or Bing also use web crawlers to index websites and present relevant results when users search for topics.
  • The only difference is that search engines represent a mutually beneficial relationship.
  • Google, for instance, takes a snippet of a news article (a headline, a blurb and perhaps a couple of sentences) and reproduces them to make its search results useful. And while Google profits off of that content, it also directs a significant amount of user traffic to news websites.
  • OpenAI, on the other hand, provides no benefit, monetary or otherwise, to news companies. It simply collects publicly available data and uses it for the company’s own purposes.
  • But it’s also true that some news outlets probably view ChatGPT as a potential competitor that will profit off their journalism.

Way forward

  • Tech gurus like to argue that the value of online content only exists in the aggregate.
  • Or in other words, ChatGPT could still exist as a high-quality product without CNN’s reporting. But all media publications across the world refused to provide access to OpenAI, it’s likely that the final product would be of lower quality.
  • And, of course, if every single creator of online content turned down OpenAI, then ChatGPT would almost certainly not exist.

 

Print Friendly, PDF & Email

© 2025 Civilstap Himachal Design & Development