News media versus OpenAI’s ChatGPT

August 29, 2023

General Studies Paper 3

Context

A group of news media organisations, including The New York Times, Reuters, CNN and the Australian Broadcasting Corporation, recently shut off OpenAI’s ability to access their content.

About

The development comes in the wake of reports that The New York Times is planning on suing the artificial intelligence (AI) research company over copyright violations, which would represent a considerable escalation in tensions between media companies and the leading creator of generative artificial intelligence solutions.

OpenAI

The company is best known for creating ‘ChatGPT’, which is an AI conversational chatbot.
Users can ask questions on just about anything, and ChatGPT will respond pretty accurately with answers, stories and essays.
It can even help programmers write software code. The hype around ChatGPT — specifically, the breathtaking advancements in the field of AI required to create it — has propelled OpenAI into becoming a $30 billion company.

The face-off between news outlets and OpenAI

Software products like ChatGPT are based on what AI researchers call ‘large language models’ (LLMs).
These models require enormous amounts of information to train their systems.
If chat bots or digital assistants need to be able to understand the questions that humans throw at them, they need to study human language patterns.
Tech companies that work on LLMs like Google, Meta or Open AI are secretive about what kind of training data they use.
But it’s clear that online content found across the Internet, such as social media posts, news articles, Wikipedia, e-books, form a significant part of the dataset used to train ChatGPT and other similar products.
This data is put together by scraping it off the Internet. Tech companies use software called ‘crawlers’ to scan web pages, hoover up content and put it together in a dataset that can be used to train their LLMs.
This is what news outlets took a stand against last week when The New York Times and others blocked a web crawler known as GPT bot, through which OpenAI used to scrape data.

Other media companies

Search engines like Google or Bing also use web crawlers to index websites and present relevant results when users search for topics.
The only difference is that search engines represent a mutually beneficial relationship.
Google, for instance, takes a snippet of a news article (a headline, a blurb and perhaps a couple of sentences) and reproduces them to make its search results useful. And while Google profits off of that content, it also directs a significant amount of user traffic to news websites.
OpenAI, on the other hand, provides no benefit, monetary or otherwise, to news companies. It simply collects publicly available data and uses it for the company’s own purposes.
But it’s also true that some news outlets probably view ChatGPT as a potential competitor that will profit off their journalism.

Way forward

Tech gurus like to argue that the value of online content only exists in the aggregate.
Or in other words, ChatGPT could still exist as a high-quality product without CNN’s reporting. But all media publications across the world refused to provide access to OpenAI, it’s likely that the final product would be of lower quality.
And, of course, if every single creator of online content turned down OpenAI, then ChatGPT would almost certainly not exist.