Media Companies Increasingly Block AI From Training On Their Content

Oct 26, 2023

There’s a major new challenge emerging for publishers of all sizes and sectors, and that’s how to protect their content from artificial intelligence. Two concerns in particular are top of mind.

The first is plagiarism and protection of IP. That’s the easier of the two issues. Stealing intellectual property is bad, and creators have an urgent imperative to protect their content—their journalism, their photos, their art, their scripts, their writing.

The other issue is trickier, and it deals with instances when AI accesses publishers’ content not to plagiarize it, but to train AI software.

And that’s where it becomes interesting. The whole point of AI is to automate information and data gathering and produce accurate results that replicate the ways in which humans interact. This basic principle applies for all kinds of uses: Journalism, data analysis, marketing, customer service—the list goes on infinitely.

To be effective, the AI has to scan vast amounts of information for facts, context, and cues that teach AI programs to recognize and respond fluently to human queries, the Washington Post noted on October 20. This includes the stories that journalists at media companies produce. “But as the quest to develop cutting-edge AI models has grown increasingly frenzied, newspaper publishers and other data owners are demanding a share of the potentially massive market for generative AI, which is projected to reach to $1.3 trillion by 2032, according to Bloomberg Intelligence,” the Post said.

Since August, at least 535 news organizations have installed a blocker to prevent stories from being surfaced and used in this way, including the Post, the New York Times and others, the Post continued.

At issue, not surprisingly, is money. Given that huge projected market, publishers are looking to get paid for use of their content, and some of them are now in talks with OpenAI, the creator of ChatGPT, about compensation. Earlier this year, OpenAI agreed to license content from the Associated Press to help train its AI models.

Tech companies have been reluctant to pay for access to content. As generative AI has developed over the last 12 months, they’ve relied on sources of information in the public domain. But that’s changing. They’re beginning to realize that what comes out is contingent on what goes in, and if that doesn’t include vast amounts of high-quality journalism, AI will have a much more difficult path to that giant projection of a $1.3 trillion market by 2023.