Recent research has revealed that four major artificial intelligence (AI) chatbots are failing to accurately summarise news stories, often introducing significant errors and distortions. The findings, derived from an investigation conducted by a leading news organisation, raise concerns about the reliability of AI-generated content and its potential impact on public discourse.
Investigation into AI accuracy
The study examined the performance of OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini, and Perplexity AI in summarising content from a major news website. The chatbots were presented with 100 news stories and asked to generate summaries. Their responses were then evaluated by journalists with subject matter expertise.
The results were alarming: 51% of AI-generated answers were found to contain significant issues, while 19% of those that directly cited the news organisation’s content introduced factual errors. These inaccuracies included incorrect numerical data, misrepresentation of dates, and misquoting sources.
In response to the findings, a spokesperson for OpenAI stated: “We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution.” Other companies behind the chatbots have yet to comment.
Distorted headlines and false claims
The investigation uncovered several notable errors in AI-generated news summaries, including:
- Gemini falsely stating that the NHS does not recommend vaping as a smoking cessation aid.
- ChatGPT and Copilot misreporting that Rishi Sunak and Nicola Sturgeon were still in office after they had stepped down.
- Perplexity AI misquoting reports on the Middle East, incorrectly claiming that Iran initially showed “restraint” while describing Israel’s actions as “aggressive.”
Of the four AI tools assessed, Microsoft’s Copilot and Google’s Gemini exhibited the most serious issues. In contrast, OpenAI’s ChatGPT and Perplexity, which has financial backing from Amazon founder Jeff Bezos, demonstrated relatively fewer inaccuracies.
A call for accountability
The CEO of the news organisation, Deborah Turness, voiced concerns over the implications of AI-generated misinformation. She warned that the unchecked proliferation of AI news summaries could pose a real threat in a time of global uncertainty.
“We live in troubled times, and how long will it be before an AI-distorted headline causes significant real-world harm?” she questioned.
Turness called for AI developers to work collaboratively with media organisations to ensure greater accuracy and reliability in news reporting. She urged companies to “pull back” their AI-generated news summaries, citing Apple’s decision to suspend its Apple Intelligence news summaries after similar concerns were raised.
Transparency and content control
During the study, the news organisation temporarily allowed AI chatbots to access its content for evaluation purposes. Under normal circumstances, its content is blocked from AI tools.
The Programme Director for Generative AI, Pete Archer, stressed the importance of maintaining editorial control over journalistic content. He argued that AI companies must be more transparent about how their models process and present news, as well as the extent of errors they introduce.
“Publishers should have control over whether and how their content is used, and AI companies should be transparent about the scale and scope of errors and inaccuracies they produce,” Archer stated.
The report also highlighted the AI models’ struggle to differentiate between fact and opinion, their tendency to editorialise, and their frequent omission of crucial contextual details.
Ongoing efforts to improve AI reliability
AI firms have acknowledged the challenges of ensuring accuracy in AI-generated news content. OpenAI has stated that it is working to improve citation accuracy and respect publisher preferences. The company has introduced measures allowing publishers to manage how their content appears in search results via its robots.txt system.
Despite these assurances, concerns persist over the role AI may play in shaping public understanding of current affairs. As AI-powered tools become increasingly integrated into digital news consumption, the demand for stringent oversight and ethical considerations in their deployment continues to grow.
The study’s findings serve as a stark reminder of the limitations of AI chatbots in handling news content accurately. While AI presents vast opportunities for information dissemination, its potential to misinform remains a pressing issue that both technology firms and news organisations must address together.