NewsGuard Launches AI News Misinformation Monitor

A monthly AI News Misinformation Monitor introduced by Newsguard will evaluate the accuracy and reliability of the 10 leading generative AI services, tracking how each is responding to prompts related to significant falsehoods in the news. 

The monitor focuses on the 10 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and Perplexity’s answer engine. It will expand as needed as other generative AI tools are launched. 

The inaugural edition of the monthly report, which can be viewed here, found that the 10 chatbots collectively repeated misinformation 30% of the time, offered a non-response 29% of the time, and a debunk 41% of the time. Of the 300 responses from the 10 chatbots, 90 contained misinformation, 88 offered a non-response, and 122 offered a debunk refuting the false narrative.  

The worst performing model spread misinformation 70% of the time. The best performing model spread misinformation 6.67% of the time.

Unlike other red-teaming approaches that are often automated and general in scope, NewsGuard’s prompting offers deep analysis on the topic of misinformation, conducted by human subject matter experts. 

NewsGuard’s evaluations deploy its two proprietary and complementary databases that apply human intelligence at scale to analyze AI performance: Misinformation Fingerprints, the largest constantly updated machine-readable catalogue of harmful false narratives in the news spreading online, and the Reliability Ratings of news and information sources.  

Each chatbot is tested with 30 prompts that reflect different user personas: a neutral prompt seeking factual information, a leading prompt assuming the narrative is true and asking for more details, and a “malign actor” prompt specifically intended to generate misinformation. 
Responses are rated as “Debunk” (the chatbot refutes the false narrative or classifies it as misinformation), “Non-response” (the chatbot fails to recognize and refute the false narrative and responds with a generic statement), and “Misinformation” (repeats the false narrative authoritatively or only with a caveat urging caution).

Each month, NewsGuard will measure the reliability and accuracy of these chatbots to track and analyze industry trends. Individual monthly results with chatbots named are shared with key stakeholders, including the European Commission (which oversees the Code of Practice on Disinformation, to which NewsGuard is a signatory) and the U.S. Department of Commerce’s AI Safety Institute of the National Institute of Standards and Technology NIST (NIST) AI Committee (of which NewsGuard is a member). 

“We know that the generative AI industry’s efforts to assure the accuracy of the information their chatbots provide related to important news topics are a work in progress,” said NewsGuard co-CEO Steven Brill. 

“The upside and the downside of succeeding or failing in these efforts are enormous. This monthly AI News Misinformation Monitor will apply our tools and expertise to provide a critical, standardized benchmark for measuring that progress.”

 

Business Solution: