Details matter. Here’s a detail about AI that matters: there’s a difference between hallucinations and misinformation.
Seems subtle, I know, but the distinction might help you understand important things as our world becomes all AI, all the time.
Here’s an example of what happens when the distinction is not understood.
In February 2023, Google was scrambling to respond to the waves of publicity for ChatGPT, which had been unveiled three months earlier. The company was in “code red” mode, caught napping without a consumer-ready AI product ready to ship. It hurried to put together an ad introducing Bard, its first experimental AI chatbot.
Just hours before a chaotic public event a couple of days later, it was reported that Bard’s answer quoted in the ad contained a factual mistake. Bard stated that the James Webb Space Telescope “took the very first pictures of a planet outside of our own solar system.” It was wrong. Actually the first photo of an exoplanet had been taken 16 years earlier.
Journalists and the online echo chamber went nuts about AI hallucinations.
Reports covering the error explained that AI chatbots have a tendency to hallucinate and create false information. An AI professor called them “bullshit generators.” The Bard mistake was used in the media as a shorthand way to explain that AI chatbots make things up – more precisely, they fill gaps with statistically likely text, regardless of accuracy.
The introduction of ChatGPT had already sparked conversations about AI hallucinations, but overnight the Bard error turned an academic discussion into a mainstream global news crisis.
Hallucinations have consequences. Shares of Alphabet (Google’s parent company) slid by eight percent following the debacle, wiping off more than 100 billion dollars from Google’s market value. Google is still dealing to this day with the blow to its reputation from that one disastrous day.
Keep the Google Bard mistake in the back of your mind. We’ll come back to it.
Hallucinations
AI chatbots are designed to always provide an answer. They never say, “I don’t know.”
AI models don’t look up answers in an index like a traditional web search. Instead, they predict the next most statistically probable word in a sequence to create plausible-sounding content.
If their word predictions don’t include facts, they create fluent lies out of thin air.
There are more than 500 examples of lawyers submitting briefs that cite 100% AI-fabricated cases. Lawyers all over the world have been sanctioned or censured, even suspended, for AI hallucinations.
Air Canada’s support chatbot invented a refund policy that didn’t exist. They were sanctioned and a court ruled that the company was responsible for what its AI made up.
In the financial industry, JPMorgan Chase, Wells Fargo, and Goldman Sachs banned ChatGPT-style tools internally earlier this year because of the risk of hallucinations – plausible but imaginary announcements or stock prices or company metrics.
Scientific and research papers are increasingly littered with references to nonexistent source materials – coherent titles, real-sounding author names, all invented by eager-to-please AIs.
On May 22, the MAHA Commission released its highly publicized report on childhood diseases, a 78 page screed citing more than 500 studies and other sources. The report was a “case study in generative AI red flags.” It contained bogus citations and titles of papers that don’t exist.
A true AI hallucination is not grounded in any way on specific source material used to train the AI model. The chatbot bullshits because it thinks a fact should be there even if that data point doesn’t exist in its training data.
The major tech companies are fully aware that hallucinations are, shall we say, a bit of a problem. The engineers responsible for the general purpose chatbots – ChatGPT, Google Gemini, and the rest – are working nonstop on a variety of efforts to lower the risk of hallucinations. Every day responses are more likely to include the results of a web search for the latest verified data, with more advanced filtering and human feedback. Yup, humans are in the AI loop, teams of contractors ranking AI answers on factual correctness, with the feedback then looping back into training the AI.
But AI responses still contain a thrilling number of lies. Watch out for hallucinations! Your company’s stock price, your law license, your reputation may depend on it.
Misinformation
Remember the Bard error? There’s a twist.
The mistake that Google Bard made was not a hallucination. One hundred billion dollars of market value gone, poof!, based on reporting that was itself wrong.
Bard said the James Webb Space Telescope “took the very first pictures of a planet outside of our own solar system.”
Bard’s answer came from a NASA blog post.
Five months earlier, NASA had posted an article online about the JWST, announcing that “for the first time, astronomers have used NASA’s James Webb Space Telescope to take a direct image of a planet outside our solar system.”
After the hubbub from the Bard mistake had died down, NASA said, “Umm, our article might not have been phrased very precisely, we were talking about planet HIP 65426 b, we know it wasn’t the first picture of any exoplanet, we’re NASA after all, we were just proud of our telescope.”
In other words, the AI response was based on bad information – an ambiguous press release from a reliable source. That’s different than just making things up – and it’s a bigger problem that requires different solutions by the tech companies.
Early computer scientists in the 1960s came up with the phrase “garbage in, garbage out.” (Or, as Tom Lehrer put it so memorably: “Life is like a sewer: what you get out of it depends on what you put into it.”)
If an AI is trained on flawed, incomplete, or poor-quality data (“garbage”), the output will be similarly flawed (“garbage”).
Imagine that a fake health claim is posted on social media. Traditional media covers it as a “controversy.” When ChatGPT is asked a question later, it finds the coverage of the “controversy” in its training data and presents the fake claim as an authoritative fact, stripped of the context that it was disputed and false.
The internet is bulging at the seams with false or misleading information. Meanwhile sources of verified information are being locked behind paywalls or withheld from the tech companies so they won’t be used as AI training data. The odds are shifting to make it more likely that AI responses will be false or misleading even though the AI is not hallucinating, it’s doing the best it can, but the online world sucks and the AI can’t overcome that.
But wait, you say. Even when the internet fills up with AI slop, surely ChatGPT ought to be possible to find objective reality and answer questions with facts, right?
There isn’t as much agreement about “facts” as you might think. Remember the complaint by Stephen Colbert’s conservative persona that “Reality has a well-known liberal bias.” There’s an example online of what that means.
Elon Musk’s AI, Grok, is presented as a general purpose chatbot competing with ChatGPT and Gemini. The New York Times found that Grok was trained to “not shy away from making claims which are politically incorrect, as long as they are well substantiated.”
But Grok has been primarily trained on data from X/Twitter – highly polarized, unfiltered, and frequently full of crap. It believes that comments by Nazis and shitposters on X/Twitter are “well substantiated.”
So when Grok said that the biggest threat to Western civilization was low fertility rates, it was not hallucinating, it was repeating false information it had been trained on.
A harsh spotlight was turned on X/Twitter over the weekend when a site change accidentally revealed that many Trumpian accounts with millions of followers are being run out of India, Bangladesh, and Nigeria. Ryan Broderick reports: “There is an entire universe of scammers and puppet accounts pretending to be American Trump supporters — and men’s rights activists and trad wives — online because it’s an easy way to farm engagement.” Those are the “well substantiated” sources used to train Grok AI.
Elon Musk is a special case, sure, openly tilting the scales to make Grok a partisan source of information gathered from the world’s crankpots. Last week Grok said Musk is fitter than LeBron James and smarter than DaVinci, a rare hat trick that combined hallucination, misinformation drawn from Grok’s “well substantiated” sources, and malignant narcissism. Really, what is wrong with these people?
But ChatGPT and the other general purpose chatbots will be dealing with some version of the same problem with misinformation. The internet is increasingly full of slop. The AIs will be consulting the slop and regurgitating it when they answer our questions. That’s a more slippery and difficult problem than pure hallucinations.
A London-based think tank released a study a few days ago about the “Pravda network,” a pro-Kremlin network flooding the internet with pro-Russia disinformation – up to 23,000 articles per day at last count and steadily increasing. Security experts believe Russia is trying to push large amounts of pro-Russia content into the training datasets of AI models like ChatGPT and Gemini. As reported in The Guardian: “The ISD cautioned that by linking to articles in the network, the websites were inadvertently increasing the likelihood of search engines and large language models (LLMs) surfacing the pages, even in cases where the linking sites were disputing the Pravda network as a source.”
And perhaps financial considerations will distort answers from the general purpose chatbots. OpenAI, Microsoft, Google, and the other giant tech companies are working hard to make their chatbots reliable but they haven’t figured out yet how to monetize them. Maybe their financial incentives will lead them to prioritize answers that lead to their advertisers, or that assist their political lobbying, or that support another CEO’s odd obsessions.
It emphasizes why I think the real money will be made by companies offering specialized AIs trained on verified data that can be sold to businesses. Thomson Reuters is offering CoCounsel to law firms, trained on massive databases of verified US statutes and case law. Norm AI is an automated compliance officer, trained to check corporate documents against government regulations. BloombergGPT is a financial AI trained on 40 years of financial documents and business filings. Abridge is a medical AI trained on clinical documentation.
ChatGPT may turn out to be a pioneer – and a distraction. Limited scope grounded AIs are the real future for AI development.
I’m amazed anew every day by what can be done today with general purpose AIs like ChatGPT and Gemini. Use AI! Just remember: Your AI chatbot might lie to you. And it might give you information gathered from people who are lying online.