AI Hallucinations

Shamino · May 20, 2025, 5:12pm

Given recent discussions and criticism of “AI”, this article seems of particular interest:

AI Responses May Include Mistakes | OS/2 Museum

Quick summary: The author was doing a Google search for an IBM PS/2 model 280. The search was a typo (there never was a model 280), but 90% of the time, Google’s “AI” presents completely bogus nonsense about this non-existent computer model. Only 10% of the time does it provide the right answer, that there is no such model and the question is probably in error.

Dan_W · May 20, 2025, 7:05pm

This past week I asked the LLM built-into my development environment, Gemini, and multiple versions of ChatGPT to write a small program, given a set of criteria and sample input data. Two of them summarized in English what I was asking for, and even accurately told me what the correct result using my sample data should be.

But when it came to writing the code they all failed, producing code which, when run, would give the incorrect results. Telling them their answer was incorrect would lead to statements like, “Yes, you’re right, my code fails to take X into account. Here’s a version that works,” and, of course, the new code didn’t work either. I finally had to come up with my own algorithm.

On the plus side, my job as a programmer is safe (for now.)

ace · May 20, 2025, 9:42pm

ChatGPT has no problem with this search, which tracks with my testing of it and other answer engines. I lack the background to judge the accuracy of the rest of the response.

The IBM PS/2 Model 280 does not appear in IBM’s official documentation or recognized model lists, suggesting it may be a mislabeling or a rare, possibly regional variant.

IBM’s PS/2 line, introduced in 1987, encompassed various models, including desktops, towers, all-in-ones, portables, laptops, and notebooks. The numbering convention typically followed a pattern:Nostalgia Nerd+3Wikipedia+3Wikipedia+3

Model 30: Entry-level systems with Intel 8086 or 80286 CPUs.

Model 50/60: Mid-range systems featuring Micro Channel Architecture (MCA).

Model 80: High-end systems equipped with Intel 80386 CPUs.Amedeo Valoroso+3Wikipedia+3Wikipedia+3

The absence of a Model 280 in these listings indicates it was not part of the standard PS/2 lineup. It’s possible that “Model 280” refers to a misidentified Model 80, which was IBM’s flagship PS/2 system featuring a 20 MHz Intel 80386 processor and MCA bus architecture. The Model 80 was known for its expandability and performance, making it a popular choice for business applications in the late 1980s.

If you have specific details or markings from the system in question, providing those could help in accurately identifying the model and its specifications.

Shamino · May 20, 2025, 10:50pm

The rest of the response is correct, but incomplete. The models 30/50/60/80 were first-generation PS/2s. There were many later models as well that didn’t really fit that numbering convention.

See also List of IBM PS/2 models - Wikipedia .

And just for kicks, I tried the same Google search. It claimed that the “Model 280” is the same as a “Model 30 286” (a real system, but has never been called a “Model 280”), and then provides a big rundown of information about the “Model 30 286”.

IsaacBalbin · May 20, 2025, 11:34pm

I needed the services of a specialist plumber. I asked for three companies near me who did this type of work including their contact details. I was given a list of three companies which do not exist including bogus web pages and phone numbers! I kid you not.

natpoor · May 21, 2025, 2:12pm

Yes, part of the problem is that LLMs and chatbots are now being presented as search engines, when that is not quite what they are. They are text-generation algorithms, designed to make sentences that seem like they were written by a human (I think part of this motivation is the Turing Test). So it’s not intelligence at all, it’s just a really complex algorithm with a massive databank of texts of all sorts (and yes, the copyright issue). Truth has nothing to do with making sentences that are grammatically correct. I think another part of the problem is that, when we converse with something, we think there’s an intelligence there – this is very good when talking with other people, or probably even with our pets, but not the best when we’re text-chatting with a server farm (i.e., a chatbot or LLM). Science fiction has had a field day with this kind of thing (the movie “Her” is just so great in this regard but there are other great examples), but there are real examples like the ELIZA program (yes, sorry, all caps there) from the 1960s, a basic therapist-like program that would mostly mirror back what the user typed but in the form of a question. Some people had very strong emotional responses to ELIZA, leading to the term “Eliza effect”.
The term “hallucination” is problematic and disingenuous, when a human hallucinates, it’s an error of process (the magic mushrooms or whatever have changed how your brain works) and that leads to an error of fact, but when an LLM hallucinates, it’s working perfectly, no error of process, it’s just an error of fact. LLMs make human-like sentences, grammatically correct. Nothing about facts there. And if your database is full of problematic statements, your output can be as well (garbage in, garbage out).
I do think some of the AI companies are trying to add factuality to their AI / LLMs, but the specifics are not in my wheelhouse.
There is a lot more that could be said, and many smart people have indeed said a lot about the problems of AI. (I am tempted to write “AI” in quotes since it isn’t intelligence, it’s just a big algorithm, and doesn’t work like a human brain – we don’t need millions of pieces of text in order to figure out how to write sentences, for instance.)

ace · May 21, 2025, 2:46pm

Well, it’s not what some of them are. An increasing number of chatbots are searching the current Web to inform their responses, as I wrote in

natpoor · May 21, 2025, 3:31pm

I think that still has the garbage in, garbage out problem though. (I mean if the AI just uses Wikipedia, then it’s really just human-created content.) As some people have pointed out, the joke is that the internet will become a lot of AIs using AI-created content to make content (although I think there was an example or two [or more] of chatbots talking at each other in Twitter).

ShermanWilcox · May 21, 2025, 8:47pm

Here’s my experience with ChatGPT as a search or knowledge engine.

I was writing a scholarly article. I wanted to start with some quote – I won’t bother to say what, I just had a general idea what I wanted the quote to say, and because I know the field I was sure somebody, probably several people, had said things like what I wanted. So I asked ChatGPT.

Sure enough, I got a few answers that fit the bill. So I asked for the source. ChatGPT promptly provided the author, publication date, book title, and page number. I know this author, and in fact I have the book. Before I used it in my article I checked. The quote was not on the page ChatGPT said. So I searched the entire book for anything resembling that quote. Nothing!

So ChatGPT gave me perfectly correct information about the book, but a totally made up quote. I didn’t realize this until 2-3 days later. I asked ChatGPT, and the response was, “Sometimes I synthesize information.” Synthesize? You just fabricated a quote, and fabricated the page number.

Simon · May 21, 2025, 9:50pm

I experienced the exact same thing with ChatGPT, Llama, and Claude.

The query was a simple one about when (if at all) the CA vehicle code allows MV drivers to cross double yellow lines to pass cyclists if they otherwise could not ensure the mandatory 3’ separation.

All three immediately started out to describe in expansive lists exactly when this was allowed. When I challenged them to show me the section of the CVC that states this, they gave me the correct CVC section, but of course that section actually had nothing to say about when crossing double yellow would be allowed (spoiler: unless you’re an emergency vehicle or there is a road obstruction it never is). When I told these three models that the CVC said nothing of the sort, they either adjusted their answers accordingly and thanked me for my corrections or they set out to BS me with something like “but it can be inferred…” again without any real citation.

Take home message: a citation from these LLMs means nothing. It could be just as made up as the rest of their lengthy answers (their makers apparently believe more is more). Everything needs to be followed up and checked down to the letter.

raykloss · May 21, 2025, 10:08pm

One of the reasons I tend toward Perplexity for AI questions. They give links to the articles quoted so that I can go right to the article (unless it is behind a paywall) and read it myself to make sure. I don’t use it very often, but for me, that is a plus.

josehill · May 22, 2025, 3:26am

Same here.

I haven’t measured exactly how often the links that Perplexity provides actually support Perplexity’s assertions, but I would guess it is only a little better than 50% of the time. I always check the links.

Often, Perplexity will provide a link to an article that, from its title, might reasonably be expected to include the desired information, but if you read the linked article, the article does not actually contain the information. A category example would be Perplexity saying that a product supports a particular function, supplying a link to the product’s technical specifications as justification, but then the function is not mentioned at all in the specifications.

That said, I would also guess that Perplexity has a somewhat higher chance of providing the “right” link to answer your question on its first attempt than the 2025 version of Google’s search engine has of providing it in the top five links it presents to you. Again, I don’t have hard data on that; it’s just my sense of what I have been seeing lately.

Shamino · May 22, 2025, 2:50pm

A recent comic (in a fictional future universe where actual AI is real) talking about AI hallucination:

For those who can’t see the image, or if the link breaks in the future:

S.S.D.D. Comic. January 24 2025:

The term “AI hallucination” has been around for a very long time, but its meaning has changed over the years. In the beginning, it was pathetic predictive models fed on bad data they had the nerve to call “intelligence.”

Today, a hallucinating AI is a lot more worrisome than some primitive search engine telling you to eat rocks. Now it’s a breakdown of the AI’s core that is not detected by the AI itself. With pointers jumping to random parts of its memory, it becomes confused. It will see things that aren’t there or remembers things that never happened. It can start to panic when things stop making sense or worse, not question it at all!

I shouldn’t have to tell you how dangerous that could be!

– Dr. Sharky addressing AI students at MIT.

Note that parts of this comic are not safe for work. You’ve been warned.

josehill · May 22, 2025, 6:53pm

Coincidentally:

“Only five of the 15 titles on the list are real.”

Halfsmoke · May 22, 2025, 8:43pm

I regard generative AI hallucinations, given the state of information on today’s Internet, to be essentially neutral as long as a gAI indicates its sources. Social media, blogs, Wikpedia, and most discussion sites contain a lot of misinformation and false information as well. In other words, “Let the searcher beware”.

ace · May 22, 2025, 9:15pm

Was this before the Web search capability was in place? I’m curious because when I asked ChatGPT what I thought your question might have been, it seems to have gotten the answer right, with a variety of useful citations.

Simon · May 22, 2025, 9:46pm

I’m not sure I can say. All I recall is that this was based on GPT-4o mini. It was one of the engines we can choose to use through our campus licensing deals.

ace · May 23, 2025, 2:29pm

I use ChatGPT 4o for most things, but every now and then, I’ve had to push it to do a Web search rather than rely on training data. It’s quite tricky to figure out which model you like best, but forcing Web searches is important for anything that relates to real-world data.

mHm · May 24, 2025, 12:39pm

The problem with ‘new’ terminology is older, probably as old when they started to call a drawn rectangle on a PC screen a "window’, or certain cars claim they have an autopilot when in reality it’s anything but. AI’s errors can be obvious and hilarious, but what about the cases where the answers are off or wrong and the human can’t detect this?

Shamino · May 24, 2025, 8:47pm

Or how we still talk about “dialing” a phone.

Or where a collection of songs is called an “album” - going back to the days of 78’s, where collections of records (with a single song on each side) would be packaged in binders similar to photo albums.

Or how automotive engines are still measured in “horse power”.

Terminology changes all the time, gaining and losing meanings to suit the context of the surrounding culture. Nothing wrong or even unusual about that.