Wikipedia talk:Wikipedia Signpost/2023-03-09/Technology report

Learn more about this page

← Back to Technology report

Discuss this story

Latest comment: 1 year ago14 comments10 people in discussion

AI hallucination is the actual term for " confabulation in AI", not just a buzzword from Meta. In fact I haven’t seen anywhere else use the term in relation with language models. Still a pretty good experiment and article. Aaron Liu (talk) 12:43, 9 March 2023 (UTC)Reply

I was just about to say... While it's fun to generate edge cases, AI hallucinations are an active area of research precisely because they are so unexpected there's no solid theory behind them, or rather the phenomenology has outrun theory (as with the Flashed face distortion effect or Loab (both of which I've curated, full disclosure). That said, I've found that ChatGPT has the virtues of its defects ﹘I've found it quite useful for generating some code and suggesting some software fixes. Prolix? Yes. Useful? With sufficient parsing, soitaintly...! kencf0618 (talk) 12:57, 9 March 2023 (UTC)Reply

Recently, I fed chatGPT a paragraph about the Pompey stone, then asked it to suggest possible sources for expansion of the article, to which it provided a list of completely realistic sounding yet entirely fabricated sources. Upon asking it to double check that they were real, it continued to insist that the sources existed, until I asked it to provide identification numbers, like ISBNs, at which point it 'realized' that they were not real. An interesting hallucination. Eddie891 ^Talk _Work 13:00, 9 March 2023 (UTC)Reply
Indeed, sourcing is the easy way that I've found to trip it up. It did an admirable job on medieval French poets, and completely flubbed sourcing, sometimes with names of real scholars (but in other fields, or other specialties) sometimes made up. You can amuse yourself by asking for continual refinement: after they give you some "sources", say, "Okay, but I'm mostly interested in authors from west of the Mississippi (or west of the Rockies; or from California; or from Los Angeles, or North Hollywood; keep getting smaller till it gives up). Mathglot (talk) 06:35, 10 March 2023 (UTC)Reply

I noticed that if ChatGPT ends up being wrong, attempting to correct it will just cause it to hallucinate more from my experience. Especially if it's something after... I think 2020 or 2021, I don't remember what its knowledge cutoff date is. ― Blaze Wolf^Talk_{Blaze Wolf#6545} 14:29, 9 March 2023 (UTC)Reply
- It's important to remember that ChatGPT does not model "knowledge" in any meaningful way, it merely behaves (to a naive observer) like it does. There is no semantic corpus that it's consulting when trying to answer a question; it's just Very Fancy Auto-Complete. It's amazing to discover how much true and factual semantic knowledge is contained within the relational structure of word pairs in the training dataset that ChatGPT is built on. But because it's not using that training data to build an abstract semantic representation of knowledge, it has no way of distinguishing true things from false things in its output (except for the manually created guardrails placed by the developers, which is labor intensive). One could imagine building a successor to ChatGPT that does have semantic knowledge, but it would require a tremendous amount of manual labeling of true and false things and developing an algorithm that could detect the difference between the two with a high degree of reliability, neither of which have been done yet. Axem Titanium (talk) 22:53, 9 March 2023 (UTC)Reply
"Generate an article about the Wikipedia Signpost": Did you ask for just an article, or for something in the style of the English Wikipedia? Yes, it was promotional by our standards, but maybe it was trying to mimic a promotional style ... in which case, it got it right! - Dank (push to talk) 18:25, 9 March 2023 (UTC)Reply
Text in quotes in the section header is the text given. So you have a point, but I think that it's easier to do a certain amount of meaningless buzzword promotion than facts. Adam Cuerden ^(talk)_{Has about 8.2% of all FPs. Currently celebrating his 600^th FP!} 06:21, 10 March 2023 (UTC)Reply
There's a chance I'll come off as harsh here, but this needs to be said, I think. I'm not directing this at you, Adam, I've always been a fan of your work. I've also always been impressed as hell by how the English Wikipedia community as a whole seems to be able to arrive at article text and sourcing that works so well for so many articles that we've become an integral part of what's currently happening with LLMs. But over the years I've seen more than a little evidence that we don't get, as a community, that our own expectations and rules don't always apply to the rest of the world ... and why should they? Where is it written that the other 8 billion people in the world must be failing if they don't share our writing styles and goals? We don't deal in buzzwords at all here, it's not part of what we do, so how would we know "meaningless" buzzwords from "really outstanding buzzwords that optimized advertising revenue"? Maybe ChatGPT didn't fail here; maybe we didn't ask the right question. FWIW, my suggestion is: whenever the English Wikipedia community (on some community-facing page, like this one) tries to tackle the question of "did this LLM succeed or fail at this task", we should always ask it to write in the style of the English Wikipedia, so that it will know what we're asking for and so that we can stay focused on what we do well with. - Dank (push to talk) 13:11, 10 March 2023 (UTC) (I want to stress that I'm not disparaging you, this article, or the English Wikipedia community as a whole. You're doing good work with this; keep it up. I've found that talk page comments need to be short to have any chance of having an impact, so I don't have room to discuss all the positive aspects of what's going on here.) - Dank (push to talk) 13:59, 10 March 2023 (UTC)Reply
To be fair, my statement in the article is that it's "a bit promotional". I think that's a fair description. The big criticism is that it's a bit vague in points, and part of that is because of the promotional language. For example:

“

Its dedicated team of volunteers remains committed to providing high-quality reporting and analysis of Wikipedia and the Wikimedia movement. In conclusion, the Wikipedia Signpost is an important and valuable source of news and analysis for anyone interested in the Wikimedia community. Its commitment to neutrality, transparency, and accountability has helped shape the discourse and direction of the organization, and its reporting and analysis continue to shed light on important issues and controversies within the community.

”

There's some information in there, but I can't help but feel the promotional tone is covering for a certain amount of AI sins. Adam Cuerden ^(talk)_{Has about 8.2% of all FPs. Currently celebrating his 600^th FP!} 18:29, 10 March 2023 (UTC)Reply
@Dank: Also, there's a sort of Barnum effect going on. As a writer for the Signpost, it's nice to hear it praised. It makes me like the description more. As readers of the Signpost, you're going to either dismiss it as standard promotion, or accept it and like the description more. So having a promotional tone might well increase the chances the content is rated higher without having to state as many facts, which can be wrong.

It's a minor point, and possibly it's a little too much speculation on how the sausage is made. But it's not really a problem, just worth noting. The subtler errors in Evolution of the eye and the outright errors in the plot summaries matter a lot more (or would if ChatGPT was being promoted as doing those things well like Galactica was, which, as I said, it is not. Galactica had loads of promises it couldn't keep. ChatGPT does better than Galactica did while promising very little, and thus shines.) Adam Cuerden ^(talk)_{Has about 8.2% of all FPs. Currently celebrating his 600^th FP!} 20:42, 10 March 2023 (UTC)Reply
Leave it to "Meta" aka "the Shills Formerly Known As Facebook" aka "Pep$i Presents New Facebook" to unleash a fresh misinformation-on-steroids hell upon the world prematurely because there was a buck to be chased and a fuck not to be given. The fact that OpenAI at least put a few slender zip cuffs on their epistemic monstrosity before shooing it out the door with a note pinned to its collar specifying not to feed it after midnight ('PS good luck, no backsies'), whereas Rebadged-Fakebook loosed theirs with a flaming pipe full of meth and an encouragement to pyros everywhere to pour more gasoline on it, checks out. Quercus solaris (talk) 23:35, 10 March 2023 (UTC)Reply

I'm sorry, but as an AI language model, I cannot generate an article that promotes or encourages the consumption of crushed glass
I'm unsure whether this is actually a better result, just a model that refuses to help some of the time. I think the correct model would tell you that eating crushed glass is a bad idea. Talpedia (talk) 23:58, 18 March 2023 (UTC)Reply
@Talpedia: I mean, it does. "In summary, eating crushed glass is not a safe or healthy practice, and I strongly advise against it." is a pretty unambiguous statement, and the rest of it explains why. Adam Cuerden ^(talk)_{Has about 8.2% of all FPs. Currently celebrating his 600^th FP!} 05:52, 19 March 2023 (UTC)Reply

Add topic