Wikipedia talk:Wikipedia Signpost/2024-09-26/Recent research

Discuss this story

If Chat GPT wasn't prone to hallucinating then this would not have happened. TarnishedPathtalk 04:10, 27 September 2024 (UTC)Reply

What does this comment have to do with this Signpost article? Regards, HaeB (talk) 07:05, 27 September 2024 (UTC)Reply

How would one go about having ChatGPT write an article? Would one give it a template, a topic, and the sources, then it writes the page for you? JoJo Eumerus mobile (main talk) 17:31, 28 September 2024 (UTC)Reply

It's not 2022 any more ;) One takeaway from the review here should be that custom LLM systems like WikiCrow or STORM seem much better at writing Wikipedia-like articles than ChatGPT by itself. To quote from my review of STORM last month:

The use of external references [in the STORM project] is motivated by the (by now well-established) observation that relying on the "parametric knowledge" contained in the LLM itself "is limited by a lack of details and hallucinations [...], particularly in addressing long-tail topics". ChatGPT and other state-of-the art AI chatbots struggle with requests to create a Wikipedia article. (As Wikipedians have found in various experiments – see also the Signpost's November 2022 coverage of attempts to write Wikipediesque articles using LLMs – this may result e.g. in articles that look good superficially but contain lots of factually wrong statements supported by hallucinated citations, i.e. references to web pages or other publications that do not exist.) The [STORM] authors note that "current strategies [to address such shortcomings of LLMs in general] often involve retrieval-augmented generation (RAG), which circles back to the problem of researching the topic in the pre-writing stage, as much information cannot be surfaced through simple topic searches." They cite existing "human learning theories" about the importance of "asking effective questions". This task in turn is likewise challenging for LLMs ("we find that they typically produce basic 'What', 'When', and 'Where' questions [...] which often only address surface-level facts about the topic".) This motivates the authors' more elaborated design [...]

Now, you said you are thinking of giving ChatGPT the sources already (i.e. you would take care of the retrieval step yourself, instead of relying on ChatGPT's parametric knowledge or its - by now - inbuilt web browsing feature). That might avoid the problem of ChatGPT hallucinating citations (coming up with non-existing references). But you might e.g. run into the problem of context length (not being able to fit the text of all the sources into the prompt, this is one reason why RAG systems use chunking). Still you could try, and maybe take some inspiration from the prompts that the WikiCrow and STORM authors used (see link in this review for WikiCrow's prompts).
Also be aware that both WikiCrow and STORM are currently more expensive to run than a single ChatGPT query (with average costs per article of about 84 cent for STORM and $5.50 for WikiCrow, although these figures may already be outdated with recent drops in LLM API costs).
Regards, HaeB (talk) 05:16, 29 September 2024 (UTC) (Tilman)Reply