Chatbot to help editors improve articles

edit
 
After selecting text, the control panel on the right is used to give instructions. The responses by the AI model are presented in the chat panel on the left.

I wrote a user script called WikiChatbot. It works by selecting text in an article and then clicking one of the buttons on the right to enquire about the selected text. It includes many functions. For example, it can summarize and copyedit the selected text, explain it, and provide examples. The chat panel can also be used to ask specific questions about the selected text or the topic in general. The script uses the AI model GPT 3.5. It requires an API key from OpenAI. New OpenAI accounts can use it freely for the first 3 months with certain limitations. For a more detailed description of all these issues and examples of how the script can be used, see the documentation at User:Phlsph7/WikiChatbot.

I was hoping to get some feedback on the script in general and how it may be improved. I tried to follow WP:LLM in writing the documentation of the chatbot. It would be helpful if someone could take a look to ensure that it is understandable and that the limitations and dangers are properly presented. I also added some examples of how to use edit summaries to declare LLM usage. These suggestions should be checked. Feel free to edit the documentation page directly for any minor issues. I'm also not sure how difficult it is to follow the instructions so it would be great if someone could try to set up the script, use it, and explain which steps were confusing. My OpenAI account is already older than 3 months so I was not able to verify the claims about the free period and how severe the limitations are. If someone has a younger account or is willing to open a new account to try it, that would be helpful.

Other feedback on the idea in general, on its problems, or on new features to implement is also welcome. Phlsph7 (talk) 12:45, 12 July 2023 (UTC)Reply

I meant to reply to this sooner. This is awesome and I'm interested in this (and related ideas) related to writing / reading with ML. I'll try to have a play and give you some feedback soon. Talpedia 10:18, 17 July 2023 (UTC)Reply
Related: see also m:ChatGPT plugin. Mathglot (talk) 07:22, 18 July 2023 (UTC)Reply
Whilst I rather like the ability of this nifty little script to do certain things, I do have some criticism. These functions strike me as extremely risky, to the point that they should probably be disabled:
  • "is it true?" - ChatGPT likely uses Wikipedia as a source, and in any case, we want verifiability, not truth. I feel quite strongly, based on several other reasons too, that this function should be disabled and never see the light of day again.
  • "is it biased?" - ChatGPT lacks the ability to truly identify anything more than glaring "the brutal savages attacked the defenceless colonist family" level bias (i.e. something that any reasonably aware human should spot very quickly indeed). Best left to humans.
  • "is this source reliable?" - Same as the first one, this has so much potential to go wrong that it just shouldn't exist. Sure it might tell you that Breitbart or a self-published source isn't reliable, but it may also suggest that a bad source is reliable, or at least not unreliable.
I don't think that any amount of warnings would prevent misuse or abuse of these functions, since there will always be irresponsible and incompetent people who ignore all the warnings and carry on anyway. By not giving them access to these functions, it will limit the damage that these people would cause. Doing so should not be a loss to someone who is using the tool responsibly, as the output generated by these functions would have to be checked so completely that you might as well just do it without asking the bot.
The doc page also needs a big, obvious warning bar at the top, before anything else, making it clear that use of the tool should be with considerable caution.
The doc page also doesn't comment much on the specific suitability of the bot for various tasks, as it is much more likely to stuff up when using certain functions. It should mention this, and also how it may produce incorrect responses for the different tasks. It also doesn't mention that ChatGPT doesn't give wikified responses, so wikilinks and any other formatting (bolt, italics, etc) must be added manually. The "Write new article outline" function also seems to suggest unencyclopaedic styles, with a formal "conclusion", which Wikipedia articles do not have.
Also, you will need to address the issue of WP:ENGVAR, as ChatGPT uses American English, even if the input is in a different variety of English. Mako001 (C)  (T)  🇺🇦 01:14, 23 July 2023 (UTC)Reply
You can ask it return wikified responses and it will do it with reasonable good success rate. -- Zache (talk) 03:03, 23 July 2023 (UTC)Reply
@Mako001 and Zache: Thanks for all the helpful ideas. I removed the buttons. I gave a short explanation at Wikipedia:Village_pump_(miscellaneous)#Feedback_on_user_script_chatbot and I'll focus here on the issues with the documentation. I implemented the warning banner and add a paragraph on the limitations of the different functions. That's a good point about the English variant being American so I mentioned that as well. I also explained that the response text needs to be wikified before it can be used in the article.
Adding a function to wikify the text directly is an interesting idea. I'll experiment a little with that. The problem is just that the script is not aware of the existing wikitext. So if asked to wikify a paragraph that already contains wikilinks then it would ignore those links. This could be confusing to editors who only want to add more links. Phlsph7 (talk) 09:12, 23 July 2023 (UTC)Reply
I made summaries/translations/etc it so that I gave wikitext as input to chatgpt instead of plaintext. However, the problem here is how to get the wikitext from page in first place. -- Zache (talk) 09:48, 23 July 2023 (UTC)Reply
In principle, you can already do that with the current script. To do so, go to the edit page, select the wikitext in the text area, and click one of the buttons or enter your command in chat panel of the script. I got it to add wikilinks to an existing wikitext and a translation was also possible. However, it seems to have problems with reference tags and kept removing them, even when I told it explicitly not to. I tried it for the sections Harry_Frankfurt#Personhood and Extended_modal_realism#Background, both with the same issue. Maybe this can be avoided with the right prompt. Phlsph7 (talk) 12:09, 23 July 2023 (UTC)Reply
Thanks for setting this up. I've recently had success drafting new Wikipedia articles by feeding the text of up to 5 RS into GPT4-32k through openrouter.com/playground and simply asking it to draft the article. It does a decent job with the right prompt. You can see an example at Harrison Floyd. I'll leave more details on the talk page of User:Phlsph7/WikiChatbot, but I wanted to post here for other interested parties to join the discussion. Nowa (talk) 00:02, 20 September 2023 (UTC)Reply
Thanks for the information. I've responded to you at Talk:Harrison_Floyd#Initial_content_summarized_from_references_using_GPT4 so that we don't have several separate discussion about the same issue. Phlsph7 (talk) 07:44, 20 September 2023 (UTC)Reply
Ran into a brick wall I thought might be helpful to know about. I've been working on the bios of people associated with Spiritual_warfare#Spiritual_Mapping_&_the_Charismatic_movement. GPT 4 and LLama refused to read the RS claiming that it was "abusive". I can see from their point of view why that is, but nonetheless, RS is RS, so I just read it manually. Between that and the challenges of avoiding copyvios I'm a bit sour on the utility of LLMs for assisting in writing new articles. It's just easier to do it manually. Having said that, the Bing chatbot does have some utility in finding RS relative to Google. Much less crap. Nowa (talk) 00:35, 9 October 2023 (UTC)Reply

If we're going to allow LLM editing, this is a great tool to guide editors to the specific use cases that have community approval (even if those use cases are few to none at this point). I found it to be straightforward and easy to use. –dlthewave 16:06, 23 July 2023 (UTC)Reply

There is no policy or guideline disallowing the use of LLM or other machine learning tools. No need for any approval unless that changes. MarioGom (talk) 17:29, 11 February 2024 (UTC)Reply

  You are invited to join the discussion at Wikipedia talk:WikiProject AI Cleanup § Proposal: adopting WP:LLM as this WikiProject's WP:ADVICEPAGE. QueenofHearts 21:37, 9 January 2024 (UTC)Reply

Someone implemented a full-on article generator, and Anthropic gave it an award

edit

Blog post description, GitHub repo, based on this Stanford work, which also has a repo, and a live working demo.

Have people noticed those kind of articles? The outline structure is more distinctive and flamboyant than we usually see from human editors. 141.239.252.245 (talk) 05:43, 26 April 2024 (UTC)Reply

Just saw this. Taking a look now. Thanks for posting. Nowa (talk) 12:07, 9 November 2024 (UTC)Reply
I took a look at the tool and in its current state, it is unsuitable for drafting Wikipedia articles. The big problem is the text in the drafted articles is not necessarily supported by the cited references. Here is an example. See my comments at the bottom of the article. Another problem is that the LLM engages in wp:synth where it takes two facts stated in the references and extrapolates a conclusion from them. You can see an example of synth here. Again, scroll to my comments at the bottom.Nowa (talk) 23:24, 10 November 2024 (UTC)Reply
Using the app a bit more, it's actually a helpful "search engine". The references are relevant to the topic of concern. So it might be useful for getting references for a Wikipedia article. Here is an example. Nowa (talk) 21:48, 11 November 2024 (UTC)Reply

Opening sentence

edit

There seems to be something wrong with the opening sentence. I can't parse it:

"While large language models (often known as "chatbots") are very useful, machine-generated text (like human-generated) often contains errors, is useless, whilst seeming accurate."

Am I missing something? Or does it need editing? AndyJones (talk) 12:35, 28 August 2024 (UTC)Reply

Yeah it was made not-very-sensical in Special:Diff/1230068490
While [[large language model]]s (colloquially termed "AI chatbots" in some contexts) can be very useful, machine-generated text (much like human-generated text) can contain errors or flaws, or be outright useless.
+
While [[large language model]]s (often known as "chatbots") are very useful, machine-generated text (like human-generated) often contains errors, is useless, whilst seeming accurate.
Alalch E. 14:36, 28 August 2024 (UTC)Reply
I have reverted the above mentioned change. —Alalch E. 21:37, 29 August 2024 (UTC)Reply

Ban it. Full stop.

edit

I was shocked and horrified to see a banner on an article announcing that it may contain "hallucinations" due to use of LLM content. A more accurate word for that template would be "lies", "fabrications", or "misinformation". So I would like that change to be made. But even more so, I firmly believe that using LLM-generated content goes directly against the ethos of the Wikipedia project and as such, that material needs to be completely, unambiguously banned from addition. Where can we go about enacting such a policy? Matt Gies (talk) 15:08, 2 November 2024 (UTC)Reply

Wikipedia:Village pump (policy) is the place to do it. Make sure you read the previous discussions about this topic before, or you'll get nowhere. The main ones are linked at the top of this talk page. Tercer (talk) 15:39, 2 November 2024 (UTC)Reply

Using Claude 3.5 to summarize reference content for Wikipedia

edit

I've been evaluating some of the more improved LLMs to see if they do a better job of summarizing reference content for Wikipedia without undue copyright infringement. Claude 3.5 seems to do a reasonably good job. See Feral_pig#Canada. I fed the content of a National Geographic article into Claude and asked it to summarize the content in Wikipedia format, including the citation.

To check for copyvio, I used Microsoft Word's "compare" function to see how the wiki draft compared to the original reference. I posted the results of the comparison on my google drive here.

As far as I can tell:

  • The wiki draft content looks faithful to the reference.
  • There are no hallucinations.
  • The longest extracted phrase from the original text was "...throughout western and central Canada, from British Columbia to Manitoba.."
  • There is no wp:synth.

Did I miss anything? Is there a more challenging task we should give it? Nowa (talk) 12:38, 21 November 2024 (UTC)Reply

Simply put, I see no value in using these systems to "summarize" a citation. — The Hand That Feeds You:Bite 17:00, 21 November 2024 (UTC)Reply
Seems awfully like close paraphrasing anyway, when you change
  • "In the late 1980s and early 1990s, some Canadian farmers imported wild boars from Europe to raise for meat. But as wild boars are wont to do, some of them escaped, either digging under fences or barreling through them. Others were set free once the boar meat market cooled. At first, it didn’t seem like a big problem; many thought they couldn’t survive Canada’s long winters. "

into

  • "Canadian farmers began importing wild boars from Europe in the late 1980s and early 1990s for meat production. Some of these animals escaped from farms by digging under or breaking through fences, while others were intentionally released when the boar meat market declined. Initially dismissed as a minor concern due to assumptions about their inability to survive Canadian winters" Fram (talk) 17:18, 21 November 2024 (UTC)Reply

@Fram Interesting. I thought I would see how the paraphrasing of a human-drafted original new section based on a reference compared to an LLM drafted new section based on the same reference. I went back to when the section "Attacks on Humans" was introduced into the feral pig article here. I then asked Claude to draft a two sentence section based on the same NYT reference Here's how the human-drafted and Claude-drafted sections compare:

NYT reference excerpts Human-drafted section Claude-drafted section
Christine Rollins, 59, was feet away from the front door when she was killed. It was only the fifth documented fatal wild hog attack in the country since 1825...

it just a few steps away from the front door...where she worked as a caretaker for an older couple.

she was attacked by a herd of feral hogs...

There were about 100 documented attacks by feral hogs on humans in the United States between 1825 and 2012

Feral pigs in the living in the United States have been known to attack and fatally injure human beings. A 59 year old Texas woman named Christine Rollins was attacked and killed only a few feet away from the front door of her workplace by a herd of feral pigs in the Texas town of Anahuac, which is 50 miles east of Houston. This incident was the fifth documented fatal feral pig attack in the United States since 1825. While feral hog attacks on humans are rare, with only 100 documented attacks in the United States between 1825 and 2012, they can be fatal as demonstrated by the 2019 death of Christine Rollins in Anahuac, Texas, who was killed by a herd of feral hogs outside a home where she worked as a caretaker. This was only the fifth documented fatal wild hog attack in the United States since 1825.

I agree that the Claude-drafted section is a closer paraphrase of the reference than the human-drafted section.

Any other comparisons worth noting?Nowa (talk) 16:29, 22 November 2024 (UTC)Reply