Wikipedia:Bots/Requests for approval/WikiLinkChecker
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: Skarz (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 02:54, Sunday, March 10, 2024 (UTC)
Function overview: It's a basic Python script that retrieves the wiki markup version of an article that I submit using the REST API, scans for URLs that are not already archived, checks the internet archive for archived versions/adds to the internet archive if necessary, and updated the applicable citation as necessary.
Automatic, Supervised, or Manual: It's hard to categorize because it's not a bot; it's a script that performs a very specific action only when I direct it to, much like Internet Archive Bot. In it's current state it cannot be used to process 100s or 1000s of URLs at a time.
Note: I have been advised by [[User:ferret|@ferret] that because this script writes the changes to Wikipedia without the ability for me to preview changes, it does not meet the criteria for assisted editing.
Programming language(s): Python
Source code available: User:Skarz/WikiLinkChecker
Links to relevant discussions (where appropriate): Discord
Edit period(s): Whenever I run it.
Estimated number of pages affected: ~10 per day
Namespace(s): Mainspace
Exclusion compliant (Yes/No):
Function details:
This Python script is designed to update Wikipedia pages by replacing dead external links with archived versions from the Internet Archive's Wayback Machine. Here's a step-by-step explanation of what the script does:
- User Login: The script prompts the user to enter their Wikipedia username and password to log in to the English Wikipedia.
- Page Selection: The user is prompted to enter the name of the Wikipedia page or its URL. The script extracts the page name from the URL if a URL is provided.
- Page Content Retrieval: The script retrieves the content of the specified Wikipedia page.
- Link Extraction: The script extracts all external links from the page content. It specifically looks for links within {{cite web}} templates and <ref> tags.
- Link Checking and Updating:
- For each extracted link, the script checks if the link is alive by sending a HEAD request.
- If the link is dead, the script checks if there is an archived version available on the Wayback Machine.
- If an archived version is available, the script updates the reference in the page content with the archive URL and the archive date.
- Page Update: The script saves the updated page content to Wikipedia with a summary indicating that dead links have been updated with archive URLs.
- Output: The script prints a message indicating that the Wikipedia page has been updated.
Limitations and Safeguards:
- User Authentication: The script requires a valid Wikipedia username and password, limiting its use to authenticated users.
- Edit Summary: The script provides an edit summary for transparency, allowing other Wikipedia editors to review the changes.
- Rate Limiting: Wikipedia has rate limits and abuse filters in place to prevent automated scripts from making too many requests or disruptive edits.
- Error Handling: The script checks for errors when making web requests and accessing the Wayback Machine, preventing it from continuing with invalid data.
- No Automatic Deletion: The script does not delete any content; it only updates dead links with archived versions, reducing the risk of unintended content removal.