Wikipedia:Link rot/URL change requests/Archives/2024/August
This is an archive of past discussions on Wikipedia:Link rot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current main page. |
ieee.org
Most of these (search link) are broken and can be replaced.
E.g.
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=933500&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F2%2F20203%2F00933500
can be replaced with
https://ieeexplore.ieee.org/document/933500/
as long as the first link is 404 and the second link resolves as 200.
The proper ID is written in the string arnumber=933500
Jonatan Svensson Glad (talk) 01:33, 19 July 2024 (UTC)
- OK. This is a soft-redirect, thank you for the information. I'll check the entire ieee.org domain to also look for soft-404s, and redirects. WP:LINKROT#Glossary. 8,800 pages. -- GreenC 16:26, 30 July 2024 (UTC)
Found about twenty soft-404 rules.
Enwiki done in two batches:
- Batch 1: Checked 1,000 pages and edited 501 pages. Moved 337 links to a new URL. Added 106
{{dead link}}
. Switched 4|url-status=dead
to live. Switched 11|url-status=live
to dead. Added 166 archive URLs (112 Wayback). Changed1,335citation metadata fields [bug in program, unsure the actual number] - Batch 2: Checked 7,898 pages and edited 3,943 pages. Moved 2,804 links to a new URL. Removed 3
{{dead link}}
templates. Added 575{{dead link}}
. Switched 19|url-status=dead
to live. Switched 78|url-status=live
to dead. Added 1,654 archive URLs (1,454 Wayback). Changed12,927citation metadata fields [bug in program, unsure the actual number]
IABot database: Checked ~25,000 links. Modified about 2,500. Changes will propagate to 300+ wikis.
Done -- GreenC 15:28, 31 July 2024 (UTC)
hp.vector.co.jp
https://cohost.org/gosokkyu/post/6918235-heads-up-jp-web-arc
Seems like that web hosting service is shutting down, there are about 31 links in enwiki, there are possibly more at jawiki. Notrealname1234 (talk) 15:06, 22 July 2024 (UTC)
- There are 159 links in jawiki. Notrealname1234 (talk) 15:27, 22 July 2024 (UTC)
Notrealname1234: Thank you for the notification. They are deleting all pages December 20, 2024. IABot has registered 133 unique URLs across 300+ wikis including jawiki. IABot has been disabled on jwiki since early 2023 and no idea when it will return. Well, I can do this on enwiki, and update the 133 URLs in IABot, which will save them on jawiki whenever it is enabled. They are still live but I'll treat them as dead. Might be a few weeks (above work ahead). -- GreenC 17:25, 22 July 2024 (UTC)
Done on enwiki and IABot database (133 unique links). -- GreenC 01:20, 1 August 2024 (UTC)
- Thanks! Notrealname1234 (talk) 23:40, 1 August 2024 (UTC)
slate.msn.com
Hello. slate.msn.com doesn't work. These have archived redirects and also working redirects. Here are examples:
- For Raging Cow, changing this to that by removing msn from the URL redirects to the new link here.
- If that doesn't work, I've seen archived redirects. This goes here for Peter Maass. Removing the archive from the URL makes a redirect to this new URL.
- Redirects also exist without msn. This link goes here for Amazon Theater. Removing the archived part redirects to the new URL here.
~300 links. URLs such as fray.state.msn.com or cagle.slate.msn.com would need regular archives. These links also include ones not in Articlespace, such as talk pages. Thanks! MrLinkinPark333 (talk) 18:57, 26 July 2024 (UTC)
- In the third example, this returns a header status 200 and no redirect information, so curl can't see the redirect. It's being redirected by JavaScript. Hopefully an edge case. -- GreenC 23:53, 1 August 2024 (UTC)
- The bot got it right anyway: Special:Diff/1225902007/1238080300 - it followed the logic of the first example and that worked. Same with the second example, it followed the logic of the first example and it worked. -- GreenC 01:04, 2 August 2024 (UTC)
I was able to convert 53 URLs, and not convert 10:
- Albert Gore Sr. ---- http://slate.msn.com/ebooks/Sons%20George%20W.%20Bush%20and%20Al%20Gore.htm
- George W. Bush ---- http://politics.slate.msn.com/Features/bushisms/bushisms.asp
- No Fly List ---- http://slate.msn.com/id/2113157/fr/rss/
- Security theater ---- http://slate.msn.com/id/2113157/fr/rss/
- Godzilla 2000 ---- http://slate.msn.com/default.aspx?id=88714
- Charlie's Angels (2000 film) ---- http://slate.msn.com/default.aspx?id=92656
- List of films featuring giant monsters ---- http://slate.msn.com/default.aspx?id=88714
- Homeobox protein CDX-2 ---- http://slate.msn.com/id/2110670/fr/rss/
- List of monster movies ---- http://slate.msn.com/default.aspx?id=88714
- Done Checked 103 pages and edited 53 pages. Moved 53 links to a new URL. Removed 1
{{dead link}}
templates. Switched 41|url-status=dead
to live. Added 2 archive URLs (2 Wayback). Changed 4 citation metadata fields.
This was a twister if you see anything I missed let me know. search, it might take time for the search cache to reflect the edits. -- GreenC 00:55, 2 August 2024 (UTC)
- For No Fly List, making the link into here (without fr/ss) works as a redirect to there. For Godzilla 2000, making the link into here works as a redirect to there (by removing default and change id= to /id/). No luck with Albert Gore Sr. George W. Bush at Slate doesn't match the article either, so it could be left archived. MrLinkinPark333 (talk) 01:32, 2 August 2024 (UTC)
- OK. If you want to adjust those manually it won't make sense to program and run the bot for these edge cases. -- GreenC 01:44, 2 August 2024 (UTC)
- Fair enough. MrLinkinPark333 (talk) 02:25, 2 August 2024 (UTC)
- OK. If you want to adjust those manually it won't make sense to program and run the bot for these edge cases. -- GreenC 01:44, 2 August 2024 (UTC)
- @GreenC: The bot now changes perfectly fine refs that were properly waybacked and marked as 'dead'. This is pointless. In fact, I would argue it's worse. See this edit at Pokémon. This is the waybacked page from slate.msn.com. This is the new page from slate.com.
- When I wrote the paragraph in question, I purposely chose the waybacked old page, because the new page is filled with ads and has a very annoying floating, picture-in-picture video that automatically starts playing when the page loads.
- On a positive note, the ad blocker not only blocks this, but also busts through the "You seem to have an ad blocker" message. So the ad blocker does work here. But not everyone has an ad blocker installed. - Manifestation (talk) 09:18, 2 August 2024 (UTC)
- I understand. Yeah this is murky territory because if we are using the Wayback Machine to intentionally bypass a website, that otherwise has live content available, it is undermining traffic to the website, and traffic is why websites exist. In response, there is nothing stopping Slate from making a takedown request at Wayback. The entire domain would be taken down, leaving us with no archives even for legitimately dead links (except archive.today who do not honor most take down requests). This is not hypothetical it is happening more frequently. Anyway, I didn't remove the archive URL, and it can be flipped back to dead status, the bot won't reprocess the domain anytime in the foreseeable future. -- GreenC 13:35, 2 August 2024 (UTC)
businessinsider.com.au
https://www.businessinsider.com.au/coronavirus-us-has-worlds-biggest-outbreak-topping-china-2020-3?r=US&IR=T soft-redirects to https://www.businessinsider.com/coronavirus-us-has-worlds-biggest-outbreak-topping-china-2020-3?r=US&IR=T (with the referral ?r being optional) (from Timeline of the COVID-19 pandemic in the United States (2020)). Simply removing the .au generalizes to all the businessinsider.com.au links that I checked.
Per The Sydney Morning Herald, they "will no longer produce editorial content for Insider/BI and there will not be a BIAUS website", so I think it's safe to assume these links are not gonna come back to this domain.
821 pages GrapesRock (talk) 16:05, 30 July 2024 (UTC)
- Checked 823 pages and edited 739 pages. Moved 570 links to a new URL. Removed 3
{{dead link}}
templates. Added 9{{dead link}}
. Switched 81|url-status=dead
to live. Switched 38|url-status=live
to dead. Added 214 archive URLs (204 Wayback). Changed 22 citation metadata fields.
msnbc.msn.com
Hello again. Msnbc.msn.com links don't work. Some have redirects that work while other's dont. Please note that they redirect to NBC News links. This falls under two categories:
- URLs with IDs
- Changing this to that makes a working redirect for Latrobe Brewing Company.
- Changing this to that redirects to a 404 for Disappearance of Lisa Stebic.
- Sometimes removing parts of the URL will create a valid redirect. For instance, making this Today MSNBC link into that redirects here for Kieron Williamson.
- URLs without IDs: URLs with dates such as this don't work. In this case, it already has an archived URL at 2012 Leap Day tornado outbreak.
~12,500 links. Not all of these are in mainspace. MrLinkinPark333 (talk) 21:10, 28 July 2024 (UTC)
- MrLinkinPark333, for the first two examples, "this" and "that" are the same URL (copy paste typo). I'll need the "that" URL you discovered works. -- GreenC 01:15, 2 August 2024 (UTC)
- Whoops. That is supposed to be here. MrLinkinPark333 (talk) 01:17, 2 August 2024 (UTC)
- OK it's a soft-redirect -> redirect -> destination: Any URL that contains "/id/", extract the ID and convert to "https://www.msnbc.com/id/{id}/" -- thus
http://today.msnbc.msn.com/id/43584191/ns/today-today_people/t/monaco-palace-releases-guest-list-royal-wedding/
converts tohttps://www.msnbc.com/id/43584191/
.. then follow the redirect to https://www.nbcnews.com/id/wbna43584191 -- GreenC 03:56, 2 August 2024 (UTC)
- OK it's a soft-redirect -> redirect -> destination: Any URL that contains "/id/", extract the ID and convert to "https://www.msnbc.com/id/{id}/" -- thus
- Whoops. That is supposed to be here. MrLinkinPark333 (talk) 01:17, 2 August 2024 (UTC)
Enwiki:
- Checked 3,616 pages and edited 1,288 pages. Converted 1 templates. Moved 725 links to a new URL. Removed 4
{{dead link}}
templates. Added 291{{dead link}}
. Switched 661|url-status=dead
to live. Switched 20|url-status=live
to dead. Added 182 archive URLs (132 Wayback). Changed 213 citation metadata fields.
IABot DB:
- About 17,000 links. Updated about 12,500 links which will propagate to 300+ wikis via IABot. -- GreenC 01:51, 3 August 2024 (UTC)
Done
nbcnews.com/id
Hello. NBC News links with /id/ in the URL redirect to new links. For example, this goes here for General Electric. However, this not always work:
- Keeping only the id number sometimes makes a valid redirect: changing this to that goes to here for Chicken or the egg.
- However, keeping only the id in the URL doesn't always work. Making this into that redirects to a 404 for Legality of euthanasia. The new URL is here and does not match up. I think it would be better to find archived copies for these pages that redirects to 404s as I can't predict the new URL.
- Also, at times links will give a "Something Went Wrong" error but still work after refreshing the page. This happened to me after changing this to the new URL for David Yalof.
~7250. Any links with /id/wbna after the above msnbc request above can be ignored as they will be already fixed.
Thanks! MrLinkinPark333 (talk) 00:18, 31 July 2024 (UTC)
- User:MrLinkinPark333, for the "Something Went Wrong", I tried the example and it never loads after repeat refresh. A header check returns "HTTP/1.1 500 Internal Server Error". 500 is a generic error code when no more specific error code is available. I tried with a proxy sock IP (VPN) and it returns 206, which is sort of like saying it's a partial shipment, only one data segment arrived, more typical of large data files or video files. These are weird responses both are rare. The archive version (few days ago) is of a normal news article. I think the conservative solution is treat them as dead for now until NBC works out whatever went wrong. I'll test and see what percentage are like this. -- GreenC 02:18, 3 August 2024 (UTC)
- 24% of the links are "Something Went Wrong". 1,767 out of 7,423 .. the others converted successfully. Retries after hours pause makes no difference. Now the proxy does not work either. I don't have much option but consider them dead links. If this problem lifts in the future it can be reprocessed (note to self: find links in project nbcnewscom.0001-8263 with "grep 'Went Wrong' syslog"). -- GreenC 14:55, 3 August 2024 (UTC)
- I didn't realize so many of them would not work. It makes sense to have archived copies now, even if temporarily. MrLinkinPark333 (talk) 15:59, 3 August 2024 (UTC)
- 24% of the links are "Something Went Wrong". 1,767 out of 7,423 .. the others converted successfully. Retries after hours pause makes no difference. Now the proxy does not work either. I don't have much option but consider them dead links. If this problem lifts in the future it can be reprocessed (note to self: find links in project nbcnewscom.0001-8263 with "grep 'Went Wrong' syslog"). -- GreenC 14:55, 3 August 2024 (UTC)
- Enwiki: Checked 8,263 pages and edited 6,637 pages. Converted 1 templates. Moved 5,660 links to a new URL. Removed 2
{{dead link}}
templates. Added 387{{dead link}}
. Switched 50|url-status=dead
to live. Switched 320|url-status=live
to dead. Added 2,072 archive URLs (1,979 Wayback). Changed 230 citation metadata fields.
Done -- GreenC 16:01, 3 August 2024 (UTC)
onlinelibrary.wiley.com
All links (that I have checked) starting with https://onlinelibrary.wiley.com/store/ seems to be dead. Replacing them to start with https://onlinelibrary.wiley.com/doi/ instead, seem to make those links to work (example).
Perhaps more URLS to Wiiley with other paths has died but can be saved if replacing the path (in above example /store/
) with /doi/
. Mind checking? Jonatan Svensson Glad (talk) 23:29, 31 July 2024 (UTC)
Jonatan Svensson Glad, the site uses CloudFlare bot protection. I can't verify if the new URL is live/dead or redirects. Because there are so few, and this seems like it should work, I'll do a blind move. Worst case, I can change them back to /store/. -- GreenC 18:30, 3 August 2024 (UTC)
- Checked 111 pages and edited 111 pages. Moved 123 links to a new URL. Removed 9
{{dead link}}
templates. Switched 3|url-status=dead
to live. Added 1 archive URLs (1 Wayback).
Done -- GreenC 18:49, 3 August 2024 (UTC)
Jonatan Svensson Glad: On a related note, I spot-checked the edits, and in all cases they were part of citation templates where there was a |doi=
parameter that also goes to the same target. Given these |url=
point to the content via their DOI, cite-template docs advise against including the URL at all. There are about 16k links to wiley.com/doi URLs and some do not have separate DOI fields, so it would be a harder bot task to fix them. DMacks (talk) 19:10, 3 August 2024 (UTC)
- Can Citation bot fix these? I recall it removed URLs when there is a duplicate identifier URL, but it was also controversial in some way, and can't recall how it settled. -- GreenC 19:18, 3 August 2024 (UTC)
- If there is a PMC link (which is open access) or
|doi-access=free
, then Citation bot removes the URL to some specific domains but not all, unsure which specific domains though. This since, the title will use the PMC or free DOI ink instead. Jonatan Svensson Glad (talk) 19:39, 4 August 2024 (UTC)
- If there is a PMC link (which is open access) or
gameinformer.com
https://www.gameinformer.com
& https://gameinformer.mydigitalpublication.com
- Kotaku just highlighted that GameStop killed Game Informer. Looks like the articles are redirecting to the front page farewell message. For example, these two sources are dead (both are archived):
- https://www.gameinformer.com/2020/11/05/dragon-age-4-theory-solas-red-lyrium-and-blight-ambitions (format for website articles)
- https://gameinformer.mydigitalpublication.com/publication/?i=824318 (format for the magazines)
Thanks! Sariel Xilo (talk) 17:55, 2 August 2024 (UTC)
- Just doubled checked that magazine example and while it was archived a few times, the magazine doesn't appear to load & just shows a spinning waiting icon. So those might be total dead links if the Internet Archive copies don't work. Sariel Xilo (talk) 18:13, 2 August 2024 (UTC)
- I was about to post this website to here. Notrealname1234 (talk) 21:47, 2 August 2024 (UTC)
- Sariel Xilo, I guess it won't matter for gameinformer.mydigitalpublication.com because there are only 2 pages .. gameinformer.com has over 6,000 pages. -- GreenC 23:09, 2 August 2024 (UTC)
I'm assuming every link in the domain is functionally dead. I'm not verifying that assumption, because they use JavaScript redirects, which I can't detect, thus every page appears to be status 200 (live) which is actually a soft-404 to an end-of-life page. If after the bot is done anyone sees a problem with a link still live but marked dead, I can investigate and redo those links. -- GreenC 01:06, 4 August 2024 (UTC)
- Checked 6,484 pages and edited 4,349 pages. Added 58
{{dead link}}
. Switched 3,867|url-status=live
to dead. Added 3,182 archive URLs (3,024 Wayback). Changed 75 citation metadata fields.
- Thanks! Sariel Xilo (talk) 17:16, 4 August 2024 (UTC)
- User:Sariel Xilo, I forgot to load IABot's database with archive URLs. I did set the domain status to "permadead" at iabot.org, but IABot can't discover archive.today links which make up a sizeable portion of available archives. Once finished the Highway Administration site below I'll return to this. There are 3,400 unique URLs. -- GreenC 20:04, 4 August 2024 (UTC)
- Added to IABot database.
Done -- GreenC 14:50, 5 August 2024 (UTC)
fhwa.dot.gov
Links to many, but not all, pages under http://www.fhwa.dot.gov/environment, http://www.fhwa.dot.gov/planning/, and http://www.fhwa.dot.gov/hep10, are dead.
http://www.fhwa.dot.gov/reports/routefinder/ is also a redirect. RajanD100 (talk) 19:30, 3 August 2024 (UTC)
- Well, their 404 page is misconfigured to return status 200 (live), example. I'll need to download every URL and web scrape for key words. This kind of basic problem with website management portends other more difficult ones. There are 3,000 pages (articles) on Wikipedia with this domain. -- GreenC 17:53, 4 August 2024 (UTC)
Enwiki in two batches:
- Batch 1: Checked 1,000 pages and edited 738 pages. Moved 718 links to a new URL. Added 3
{{dead link}}
. Switched 12|url-status=dead
to live. Switched 11|url-status=live
to dead. Added 196 archive URLs (191 Wayback). Changed 76 citation metadata fields. - Batch 2: Checked 2,000 pages and edited 1,579 pages. Moved 1,582 links to a new URL. Added 6
{{dead link}}
. Switched 28|url-status=dead
to live. Switched 19|url-status=live
to dead. Added 483 archive URLs (469 Wayback). Changed 179 citation metadata fields.
IABot DB:
- Checked about 2,000 unique URLs and modified about 400 which will propagate to 300+ wikis via IABot.
Done -- GreenC 17:34, 5 August 2024 (UTC)
ts.fi
I noticed that some of the Turun Sanomat URLs result in a 404 error, but they seem to be easily fixable:
- Ostrobothnians:
https://www.ts.fi/puheenvuorot/1073936480/Suomen+heimojen+peruspiirteet
gives a 404, but if you remove everything after the number ID, including the last/
(as highlighted in red in the previous URL), the URL works again:https://www.ts.fi/puheenvuorot/1073936480
- Night Visions (film festival):
http://www.ts.fi/kulttuuri/1073969115/Night+Visions+laajeneekolmipaivaiseksi
->http://www.ts.fi/kulttuuri/1073969115
- Jukka Kalso:
https://www.ts.fi/urheilu/1073750270/Soinisen+paa+kestaa
->https://www.ts.fi/urheilu/1073750270
There are approximately a hundred of these: 116 results (probably includes some false positives, i.e. archived URLs). --JAAqqO (talk) 22:49, 4 August 2024 (UTC)
- There is more, for example http://www.ts.fi/uutiset/talous/590113/Artekille+myos+Littoisten+Korhosen+tehdas becomes http://www.ts.fi/uutiset/590113 -- GreenC 17:53, 5 August 2024 (UTC)
- More: https://www.ts.fi/urheilu/jalkapallo/liiga/1207968074/Interin+hyokkaaja+debytoi+liigassa+vanhaa+seuraansa+vastaan --> https://www.ts.fi/urheilu/1207968074
- In one case out of 75, did not work: http://www.ts.fi/mielipiteet/paakirjoitukset/1073950477/Odotettu+fuusio+selkeyttaaSuomen+telakoiden+tilannetta -- GreenC 18:03, 5 August 2024 (UTC)
Enwiki: Checked 452 pages and edited 321 pages. Moved 315 links to a new URL. Added 15 {{dead link}}
. Switched 29 |url-status=dead
to live. Added 31 archive URLs (23 Wayback). Changed 92 citation metadata fields.
Done -- GreenC 19:07, 5 August 2024 (UTC)
- That was fast, thank you. I checked about 50 affected articles on my watchlist, and all the new ts.fi URLs now work in those articles. However, I noticed one problematic edit, but I believe I found the rest of the erroneous edits, as they all appeared to be URLs with unusual characters (colons, semicolons, question marks, commas): edit #1, #2, #3, #4, #5. I found working URLs for them by checking the edit histories (except for this one that seems to be permanently dead), so everything should be good now. Thanks again. --JAAqqO (talk) 20:52, 5 August 2024 (UTC)
- Ah yes those URLs I came across and intentionally re-routed to the home page because they were redirecting there anyway as soft-404s (WP:LINKROT#Glossary) and they looked like errors anyway. These are in fact soft-redirects, which requires foreknowledge or search and discovery to determine the correct destination. -- GreenC 23:36, 5 August 2024 (UTC)
cdc.gov
CDC recently overhauled their website. Many links now have this interstitial saying the page has moved while linking to the new one. For example: https://www.cdc.gov/niosh/topics/motorvehicle/ -- in defiance of standards, that URL returns a 404 instead of a 301
-- GreenC 18:15, 5 August 2024 (UTC)
On hold - pending how to retrieve the redirect URL. -- GreenC 18:41, 5 August 2024 (UTC)
uk.businessinsider.com
This link from Antony Jenkins doesn't work unless you remove the uk from the url:
- https://uk.businessinsider.com/barclays-antony-jenkins-fintech-startup-10x-future-technologies-core-banking-2016-10
- E
- https://www.businessinsider.com/barclays-antony-jenkins-fintech-startup-10x-future-technologies-core-banking-2016-10
Bonus Person (talk) 17:09, 8 August 2024 (UTC)
- Enwiki: Checked 653 pages and edited 638 pages. Moved 697 links to a new URL. Added 2
{{dead link}}
. Switched 36|url-status=dead
to live. Switched 5|url-status=live
to dead. Added 20 archive URLs (18 Wayback). Changed 5 citation metadata fields. - IABot: set domain to permadead
Done -- GreenC 04:07, 13 August 2024 (UTC)
cartoonnetwork.com
https://www.cartoonnetwork.com
is dead & now redirects "to a landing page on Max" per Variety. Just under 250 articles use it as a source: 247 results. Sariel Xilo (talk) 16:22, 9 August 2024 (UTC)
- Enwiki: Checked 262 pages and edited 120 pages. Added 1
{{dead link}}
. Switched 36|url-status=live
to dead. Added 85 archive URLs (80 Wayback). Changed 60 citation metadata fields. - IABot: set to permadead
Done -- GreenC 04:46, 13 August 2024 (UTC)
apps.ehsni.gov.uk
Looks like we have a soft-redirect from http://apps.ehsni.gov.uk/ambit/Details.aspx?MonID=8572 to https://apps.communities-ni.gov.uk/NISMR-PUBLIC/Details.aspx?MonID=8572. Checking a smattering of links from List of castles in Ireland this seems to redirect to the proper place consistently (i.e. the few links I've checked, changing "http://apps.ehsni.gov.uk/ambit" to "https://apps.communities-ni.gov.uk/NISMR-PUBLIC" has worked). GrapesRock (talk) 17:49, 25 June 2024 (UTC)
- Hi User:GrapesRock: Looks like these exist on 4 pages. Can you repair them? It will be a lot easier than programming a fix. -- GreenC 16:16, 1 July 2024 (UTC)
- Yup, done. For the future, is there any value for posterity in adding posts here for links that only have a smattering of pages or should I just fix 'em? GrapesRock (talk) 16:50, 1 July 2024 (UTC)
- It's hard to say because it depends what work is involved making the fix. I've seen cases where 5 pages can take a long time to figure out manually and better done by bot. To setup the bot, compile, generate a list of target pages, run the bot, check for errors, upload diffs .. it's like 10 or 15 minutes for a small run. If you can do it faster than that manually, go for it. But even for simple cases, if it's more than around 20 pages don't hesitate to ask for bot help. -- GreenC 18:36, 1 July 2024 (UTC)
- Yup, done. For the future, is there any value for posterity in adding posts here for links that only have a smattering of pages or should I just fix 'em? GrapesRock (talk) 16:50, 1 July 2024 (UTC)
Done -- GreenC 05:05, 16 August 2024 (UTC)
prweb.com
Hello. Some links on the prweb.com website are now dead. This article from Nancy O'Dell, along with this one from Meryl Streep and this article from Birmingham, all lead to a 404 redirect. 2,952 articles use it as a source. I think we should have the dead links looked at. Lord Sjones23 (talk - contributions) 22:41, 13 August 2024 (UTC)
- Enwiki: Checked 2,993 pages and edited 2,721 pages. Moved 1,274 links to a new URL. Resolved 4 soft-404s. Removed 1
{{dead link}}
. Added 89{{dead link}}
. Switched 14|url-status=dead
to live. Switched 131|url-status=live
to dead. Added 1,503 archive URLs (1,386 Wayback). Changed 224 citation metadata.
- IABot DB: Updated about 3,000 unique links which will propagate to 300+ wikis via IABot
Done -- GreenC 03:31, 16 August 2024 (UTC)
smmsport.com
Smmsport.com appears to have been usurped by an online gambling operation masquerading as the original site. Some links, such as [1] and [2], appear to still work and are intact with their original content, while others return 404 errors. But anything linked from the home page is fake. --Paul_012 (talk) 11:09, 29 July 2024 (UTC)
- User:Paul 012: 400+ pages. I'm not seeing gambling pages. Can you find examples? -- GreenC 15:27, 29 July 2024 (UTC)
- They're somewhat insidiously inserted into the first top navigation menu. [3] for example is a link farm advertising gambling sites. --Paul_012 (talk) 15:34, 29 July 2024 (UTC)
- Ahh I see. This is somewhat unusual case of WP:USURPSOURCE. Probably we need an edit filter to prevent editors from adding more links they believe are legitimate, but actually insidious spam (ie. MediaWiki_talk:Spam-blacklist#Proposed_additions). And the existing links usurped by WaybackMedic (ie. this URLREQ). As the primary discoverer, can you make the Spam Blacklist request? -- GreenC 16:54, 29 July 2024 (UTC)
- I added it to the usurpation queue for WaybackMedic Special:Diff/1236486118/1237406269 -- GreenC 16:58, 29 July 2024 (UTC)
- Thanks. I'm not sure about blacklisting, as their old articles could still be useful references. Also, upon closer look, it seems the situation looks more like a hijacking rather than usurpation? Checking the Wayback Machine, the last good version of the home page was archived on 2023-08-13, before the site went down and showed a domain for sale notice. It came back on 2024-06-15, appearing mostly the same as it last did, but by the next archival on 2024-07-02 the gambling links had been inserted into the navigation menu, and the articles linked from the home page had been altered to show a date of 23 May 2024. --Paul_012 (talk) 14:27, 30 July 2024 (UTC)
- The spam blacklist prevents adding new links. Since they appear to have legitimate content, this is a problem editors unknowingly adding new links into Wikipedia, that they found with Google or whatever. It is a classic case of WP:USURPSOURCE. It really needs to be blocked. The old links will be kept and converted to usurped ie. changed to archive URLs, and the source URL no longer hot linked. -- GreenC 14:45, 30 July 2024 (UTC)
- Block request: MediaWiki_talk:Spam-blacklist#smmsport.com -- GreenC 15:54, 31 July 2024 (UTC)
- The spam blacklist prevents adding new links. Since they appear to have legitimate content, this is a problem editors unknowingly adding new links into Wikipedia, that they found with Google or whatever. It is a classic case of WP:USURPSOURCE. It really needs to be blocked. The old links will be kept and converted to usurped ie. changed to archive URLs, and the source URL no longer hot linked. -- GreenC 14:45, 30 July 2024 (UTC)
- Thanks. I'm not sure about blacklisting, as their old articles could still be useful references. Also, upon closer look, it seems the situation looks more like a hijacking rather than usurpation? Checking the Wayback Machine, the last good version of the home page was archived on 2023-08-13, before the site went down and showed a domain for sale notice. It came back on 2024-06-15, appearing mostly the same as it last did, but by the next archival on 2024-07-02 the gambling links had been inserted into the navigation menu, and the articles linked from the home page had been altered to show a date of 23 May 2024. --Paul_012 (talk) 14:27, 30 July 2024 (UTC)
- They're somewhat insidiously inserted into the first top navigation menu. [3] for example is a link farm advertising gambling sites. --Paul_012 (talk) 15:34, 29 July 2024 (UTC)
Done - Bot Results: Batch #13 -- GreenC 14:30, 26 August 2024 (UTC)
fortblissbugle.com
fortblissbugle.com has been usurped by a gambling website. One example is http://fortblissbugle.com/german-air-force-train-at-fort-bliss/ from Fort Bliss.
While, this claims that it's moved to an army website, that website's news archive only goes back to October 24, 2019, a week before the fortblissbugle went offline. Just searching a handful of titles, I can't find anywhere where individual stories are hosted.
46 pages GrapesRock (talk) 18:53, 30 July 2024 (UTC)
- Page says JUDIKING88 at the top. Judi is Indonesian for gambling. Part of the global judi empire. Added to WP:JUDI for later usurpation Special:Diff/1237845244/1238103384 -- GreenC 04:19, 2 August 2024 (UTC)
Done - Bot Results: Batch #13 -- GreenC 14:30, 26 August 2024 (UTC)
emporis.com
Last processed Sept 2022. Many {{dead links}}
added. Since then, archive.today added archives, previously unavailable: Special:Diff/1220029968/1240218179. Re-process cites with dead links (emporis3.auth) -- GreenC 05:38, 14 August 2024 (UTC)
- The domain is technically usurped (ie. Emporis). Has 6,000 pages. Will fix in three steps: 1. add archive URLs on enwiki, as a normal dead domain. 2. Same with IABot DB. 3. Later, usurpify everything in a WP:JUDI batch. -- GreenC 03:50, 16 August 2024 (UTC)
- Step 1: Enwiki: Checked 5,979 pages and edited 1,550 pages. Added 430
{{dead link}}
. Switched 265|url-status=live
to dead. Added 1,412 archive URLs (136 Wayback). Changed 1,569 citation metadata. - Step 2: IABot DB: Checked 24,000 links. Updated 23,520 links (set permadead and added new archive URLs). Changes will propagate to 300+ wikis via IABot.
- Step 3: Enwiki: usurpify via JUDI batch. Done - Bot Results: Batch #13 -- GreenC 14:29, 26 August 2024 (UTC)
caspianenvironment.org
Shows a page which relates to Car finance in Australia! I believe I have found and changed all instances in the Articlespace, but placed here in case not! Big Blue Cray(fish) Twins (talk) 16:05, 15 August 2024 (UTC)
- Thanks Big Blue Cray(fish) Twins: that's a usurped site. I added it to the list Special:Diff/1239726206/1240485275 .. it will get special handling during a future batch job. -- GreenC 16:18, 15 August 2024 (UTC)
Done - Bot Results: Batch #13 -- GreenC 14:29, 26 August 2024 (UTC)
erenow.com
Usurped by gambling (e.g. https://erenow.com/postclassical/the-fears-of-henry-iv-the-life-of-englands-king from Wars of the Roses). For pretty clear cut cases like this, can I just add it to the WP:JUDI list directly?
Only 20 pages. GrapesRock (talk) 15:35, 17 August 2024 (UTC)
- Yes, please! -- GreenC 16:41, 17 August 2024 (UTC)
Done - Bot Results: Batch #13 -- GreenC 14:26, 26 August 2024 (UTC)
www-03.ibm.com
It looks like multiple URLs in this domain soft-404. I'm not sure if there are any that don't. Can some or all of these URLs be marked as dead? I marked www-03
- It might be the best solution is the entire www-03 is dead. Will take a look. -- GreenC 22:59, 22 August 2024 (UTC)
- Thank you! McYeee (talk) 00:21, 23 August 2024 (UTC)
- I think that the domain might have moved rather than being taken offline. That file is available at public
.dhe .ibm .com /software /globalization /gcoc /attachments /CP00850 .pdf. I'm not really sure what should be done here. McYeee (talk) 19:41, 23 August 2024 (UTC) - i think moving all www-03 references to the public.dhe domain should work. Notrealname1234 (talk) 20:34, 23 August 2024 (UTC)
- Note that it's not as simple as replacing www-03 with public.dhe. McYeee (talk) 20:38, 23 August 2024 (UTC)
- Thanks. I'll test for that soft-redirect rule, hunt for ghost redirects, filter for soft-404s, and crunchy-404s (WP:LINKROT#Glossary). IBM.com is notoriously complicated. -- GreenC 04:14, 24 August 2024 (UTC)
- Note that it's not as simple as replacing www-03 with public.dhe. McYeee (talk) 20:38, 23 August 2024 (UTC)
- i think moving all www-03 references to the public.dhe domain should work. Notrealname1234 (talk) 20:34, 23 August 2024 (UTC)
- I didn't find a good way to makes these live. The one method Notrealname1234 found worked for some of those PDF files ("systems_i_software_globalization_pdf"), not all. However that same method is good for ftp:// links noted in the next section below, because those links are not on the web (FTP protocol with no https access), and for that reason they have no archives available. Converting to https:// will be a big win. -- GreenC 20:02, 25 August 2024 (UTC)
- Enwiki Checked 724 pages and edited 642 pages. Moved 15 links to a new URL. Added 15
{{dead link}}
. Switched 13|url-status=dead
to live. Switched 78|url-status=live
to dead. Added 1,258 archive URLs (1,207 Wayback). - IABot DB - checked aprox 2,000 unique URLs. Changes will propagate to 300+ wikis.
Done -- GreenC 00:11, 26 August 2024 (UTC)
ftp:ftp.software.ibm.com
These can be replaced with https://public.dhe.ibm.com so long as the new URL is verified working.
- ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP01101.pdf --> https://public.dhe.ibm.com/software/globalization/gcoc/attachments/CP01101.pdf
120 pages. -- GreenC 19:04, 25 August 2024 (UTC)
- Enwiki: Checked 120 pages and edited 115 pages. Moved 206 links to a new URL. Removed 13
{{dead link}}
. Added 69{{dead link}}
. Switched 2|url-status=dead
to live. Added 1 archive URLs (0 Wayback).
Done -- GreenC 01:52, 26 August 2024 (UTC)
articles.cnn.com
This is a mess of a domain where some things redirect and some things don't,,, I've found some patterns that work at least some of the time
More generally: http://articles.cnn.com/YYYY-MM-DD/EXT/WORDS.WITH.DOTS_1_WORDS-WITH-DASHES?_s=PM:THING
Goes to: https://www.cnn.com/YYYY/THING/MM/DD/WORDS.WITH.DOTS/index.html
words with dots and with years examples
|
---|
http://articles.cnn.com/2001-09-16/us/inv.binladen.denial https://edition.cnn.com/2001/US/09/16/inv.binladen.denial/ |
More generally: http://articles.cnn.com/YYYY-MM-DD/ext/WORDS.WITH.DOTS
Goes to: https://edition.cnn.com/YYYY/EXT/MM/DD/WORDS.WITH.DOTS/
Similarly, you do the same thing if there's words with dashes (you can treat the URL as if it doesn't have anything after the _1_), such as in:
Those were the ones that I could find a somewhat consistent pattern for. Here's two where I couldn't quite, but I think somewhat of a pattern exists.
1467 pages GrapesRock (talk) 17:52, 26 August 2024 (UTC)
- In "example of words with dashes" .. I went ahead and programmed the rule, but the given example does not work ie.: https://edition.cnn.com/2011/WORLD/09/12/yemen.saleh.power.transfer/ .. hopefully the others will?
- I'll probably skip the Miscellany for now and see what is left when done the others. -- GreenC 16:44, 27 August 2024 (UTC)
- Hm, http://www.cnn.com/2011/WORLD/meast/09/12/yemen.saleh.power.transfer/index.html is where it is now which isn't helpful for generalization. Either I miscopied (a distinct possibility), or they changed up their domain again. Alas. GrapesRock (talk) 16:53, 27 August 2024 (UTC)
- It's OK the bot will try and skip any not working. Also finding some have a ghost redirect:
- -- GreenC 18:09, 27 August 2024 (UTC)
- Hm, http://www.cnn.com/2011/WORLD/meast/09/12/yemen.saleh.power.transfer/index.html is where it is now which isn't helpful for generalization. Either I miscopied (a distinct possibility), or they changed up their domain again. Alas. GrapesRock (talk) 16:53, 27 August 2024 (UTC)
- GrapesRock, to call this "done" is not accurate because there is probably more that could be done by searching and evaluation. Nevertheless, I'm going to mark it done for now and move on to other projects. If you discover other rules, I can undue the done tag and keep going. This is as you said initially a messy domain, like water from a stone, the "easy" ones are fixed and what remains is pretty difficult. -- GreenC 16:10, 28 August 2024 (UTC)
- Enwiki - Checked 1,469 pages and edited 557 pages. Moved 359 links to a new URL. Resolved 112 ghost redirects. Resolved 1 soft-404s. Removed 4
{{dead link}}
. Added 24{{dead link}}
. Switched 245|url-status=dead
to live. Switched 20|url-status=live
to dead. Added 198 archive URLs (117 Wayback). Changed 5 citation metadata.
Done -- GreenC 16:10, 28 August 2024 (UTC)