Wikipedia talk:Controlling search engine indexing
This project page does not require a rating on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
noindex on new, unpatrolled pages
editI was asked in the help channel about a page being noindexed, and it turns out (according to Phabricator) that new, unpatrolled pages are noindexed by default. It might be worth putting on this page, but at the very least, I wanted to put it on the talk page so people might be able to find it. --MarkTraceur (talk) 14:02, 25 October 2016 (UTC)
- I've mentioned in the lead; it was already mentioned in a section. fgnievinski (talk) 02:35, 22 September 2023 (UTC)
Possible updates needed
editThe following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
- (copied from User_talk:Xaosflux)
- NOINDEX in mainspace
I was reading through the latest NPP RfC and related threads and noticed something I'm apparently confused about. I figure you'd probably know :) Much has been made of the possible dangers of "bad" new articles getting indexed by search engines. Speedy tags apply {{NOINDEX}}, but is it actually the case that new articles are not indexed until patrolled? I don't believe that was true last year (last time I had a reason to notice), but I may be behind. WP:NOINDEX seems to be out of date. I tried to test it by creating an article with my sock, but was lucky enough that it was patrolled quickly. (Yet the article still isn't indexed on google, and others I created today with my main account are... unfortunately I don't have enough articles stored up for any more sampling :) Opabinia regalis (talk) 06:59, 23 November 2016 (UTC)
- @Opabinia regalis: (no)indexing isn't always an exact science. Since October 2016 (c.f. phab:T147544) new articles get an html attribute applied
<meta name="robots" content="noindex,nofollow"/>
until they are patrolled (or until 90 days go by, I haven't tested that). This attribute is also applied if the article contains a deletion tag such as {{speedy}}. Note, this is similar, but not identical to the behavior that __NOINDEX__ uses. There is a configuration parameter$wgExemptFromUserRobotsControl
that prevents the INDEX, NOINDEX magic words from overriding the namespace default (which for Namesapce:0 (Article) is set to INDEX). This is to prevent vandals from NOINDEX'ing random pages out of search. The noindex meta tab is merely a request to web crawlers - Google generally honors these - but some search engine may not. Finally, being available for indexed doesn't require or "push" a notice to all of the search providers of the world - it is up to them to fetch and index a page - sometimes this is fast, sometimes it takes a long time. Hope this helps. — xaosflux Talk 14:06, 23 November 2016 (UTC)- Yeah, I know it's not a push notification that pops up immediately; it's just very noticeable that I created three articles yesterday on very similar topics, and the two created by this account were indexed immediately, but the one that needed "patrolling" is still absent. Hardly statistically significant, but I hadn't given it much thought because most of my articles are on very obscure topics and they usually pop up in google searches for the title near-instantly. I suppose we'll have to at least update the boilerplate for autopatrolled - the conventional wisdom is that the user right doesn't benefit the holder, but does benefit others by saving them some work; that's clearly no longer true if we assume that people create new articles because they want others to find and read them.
- Anyway, thanks for the phab link, that's what I was looking for. Opabinia regalis (talk) 20:34, 23 November 2016 (UTC)
- Opabinia regalis I can't find the documentation - but I hear that google does follow our new pages feed - but that non-autopatrolled page (autopatrol is included in your sysop group) would have been skipped - so now it would have to wait to get spider-indexed. I'm assuming you are referring to New Jersey polyomavirus. I pulled the source on it, and it is not (now) flagged for noindex. I made a minor edit on it, that may help kickstart indexing on an external site. Please note, none of this behavior has changed due to removing the patrol behavior from autoconfirmed users - non-autopatrolled editors woudl still have needed someone else to mark their page as patrolled. This indexing behavior is likely different due to the October software update. — xaosflux Talk 20:49, 23 November 2016 (UTC)
- Opabinia regalis FYI - I used google webmaster tools to submit a request to crawl that page now - and now it is the #2 search result: google-result-here. — xaosflux Talk 20:53, 23 November 2016 (UTC)
- p.s. google has massive caches - I got it to show me that result, but when reloading its not up - their index will take a little time to replicate. — xaosflux Talk 20:57, 23 November 2016 (UTC)
- Ahhh, if google is directly following the new pages feed then that would make sense. Thanks, I see it now! (The other two articles are MW polyomavirus and STL polyomavirus, which were autopatrolled and indexed right away.) Opabinia regalis (talk) 21:26, 23 November 2016 (UTC)
- p.s. google has massive caches - I got it to show me that result, but when reloading its not up - their index will take a little time to replicate. — xaosflux Talk 20:57, 23 November 2016 (UTC)
- Opabinia regalis FYI - I used google webmaster tools to submit a request to crawl that page now - and now it is the #2 search result: google-result-here. — xaosflux Talk 20:53, 23 November 2016 (UTC)
- Opabinia regalis I can't find the documentation - but I hear that google does follow our new pages feed - but that non-autopatrolled page (autopatrol is included in your sysop group) would have been skipped - so now it would have to wait to get spider-indexed. I'm assuming you are referring to New Jersey polyomavirus. I pulled the source on it, and it is not (now) flagged for noindex. I made a minor edit on it, that may help kickstart indexing on an external site. Please note, none of this behavior has changed due to removing the patrol behavior from autoconfirmed users - non-autopatrolled editors woudl still have needed someone else to mark their page as patrolled. This indexing behavior is likely different due to the October software update. — xaosflux Talk 20:49, 23 November 2016 (UTC)
DuckDuckGo Zero-click box showing content of no-index pages
editFYI: [1]. TigraanClick here to contact me 17:50, 4 January 2017 (UTC)
Indexing in user space
editI have raised a question here about permitting particular pages in user space to be indexed: Noyster (talk), 20:50, 28 February 2017 (UTC)
On userpages
editSuppose we were to put in https://en.wikipedia.org/robots.txt
Disallow: /wiki/User:
Would that serve as a deterrent to people's putting their resumes, etc. in userspace? St. claires fire (talk) 22:29, 12 April 2017 (UTC)
- Would this mean that no page in user space could be indexed? If so, the discussion here has made it abundantly clear that many established editors value the facility to invoke indexing of their user pages and would not accept its removal: Noyster (talk), 08:53, 13 April 2017 (UTC)
Automatic article indexing
editWhat was the reasoning behind the 30-day indexing? Is it a holdover from before NPP, as a way to stop new junk being indexed? I would think indexing when an article is patrolled should be sufficient now.
The people who care most about indexing are those who are NOTHERE and creating articles for promotional purposes. Their articles should not be indexed at all. If their stuff manages to fall through the cracks and not get seen at creation, it would in effect be "accepted" automatically after sitting around for a month, which isn't good. With the huge backlog of pages to patrol that we have now, an article not being looked at for 30 days isn't unlikely, and basically the effect is that NPP is made ineffective by there being a big backlog. Apart from promotion, other poor articles will also show up on search engines just by hanging around.
So... Do we really need that 30-day thing? If not, how can we get rid of it?
Yeryry (talk) 16:08, 13 April 2017 (UTC)
- The current wait appears to actually be 90 days
- Thompcha (talk) 20:11, 1 April 2021 (UTC)
BLP talkpage Archives are still indexed
editHi. If I understand correctly from Wikipedia:Controlling search engine indexing#Biographies of Living Persons talkpage noindexing, article talkpages should be marked as noindex. However, I've noticed that /Archive
pages are still indexed in some search engines (not google) (random example) because they don't have {{BLP}} template on them. I'm not sure how to suggest fixing this, but hope someone here can determine a simple solution. Thanks, Quiddity (WMF) (talk) 00:35, 4 December 2018 (UTC)
Possible bug
editIf you are viewing a source of a Wikipedia page after logging in, it is not displaying robots file. However if you are seeing the page without login in, it shows robots file with nofollow noindex tags. Here is an example where I noticed this bug Ruslan Baisarov. When I am logged into my account I cannot robots file however in a public view, there is a file. The page is also well over 90 days and was reviewed also. Meeanaya (talk) 06:01, 22 October 2019 (UTC)
Knowledge panels are picking up unindexed articles
editFYI: I recently noticed that an article I created that has not yet been patrolled (and thus should not be indexed) has nevertheless shown up as a knowledge panel when I just did a Google search (it's not yet in the search results themselves, though). {{u|Sdkb}} talk 03:10, 18 September 2020 (UTC)
DYK nominations
editIt seems DYK noms are the latest in a long line of backend pages that are getting indexed despite not being reader-facing. I hope we do a top-to-bottom examination of our indexing practices at some point and stop displaying so many pages that aren't relevant in google searches. {{u|Sdkb}} talk 08:31, 16 May 2021 (UTC)
Build NOINDEX into templates?
editIs there a way to build __NOINDEX __ into templates? This is the template under consideration and the issue is that Google has started indexing the draft bios that are tagged with this template; they are all accessible from this page. Of course, we could tag them all individually but come the next election, it would be much safer if we could just build this into the template. So, is that possible? Schwede66 07:14, 27 July 2022 (UTC)