Wikipedia talk:WikiProject Short descriptions/Archive 1
This is an archive of past discussions about Wikipedia:WikiProject Short descriptions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
Talk pages
Short descriptions are sufficiently unfamiliar, that I think it would be good to automatically have a thread on the talk page when one has been added (at least to start with).
Where a short description template has been added to an article, it should be easy enough to get a bot to subst a template on the talk page, explaining what short templates are, saying how they can be seen, and presenting the text that has been added, making clear that it can be modified. Jheald (talk) 16:53, 16 February 2018 (UTC)
- At best, it could be very useful, and at worst is unlikely to do any harm.· · · Peter (Southwood) (talk): 13:20, 22 February 2018 (UTC)
Monitoring / Dashboard
It would be useful to have a rolling update of how many short templates have been added, perhaps broken out by Wikiproject, and/or by some subject classification based on Wikidata.
It would also be nice to be able to view what descriptions have been added for particular sorts of things, with perhaps the option to sort to show the most recent for a particular facet, and/or show recent diffs.
This perhaps needs a tool to watch the recent changes stream, and keep an off-wiki database of current descriptions, and diff ids for recent changes to them, to allow descriptions for groups of pages to be easily extracted and browsed. Jheald (talk) 17:05, 16 February 2018 (UTC)
- There is a bar showing percentage done, but the percentage is so small that you can't see any green yet.
- I have tried to do similar for a project (on Wikipedia:WikiProject Scuba diving, but haven't worked out how to extract the numbers for the intersection of Category:Articles with short description on mainspace pages and Category:WikiProject SCUBA articles on talkspace pages.
- If you know how to do these things, I encourage you to go ahead. Such tools will be useful. · · · Peter (Southwood) (talk): 13:33, 22 February 2018 (UTC)
A suggestion
I applaud this idea, but doubt I will have the time to contribute. A suggestion: work from DAB (including geodis and hndis) and name pages. They vary in quality, but a well-written one-line description on one of those could give you everything that's needed, and so get the numbers up very quickly. Narky Blert (talk) 15:40, 20 February 2018 (UTC)
- This is happening. I am working through disambiguation pages by category. · · · Peter (Southwood) (talk): 13:36, 22 February 2018 (UTC)
- About dabs, but also in general: Is there a reason these are going at the very top of pages (where they are most obtrusive to normal editing, and presumably also most liable to be vandalized) as opposed to at the bottom? Dekimasuよ! 20:45, 22 February 2018 (UTC)
- I think Dekimasu makes a good point there, placing them at the top of pages makes them an easy target for vandals especially ones with low view counts that could be undiscovered for a significant length of time. Kosack (talk) 08:14, 23 February 2018 (UTC)
- One issue is that where they are placed is where they will be made visible on desktop (for people who have the modified CSS to make them visible). If they are at the top, they will appear at the top (corresponding to how they appear on mobile; and closest to the material in the article they are most likely to draw on). If they are at the bottom, that won't be the case.
- The point about vandalism is important though. These are going to be high value / high visibility targets (to mobile readers), but low visibility to editors. The anti-vandal patrollers need to be aware of this, and an adjustment to tools may be needed to prioritise changes for investigation, and perhaps some specific AI support. Jheald (talk) 11:10, 23 February 2018 (UTC)
- Perhaps a tag to alert users of any changes made to the short description would be a good shout. Kosack (talk) 13:34, 23 February 2018 (UTC)
- I think Dekimasu makes a good point there, placing them at the top of pages makes them an easy target for vandals especially ones with low view counts that could be undiscovered for a significant length of time. Kosack (talk) 08:14, 23 February 2018 (UTC)
- About dabs, but also in general: Is there a reason these are going at the very top of pages (where they are most obtrusive to normal editing, and presumably also most liable to be vandalized) as opposed to at the bottom? Dekimasuよ! 20:45, 22 February 2018 (UTC)
- Why are dab pages being targeted individually and not in the dab templates? Both the general and specific ones could have a default descriptions, with an ability to override. IMHO, expecting {{short description}} to be explicitly placed on every page without transclusion seems an unwieldy and naive approach, but maybe this is be discussed on the various pages that address this topic. —Ost (talk) 23:11, 14 March 2018 (UTC)
- This suggestion has been made before. No-one has demonstrated yet that it will work, but that is probably because they are waiting for the magic word to be implemented so experimentation becomes possible. Everything is on hold, waiting for the developers to finish. · · · Peter (Southwood) (talk): 05:20, 15 March 2018 (UTC)
Magic word has been implemented
Someone who is a template editor can have a go at embedding the short description template in the disambiguation template and see if it works. · · · Peter (Southwood) (talk): 19:39, 29 March 2018 (UTC)
Best practice ?
It bothers me that people are now starting to add short descriptions, but there still seems to be no very good guidance as to what they should look like in particular standard cases, ie what we are ideally aiming for. It seems incredibly hard to get discussion going about this -- what short descriptions are considered good or not so good? Why? In the absence of any central steer, different editors are going to have very different ideas as to what to write, how much to include, what to leave out.
For example:
- Aspirin
- Wikidata : "chemical compound used to treat pain, fever, and inflammation"
- Template : "medication"
- Mikhail Yakovlevich Suslin
- Wikidata : "Russian mathematician"
- Template : "mathematician"
- Esbjörn Svensson
- Wikidata : "Swedish musician"
- Template : "Swedish jazz pianist and founder of the jazz group Esbjörn Svensson Trio"
- The Unpleasant Profession of Jonathan Hoag
- Wikidata : "short story by Robert A. Heinlein"
- Template : "short story"
Which of these is better? Why? Jheald (talk) 10:22, 22 February 2018 (UTC)
- I will toss my 2c in to start the ball rolling.
- In all of theses cases both short descriptions are acceptable, because they are an accurate representation of the article contents, but also all probably not optimum, though "short story by Robert A. Heinlein" may be as good as we are likely to get, as it is both concise and provides what I would consider the most important information.
- For Aspirin the Wikidata version is better. I would copy it over or try to come up with a better compromise - "medication used to treat pain, fever and inflammation" is a little shorter.
- Russian mathematician is better then just mathematician. Those two words are the core. a little more might be useful, but not necessary.
- I might consider trimming Esbjorn Svensson down, but I think it is more useful than just Swedish musician, but either is better than none and both are accurate representations of the article. I think that like articles, they will be improved as and when editors think of better versions, and as we get more experience with creating them, better guidance will be offered. To this end, I am creating them and occasionally improving them as I go. I expect that most will remain unchnged for years, a few will be argued about interminably and a moderate number will be improved considerably within a reasonably short time. As more editors notice that they are being created, they will think about what they should be and discuss them on the project page and sooner or later we will know what is needed and how to do it for most cases.
- I also think that the fastest way to get more discussion is to boldly add descriptions that you think are good enough or better, and wait for the responses. I use an edit summary "add Wikipedia:Short description" to encourage people to find out what they are for, and hopefully comment and make suggestions. That is one of the ways we do things on Wikipedia, and it often works. There may be a better way, but until someone suggests it, I think this way is a good enough start. Whatever we think of now will probably evolve into something different. Optimisation can be done whenever someone feels the urge, but it is less urgent than getting usable descriptions on pages. As the content of pages develops, the short descriptions may also have to be changed. Specialist gnomes will probably evolve to occupy this ecological niche. Cheers, · · · Peter (Southwood) (talk): 12:55, 22 February 2018 (UTC)
- Hmm. I hope you're right. For me one of the issues with the Wikidata descriptions is that they've almost certainly never had any systematic review, assessment, or quality control. They've been assembled piecemeal, without anyone taking a general overview, and without any comparative critique or discussion about what actually works best and is most suitable for their purpose(s).
Optimisation ... is less urgent than getting usable descriptions on pages.
Well, my hope for this process is that it would be a chance to level-up, a chance to define and quality-control and improve the standard of these short descriptions. I don't see much value in an all-out effort to add short descriptions here, if they're not going to be systematically better or more suitable than what's being served already. Jheald (talk) 11:25, 23 February 2018 (UTC)- Starting with good enough is better than waiting for perfect, because however much time we spend on it, there will be things that have to change. The trick is in defining what is good enough for now. That is the way Wikipedia has worked in the past, and one of the reasons it does work - it is flexible enough to fix and there is no deadline. The short descriptions will also be like that, even if we spend months trying to optimise the format, as the only way we will find out what is necessary and useful is to experiment by doing it. The ones we get wrong, we fix. The ones that turn out good we leave until someone works out how they can be better. By doing things now and accepting that some will be sub-optimum, it is likely that other users will see ways of improving them and do so. Getting people to notice what is happening is a big part of the job. People will not notice until there are short descriptions appearing on pages, then some will see what needs to be done and do it. Sysematic review of 5.5 million short descriptions is not going to happen anytime soon. They will get reviewed in the highly unsystematic Wiki way, and I think they will usually turn out fit for purpose within two or three iterations where they need to be improved. Of course you or someone else could come up with a better plan, but I don't think we need to wait. Cheers, · · · Peter (Southwood) (talk): 14:48, 23 February 2018 (UTC)
- Does no one else have an opinion? · · · Peter (Southwood) (talk): 13:16, 25 February 2018 (UTC)
- Starting with good enough is better than waiting for perfect, because however much time we spend on it, there will be things that have to change. The trick is in defining what is good enough for now. That is the way Wikipedia has worked in the past, and one of the reasons it does work - it is flexible enough to fix and there is no deadline. The short descriptions will also be like that, even if we spend months trying to optimise the format, as the only way we will find out what is necessary and useful is to experiment by doing it. The ones we get wrong, we fix. The ones that turn out good we leave until someone works out how they can be better. By doing things now and accepting that some will be sub-optimum, it is likely that other users will see ways of improving them and do so. Getting people to notice what is happening is a big part of the job. People will not notice until there are short descriptions appearing on pages, then some will see what needs to be done and do it. Sysematic review of 5.5 million short descriptions is not going to happen anytime soon. They will get reviewed in the highly unsystematic Wiki way, and I think they will usually turn out fit for purpose within two or three iterations where they need to be improved. Of course you or someone else could come up with a better plan, but I don't think we need to wait. Cheers, · · · Peter (Southwood) (talk): 14:48, 23 February 2018 (UTC)
Potential user rollout query
Hi, an interesting project idea. I'd be willing to help if I'm of use but would like some clarification first. I would look primarily at articles dealing with football which should be a pretty standard description between each page. Taking the first player from Category:English footballers for example, Arthur Aaron (footballer), would adding simply English footballer be an appropriate addition or would it need to be more descriptive? Perhaps English footballer who played for Stockport County? Also, would it actually be useful for me to work through potentially thousands of football articles or could a bot simply add English footballer to every player in that category? Kosack (talk) 10:57, 22 February 2018 (UTC)
- Hi Kosack, English footballer is a lot better than no description. There may be a better description - almost certainly there should be, and your counterexample looks like it would be - but perfect is the enemy of good enough, and good enough is the bar. If anyone prefers a better description, they can just do it at any time, these descriptions are like any other text in that anyone can edit them any time subject to content and behaviour policies. If you have the urge to do fewer but higher quality descriptions, no-one should stop you or complain that you should be doing otherwise. At some 19000 articles English footballers a fairly big category and will be a significant step forward (about one third of a percent). You could start with a bot run for the basic description and improve those where your enthusiasm takes you afterwards. Bot runs should go through the standard approval of course. Cheers, · · · Peter (Southwood) (talk): 13:49, 22 February 2018 (UTC)
- For comparison, this link gives current descriptions for 1000 UK footballers from Wikidata:
tinyurl.com/ya8rjg7w
- Mostly not much more informative than "English footballer", but there are a few variants there that might be worth considering.
- One question is whether it would be worth including dates of birth/death where we have them? Jheald (talk) 18:06, 22 February 2018 (UTC)
- As someone who has little knowledge of football and footballers I am mostly happy that there are other people who care about those things who will try to do a good job because it is important to them. Would you find the dates useful? Would clubs be more useful? Do we need to disambiguate in more detail than that they are English footballers? I would ask Kosack who is much more likely to search for a footballer than I am. I would have opinions on engineering, biology, other sciences, diving, occupational safety, science fiction and a few other subjects, and no clue on a whole lot of others. I will worry about them when I run out of things in my own field of interests. Maybe I will have a go at things I know nothing about occasionally, to see what develops, but will not be surprised when people tell me why I am wrong. I actually welcome that because it gets people involved. Some people are article creators, others are article builders, and others are polishers and janitors. That will also happen with short descriptions. Cheers, · · · Peter (Southwood) (talk): 15:08, 23 February 2018 (UTC)
- I looked at your table, and one thing that immediately stood out as probably useful is the position they play at, and whether they went on to be a coach or manager. Even I can see the value of distingushing between a wing and a goalkeeper. These things could be included immediately, or added later, maybe using a different category, building on what is already there. One is constrained slightly by what is stated in the article. We cant go stating things that are not in the article, unless we add them to the article and reference them first.· · · Peter (Southwood) (talk): 15:21, 23 February 2018 (UTC)
- Dates of birth and death. Why would these be useful? Not obvious to me. · · · Peter (Southwood) (talk): 15:28, 23 February 2018 (UTC)
- I would agree on the dates of birth/death not being the most useful term for footballers if this is looking to aid people in searching. I would say most users would be looking for "Joe Bloggs who played for Liverpool" and wouldn't really know that player's date of birth. The clubs would be an option but could be complicated by players who played significantly for more than one team, I should imagine that could lead to edit wars between users who want their club mentioned. Positions could be the answer as Pbsouthwood mentioned, English footballer who played as a goalkeeper for example perhaps? Kosack (talk) 16:39, 23 February 2018 (UTC)
- Okay, but I do find dates quite useful to give a general idea of when somebody lived -- were they a 19th century footballer? Inter-war years? 1950s? 1980s? Present day? Plus also they help disambiguate if we have two Joe Bloggs -- people may know it's not the 1920s one they want.
- Whenever I've edited biographical entries on disambiguation pages in the past, I have tried to include dates. Also when creating or revising descriptions on Wikidata. But I see that most descriptions there don't have dates. Jheald (talk) 10:24, 25 February 2018 (UTC)
- No harm in correct dates if they do not make the description too long, or push out something more useful. Do what seems best for the article. · · · Peter (Southwood) (talk): 13:03, 25 February 2018 (UTC)
- I would agree on the dates of birth/death not being the most useful term for footballers if this is looking to aid people in searching. I would say most users would be looking for "Joe Bloggs who played for Liverpool" and wouldn't really know that player's date of birth. The clubs would be an option but could be complicated by players who played significantly for more than one team, I should imagine that could lead to edit wars between users who want their club mentioned. Positions could be the answer as Pbsouthwood mentioned, English footballer who played as a goalkeeper for example perhaps? Kosack (talk) 16:39, 23 February 2018 (UTC)
- For comparison, this link gives current descriptions for 1000 UK footballers from Wikidata:
Useful for translation
Wiki NYC is organizing a wiki translation event in April 2018.
As we are looking around for suggested content to translate I thought that this project could curate useful content to recommend for translation.
There would be lots of details to work out both in this WikiProject and in event outreach to make it work, and it probably would not be possible to ready anything for this upcoming event in the next few weeks, but I thought that I would post here to suggest that if this project did get better established then in-person events for new wiki editors could be a way to amplify this project's outcomes and make the content more accessible in more languages. Blue Rasberry (talk) 14:38, 28 February 2018 (UTC)
- Bluerasberry, I don't see how this project is relevant to translation. We do not even know if any other Wikipedias will choose to use short descriptions. Could you explain your ideas? · · · Peter (Southwood) (talk): 20:50, 28 February 2018 (UTC)
- @Pbsouthwood: I apologize for not communicating more clearly. I have been having lots of conversations with others about how to get more content across language Wikipedias and one of the most commonly raised ideas is to curate more concise content for translators to use first.
- One idea for priority in translation is to start with article leads, which could mean 10 sentences. Another idea is to translate just first sentences. Another idea is to translate the description in Wikidata. The documentation currently mentions ENWP short descriptions as being preferable to the short descriptions in Wikidata, and I agree for now, but eventually somehow Wikidata will need to scrape up whatever short descriptions exist anywhere and translate them to other languages.
- Whereas in ENWP these are only short descriptions, in many languages after automated translation short descriptions are likely to be a starting point for establishing new articles in underdeveloped Wikipedias. There are already some efforts to generate Wikipedia articles using only Wikidata content, such as with Haitian Creole Wikipedia, and I expect those experiments will only continue.
- I am not making any request for anyone in this project to do anything differently. I am just thinking about how if this project creates lists of short descriptions, there are places including Wikidata in which ask users to post translations of those short descriptions. Blue Rasberry (talk) 18:57, 4 March 2018 (UTC)
- Bluerasberry, That makes more sense, though taken out of context the short description may lose some meaning. They are best seen as an annotation to the title (as in "Title - short description", see Outline of underwater diving for an example of using them this way), and should be translated as such. That should work well enough in the majority of cases. It is taking concision about as far as it can go. What other projects do with our short descriptions is entirely their choice, and as far as I am concerted they are welcome to use them in any way they find useful, and the more they are used the more our work is worth doing. Cheers, · · · Peter (Southwood) (talk): 20:11, 4 March 2018 (UTC)
- Bluerasberry, I don't see how this project is relevant to translation. We do not even know if any other Wikipedias will choose to use short descriptions. Could you explain your ideas? · · · Peter (Southwood) (talk): 20:50, 28 February 2018 (UTC)
lower case
I've followed the advice overleaf to display short descriptions in desktop view (Monobook here). I'm surprised to find that the display is in lowercase, except for the 1st letter. At SS Zealandia (1910) I see, "Australian cargo and passenger steamship sunk in the bombing of darwin". How can I get a display that's identical to what's written in the article's template? -- Michael Bednarek (talk) 10:54, 9 April 2018 (UTC)
- Good question. I had not noticed this before. I will take a look at the template code, but not sure I will understand enough to find a problem. Maybe RexxS can shed some light. · · · Peter (Southwood) (talk): 12:34, 9 April 2018 (UTC)
- @Michael Bednarek and Pbsouthwood: When we first started, we were unsure whether or not to capitalise the first word of the short description. What was being used previously came from Wikidata, where the first letter is not capitalised, so I forced the description into sentence case so that either would work. However, as you point out, that causes problems with proper nouns within the description. Now that we can actually see what's written I've removed any change of case, so what you see is what you get. Does that solve the issue for you? --RexxS (talk) 13:01, 9 April 2018 (UTC)
- Yes, it now displays the short description as written in the template. Thank you very much.
WP:BEANS notwithstanding, I notice that the desktop display will observe all Wiki markup, like italics, bold, colouring, line breaks, even the invocation of templates. That of course may open a Pandora's Box. -- Michael Bednarek (talk) 13:39, 9 April 2018 (UTC)- RexxS, It works fine with your css, we can just capitalise for normal sentence case. I don't see any Wiki markup as Michael Bednarek describes. Michael, are you using the gadget or the css to see the short description? I couldn't get the gadget to work, so don't know what it displays. The template looks much simpler now too, which is probably a good thing. Cheers, · · · Peter (Southwood) (talk): 15:17, 9 April 2018 (UTC)
- @Michael Bednarek and Pbsouthwood: I have a version in the sandbox that cleans out any code from the text and trims it.
{{short description| '''short <p> <big>description</big>''' }}
→{{short description/sandbox| '''short <p> <big>description</big>''' }}
→
- That should solve any malicious misuse, but sadly it complicates the template again. Sorry Peter. --RexxS (talk) 16:58, 9 April 2018 (UTC)
- The upside is that it makes vandalism of that kind pointless, the downside is that no-one notices, so they dont get fixed. · · · Peter (Southwood) (talk): 17:14, 9 April 2018 (UTC)
- IMO the defanged (nowiki) version is preferable, but others might think differently. -- Michael Bednarek (talk) 23:51, 9 April 2018 (UTC)
- The upside is that it makes vandalism of that kind pointless, the downside is that no-one notices, so they dont get fixed. · · · Peter (Southwood) (talk): 17:14, 9 April 2018 (UTC)
- @Michael Bednarek and Pbsouthwood: I have a version in the sandbox that cleans out any code from the text and trims it.
- RexxS, It works fine with your css, we can just capitalise for normal sentence case. I don't see any Wiki markup as Michael Bednarek describes. Michael, are you using the gadget or the css to see the short description? I couldn't get the gadget to work, so don't know what it displays. The template looks much simpler now too, which is probably a good thing. Cheers, · · · Peter (Southwood) (talk): 15:17, 9 April 2018 (UTC)
- Yes, it now displays the short description as written in the template. Thank you very much.
- @Michael Bednarek and Pbsouthwood: When we first started, we were unsure whether or not to capitalise the first word of the short description. What was being used previously came from Wikidata, where the first letter is not capitalised, so I forced the description into sentence case so that either would work. However, as you point out, that causes problems with proper nouns within the description. Now that we can actually see what's written I've removed any change of case, so what you see is what you get. Does that solve the issue for you? --RexxS (talk) 13:01, 9 April 2018 (UTC)
Am I doing it correctly?
I added 9 short descriptions: one for each of the state capitals in New England, and the largest cities in New Hampshire, Maine, and Vermont (the capital and largest city are the same city in Massachusetts and Rhode Island, and the largest city in Connecticut may not stay the largest).
Am I doing it right? Should I continue? HotdogPi 21:43, 28 April 2018 (UTC)
- Yes, HotdogPi, that's great; thank you. Keep it up, every one helps. --RexxS (talk) 00:24, 29 April 2018 (UTC)
cross-posting discussion/progress about disambiguation pages
Template talk:Disambiguation#Edit request for inclusion of short description template
Discussion there is getting wider. I created {{Disambiguation page short description}}, which holds the string that is used on {{Disambiguation}} and others (hopefully!) for easy modification if necessary. (Please someone protect this; it affects/will affect hundreds of thousands of articles.) That new string template is only added to a couple of disambig templates at the moment; I await some sense of agreement with the principle before changing {{Disambiguation}} and others to use it.
Outstanding topics include what value is gained from treating "sets" differently, such as the reversion of "type=disambiguation page" on {{surname}}, which put all of those broadly-dab pages back into Category:Articles with short description. Outriggr (talk) 04:22, 2 May 2018 (UTC)
- Outriggr it seems more sensible to make this generate the entire short description, i.e {{short description|Disambiguation page providing links to articles with similar titles|pagetype = Disambiguation page}} Galobtter (pingó mió) 04:42, 2 May 2018 (UTC)
- It is generating the entire short description—but since that same template text has to be applied to dozens of disambig templates, transcluding a dedicated template for the string allows the string to be changed in one place instead of dozens of places. As in
<includeonly>{{short description|{{Disambiguation page short description}}|pagetype = Disambiguation page}}</includeonly>
Outriggr (talk) 04:47, 2 May 2018 (UTC)- @Outriggr: The template should be {{short description|Disambiguation page providing links to articles with similar titles|pagetype = Disambiguation page}} so that you just have to transclude {{Disambiguation page short description}} instead of that Galobtter (pingó mió) 04:50, 2 May 2018 (UTC)
- OK, I see what you're saying. Sure. Outriggr (talk) 04:53, 2 May 2018 (UTC)
- @Outriggr: The template should be {{short description|Disambiguation page providing links to articles with similar titles|pagetype = Disambiguation page}} so that you just have to transclude {{Disambiguation page short description}} instead of that Galobtter (pingó mió) 04:50, 2 May 2018 (UTC)
- It is generating the entire short description—but since that same template text has to be applied to dozens of disambig templates, transcluding a dedicated template for the string allows the string to be changed in one place instead of dozens of places. As in
Use of this template
I noticed that the template still displays as a hidden block on the rendered page. This isn't really good for long term strategy, with regard to content reuse/alternative engines etc. The only reason to do it is to make it visible for those who don't want to use the gadget to make it visible. We should really deprecate that practice in my opinion, especially since using this template seems to have become the standard. —TheDJ (talk • contribs) 09:39, 11 May 2018 (UTC)
- TheDJ, Do you have a specific alternative to recommend, and can you explain why it would be better? I don't understand what you are saying well enough to have an opinion yet. · · · Peter (Southwood) (talk): 10:50, 12 May 2018 (UTC)
- @Peter: I think TheDJ means that we add a load of extra gubbins to the magic word inside the template, and that results in hidden text placed in the html of the page.
- @TheDJ: With only 10% done, we're a very long way from anything being "standard", so I would prefer us to retain the current template while we are still adding short descriptions – at least for now, until we can be sure that gadgets to read the api will work for almost everybody (we know the CSS will). As a long-term strategy, I originally envisioned that at some point in the future we could use a bot to convert
{{short description|Xyz}}
to{{SHORTDESC:Xyz}}
, which would be a trivial job, as I'm sure you'll agree. However, there may be sufficient advantages to wrapping in a template (ability to add/remove tracking categories, etc.) that it might be better to retain the template and just cut out the hidden text at some point in the future. We could do that at any time, of course, but I'd want to be sure that we wouldn't hinder the process of adding short descriptions to articles. Any thoughts on how we might gauge that? --RexxS (talk) 12:12, 12 May 2018 (UTC)- RexxS, I tried inspecting the html of a page with a short description, and there seemed to be very little html obviously associated with the short description, but maybe I am not looking in the right place. · · · Peter (Southwood) (talk): 13:04, 12 May 2018 (UTC)
- {{short description}} has this piece of code: <div class="shortdescription nomobile noexcerpt noprint searchaux" style="display:none">{{{1|}}}{{SHORTDESC:{{{1|}}}}}</div>. {{{1|}}} puts the text of the short description in the html of the page (inside the div block) which TheDJ considers problematic. Galobtter (pingó mió) 13:11, 12 May 2018 (UTC)
- I think I get it. For it to be visible using css it must be in the html of the page, which may be problematic somehow. However the only 100% reliable method of viewing for me is via css from this text, as the gadget and script do not always work, possibly due to slow internet connection. Cheers, · · · Peter (Southwood) (talk): 13:56, 12 May 2018 (UTC)
- {{short description}} has this piece of code: <div class="shortdescription nomobile noexcerpt noprint searchaux" style="display:none">{{{1|}}}{{SHORTDESC:{{{1|}}}}}</div>. {{{1|}}} puts the text of the short description in the html of the page (inside the div block) which TheDJ considers problematic. Galobtter (pingó mió) 13:11, 12 May 2018 (UTC)
- RexxS, I tried inspecting the html of a page with a short description, and there seemed to be very little html obviously associated with the short description, but maybe I am not looking in the right place. · · · Peter (Southwood) (talk): 13:04, 12 May 2018 (UTC)
Communication
For a project of this scale – generating short descriptions for 5.5 million articles – there does seem a need for more widespread awareness (I've added a plug in Community portal), and more communication of what is going on. For example: the project page does include we are working on making infoboxes generate descriptions automatically
, but no more detail than that (until I just added to it). I've been following this recently, but never spotted any announcement that half-a-million placename articles had already had descriptions generated out of {{Infobox settlement}}
. Well done! But AFAIK no-one said. It just happened one day. Would be good to know if plans are afoot to do similar with other common types of infobox, then we wouldn't waste our time typing in descriptions for articles carrying those types of infobox. A lot of articles, though, have no infobox and there are editors keen to keep it that way. How about the proposal to generate descriptions out of leads – how's that going?: Noyster (talk), 08:35, 12 May 2018 (UTC)
- Taxoboxes seem like a likely tool to automatically generate a large number of descriptions. I have tried to start a discussion at Template talk:Taxobox, with little apparent interest. I am also manually adding short descriptions to featured articles. What I am learning from this is that it is quite easy to generate a workable but not particularly great short description in many ways, but a good short description can be quite difficult to compose in some cases. A lot of featured articles have a lead sentence which converts to a quite good short description. Not surprisingly, quite a few of these have a Wikidata description that is very similar to the first sentence, others have no Wikidata description at all. This is less often the case for run of the mill articles, where a quick scan of the lead paragraph is often more productive, but less likely to be automateable. Cheers, · · · Peter (Southwood) (talk): 11:01, 12 May 2018 (UTC)
- On the generating from lead sentence; it is doable for sportspersons (which is why I mentioned them), and likely useful as nationality isn't given in sportsperson infoboxes so it is hard to generate good descriptions. I am working on that actually Galobtter (pingó mió) 11:44, 12 May 2018 (UTC)
This discussion tends to bear me out about awareness. Until the project wins wider acceptance, it may be well for those of us adding SDs by manual editing to avoid annoying people via their watchlists, by binding ourselves to the rule imposed on bot operators: do it only as part of an edit that also alters the appearance of the page as rendered (in this case, meaning "as rendered on desktop").: Noyster (talk), 09:51, 27 May 2018 (UTC)
Infobox settlement
Hi all, there's currently a problem with the short description on San Francisco that breaks the page on the iOS app. The short description that's being generated includes a line break, which is causing problems. The short description seen in iOS app search is: "City and County in California in California", and when you open the page, it just shows the lead image and nothing else. Here's a screenshot of what it looks like.
WMF developers are fixing the issue on our end -- our display should be resilient enough to handle accidental line breaks in the description. But as we've been investigating the problem, it's been hard for us to understand what Template:Infobox settlement is doing with the short descriptions.
As far as I can tell, Infobox settlement is pulling information to dynamically create the short description. But I would imagine that the descriptions would use the same format, and they don't. Some examples of the short descriptions I'm currently seeing in the app:
- San Francisco: City and County in California in California
- Los Angeles: Place
- Ann Arbor, Michigan: City in Michigan, United States
- New York City: Megacity in United States
In the Json blob, the San Francisco article says:
- description: "City and County in California in California\n----, United States",
- descriptionsource: "local"
I'm also seeing a mistake in the metropolitan areas:
- New York metropolitan area: Megacity in New York, New Jersey, Connecticut, Pennsylvania, United States
- Chicago metropolitan area: Metropolitan region in , United States
I assume this is pulling information from the infobox itself, so is the infobox for Chicago metropolitan area missing a property that would fill that empty space? It would be great if someone could explain how these descriptions are constructed, so that our devs can help when there's a bug, like the one on the San Francisco page.
Also, when there's an obvious fail like "Los Angeles: Place", what can an editor do to add a better description? I know that Pbsouthwood was concerned a while back about editors not being able to see the short description in the wikitext, so they can edit or update them. Is that still a concern? I looked for anything related to the short description on these pages, and there's no indication of what the description is, or how to fix it. -- DannyH (WMF) (talk) 22:46, 25 May 2018 (UTC)
- @DannyH (WMF): You can add short description through the settlement_type parameter, and the subdivisions through the subdivision_type and subdivision_name etc; basically the displayed data on what type it is given below the image and where it is in the infobox. The in california in california thing and similar is tracked in Category:Infobox_settlement_pages_with_bad_settlement_type, with currently 3000 members, probably get someone at WT:AWB to go through and fix those. I'll fix the other instances too (at the template) Galobtter (pingó mió) 06:10, 26 May 2018 (UTC)
- The short description can also be edited directly through Infobox settlement, using the recently added
short_description =
parameter. However, to meet cases of need like those above I think there should be more obvious means of editing these autogenerated SDs, which are being created from increasing numbers of other infoboxes as well. The autogenerated SDs cannot be overridden using the helper script, even though this provides an "Edit" button (discussed here). Nor can they be overridden using the{{Short description}}
template unless the template is placed below the infobox, rather than above as recommended (discussed here). I agree that it would be good to have the SD visible in wikitext, and also in preview mode: Noyster (talk), 07:56, 26 May 2018 (UTC)- @Galobtter:: Thanks for the explanation. I'm seeing some problems with US state capitals -- Trenton, New Jersey, Sacramento, California and Hartford, Connecticut all had the "of New Jersey in New Jersey" construction. I fixed those (using my volunteer account) using the short_description parameter, but I'm sure there are more. I also tried to fix Los Angeles using that parameter, but it's still stubbornly saying "Place". :) -- DannyH (WMF) (talk) 18:12, 29 May 2018 (UTC)
I really can't figure out why it says place, because the hidden text generated by {{short description}} shows "City in California, United States"..Galobtter (pingó mió) 19:09, 29 May 2018 (UTC)- Fixed Galobtter (pingó mió) 19:21, 29 May 2018 (UTC)
- Oh, was it stuck because there were two infoboxes on the page? -- DannyH (WMF) (talk) 21:22, 29 May 2018 (UTC)
- @Galobtter:: Thanks for the explanation. I'm seeing some problems with US state capitals -- Trenton, New Jersey, Sacramento, California and Hartford, Connecticut all had the "of New Jersey in New Jersey" construction. I fixed those (using my volunteer account) using the short_description parameter, but I'm sure there are more. I also tried to fix Los Angeles using that parameter, but it's still stubbornly saying "Place". :) -- DannyH (WMF) (talk) 18:12, 29 May 2018 (UTC)
Constraint for repeated templates?
Do we have a constraint which would notice if there are two {{short description}} templates, add some red text on the page and possibly add it to a service category?--Ymblanter (talk) 08:22, 16 November 2018 (UTC)
- @Ymblanter:, Nothing that I am aware of. I assume you don't mean the cases where one is embedded in another template to provide a generic short description which is overridden by a manually added short description on the page, which should be quite OK and is typically used in cases where the embedded/automatically generated short description does not work too well, or an editor just thinks a manually crafted one is better. · · · Peter (Southwood) (talk): 18:20, 16 November 2018 (UTC)
- No, I indeed mean just two templates. Whereas the best practice it is to put this template on the top of the article, I can still easily add it on top of an article where another template was previously added by another editor on the bottom. Or there could be some mess on the top (hatnotes, protection templates etc), and if it long enough, it is possible for a good-faith editor not to note an already existing template. It would be beneficial to introduce constraints which alert the editors, similarly to what happens if two templates with coordinates have been added to the same article.--Ymblanter (talk) 18:32, 16 November 2018 (UTC)
Automatically generating short descriptions for species articles
Hello everyone. I have developed a way to procedurally generate short descriptions for the 387,816 articles in Category:Articles with 'species' microformats. My method, while not perfect, generally produces a better short description than the Wikidata description. It operates by copying a snippet from the first sentence of the lede that is suitable as the short description. This method relies on the relatively systematic way articles in this category are written - use caution before expanding this beyond this category.
Pseudocode
The purpose of this is to explain exactly how a program would generate these summaries:
Download the wikitext for an article in Category:Articles with 'species' microformats
Run the regex string (?<=(. a | an ))(.*?)(?=(\.|,| which | known | found | describe|<ref|\(| native | grow| that | within | from | cause)) on the wikitext
Take the first match generated by this regex, ignore/discard the rest.
This produces the basic short description now we need to clean it up.
Start loop
Run the regex \[\[[^\]]*\| to identify the left side of piped links.
If any matches were found, remove the matched text and repeat the loop
End loop
Run the above loop three more times, replacing the regex lines with the lines below to strip out links, bold, and Italics
Run the regex \[
Run the regex \]
Run the regex [']{2,}
If the string is "Gram-negative" replace it with "Gram-negative bacteria"
Check whether there's a space in the string
if there is not
Add the article to a list for carbon-based intelligence to deal with, then skip the article
Check the length of the string
if length in characters > 70
Add the article to a list for carbon-based intelligence to deal with, then skip the article
else, add {{shortdescription|(the remaining regex match)}} to the article. Include attribution in the edit summary.
How it works
The regex looks for a string that is immediately preceded by "Any character, space, lowercase a, space" or "Space, lowercase a, lowercase n, space" The any character is there because the lookbehinds must be of the same length (four characters in this case). It then matches any number of characters (the short description) until the string immediately in front of it is one of several stop codes.
All links and bolding/italics are then stripped out of the short description. If the short description is longer than 70 characters, it is left for a human. If the short description contains no spaces, it's left for a human. Otherwise it is posted at the top of the article in the shortdescription template.
Results
I used Random page in category Articles with 'species' microformats to generate a sample of articles. The article, my procedurally generated short description, and the wikidata description are included.
Article | Procedurally generated description | Wikidata description | Notes | Comments |
---|---|---|---|---|
Profundiconus pacificus | species of sea snail | species of mollusc | A good example of the improvements achievable over a wikidata import. | |
Catocala caesia | moth of the Erebidae family | species of insect | Significant improvement on WD (P) | |
Pterostylis daintreana | species of orchid endemic to eastern Australia | species of plant | endemic should probably be added to the stop codes | I think endemic is acceptable, but you could shorten a bit by substituting "from" for "endemic to" (P) |
Sewa taiwana | moth of the Drepanidae family | species of insect | ||
Lactobacillus pontis | (skipped, added to human list) | species of prokaryote | Algorithm produced "rod-shaped", which gets kicked for a lack of spaces. Bacteria articles are hard on my algorithm. Is there a subcategory I can skip? | |
Ross seal | true seal | species of mammal | Not a big improvement, but not unacceptable. (P) | |
Turner's thick-toed gecko | species of gecko | species of reptile | Not a big improvement as the title alreaddy contains "gecko", like previous example, but also not unacceptable (P) | |
Coleophora sylvaticella | moth of the Coleophoridae family | species of insect | ||
Solirubrobacter pauli | mesophilic Gram-positive and aerobic bacterium | (none) | 46 characters, algorithm got lucky here. | Very compact, quite informative. If anything a bit technical, "and" could be left out, reducing lendth to 42 characters if worth the effort. |
Leucotabanus ambiguus | species of horse flies in the subfamily Tabaninae | species of insect | big improvement on WD, could be improved, but should be good enough. (P) | |
Chersodromus | genus of snakes of the family Colubridae | genus of reptiles | Big improvement on WD (P) | |
Artedius harringtoni | (skipped, added to a list for humans to parse) | species of fish | Algorithm reterned "demersal" which is rejected for lack of spaces. | |
Mitrella blanda | species of sea snail | species of mollusc | ||
Givira aregentipuncta | moth in the Cossidae family | species of insect | ||
Medicorophium | genus of amphipod crustaceans | genus of crustaceans | Improvement on WD. Could be improved, but probably good enough (P) | |
Scrophularia ningpoensis | perennial plant of the family Scrophulariaceae | species of plant | significant improvement on WD (P) | |
Anadasmus sororia | moth of the Depressariidae family | species of insect | ||
Hakea flabellifolia | shrub of the genus Hakea | species of plant | ||
Shrew | small mole-like mammal classified in the order Eulipotyphla | family of mammals | 58 characters | "Classified in" can be reduced to "in" takes it down to 48. This may be a generally applicable modification (P) |
Gascoyne's Scarlet | English cultivar of domesticated apple | apple | nice, probably close to optimum. (P) | |
Barred thicklip | species of fish belonging to the wrasse Family | species of fish | Need to switch order in which I strip links and check regex. | "belonging to" could be reduced to "in" (P) |
Moluccan scops owl | owl found in Indonesia | species of owl |
Decisions
The things we need to decide:
- Are the short descriptions generated in this way good enough for semiautomatic posting? For automatic posting? Semi-auto posting at one a second is still a 100 hour job.
- Do we bias towards shorter summaries by adding " in ", and " belonging " to the ending criteria?
- What is an acceptable "Fail Rate" wherein the bot posts something in the short description that is inappropriate for the short description?
Moving Forward
To move this forward, we need to do a couple of things:
- Develop a strong consensus that adding these summaries is a good thing, and that the occasional mistakes are worth it.
- Refine this process more. There's still some low-hanging fruit for improvement
- Find a bot operator willing to implement this and make the runs. I could probably do it, but it will be difficult for me, as this would be my first bot.
I'm inviting comments on this now - if it looks good, we can get a consensus for it and I'll start refining it.
Cheers, Tazerdadog (talk) 05:38, 3 June 2018 (UTC)
- Nice! One thing, I was thinking that we could instead automatically generate descriptions from the taxobox, though this may be better since there are some complexities with classification and common names etc (though it could be relatively simple to at-least get something like "species of insect" or "genus of beetle") I too am doing basically this for footballer biographies; have a Python script that works quite well; this reminds me that I probably should finish that. There is no need to manually strip links; you can use mw:Extension:TextExtracts to get the plain text lead sentence, which also allows you to skip doing regex on anything above or below the lead sentence. Galobtter (pingó mió) 06:39, 3 June 2018 (UTC)
- All the generated descriptions are as good as or better than the Wikidata decriptions, none of the descriptions generated are unacceptable. Most are not optimal, but most human generated short descriptions will also not be optimal first shot. I would describe this as very promising. Is there a way to do a moderate batch semi-automated that is simple enough for me to do a trial run of a thousand or so for a more representative sample?
- An edit summary asking for a human check and correction if necessary might help optimise. I think the failure rate will be low, and revert rate even lower.· · · Peter (Southwood) (talk): 09:20, 3 June 2018 (UTC)
- There is not currently a bot implementation of this - I did the table above by hand. I will see if I can set up a basic automated script somehow to try to get a bigger sample. With 1000, we can start to see which substitutions are worth it and which are not. We can also examine error rates and look into places where the bot might really screw up. I can also put in a bunch of little optimizations as they come up with the first 1000. Tazerdadog (talk) 09:34, 3 June 2018 (UTC)
- @Tazerdadog: I'm almost done getting 500 descriptions with your regex, will post the table soon. Galobtter (pingó mió) 10:23, 3 June 2018 (UTC)
- Done, Here, @Pbsouthwood: too; I see some problems, some can be fixed, some perhaps not, 95%+ are good so it should at-least be useful for doing semi-automatically. The reason I was doing footballer biographies is because they are so regular that one doesn't need any uncontrolled .*? leading to sometimes strange stuff as in there. Galobtter (pingó mió) 10:35, 3 June 2018 (UTC)
- stripping out things in brackets and making it require "is a[n]" instead of just "a[n]" would probably fix a lot of the problems Galobtter (pingó mió) 10:42, 3 June 2018 (UTC)
- First of all, thank you for compiling this. I'm going to spend an hour or so really digging into the results here and seeing what I can do to improve this. I agree with your general assessment on the two big questions though. This is good enough for semi-auto posting with a human in the loop right now, and this is not good enough for full automatic bot posting without a human in the loop right now. I'm going to hack on this a little more, and get an updated regex in an hour or so. The holy grail is fully automatic additions, as that's where we can really make progress towards 2 million. Tazerdadog (talk) 10:50, 3 June 2018 (UTC)
- stripping out things in brackets and making it require "is a[n]" instead of just "a[n]" would probably fix a lot of the problems Galobtter (pingó mió) 10:42, 3 June 2018 (UTC)
- There is not currently a bot implementation of this - I did the table above by hand. I will see if I can set up a basic automated script somehow to try to get a bigger sample. With 1000, we can start to see which substitutions are worth it and which are not. We can also examine error rates and look into places where the bot might really screw up. I can also put in a bunch of little optimizations as they come up with the first 1000. Tazerdadog (talk) 09:34, 3 June 2018 (UTC)
- These look very good to me so I'm reluctant to raise a trivial point. Should the descriptions start with a capital letter? WP:Short descriptions: "Whether it should have an initial capital remains undecided, but is favored at present." Thincat (talk) 11:09, 3 June 2018 (UTC)
- I capitalized it in the table I script generated, Tazerdadog I assume just copied the output without capitalizing but when (hopefully) made into a bot it will likely be capitalized. Galobtter (pingó mió) 11:12, 3 June 2018 (UTC)
- Galobtter is correct, I just copy-pasted. No opinion on the merits of whether the first word should be capitalized. Tazerdadog (talk) 12:23, 3 June 2018 (UTC)
- I capitalized it in the table I script generated, Tazerdadog I assume just copied the output without capitalizing but when (hopefully) made into a bot it will likely be capitalized. Galobtter (pingó mió) 11:12, 3 June 2018 (UTC)
- This is good to see, only we need to ensure that
{{short description}}
templates added by a human at the top of the article are not overridden by this process – as is already happening with descriptions generated out of infoboxes: Noyster (talk), 11:40, 3 June 2018 (UTC)- If there are already local short descriptions it shouldn't edit the page; the bot would add
{{short description}}
at the top of the article (as a side-note, for the infoboxes a fix is in the pipeline, assuming something is happening with phab:T193857) Galobtter (pingó mió) 11:55, 3 June 2018 (UTC)
- If there are already local short descriptions it shouldn't edit the page; the bot would add
Ok, I've had some time to mull over @Galobtter:'s list, and I've arrived at the following basic conclusions.
1) Start by pulling the first three sentences, not just the first sentence. We got quite a few hat notes and abbreviations instead of sentences. I believe this can be done by changing a number, if it's a real problem let me know.
2) Remove parentheses and everything inside them before running my regex. Parentheses were mucking things up, and this is cleaner than my kludge of stopping at an open parentheses. If this is a problem to implement, skip it.
3) Run my updated regex main expression (below). Notable features include adding is to ensure more robust starts. two new start codes, is the and is followed by one, and 5 new endcodes.
(?<=(.. is a |. is an | is the |.... is (?=one )))(.*?)(?=(\.|,| which | known | found | describe|<ref| whose | native | grow| that | within | from | cause| used | and | with |;))
4) Note that single-word short descriptions are filtered out - if the regex returns something without spaces, it defaults to a skip. We don't need to change the way we've been testing, but everyone should bear this in mind when reading the results.
5) Similarly to the previous point, there's also a filter for maximum length. I currently have it set to skip if my regex produces something longer than 70 characters, but commentary on the appropriate length is very welcome. If you want to pitch in but don't want to program, looking at the test data and telling us where to set that character count would be helpful.
Could you please generate a new sample for us Galobtter?
Cheers. Tazerdadog (talk) 12:13, 3 June 2018 (UTC)
- Updated the regex, added a couple of new start conditions, and removed a couple of prominent older ones - this means the algorithm will attempt more short descriptions, that will typically be longer.
(?<=(.. is a |. is an |. are a | are an | is the |.... is (?=one )))(.*?)(?=(\.|,| which | known | first described | describe|<ref| whose | native | grow| that | in which | from | cause| used | with |;))
- Could you generate a nw sample with new articles using the first 3 sentences? I'm worried that I'm tuning the regex too hard to the 500 articles we've been using, and want to nip that in the bud. Cheers, Tazerdadog (talk) 14:05, 3 June 2018 (UTC)
- Done Galobtter (pingó mió) 14:44, 3 June 2018 (UTC)
- I'm very impressed with how good these are. Overwhelmingly helpful. Even the grammar is good. Does "Toad" appear because it will later be filtered out? I love "Rarely seen" for the Shelled slug! Thincat (talk) 18:16, 5 June 2018 (UTC)
- Done Galobtter (pingó mió) 14:44, 3 June 2018 (UTC)
- Could you generate a nw sample with new articles using the first 3 sentences? I'm worried that I'm tuning the regex too hard to the 500 articles we've been using, and want to nip that in the bud. Cheers, Tazerdadog (talk) 14:05, 3 June 2018 (UTC)
Sorry for not doing anything on this for the last few days. I have just manually evaluated the first 100 short descriptions generated at User:Tazerdadog/organism descriptions first hundred. The results were: 18 skips, 4 bad descriptions, 5 OK descriptions, and 73 good/great descriptions.
Definitions:
Good/great: Better than wikidata's generic "Species of plant" style constructions"
OK: Worse than wikidata, but better than nothing. May contain minorly misleading statements, or minor grammar errors
Bad: Worse than nothing - either very misleading, nonsensical, or containing a glaring grammar error.
Skip: The algorithm hit something it wasn't comfortable with and declined to return a short description. Such pages will either receive the wikidata short description or be reserved for a human.
9 of the 18 skips were for being too long. In my evaluation, most of the descriptions rejected for length didn't have any problems except that they were too long. I'm therefore going to bump up the filter from max 70 chars to max 90 chars. This is an area where non-technically inclined people can help me - sort by character count, and give me your opinion on where I should draw the line for "that's so long that it's worse than no short description at all".
I'm going to add a filter for "Contains exactly one space, and was terminated by a comma" These two word constructions are generally part of a list of traits that I can't parse easily - it's safer to just skip them. If this filter had been in place on this pass, two bad descriptions and one good one would have been skipped.
I'm going to add a filter for short descriptions ending in "ing" Such endings are more likely to be grammatically incorrect. In this run, this filter would have skipped one bad description.
I'm going to add in the two new filters and modify the existing one,and plow through evaluating another 100 short descriptions.
At this point, I'd like to get a conversation going about the acceptable error rate for fully automatic posting. I think I'm close to that point, but we need a strong consensus that we're there before we go into a BRFA.
My opinion: On the articles that the bot doesn't skip, 80% must be better than wikidata (currently 73/82, 89.0%), and no more than 3 percent can have problems so severe that they're worse than nothing (currently 4/82, 4.9%) Either a 95% confidence interval, or a sample size of 250 is sufficient. (95% CI is pretty much the scientific standard, 250 sample size is the most I'd consign anyone to slog through checking my classifications.) Tazerdadog (talk) 11:23, 8 June 2018 (UTC)
- Commenting to try to poke this along - I need a consensus on how good the descriptions need to be for fully automatic posting. Tazerdadog (talk) 12:35, 12 June 2018 (UTC)
- It would be difficult but I'd prefer a severe problem rate of <1% - 3% would still mean something like 10000 bad descriptions if applied over all organisms Galobtter (pingó mió) 12:39, 12 June 2018 (UTC)
Problem cases
- Colma, California - Manually added template not overriding infobox version. Does the infobox template have noreplace? · · · Peter (Southwood) (talk): 20:24, 21 June 2018 (UTC)
- It does now. --RexxS (talk) 00:32, 22 June 2018 (UTC)
Project banner and categories
So you guys may have already noticed, but I created a project banner {{WikiProject Short descriptions}} and a project category Category:WikiProject Short descriptions to make it easier to keep track of relevant pages under this project's scope. I also tweaked most of the categories populated by {{short description}} to standardize them and ensure they weren't orphaned. (BTW, if any of you can figure out a better image that encapsulates the concept of "short descriptions", feel free to tweak the banner...) Cheers. — AfroThundr (u · t · c) 04:08, 3 August 2018 (UTC)
Using an article's short description in a disambiguation page
Can the short description of an article be accessed and used in a disambiguation page somehow? --Gonnym (talk) 13:42, 19 September 2018 (UTC)
- Gonnym, you can do that using {{Template parameter value}}, assuming the short description is specified using {{short description}} Galobtter (pingó mió) 13:49, 19 September 2018 (UTC)
- Could you give me an example of how this should work, as I'm not getting this to work for me. Trying this as an example: {{Template parameter value|14 (number)|short description|1}}. --Gonnym (talk) 14:35, 19 September 2018 (UTC)
- I wasn't clear enough: it only works if the {{short description}} is in the text of the page not if it is automatically generated through an infobox as for 14 (number) Galobtter (pingó mió) 14:51, 19 September 2018 (UTC)
- Oh. So there is currently no way of getting the short description from the infobox? That's a pretty major setback then... --Gonnym (talk) 14:57, 19 September 2018 (UTC)
- There is no reason why a short description can't be added manually to any page where it's needed. As long as the infobox allows manual descriptions to take precedence (as {{infobox number}} now does), things should work as expected. I know it would be nice to have all of this done automatically for us, but the job is a very large one and having the infoboxes do some of the work is a real bonus most of the time. --RexxS (talk) 15:39, 19 September 2018 (UTC)
- But there is really no point at all for a manual description in that scenario if it has the exact same description. That just leads to a sort of WP:CONTENTFORK. From both a coding and an editing perceptive that is very bad practice. Anyways, thanks for answering my questions, not your fault it didn't solve my issue. --Gonnym (talk) 15:47, 19 September 2018 (UTC)
- There really is a point in having a manual description if a manual description is needed for another function. There's no reason why the manual description shouldn't be an improvement on the infobox-generated one – see examples of {{infobox settlement}}, and note that I used
{{short description|Natural number, composite number}}
for 14 (number), which improves on the bare "Natural number" that the infobox would supply. Because the manual description replaces, not supplements the infobox-generated one, it really is no sort of CONTENTFORK at all: see the page info for 14 (number)] if you need to be convinced. It is never bad practice to use manual methods when an automatic method doesn't give the required result, and that applies to both the editing and coding perspectives. I'm sorry my suggested solution didn't please you, but it does solve your issue. --RexxS (talk) 17:10, 19 September 2018 (UTC)- I can see how a manual override is helpful if the automatic one is not sufficient, but in my specific scenario, the automatic one should for almost all cases be sufficient (which is Template:Infobox television episode). So for a template that is used on 9000 pages, the solution you gave me which requires me to manually insert the same exact description is just a bad one. From a coding perceptive, duplication of code is never a valid reason. This duplicates code. And from an editing perspective, instead of a WP:CONSISTENT style used for all episodes, chances are slim, that over 9000 manually entered description that will be the same, unless editors keep watch over them every day. So yeah, your solution technically works, but is not a real solution. --Gonnym (talk) 17:20, 19 September 2018 (UTC)
- Gonnym, it is actually possible to get the short description out of such an article. Will have an module up soon Galobtter (pingó mió) 17:36, 19 September 2018 (UTC)
- Actually don't really even need a module. {{#invoke:string|match|s = {{:13 (number)}}|pattern = {{lessthan}}div class="shortdescription.->(.-)<}} gives Natural number Galobtter (pingó mió) 17:45, 19 September 2018 (UTC)
- I can see how a manual override is helpful if the automatic one is not sufficient, but in my specific scenario, the automatic one should for almost all cases be sufficient (which is Template:Infobox television episode). So for a template that is used on 9000 pages, the solution you gave me which requires me to manually insert the same exact description is just a bad one. From a coding perceptive, duplication of code is never a valid reason. This duplicates code. And from an editing perspective, instead of a WP:CONSISTENT style used for all episodes, chances are slim, that over 9000 manually entered description that will be the same, unless editors keep watch over them every day. So yeah, your solution technically works, but is not a real solution. --Gonnym (talk) 17:20, 19 September 2018 (UTC)
- There really is a point in having a manual description if a manual description is needed for another function. There's no reason why the manual description shouldn't be an improvement on the infobox-generated one – see examples of {{infobox settlement}}, and note that I used
- But there is really no point at all for a manual description in that scenario if it has the exact same description. That just leads to a sort of WP:CONTENTFORK. From both a coding and an editing perceptive that is very bad practice. Anyways, thanks for answering my questions, not your fault it didn't solve my issue. --Gonnym (talk) 15:47, 19 September 2018 (UTC)
- There is no reason why a short description can't be added manually to any page where it's needed. As long as the infobox allows manual descriptions to take precedence (as {{infobox number}} now does), things should work as expected. I know it would be nice to have all of this done automatically for us, but the job is a very large one and having the infoboxes do some of the work is a real bonus most of the time. --RexxS (talk) 15:39, 19 September 2018 (UTC)
- Oh. So there is currently no way of getting the short description from the infobox? That's a pretty major setback then... --Gonnym (talk) 14:57, 19 September 2018 (UTC)
- I wasn't clear enough: it only works if the {{short description}} is in the text of the page not if it is automatically generated through an infobox as for 14 (number) Galobtter (pingó mió) 14:51, 19 September 2018 (UTC)
- Could you give me an example of how this should work, as I'm not getting this to work for me. Trying this as an example: {{Template parameter value|14 (number)|short description|1}}. --Gonnym (talk) 14:35, 19 September 2018 (UTC)
Confused - Please Help
How do I find articles that need a short description? For example, I came across this article in the Featured section - Aggie Bonfire - and there is no short description template on the source page. Does that mean I can add one on it? Likewise for the page Beer Festival, thanks in advance - Vinvibes (talk)
- Yes, you can add one. At this point, a vast majority of articles do not have a short description. For example, in the last couple of week I added short descriptions to the articles on the US states, Canadian provinces, and English counties, most of them did not have any prior to my edits.--Ymblanter (talk) 08:51, 8 December 2018 (UTC)
- Hi, okay - I could do the same for Asian countries and Indian states. Is there a specific search tool that I could use for finding articles pertaining to a specific continent here on Wikipedia? Thanks for your inputs, regards, Vinvibes (talk) 06:49, 9 December 2018 (UTC)
- In fact I started on the list of Asian countries and did Afghanistan, Armenia, Bhutan, Brunei and Cambodia. Can you please check if I am on the right track? Thanks in advance, Vinvibes (talk) 09:47, 9 December 2018 (UTC)
- Your edits are definitely fine, though to my taste the descriptions you added are too detailed. The purpose of the short description is to show the mobile readers what the article is about. If they see Brunei for example they want to know that the article is about the country and not about an eponymous musical group or shopping center or whatever. For this, smth like "A country in Asia" should suffice, and if they want to know whether it is landlocked or a monarchy they can go to the article and read it. But this is just my opinion, I have seen users around adding longish description similar to what you have added.--Ymblanter (talk) 10:36, 9 December 2018 (UTC)
- @Vinvibes: To find articles that have something in common, check to see if a category exists – for example Category:Countries in Asia. You can also check the parent, in that case Category:Countries by continent and work your way down to level you want to look at. Shorter descriptions are often better than longer ones, but don't worry too much, because any relevant description is better than none. By the way, if you haven't tried User:Galobtter/Shortdesc helper, it's really worth a look as it makes it easy to modify a description imported from Wikidata or add a new one, as well as showing the short description. --RexxS (talk) 14:13, 9 December 2018 (UTC)
- Your edits are definitely fine, though to my taste the descriptions you added are too detailed. The purpose of the short description is to show the mobile readers what the article is about. If they see Brunei for example they want to know that the article is about the country and not about an eponymous musical group or shopping center or whatever. For this, smth like "A country in Asia" should suffice, and if they want to know whether it is landlocked or a monarchy they can go to the article and read it. But this is just my opinion, I have seen users around adding longish description similar to what you have added.--Ymblanter (talk) 10:36, 9 December 2018 (UTC)
- In fact I started on the list of Asian countries and did Afghanistan, Armenia, Bhutan, Brunei and Cambodia. Can you please check if I am on the right track? Thanks in advance, Vinvibes (talk) 09:47, 9 December 2018 (UTC)
- Hi, okay - I could do the same for Asian countries and Indian states. Is there a specific search tool that I could use for finding articles pertaining to a specific continent here on Wikipedia? Thanks for your inputs, regards, Vinvibes (talk) 06:49, 9 December 2018 (UTC)
- @Ymblanter:Right-oh, will bear your point in mind for future descriptions. Many thanks, Vinvibes (talk) 15:00, 9 December 2018 (UTC)
- @RexxS:Thanks Rexx5 for your inputs, definitely helpful for a new user like me to find my way around, Vinvibes (talk) 15:00, 9 December 2018 (UTC)
Annotated links on disambiguation pages
Editors are beginning to use {{Annotated link}} on disambiguation pages, which is great, but the formatting does not match MOS:DAB guidelines. For example, in Ben Nevis (disambiguation), three of the See also items have been added using {{Annotated link}}, and have the following format:
- Nevis Radio – Community radio station in Fort William, Scotland
According to MOS:DAB, the format should be:
- Nevis Radio, community radio station in Fort William, Scotland
Specifically, a comma instead of a hyphen after the article title and the following word in lower case (if non a proper name). Is it possible to adjust the template or use parameters to affect these changes? It seems like a small detail, but disambiguation pages are designed to be quickly scanned and that is easier if there is a consistent format. Thanks, Leschnei (talk) 17:37, 27 December 2018 (UTC)
- The comma/hyphen and lower/uppercase changes are easy. The tough one is the article (a/an/the) that's supposed to go after the comma. It will be very hard to impossible to have the template determine which word to use on its own, and get it right every time. It will likely require another parameter, like {{annotated link|Nevis Radio|style=dab|article=a}}. Either that, or a change to MOS:DAB. —swpbT go beyond 18:05, 27 December 2018 (UTC)
- So this:
{{Annotated link/sandbox|Nevis Radio|style=y}}
-> Nevis Radio – Community radio station in Fort William, Scotland{{Annotated link/sandbox|Nevis Radio|style=y|word=the}}
-> Nevis Radio – Community radio station in Fort William, Scotland{{Annotated link/sandbox|Nevis Radio|style=y|word=yes}}
-> Nevis Radio – Community radio station in Fort William, Scotland
parameter names can be changed, just needed something fast. --Gonnym (talk) 18:34, 27 December 2018 (UTC)
- Thanks Swpb and Gonnym, Leschnei (talk) 18:48, 27 December 2018 (UTC)
Hello, I've nominated this category to be renamed to Category:Articles with short descriptions (plural). Would you please offer input at Wikipedia:Categories for discussion/Log/2018 December 31? Thank you. Nyttend (talk) 02:22, 31 December 2018 (UTC)
Progress with adding short descriptions
We now have about 672,000 articles with a short description input or generated within Wikipedia, against a target of 2 million that WMF has set us before the link to Wikidata descriptions is cut.
Almost 60,000 of these articles have had their SD added manually by editors (including imports from Wikidata using the Shortdesc helper). This activity has picked up in pace over the past six months, and is now adding SDs at the rate of about 500 per day, However, relying on this alone, at the present pace it would take about another seven years to hit the 2 million target.
Meanwhile, over 90% of the SDs so far added have been automatically generated from infoboxes, chiefly {{Infobox settlement}}
. But this side of the project appears to have stalled, with no updates since May to the lists above of infoboxes currently used as sources or being worked upon.
No-one appears to have yet tackled the 'biggie', which is deriving SDs from {{Infobox person}}
and all its derivatives. Petscan says there are about 891,000 BLPs. Just {{Infobox person}}
is used on 321,000 articles, then {{Infobox football biography}}
on 161,000, {{Infobox officeholder}}
on 136,000, and so on. In view of these large numbers, and recalling that much of the original concern about relying on Wikidata descriptions was focussed on BLP issues, it seems to me that deriving SDs from these infoboxes is an essential step forward. The core of these SDs would presumably be Nationality occupation
, or for more specific infoboxes such as {{Infobox architect}}
it would just be Nationality
architect.
This done, there would still remain plenty of scope for human input in checking, correcting and improving the auto-generated SDs for articles about people: Bhunacat10 (talk), 12:31, 6 January 2019 (UTC)
- {{Infobox television episode}} has also implemented an automatic short description. Regarding infobox person or other templates, the issue isn't technical, as I could create those pretty fast, but there needs to be a discussion to get an accepted (and good) style that can work. Also note, that if these templates have many options which complicate them, that isn't an issue, and the code will just be handled in an invoked module from the infobox template. --Gonnym (talk) 13:32, 6 January 2019 (UTC)
- The automatically generated short descriptions do not have to be perfect, they just have to be good enough, as manually added SDs will override them with no difficulty if anyone wants to improve them or if they are already in the article. We should consider them as the first approximation rather than the final product. For BLPs the critical thing is they must not be wrong.
Nationality occupation
is a pretty bland description, and will seldom be optimum, but it will be good enough for a first approximation, as it is unlikely that anyone will have a strong objection to them because they will comply with BLP policy if the infobox does. - The other upside of this is that if for any obscure unforeseen reason it does produce large numbers of problematic results, it is easily reverted.
- Yet another upside is that if the first version is OK, and someone comes up with an improvement, it is easy to implement, and thousands of SDs are improved at once.
- If there is a downside other than the sub-optimal part, I am missing it. 890,000 more pushes us a lot closer to the target.· · · Peter Southwood (talk): 13:59, 6 January 2019 (UTC)
- Gonnym, once you have the module to extract SDs, would it be a big issue to refine them if a better style is proposed? · · · Peter Southwood (talk): 14:13, 6 January 2019 (UTC)
- Well, it isn't a "big issue". However, I personally dislike writing code for the sake of having code. I'd rather flesh out something which is pretty good already as a first draft. So taking Infobox officeholder as an example - do we pick the highest/most prestige occupation (President vs V.P. / King vs Duke)? Do we add the Nth number of that position (23rd president)? Do we add the country?. For Infobox film (which I started a discussion but got no participation for it), I've noticed 4 different manual SD being used - any of those can be made to be the default, but again, the question is which one. Also worth noting, that after a style has been decided upon, it is now turn for the edge cases and missing parameter fixes. So say we decide that we use for Infobox officeholder the highest position and number. So we have "23rd president of [...]", but no one entered a value for the number" so we need to change the style to "President of [...]" - these things aren't hard to do, but they just take time to understand, so we don't end up with "rd president of" because we don't have a number or country information. --Gonnym (talk) 14:25, 6 January 2019 (UTC)
- Coincidentally, I've just been looking at the occupation/nationality intersection to auto-create categories via c:Template:Wikidata Infobox on Commons, drawing the information from Wikidata. The biggest problem I've come across is the sheer number of items that can be returned, so for Franz Kafka, it gives 33 results (the displayed text is the sort key as we're not actually adding these categories):
- The infobox on the Kafka article only gives two nationalities and three occupations, so we're saved from the worst excesses of data-mad Wikidatans (by not using Wikidata), but even then you have to make a decision on which nationality/occupation from the infobox is to be preferred to use as the short description. My personal recommendation is "sod it, just use the first one of each" and let editors write a manual version if they can improve on that. YMMV --RexxS (talk) 16:22, 6 January 2019 (UTC)
- Ok, so lets work with Kafka as a test case.
- Well, it isn't a "big issue". However, I personally dislike writing code for the sake of having code. I'd rather flesh out something which is pretty good already as a first draft. So taking Infobox officeholder as an example - do we pick the highest/most prestige occupation (President vs V.P. / King vs Duke)? Do we add the Nth number of that position (23rd president)? Do we add the country?. For Infobox film (which I started a discussion but got no participation for it), I've noticed 4 different manual SD being used - any of those can be made to be the default, but again, the question is which one. Also worth noting, that after a style has been decided upon, it is now turn for the edge cases and missing parameter fixes. So say we decide that we use for Infobox officeholder the highest position and number. So we have "23rd president of [...]", but no one entered a value for the number" so we need to change the style to "President of [...]" - these things aren't hard to do, but they just take time to understand, so we don't end up with "rd president of" because we don't have a number or country information. --Gonnym (talk) 14:25, 6 January 2019 (UTC)
- Gonnym, once you have the module to extract SDs, would it be a big issue to refine them if a better style is proposed? · · · Peter Southwood (talk): 14:13, 6 January 2019 (UTC)
- The automatically generated short descriptions do not have to be perfect, they just have to be good enough, as manually added SDs will override them with no difficulty if anyone wants to improve them or if they are already in the article. We should consider them as the first approximation rather than the final product. For BLPs the critical thing is they must not be wrong.
Kafka test case
- Option 1 - Nationality and occupation, first of each -
Austro-Hungarian novelist.
- Option 2 - Nationality and occupation, all -
Austro-Hungarian and Czechoslovakian novelist, short story writer and insurance officer.
- Option 3 - decade and occupation (first) -
19th century novelist.
- Option 4 - decade and occupation (all) -
19th century novelist, short story writer and insurance officer.
- Option 5 - Nationality (first) decade and occupation -
19th century Austro-Hungarian novelist.
- Option 6 - Nationality (all) decade and occupation (first) -
19th century Austro-Hungarian and Czechoslovakian novelist.
- Option 7 - Nationality (all) decade and occupation (all) -
19th century Austro-Hungarian and Czechoslovakian novelist, short story writer and insurance officer.
Could be others, but those seem to me the major ones. Agree that the first nationality and occupation is indeed better in this case. Question is, is this always the case in Infobox Person? --Gonnym (talk) 16:37, 6 January 2019 (UTC)
- Just some thoughts (to provoke rather than to advise). When biographical titles are disambiguated this will usually be by occupation, see WP:NCPDAB. Disambiguating items might be omitted from the SD if you regard these as supplementary (as I do). Last (chronological) nationality may be more significant than first (and may most often be listed last). Nationality can be (very) controversial. I think we overdo nationality on WP but dates of birth/death/activity are very helpful and less controversial. All Kafka's work was 20th century though he was born 19th. I see from his categories: Category:19th-century Austrian people, Category:20th-century Austrian writers, Category:20th-century Austrian novelists. Are categories parseable?. Thincat (talk) 17:22, 6 January 2019 (UTC)
- PS, I see Category:20th-century novelists declares itself to contain "Novelists active in the 20th century". Wikipedia:Categorization of people may be helpful. Thincat (talk) 17:47, 6 January 2019 (UTC)
- Try and limit the options to fields that are from the infobox, as then I can just use whatever we add there for input. If it comes from categories, I'm not sure how we can (if we can) access that from the infobox template. Also note regarding disambiguation, while you are right that we can omit the occupation if it is listed, the next in-line wouldn't necessarily be any good of an option. So say Kafka (novelist), the next occupation is short story writer, does that really help? (or insurance officer)? Also agree that nationality can be problematic, though if the field has only one value that is less problematic (remember that we can style the SD with if/elses to fit our needs). --Gonnym (talk) 18:16, 6 January 2019 (UTC)
- [ec] I am inclined to agree with RexxS' pithy comment above. This is a fine example of perfect is the enemy of good enough. First of each is good enough. Anyone who chooses to handcraft a work of art can do so in their own time. Good enough but way short of perfect may also encourage people to try do do better themselves, and that may led to more people doing manual short descriptions in articles where automation has nothing easy to work from. Sweating over the details to try to get slightly better at the cost of a lot of work does not seem a good plan. If getting it better is important to some, lets do it after getting the quantity up first.
- One way to find out if first nationality and occupation are always best is to do it and see what happens. If there are no complaints it is probably near enough. If there are a moderate number, we ask people to fix them. If there is a huge number, we revert. I don't expect to be reverting. Cheers, · · · Peter Southwood (talk): 18:29, 6 January 2019 (UTC)
- Agree and on the whole brief and bland should be the watchwords here. It's a search aid, not a complete and academically watertight definition of a topic. But as Gonnym says "nationality can be problematic", and the nationality as given in the article may have resulted from a vigorous debate as with Nikola Tesla. So preferred options may be:
- Nationality (all); occupation {first)
- Nationality if one word only, else omit; occupation (first)
- : Bhunacat10 (talk), 09:57, 7 January 2019 (UTC)
- I don't think either of these is significantly better or worse than Nationality (first) occupation (first), as a functional description and suggest we leave it to the coder to decide between these three options. Nationality (first) occupation (first) does have the advantage of probably being the easiest to code and the quickest to implement. Whichever of these is chosen, it is a placeholder for a better hand-crafted short description which may be provided tomorrow, in a year's time, or possibly never. · · · Peter Southwood (talk): 17:00, 7 January 2019 (UTC)
- If there was vigorous debate, the handcrafted version is likely to appear sooner rather than later, which is just fine. · · · Peter Southwood (talk): 17:02, 7 January 2019 (UTC)
- Agree and on the whole brief and bland should be the watchwords here. It's a search aid, not a complete and academically watertight definition of a topic. But as Gonnym says "nationality can be problematic", and the nationality as given in the article may have resulted from a vigorous debate as with Nikola Tesla. So preferred options may be:
Current BRFA relating to short descriptions
Please see Wikipedia:Bots/Requests for approval/DannyS712 bot. --DannyS712 (talk) 06:29, 7 January 2019 (UTC)
Shortcut to {{short description}}
?
Please see the RfD discussion: Bhunacat10 (talk), 11:08, 15 January 2019 (UTC)
Infobox album auto short description
Please see Template talk:Infobox album#Adding short description. Galobtter (pingó mió) 08:05, 20 January 2019 (UTC)
Shortdesc helper into a gadget
I've proposed making shortdesc helper into a gadget here. Galobtter (pingó mió) 08:12, 20 January 2019 (UTC)
- See the finished discussion at Wikipedia:Village pump (technical)/Archive 172#Proposed gadget: Shortdesc helper. Thanks, Willbb234Talk (please {{ping}} me in replies) 13:35, 22 September 2019 (UTC)
"Sub-tasks" section, incl. personal
I noticed a section naming a few sub-tasks including one about scuba diving, which appears to be a personal task 9that is already completed). Can any project member add to the list of personal tasks? I am currently focusing on adding short descs. for archaeology projects and would like to include that if it's allowed. YuriNikolai (talk) 23:46, 8 February 2019 (UTC)
- @YuriNikolai: Go ahead and do it. Welcome to the project! --DannyS712 (talk) 23:51, 8 February 2019 (UTC)
- Thanks. It's done! YuriNikolai (talk) 02:13, 9 February 2019 (UTC)