Wikipedia:WikiProject Women in Red/Wikidata redlist guide
This is a WikiProject advice page. It contains the advice or opinions of one or more WikiProjects on Wikipedia or its process, as pertaining to topics within the WikiProject(s) area of interest. This page is not one of Wikipedia's policies or guidelines, as it has not been thoroughly vetted by the community. |
If you need help creating or fixing a Wikidata-based redlist, ask at Wikipedia talk:WikiProject Women in Red or wikidata:Wikidata:Request a query. |
This Wikidata redlist guide provides step-by-step guidance to create Women in Red redlists. Although this guide is focused on Women in Red, it may be useful to create Wikidata-based lists for other purposes.
Preliminaries
editIn order to create a Wikidata-based redlist, you will need:
- Basic understanding of template usage, see Help:Transclusion.
- Basic understanding of what Wikidata is.
- A grasp of SPARQL queries, see wikidata:Wikidata:SPARQL tutorial. You can learn even more at wikidata:Wikidata:SPARQL query service/Wikidata Query Help.
You will use the following tools:
- Wikidata Query Service (query.wikidata.org).
- {{Wikidata list}} and {{Wikidata list end}} templates.
Basics
editSimple example
editLet's start with a trivial Wikidata list. It will have a single entry for Ada Lovelace and we'll use the following query:
SELECT ?item WHERE {
?item wdt:P31 wd:Q5 .
?item wdt:P21 wd:Q6581072 .
?item wdt:P735 wd:Q346047 .
?item wdt:P734 wd:Q1260681 .
}
Click here to launch the Wikidata query
The above query will get every Wikidata item fulfills these conditions:
- Is a human: instance of (P31) human (Q5).
- Is a female: sex or gender (P21) female (Q6581072).
- Has given name Ada: given name (P735) Ada (Q346047).
- Has family name Byron: family name (P734) Byron (Q1260681).
Make sure you use female (Q6581072), and not female organism (Q43445). |
Now that we have a SPARQL query that returns the entries we want, we can create the redlist using {{Wikidata list}} (and remembering to include a {{Wikidata list end}} template):
wikitext
|
---|
{{Wikidata list |sparql=SELECT ?item WHERE { ?item wdt:P31 wd:Q5 . ?item wdt:P21 wd:Q6581072 . ?item wdt:P735 wd:Q346047 . ?item wdt:P734 wd:Q1260681 . } |columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item |links=red |thumb=40 }} {{Wikidata list end}} |
ListeriaBot will take care of updating it automatically, producing the following output:
result
| ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
End of auto-generated list.
|
Notice that the query returns only ?item. Columns in the table it generates are specified in the |columns=
parameter of the {{Wikidata list}} template. See Template:Wikidata listfor more information on Wikidata list parameters.
Missing articles
editIn order to list only items without a corresponding article in the English Wikipedia, every redlist needs the following SPARQL fragment:
OPTIONAL { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. }
FILTER(!(BOUND(?w)))
You will also see the following equivalent form:
FILTER NOT EXISTS { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/> . }
Number of sites
editWhen looking for notable subjects, it is often useful to look at how many Wikimedia projects have a page for a given item. This number can be retrieved with the following SPARQL fragment:
?item wikibase:sitelinks ?linkcount .
Here's a modified version of the simple example modified to add a column with link count:
wikitext
|
---|
{{Wikidata list |sparql=SELECT ?item ?linkcount WHERE { ?item wdt:P31 wd:Q5 . ?item wdt:P21 wd:Q6581072 . ?item wdt:P735 wd:Q346047 . ?item wdt:P734 wd:Q1260681 . ?item wikibase:sitelinks ?linkcount . # number of site links } |columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links |links=red |thumb=40 }} {{Wikidata list end}} |
result
| ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
End of auto-generated list.
|
Handling large results
editThe number of results for a SPARQL query can often be in the thousands or tens of thousands. That is way beyond what we can handle in a wiki redlist, so we need to cut it own. The number of results of a query can be limited by adding a LIMIT clause to the end. For example, LIMIT 1000 to limit results to 1000.
However, if we use LIMIT alone, the results that make it into the list will be arbitrary, and they might not be the most relevant. So it is a good idea to always apply order criteria. A limit with our recommended order follows:
ORDER BY DESC(?linkcount) ASC(?item)
LIMIT 1000
This limits the results to the top 1000 by number of sites. If two items have the same number of sites, the one with the lowest item number takes precedence. This makes the result deterministic, meaning that in the absence of actual data changes, the query will always return the same set of 1000 results. If we didn't do this, the bot will repeatedly remove and add back items in subsequent updates.
Occupation
editOne of the most common criterion for redlist is occupation (P106). Check out current redlists by occupation. We specify one or more occupations as follows:
?item wdt:P106 ?occ
VALUES ?occ {
wd:Q5468707 # forensic entomologist
wd:Q27645949 # paleoentomologist
wd:Q3055126 # entomologist
}
This will include items where occupation (P106) is either forensic entomologist (Q5468707), paleoentomologist (Q27645949), or entomologist (Q3055126). The comments in the query (e.g. # entomologist) are optional, but they can make the query more readable to humans.
Here's a full example of a redlist of 5 entomologist women (see also the actual Entomologists redlist):
wikitext
|
---|
{{Wikidata list |sparql=SELECT DISTINCT ?item ?linkcount WHERE { ?item wdt:P106 ?occ . VALUES ?occ { wd:Q5468707 # forensic entomologist wd:Q27645949 # paleoentomologist wd:Q3055126 # entomologist } ?item wdt:P21 wd:Q6581072 . ?item wdt:P31 wd:Q5 . ?item wikibase:sitelinks ?linkcount . OPTIONAL { ?w schema:about ?item; schema:isPartOf <https://en.wikipedia.org/>. } FILTER(!(BOUND(?w))) } ORDER BY DESC(?linkcount) ASC(?item) LIMIT 5 |columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links |links=red |thumb=40 }} {{Wikidata list end}} |
result
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
End of auto-generated list.
|
Country
editSee our country redlists. A simple approach to create this would be using the country of citizenship (P27) property. But Wikidata may be missing the country of citizenship, but it may have other geographical properties that would be good enough for our purposes. So we can use a combination of country of citizenship (P27), country (P17), country of origin (P495), country for sport (P1532), and place of birth (P19). We can do it with the following SPARQL fragment:
VALUES ?country {
wd:Q189 # Iceland
}
{
{ ?item (wdt:P27|wdt:P17|wdt:P495|wdt:P1532) ?country. }
UNION
{ ?item (wdt:P19/wdt:P17) ?country. }
}
This will generate duplicate results in many case. Use SELECT DISTINCT instead of SELECT to avoid it. |
Here's a full example of a redlist of 5 women from Honduras (see also the actual Honduras redlist):
wikitext
|
---|
{{Wikidata list |sparql=SELECT DISTINCT ?item ?linkcount WHERE { VALUES ?country { wd:Q783 } { { ?item (wdt:P27|wdt:P17|wdt:P495|wdt:P1532) ?country. } UNION { ?item (wdt:P19/wdt:P17) ?country. } } ?item wdt:P21 wd:Q6581072 . ?item wdt:P31 wd:Q5 . ?item wikibase:sitelinks ?linkcount . OPTIONAL { ?w schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> . } FILTER(!BOUND(?w)) } ORDER BY DESC(?linkcount) ASC(?item) LIMIT 5 |columns=label:name,P18,description,P106,P569,P570,P19,P20,item:wikidata item,?linkcount:site links |links=red |thumb=40 }} {{Wikidata list end}} |
result
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
This list is automatically generated from data in Wikidata and is periodically updated by Listeriabot.
End of auto-generated list.
|
Troubleshooting
editKilled by OS for overloading memory
editA list may fail to update because the bot ran out of memory. This is signaled with the error Killed by OS for overloading memory on manual updated. This problem is a known problem of ListeriaBot, and it is usually because there are many links to large entities. A workaround is reducing the number of links to geographical entitites. For example, removing the place of death (P20) column.