User talk:Magnus Manske/Archive 7

This is an archive of past discussions about User:Magnus Manske. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

←

Archive 5

Facto Post – Issue 8 – 15 January 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 8 – 15 January 2018

Metadata on the March

From the days of hard-copy liner notes on music albums, metadata have stood outside a piece or file, while adding to understanding of where it comes from, and some of what needs to be appreciated about its content. In the GLAM sector, the accumulation of accurate metadata for objects is key to the mission of an institution, and its presentation in cataloguing.

Today Wikipedia turns 17, with worlds still to conquer. Zooming out from the individual GLAM object to the ontology in which it is set, one such world becomes apparent: GLAMs use custom ontologies, and those introduce massive incompatibilities. From a recent article by sadads, we quote the observation that "vocabularies needed for many collections, topics and intellectual spaces defy the expectations of the larger professional communities." A job for the encyclopedist, certainly. But the data-minded Wikimedian has the advantages of Wikidata, starting with its multilingual data, and facility with aliases. The controlled vocabulary — sometimes referred to as a "thesaurus" as term of art — simplifies search: if a "spade" must be called that, rather than "shovel", it is easier to find all spade references. That control comes at a cost.

SVG pedestrian crosses road

Zebra crossing/crosswalk, Singapore

Case studies in that article show what can lie ahead. The schema crosswalk, in jargon, is a potential answer to the GLAM Babel of proliferating and expanding vocabularies. Even if you have no interest in Wikidata as such, simply vocabularies V and W, if both V and W are matched to Wikidata, then a "crosswalk" arises from term v in V to w in W, whenever v and w both match to the same item d in Wikidata.

For metadata mobility, match to Wikidata. It's apparently that simple: infrastructure requirements have turned out, so far, to be challenges that can be met.

Links

1lib1ref campaign starts today, see m:The Wikipedia Library/1Lib1Ref: also #1lib1ref introductory video by Felix Nartey
Funders should mandate open citations, article 9 January 2018 in Nature by David Shotton
From snowflake to avalanche: Possibilities of using free citation data in libraries, translation from the German original of Annette Klein, Mannheim University Library
outreach:GLAM/Newsletter/December 2017/Contents/WMF GLAM report
Why Mickey Mouse’s 1998 copyright extension probably won't happen again: Copyrights from the 1920s will start expiring next year if Congress doesn't act, Timothy B. Lee, 8 January 2018, Arstechnica

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 12:38, 15 January 2018 (UTC)

Help with researching citations/references on Wikipedia articles

Latest comment: 6 years ago1 comment1 person in discussion

I'm a journalist from Israel covering Wikipedia and a graduate student at the Cohn Institute for the History and Philosophy of Ideas. In the past year I have been working with Rona Aviram, a biologist from the Weizmann Institute, on a research project on Wikipedia articles and their ties to science.

We are looking for a way to work with a WP article's reference list. Specifically, we are looking for a tool that can scrape articles history to tell us when a specific reference was added to an article.

Some suggested using Wiki Blame to search for when any individual reference was added. But, we wanted to know if there is a way to do it for an entire article. It seems that answer is negative and I was hoping you could prove otherwise

We are researching the latency between an academic article's publication and its integration into a Wikipedia and would love some help in finding a technological solution to this question for a group of about 2-4 Wikipedia articles which are our topic subject.

Hoping to pass your Darwinian spam filter --Omer Benjakob (talk) 13:43, 16 January 2018 (UTC)

Deletions by ListeriaBot

Latest comment: 6 years ago2 comments2 people in discussion

Hi! I just added some names to the Women in Red/Spain list and saw that ListeriaBot deleted my contributions from 15 of January. Could you help me with that?

I just saw that it had similar problems with the list of Israel.

Regards! --DaddyCell (talk) 10:51, 24 January 2018 (UTC)

On the page you linked to, please read the line on top of the table, the one that is both bold and highlighted in yellow, so no one could possibly miss it... --Magnus Manske (talk) 11:20, 24 January 2018 (UTC)

Move-to-commons assistant

Latest comment: 6 years ago2 comments2 people in discussion

Dear Magnus, I had in mind to start moving pics from dewiki to Commons, with Move-to-commons assistant, but whenn I try to authorize OAuth uploader I get error mess: Error retrieving token: mwoauthdatastore-request-token-not-found see link. I also tried to make it manually at https://www.mediawiki.org/wiki/Special:OAuthManageMyGrants but get a new error message which reads in Swedish: Åtkomst-nyckeln för denna konsument har tagits bort.. can you kindly advice me, what is my next step? Dan Koehl (talk) 15:07, 25 January 2018 (UTC)

Next step is ignore the error message, OAuth should work. --Magnus Manske (talk) 09:46, 26 January 2018 (UTC)

ISCB Wikipedia Competition 2018: entries open!

Latest comment: 6 years ago1 comment1 person in discussion

ISCB Wikipedia Competition 2018: entries open!

The International Society for Computational Biology (ISCB) and WikiProject Computational Biology are pleased to call for participants in the 2018 ISCB Wikipedia Competition. The ISCB aims to improve the communication of scientific knowledge to the public at large, and Wikipedia and its sister sites play an increasingly important role in this communication; the ISCB Wikipedia Competition aims to improve the quality of Wikipedia articles relating to computational biology. Entries to the competition are open now; the competition closes on 31 Dec 2018.

For students/trainees: Entry to the competition is open internationally to students and trainees of any level, both as individuals and as groups. Prizes of up to $500 will be awarded to the best contributions as chosen by a judging panel of experts; these will be awarded at the ISMB/ECCB conference in Basel, Switzerland in July 2019. As in previous years, the ISCB encourages competition entries for contributions to Wikipedia in any language, and contributions to Wikidata items.

For teachers/trainers: We encourage you to pass this invitation on to your students, and consider using the competition as part of an in-class assignment.

Further details may be found at: Wikipedia:WikiProject Computational Biology/ISCB competition announcement 2018.

If you wish to opt-out of future mailings from WikiProject Computational Biology, please remove yourself from the mailing list or alternatively to opt-out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. (Message delivered:MediaWiki message delivery (talk) 12:02, 1 February 2018 (UTC))

A page you started (False Cross) has been reviewed!

Latest comment: 6 years ago1 comment1 person in discussion

Thanks for creating False Cross, Magnus Manske!

Wikipedia editor Dan Koehl just reviewed your page, and wrote this note for you:

Looks good to me! :)

To reply, leave a comment on Dan Koehl's talk page.

Learn more about page curation.

Dan Koehl (talk) 03:19, 3 February 2018 (UTC)

Facto Post – Issue 9 – 5 February 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 9 – 5 February 2018

m:Grants:Project/ScienceSource is the new ContentMine proposal: please take a look.

Wikidata as Hub

One way of looking at Wikidata relates it to the semantic web concept, around for about as long as Wikipedia, and realised in dozens of distributed Web institutions. It sees Wikidata as supplying central, encyclopedic coverage of linked structured data, and looks ahead to greater support for "federated queries" that draw together information from all parts of the emerging network of websites.

Another perspective might be likened to a photographic negative of that one: Wikidata as an already-functioning Web hub. Over half of its properties are identifiers on other websites. These are Wikidata's "external links", to use Wikipedia terminology: one type for the DOI of a publication, another for the VIAF page of an author, with thousands more such. Wikidata links out to sites that are not nominally part of the semantic web, effectively drawing them into a larger system. The crosswalk possibilities of the systematic construction of these links was covered in Issue 8.

Wikipedia:External links speaks of them as kept "minimal, meritable, and directly relevant to the article." Here Wikidata finds more of a function. On viaf.org one can type a VIAF author identifier into the search box, and find the author page. The Wikidata Resolver tool, these days including Open Street Map, Scholia etc., allows this kind of lookup. The hub tool by maxlath takes a major step further, allowing both lookup and crosswalk to be encoded in a single URL.

Links

What galleries, libraries, archives, and museums can teach us about multimedia metadata on Wikimedia Commons, Wikimedia Foundation blogpost, 29 January 2018, by Jonathan Morgan and Sandra Fauconnier
m:The Wikipedia Library/1Lib1Ref/Connect, 2018 institutional participation in the #1lib1ref campaign
Newspeak House queries, created at 3 February 2018 event in London led by Magnus Manske
Cochrane–Wikipedia Initiative, Wikipedia Signpost special report 5 February 2018, by JenOttowa
What is the Last Question?, 5 February 2018

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 11:50, 5 February 2018 (UTC)

Catscan 2

Latest comment: 9 years ago1 comment1 person in discussion

Hi,

First of all, thank you for your work on the tools in wikimedia labs. I would need to use catscan 2 for a schoolar work but I saw this morning the tools was down. Do you have time to fix it ?

Thank you.

— Preceding unsigned comment added by 88.138.241.72 (talk • contribs) 06:54, 11 April 2015 (UTC)

ArbCom Elections 2016: Voting now open!

Latest comment: 7 years ago1 comment1 person in discussion

Hello, Magnus Manske. Voting in the 2016 Arbitration Committee elections is open from Monday, 00:00, 21 November through Sunday, 23:59, 4 December to all unblocked users who have registered an account before Wednesday, 00:00, 28 October 2016 and have made at least 150 mainspace edits before Sunday, 00:00, 1 November 2016.

The Arbitration Committee is the panel of editors responsible for conducting the Wikipedia arbitration process. It has the authority to impose binding solutions to disputes between editors, primarily for serious conduct disputes the community has been unable to resolve. This includes the authority to impose site bans, topic bans, editing restrictions, and other measures needed to maintain our editing environment. The arbitration policy describes the Committee's roles and responsibilities in greater detail.

If you wish to participate in the 2016 election, please review the candidates' statements and submit your choices on the voting page.

— Preceding unsigned comment added by MediaWiki message delivery (talk • contribs) 22:06, 21 November 2016 (UTC)

Facto Post – Issue 2 – 13 July 2017

Latest comment: 7 years ago1 comment1 person in discussion

Facto Post – Issue 2 – 13 July 2017

Editorial: Core models and topics

Wikimedians interest themselves in everything under the sun — and then some. Discussion on "core topics" may, oddly, be a fringe activity, and was popular here a decade ago.

The situation on Wikidata today does resemble the halcyon days of 2006 of the English Wikipedia. The growth is there, and the reliability and stylistic issues are not yet pressing in on the project. Its Berlin conference at the end of October will have five years of achievement to celebrate. Think Wikimania Frankfurt 2005.

Progress must be made, however, on referencing "core facts". This has two parts: replacing "imported from Wikipedia" in referencing by external authorities; and picking out statements, such as dates and family relationships, that must not only be reliable but be seen to be reliable.

In addition, there are many properties on Wikidata lacking a clear data model. An emerging consensus may push to the front key sourcing and biomedical properties as requiring urgent attention. Wikidata's "manual of style" is currently distributed over thousands of discussions. To make it coalesce, work on such a core is needed.

Links

WikiFactMine project pages on Wikidata, including a SPARQL library (in development).
Fatameh tool for adding items on scientific papers to Wikidata, by User: T Arrow. It has made a big recent impact. Offline for maintenance as we go to press, it is expected back soon.
As of July 2017, Zotero has a Wikidata translator. A personal Zotero library acts as an intermediary in managing and storing citation metadata.
GLAM Newsletter June 2017, Wikidata report. This is a good monthly round-up to follow, and welcomes contributions.
Exciting and Impressive! The Initiative for Open Citations (I4OC) was launched in April: Infodocket on the first three months.
Olivia Solon in San Francisco, Why the net neutrality protest matters, opinion piece in The Guardian on 11 July.

Editor Charles Matthews. Please leave feedback for him.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

— Preceding unsigned comment added by MediaWiki message delivery (talk • contribs) 08:12, 13 July 2017 (UTC)

Update on File:FailFest and Learning Pattern Hackathon - Berlin 2015.pdf

Latest comment: 6 years ago1 comment1 person in discussion

Hi @Magnus Manske:, thank you for creating the tool CommonsDelinker. I thought it was an actual person and I left a message on the talk page . I wanted to give you an update on this file removed from our event documentation. I talked to user:Taivo, and he explained which image was not CC-licensed. I removed the image and I re-uploaded the file. Since it has the same name, embedding the file on our event page was automatic, so what I did was revert the bot's edit, in order to not have to re-write the markup. I hope this is ok! Please let me know if you have any questions, and thanks again for all the great work. María (WMF) (talk) 00:20, 3 March 2018 (UTC)

Extra empty lines

Latest comment: 6 years ago1 comment1 person in discussion

Hi Magnus Manske! I noticed these extra empty lines at the end of the tables. Would it be possible to avoid them? Thanks in advance. -- Basilicofresco (msg) 18:23, 5 March 2018 (UTC)

Abrufstatistik

Latest comment: 6 years ago1 comment1 person in discussion

Hi, Magnus Manske, bei mir funktioniert das leider momentan nicht. Man kann auf Do it klicken wie man will, nix passiert. LG, --Gyanda (talk) 16:33, 6 March 2018 (UTC)

Quarry

Latest comment: 6 years ago3 comments2 people in discussion

Hello, Magnus. Could you, please, tell me if there is a documentation for PagePile? Thank you. IKhitron (talk) 18:15, 1 March 2018 (UTC)

The closest thing I know of is my original blog post. --Magnus Manske (talk) 15:03, 6 March 2018 (UTC)

Thanks a lot. IKhitron (talk) 22:33, 9 March 2018 (UTC)

Facto Post – Issue 10 – 12 March 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 10 – 12 March 2018

Milestone for mix'n'match

Around the time in February when Wikidata clicked past item Q50000000, another milestone was reached: the mix'n'match tool uploaded its 1000th dataset. Concisely defined by its author, Magnus Manske, it works "to match entries in external catalogs to Wikidata". The total number of entries is now well into eight figures, and more are constantly being added: a couple of new catalogs each day is normal.

Since the end of 2013, mix'n'match has gradually come to play a significant part in adding statements to Wikidata. Particularly in areas with the flavour of digital humanities, but datasets can of course be about practically anything. There is a catalog on skyscrapers, and two on spiders.

These days mix'n'match can be used in numerous modes, from the relaxed gamified click through a catalog looking for matches, with prompts, to the fantastically useful and often demanding search across all catalogs. I'll type that again: you can search 1000+ datasets from the simple box at the top right. The drop-down menu top left offers "creation candidates", Magnus's personal favourite. m:Mix'n'match/Manual for more.

For the Wikidatan, a key point is that these matches, however carried out, add statements to Wikidata if, and naturally only if, there is a Wikidata property associated with the catalog. For everyone, however, the hands-on experience of deciding of what is a good match is an education, in a scholarly area, biographical catalogs being particularly fraught. Underpinning recent rapid progress is an open infrastructure for scraping and uploading.

Congratulations to Magnus, our data Stakhanovite!

Links

3D printing

Wikipedia goes 3D allowing users to upload .STLs for digital reference, Beau Jackson for 3dprintingindustry.com, February 22 2018
WikiCite report (video)
Formal publication and announcement of ISBN citation dataset, see Twitter post, February 23 2018
Plotting the Course Through Charted Waters, workshop on data visualization literacy from Mikhail Popov, Wikimedia Foundation
Using Wikidata to build an authority list of Holocaust-era ghettos, Nancy Cooey, United States Holocaust Memorial Museum, February 12 2018
Why Should You Learn SPARQL? Wikidata! Mark Longair, blogpost November 29 2017
Back to the future: Does graph database success hang on query language?, George Anadiotis for Big on Data, March 5 2018

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 12:26, 12 March 2018 (UTC)

ListeriaBot: duplicate items lost; date format

Latest comment: 6 years ago1 comment1 person in discussion

Hi, Magnus. I am using ListeriaBot to generate some lists on rowiki. I have two small issues, best exemplified by ro:Prim-ministru al Danemarcei (Prime Minister of Denmark), where a list is generated after this query:

Is there a way for the bot to format the dates according to a default format connected to the language of the wiki? Or for the user to make the bot pass the value to a template (we have our version of {{Date}} that can format it itself).
Even though the query does not request unique items, the bot seems to be losing some entries that refer to the same item: the bot-generated list does not contain the second mandate of people who were prime ministers more than once, such as Christian Albrecht Bluhme, Carl Christian Hall or even the current Lars Løkke Rasmussen. When run on the query service, the query lists all entries.- Andrei (talk) 13:19, 13 March 2018 (UTC)

delete catalog 1078 & 1079

Latest comment: 6 years ago3 comments2 people in discussion

Hi, I incorrectly imported this catalog https://tools.wmflabs.org/mix-n-match/#/catalog/1078 Could you please delete this. thanks --আফতাব (talk) 19:27, 16 March 2018 (UTC)

Delete this one too https://tools.wmflabs.org/mix-n-match/#/catalog/1079 This time i imported correctly but then i realise it is better not to include "_" in title. because of this, "Automatically matched" gives me very few result. now if i want to work with "Unmatched", i have to work more. so please delete this one too. sorry to trouble. --আফতাব (talk) 21:40, 16 March 2018 (UTC)

Both deactivated. --Magnus Manske (talk) 15:58, 20 March 2018 (UTC)

Any chance you could help?

Latest comment: 6 years ago1 comment1 person in discussion

With this question? Wikipedia:Reference_desk/Archives/Computing/2018_March_12#Unique_visitors_to_Wikipedia_and_popularity_by_region. (If you reply here please ping me - thanks). --_{Piotr Konieczny aka Prokonsul Piotrus| reply here} 09:22, 28 March 2018 (UTC)

add-information tool is down

Latest comment: 6 years ago3 comments3 people in discussion

For the past week or so, https://tools.wmflabs.org/add-information/ has not been working, either from the tool itself or the sidebar on Commons. Using Chrome, I get the message "This page isn’t working. tools.wmflabs.org is currently unable to handle this request. HTTP ERROR 500", and using Firefox I get simply a blank screen. Thanks for looking into this. --Animalparty! (talk) 22:14, 20 March 2018 (UTC)

Toolforge keeps silently breaking my tools. Restarted. --Magnus Manske (talk) 22:28, 21 March 2018 (UTC)

See also Commons:User_talk:NeoMeesje#Broken licensing Andy Dingley (talk) 10:03, 28 March 2018 (UTC)

pagepile

Latest comment: 6 years ago1 comment1 person in discussion

Hello again. I like this tool very much. Is there a way to make it even better, adding an "x" link beside each page, forgetting it from the pile forever on clicking? Thank you. IKhitron (talk) 14:08, 29 March 2018 (UTC)

Please remove mix-n-match catalog 1122 superseded by mix-n-match catalog 1123

Latest comment: 6 years ago2 comments2 people in discussion

Hi Magnus I love the import feature unfortunately the first import was more a test since I didn't want to do larger GNIS imports until I verified the import format. So I created 1122, but that was a subset of what became 1123.

A bit of clarification on the field mapping of the form to the display also would have made the first import cleaner. Thanks for making such a great tool! Wolfgang8741 says: If not you, then who? (talk) 21:54, 2 April 2018 (UTC)

1122 deactivated. --Magnus Manske (talk) 08:54, 3 April 2018 (UTC)

NCM on Mix'n'match?

Latest comment: 6 years ago2 comments2 people in discussion

Hey Magnus. The Brazilian government publishes a PD nomenclature list for goods traded within the Mercosul economic bloc. I tried importing that into Mix'n'match, but for some reason it requires URLs in the catalog. What can be done? ~★ nmaia ^d 12:12, 30 March 2018 (UTC)

It requires a URL, but it doesn't check if the URL is valid ;-) --Magnus Manske (talk) 09:01, 3 April 2018 (UTC)

Property of https://tools.wmflabs.org/mix-n-match/#/list/1134 should be P804 GNIS ID

Latest comment: 6 years ago1 comment1 person in discussion

I don't know why, I swear I used the correct property ID but the GNIS ID Antarctica P590 is being placed instead of GNIS ID P804. Now seeing the issue also on https://tools.wmflabs.org/mix-n-match/#/list/1137 when GNIS ID was correctly applied when the form was submitted.

Would you correct the property for the GNIS Falls USA in the https://tools.wmflabs.org/mix-n-match/#/list/1134 config?

It is already P804, for catalogs 1134-1137. Not sure where you saw P590? Example link?

Apparently I didn't read close enough... all GNIS imports should be P590 as that is GNIS ID and the description of GNIS ID says to use GNIS P804 for GNIS Antarctica so all but the GNIS Antarctica properties are incorrect and should be P590. Sorry about that, the description made me flip the properties in my head. So anything I uploaded for GNIS should be using P590.

Also a typo in hettps://tools.wmflabs.org/mix-n-match/#/catalog/1126 - description should be USGS not GNIS

Fixed.

Thanks

Finally, if we had a new export for data would it be easier to message you to replace or would creating a new instance be better? This is where datasets might be found to be erroneous or having updated with new content since these are static imports and not generated?

Mix'n'match is not really designed for continuous updates of existing entries. Invalid entries can be just tagged N/A on Mix'n'match. Everything else, let me know if there is a massive change "upstream". --Magnus Manske (talk) 20:08, 5 April 2018 (UTC)

Thanks, I think the natural next steps would include dynamic updates or updated dumps with additional ID's, but that's just food for thought for now. Other than being able to correct these issues on my own overall it seems to do its job nicely as designed.

Facto Post – Issue 11 – 9 April 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 11 – 9 April 2018

The 100 Skins of the Onion

Open Citations Month, with its eminently guessable hashtag, is upon us. We should be utterly grateful that in the past 12 months, so much data on which papers cite which other papers has been made open, and that Wikidata is playing its part in hosting it as "cites" statements. At the time of writing, there are 15.3M Wikidata items that can do that.

Pulling back to look at open access papers in the large, though, there is is less reason for celebration. Access in theory does not yet equate to practical access. A recent LSE IMPACT blogpost puts that issue down to "heterogeneity". A useful euphemism to save us from thinking that the whole concept doesn't fall into the realm of the oxymoron.

Some home truths: aggregation is not content management, if it falls short on reusability. The PDF file format is wedded to how humans read documents, not how machines ingest them. The salami-slicer is our friend in the current downloading of open access papers, but for a better metaphor, think about skinning an onion, laboriously, 100 times with diminishing returns. There are of the order of 100 major publisher sites hosting open access papers, and the predominant offer there is still a PDF.

Red onion cross section

From the discoverability angle, Wikidata's bibliographic resources combined with the SPARQL query are superior in principle, by far, to existing keyword searches run over papers. Open access content should be managed into consistent HTML, something that is currently strenuous. The good news, such as it is, would be that much of it is already in XML. The organisational problem of removing further skins from the onion, with sensible prioritisation, is certainly not insuperable. The CORE group (the bloggers in the LSE posting) has some answers, but actually not all that is needed for the text and data mining purposes they highlight. The long tail, or in other words the onion heart when it has become fiddly beyond patience to skin, does call for a pis aller. But the real knack is to do more between the XML and the heart.

Links

Crossref as a new source of citation data: A comparison with Web of Science and Scopus, CWTS blogpost 17 January 2018, Nees Jan van Eck, Ludo Waltman, Vincent Larivière, Cassidy Sugimoto
Citations with identifiers in Wikipedia, figshare dataset
Making women more visible online—with Wikidata tools!, Wikimedia blogpost 29 March 2018 by Sandra Fauconnier
Village pump discussion, Turn on mapframe? We’re ready if you are reaches conclusions
The Power of the Wikimedia Movement beyond Wikimedia, Forbes 28 March 2018, Michael Bernick
Tracing stolen bitcoin, blogpost 26 March 2018 by Ross J. Anderson

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 16:25, 9 April 2018 (UTC)

Apology for possibly bringing down PetScan

Latest comment: 6 years ago1 comment1 person in discussion

I wrote a program which uses the "union" feature of PetScan get a list of articles in a given set of categories. If the list of categories is high (>200) then I request PetScan in small chunks of categories for which I expect to not get a "414 Request URI too long" error. Initially the use case didn't have a lot of chunks and so I stop gapped the issue of fixing the 414 error by sending the request for all the chunks without waiting for a response. Everything was nice and sound until the deadly day came.

On 9Apr2018 around 10Hrs UTC+5.30, I accidentally ran the sloppy program with a test case which had a very large number of categories which resulted in more than 255 requests being sent to PetScan without waiting for any response. After some time, I noticed that the program didn't receive any response so I terminated it. I then came to know that PetScan went down temporarily after that incident which made me fell guilty as it was been due to my lethargy. I apologise for my mistake. I have fixed the program immediately to wait for a response before sending the next request. I feel for not doing that initially.

When I was trying to find a way to contact you I noticed that you were travelling during that period and have request not to break the tools during that period. I once again apologise for accidentally pulling down PetScan during that period. Sorry, Kaartic ^{correct me, if i'm wrong} 19:14, 10 April 2018 (UTC)

Request for Add-information tool

Latest comment: 6 years ago1 comment1 person in discussion

Now that https://tools.wmflabs.org/add-information/ is working again, could there be an command where it automatically removes the category Category:Media missing infobox template (if present)? Thanks again, --Animalparty! (talk) 18:30, 11 April 2018 (UTC)

Costa Rica is no longer available in Wikidata Todo

Latest comment: 6 years ago1 comment1 person in discussion

Hello. I have enjoyed using Wikidata Todo specially fixing problems related to my own country, Costa Rica. However, I see that this country is no longer available in the tool. Is it possible to have it back? Thank you. Green Mostaza (talk) 23:21, 16 April 2018 (UTC)

Mix'n match : new catalog

Latest comment: 6 years ago2 comments2 people in discussion

Hello Magnus Manske and thanks for your amazing tools! I would like to import a new catalog for Rural chapels in the Provence-Alpes-Côte-d'Azur region (France) : P5010 but I have this message : ERROR: Less than three coulmns on row 1: 157;chapelle Notre-Dame-des-Champs de Selonnet;Selonnet, Alpes de Haute-Provence

Thanks a lot for your help, Cheers

Without seeing the data you put into the importer, I can't really comment, but each row needs to have at least three tab-separated columns. --Magnus Manske (talk) 11:56, 27 April 2018 (UTC)

Hello Magnus Manske, we tried to import our catalog few days ago : https://tools.wmflabs.org/mix-n-match/#/catalog/1194 But now it says : This catalog appears to be empty, maybe the initial scraping is still running

Thanks a lot! Cheers Territo13 (talk) 15:30, 2 May 2018 (UTC)

New catalog 1214 with error

Latest comment: 6 years ago4 comments2 people in discussion

Hi Magnus. I uploaded a new catalog -FIFA player ID-, but show the next error:

Everything looks good, now importing 54678 entries into a new catalog 1214!

string(228) "#0 /data/project/mix-n-match/scripts/mixnmatch.php(40): ToolforgeCommon->getSQL(Object(mysqli), 'INSERT INTO ent...', 5) #1 /data/project/mix-n-match/public_html/import.php(242): MixNMatch->getSQL('INSERT INTO ent...') #2 {main}" There was an error running the query [Lock wait timeout exceeded; try restarting transaction] INSERT INTO entry (`catalog`,`ext_id`,`ext_url`,`ext_name`,`ext_desc`,`type`,`random`) VALUES (1214,"Entry ID","http://es.fifa.com/fifa-tournaments/players-coaches/people=Entry+ID","Entry name","Entry description","person",rand())

That first entry is the titles of the columns, wrongly added in the .tsv.

In the page https://tools.wmflabs.org/mix-n-match/#/catalog/1214 the catalog appears to be empty.

Regards, Jmmuguerza (talk) 06:31, 26 April 2018 (UTC)

I have deleted the (indeed empty) catalog. This looks like a temporary database issue (outside my control). Please try again. --Magnus Manske (talk) 11:58, 27 April 2018 (UTC)

Thanks Magnus. I tried again two times without luck. The server response "504 Gateway Time-out". I see the cataogs https://tools.wmflabs.org/mix-n-match/#/catalog/1217 and https://tools.wmflabs.org/mix-n-match/#/catalog/1218 created, but emptys. Regards, Jmmuguerza (talk) 18:40, 27 April 2018 (UTC)

It appears it worked fine for both, but 1218 ran auto-matching. I have deactivated 1217. Let me know if something is missing. --Magnus Manske (talk) 11:46, 3 May 2018 (UTC)

Request to add Tamil in Wikidata Terminator

Latest comment: 6 years ago2 comments1 person in discussion

Hi, could you please Tamil in Wikidata Terminator? It has one of the most vibrant Wikipedia communities in India. Availability of this data would help us to fix the content gap. Please let us know if we should provide any information to get this done. Thanks. --Ravi (talk) 22:14, 13 May 2018 (UTC)

Thanks for adding Tamil. For your info, the create Pagepile here link is not working. --Ravi (talk) 00:59, 15 May 2018 (UTC)

Facto Post – Issue 12 – 28 May 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 12 – 28 May 2018

ScienceSource funded

The Wikimedia Foundation announced full funding of the ScienceSource grant proposal from ContentMine on May 18. See the ScienceSource Twitter announcement and 60 second video.

A medical canon?

The proposal includes downloading 30,000 open access papers, aiming (roughly speaking) to create a baseline for medical referencing on Wikipedia. It leaves open the question of how these are to be chosen.

The basic criteria of WP:MEDRS include a concentration on secondary literature. Attention has to be given to the long tail of diseases that receive less current research. The MEDRS guideline supposes that edge cases will have to be handled, and the premature exclusion of publications that would be in those marginal positions would reduce the value of the collection. Prophylaxis misses the point that gate-keeping will be done by an algorithm.

Two well-known but rather different areas where such considerations apply are tropical diseases and alternative medicine. There are also a number of potential downloading troubles, and these were mentioned in Issue 11. There is likely to be a gap, even with the guideline, between conditions taken to be necessary but not sufficient, and conditions sufficient but not necessary, for candidate papers to be included. With around 10,000 recognised medical conditions in standard lists, being comprehensive is demanding. With all of these aspects of the task, ScienceSource will seek community help.

Links

OpenRefine logo, courtesy of Google

d:Wikidata:Lexicographical data, Wikidata's multi-lingual dictionary project gets going
Ordia tool, a basic search interface for Wikidata lexemes and forms
OpenRefine tool 3.0, May update allows wrangling of tabular information into Wikidata
d:Wikidata:WikiProject British Politicians pushes ahead with data modelling and imports
#1Lib1Ref Returns for a Second Time in 2018, IFLA blogpost 25 May 2018, second chance this year to participate in referencing Wikipedia

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see below.
Editor Charles Matthews, for ContentMine. Please leave feedback for him. Back numbers are here.
Reminder: WikiFactMine pages on Wikidata are at WD:WFM. ScienceSource pages will be announced there, and in this mass message.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 10:16, 28 May 2018 (UTC)

Bug report on TABernacle

Latest comment: 6 years ago2 comments2 people in discussion

Hello Magnus Manske,

I am new around here and I do not know how to properly deal with the problem I noticed: on the (great) tool TABernacle some HTML accents are displayed raw (the html entity name gets displayed instead of the character). This can be seen with item Q54437988 and property P31.

Please tell me if there is any way for me to try to fix it without bothering anyone.

Also, it would be cool to be able to define a default language, but that's less important.

Thank you very much again,

Nathann — Preceding unsigned comment added by Nathann.cohen (talk • contribs) 08:22, 30 May 2018 (UTC)

Hi, the escape issue should be fixed now. Language selector later ;-) --Magnus Manske (talk) 18:50, 31 May 2018 (UTC)

Distributed game

Latest comment: 6 years ago1 comment1 person in discussion

Hello! A short remark on your distributed game "Match new articles with Wikidata items": it would be nice if the game will distinguish usual articles and disambiguation pages (maybe it does for other languages, but not for Russian?), because there is quite a lot of suggestion to connect an article and a disambiguation. Wikisaurus (talk) 15:32, 5 June 2018 (UTC)

Restart of Petscan on 2018-06-06 following Cloud VPS server reboots

Latest comment: 6 years ago1 comment1 person in discussion

I poked around a bit and used the process I just documented at https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Petscan&diff=1794081&oldid=382814 to get https://petscan.wmflabs.org/ running following the planned server maintenance in Cloud VPS on 2018-06-06. A user had reported it down at phab:T196568. If you would like help figuring out how to make this service start automatically when the virtual machine starts, please make a Phabricator task and ping me (bd808) on it. --BDavis (WMF) (talk) 22:59, 6 June 2018 (UTC)

Women in Red tools and technical support

Latest comment: 6 years ago1 comment1 person in discussion

We are preparing a list of tools and technical support for Women in Red. We have tentatively added your name as you have provided general technical support, including tool developments. Please let me know whether you agree to be listed. You are of course welcome to make any additions or corrections.--Ipigott (talk) 10:18, 8 June 2018 (UTC)

Possible reference loop between VIAF and Wikidata

Latest comment: 6 years ago1 comment1 person in discussion

Hi, Magnus. I don't know how to proceed on this. Please look at the history on wikidata:Q4767509, and at the WP help desk entry on "Wrong DOB" (permalink). We seem to have some sort of reference loop, where one of your bots modifies Wikidata based on VIAF, but VIAF picks up data from Wikidata. I would be grateful if you could check out this specific instance to find out where the "1950" date came from originally, and also see if this is a more general problem. I suspect @Majora: and @PrimeHunter: may also be interested in the answer. Thanks. -Arch dude (talk) 04:08, 24 June 2018 (UTC)

Facto Post – Issue 13 – 29 May 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 13 – 29 May 2018

The Editor is Charles Matthews, for ContentMine. Please leave feedback for him, on his User talk page.

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see the footer.

Back numbers are here.

Respecting MEDRS

Facto Post enters its second year, with a Cambridge Blue (OK, Aquamarine) background, a new logo, but no Cambridge blues. On-topic for the ScienceSource project is a project page here. It contains some case studies on how the WP:MEDRS guideline, for the referencing of articles at all related to human health, is applied in typical discussions.

Close to home also, a template, called {{medrs}} for short, is used to express dissatisfaction with particular references. Technology can help with patrolling, and this Petscan query finds over 450 articles where there is at least one use of the template. Of course the template is merely suggesting there is a possible issue with the reliability of a reference. Deciding the truth of the allegation is another matter.

This maintenance issue is one example of where ScienceSource aims to help. Where the reference is to a scientific paper, its type of algorithm could give a pass/fail opinion on such references. It could assist patrollers of medical articles, therefore, with the templated references and more generally. There may be more to proper referencing than that, indeed: context, quite what the statement supported by the reference expresses, prominence and weight. For that kind of consideration, case studies can help. But an algorithm might help to clear the backlog.

Evidence pyramid leading up to clinical guidelines, from WP:MEDRS

Links

World Cup scorers bubble chart, by the league in which they play, query run on Wikidata
Timeline of discoveries of natural satellites in the solar system, query run on Wikidata
4800 Welsh portraits added to Wikimedia Commons and Wikidata, National Library of Wales blogpost 27 June 2018, by Jason.nlw
The "deaditors" of Wikipedia, Hay Kranen blogpost, 15 June 2018
Six dimensions of open access, polemical tweet, 17 June 2018

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 18:19, 29 June 2018 (UTC)

Facto Post – Issue 14 – 21 July 2018

Latest comment: 6 years ago1 comment1 person in discussion

Facto Post – Issue 14 – 21 July 2018

The Editor is Charles Matthews, for ContentMine. Please leave feedback for him, on his User talk page.

To subscribe to Facto Post go to Wikipedia:Facto Post mailing list. For the ways to unsubscribe, see the footer.

Back numbers are here.

Plugging the gaps – Wikimania report

Officially it is "bridging the gaps in knowledge", with Wikimania 2018 in Cape Town paying tribute to the southern African concept of ubuntu to implement it. Besides face-to-face interactions, Wikimedians do need their power sources.

Hackathon mentoring table wiring

Facto Post interviewed Jdforrester, who has attended every Wikimania, and now works as Senior Product Manager for the Wikimedia Foundation. His take on tackling the gaps in the Wikimedia movement is that "if we were an army, we could march in a column and close up all the gaps". In his view though, that is a faulty metaphor, and it leads to a completely false misunderstanding of the movement, its diversity and different aspirations, and the nature of the work as "fighting" to be done in the open sector. There are many fronts, and as an eventualist he feels the gaps experienced both by editors and by users of Wikimedia content are inevitable. He would like to see a greater emphasis on reuse of content, not simply its volume.

If that may not sound like radicalism, the Decolonizing the Internet conference here organized jointly with Whose Knowledge? can redress the picture. It comes with the claim to be "the first ever conference about centering marginalized knowledge online".

Plugbar buildup at the Hackathon

Links

ScienceSource focus list (shortcut WD:SSFL on Wikidata), project to tag a first-pass open access medical bibliography on Wikidata, and also overcome the systematic biases in the medical literature by curation.
Wikimedia Foundation and Kiwix partner to grow offline access to Wikipedia, Wikimedia Foundation blogpost 18 July 2018.
Wikipedia's upcoming Cape Town conference will tackle the issue of diversity, Jamie Matroos, 2 July 2018.
VideoWiki, a video version of Wikipedia.
Search Full-Text within 4M+ Books, by MEK, The Open Library Blog, 14 July 2018
More than 5,000 German scientists have published papers in pseudo-scientific journals, NDR, 19 July 2018.

If you wish to receive no further issues of Facto Post, please remove your name from our mailing list. Alternatively, to opt out of all massmessage mailings, you may add Category:Wikipedians who opt out of message delivery to your user talk page.
Newsletter delivered by MediaWiki message delivery

MediaWiki message delivery (talk) 06:10, 21 July 2018 (UTC)

8th ISCB Wikipedia Competition: entries open!

Latest comment: 6 years ago1 comment1 person in discussion

8th ISCB Wikipedia Competition: entries open!

The International Society for Computational Biology (ISCB) and WikiProject Computational Biology are pleased to call for participants in the 8th ISCB Wikipedia Competition. The ISCB aims to improve the communication of scientific knowledge to the public at large, and Wikipedia plays an increasingly important role in this communication; the ISCB Wikipedia Competition aims to improve the quality of Wikipedia articles relating to computational biology. Entries to the competition are open now; the competition closes on 17 May 2019.

For students/trainees: Entry to the competition is open internationally to students and trainees of any level, both as individuals and as groups. Prizes of up to $500 will be awarded to the best contributions as chosen by a judging panel of experts; these will be awarded at the ISMB/ECCB conference in Basel, Switzerland in July 2019. As in previous years, the ISCB encourages competition entries for contributions to Wikipedia in any language.

For teachers/trainers: We encourage you to pass this invitation on to your students, and consider using the competition as part of an in-class assignment.

Further details may be found at: Wikipedia:WikiProject Computational Biology/8th ISCB Wikipedia competition announcement.

If you wish to opt-out of future mailings from WikiProject Computational Biology, please remove yourself from the mailing list or alternatively to opt-out of all massmessage mailings, you may add Category:Opted-out of message delivery to your user talk page. (Message delivered:MediaWiki message delivery (talk) 17:12, 18 August 2018 (UTC))