User:Daniel Mietchen/Talks/Archiving 2014/Web Harvesting and Archiving with and for the Crowd, Including Bots

A 3D model of a fossil jaw - automatically imported from PubMed Central into Wikimedia Commons.

Reuse of multimedia files from PubMed Central on Wikimedia Commons

Uploads by the Open Access Media Importer to Wikimedia Commons between July 2012 and March 2014

The Open Access Media Importer Bot is a script that crawls the Open Access subset of PubMed Central - a database for biomedical literature - in order to find video and audio files that are licensed compatibly with reuse on Wikimedia platforms. If it finds such materials, it uploads them to Wikimedia Commons, the media repository shared between all Wikipedias and their sister projects. The digital media collection created by the bot now has about 15,000 files and is curated by the volunteer community at Wikimedia Commons. In this talk, I will use the bot as an example to highlight the reusability of digital archives and collections, the importance of open licensing, metadata standards and opportunities for community involvement.

Links

Outlook

This image of Xanthichthys ringens is sourced from an open-access scholarly article licensed for re-use.
How can we make that reusability explicit when citing this source in Wikipedia articles?^[1]
For further details, see this Signpost op-ed.

Reference

Note the icons and links complementing the bibliographic information.

^ Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLoS ONE. 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760.{{cite journal}}: CS1 maint: unflagged free DOI (link) CC0 full text media metadata

A million first steps: crowdsourcing the creation of metadata

The town hall of Jena before 1755

Salaga in 1892

Traditional clothing

Volcanic eruption giving birth to the island of Ferdinandea

In December 2013, the British Library released a set of more than one million images on Flickr. They had extracted them automatically from scans of Public Domain works in their collection. The metadata they had about these images were those pertaining to the scanned work, plus the page number. With the release, they hoped to crowdsource the generation of more specific metadata, describing what the content of the images is rather than their bibliographic location. In this talk, I will review the progress of the initiative over the course of the six months since then, paying special attention to metadata generated through integration of these images into Wikimedia Commons and putting the project into perspective of a range of large-scale releases of media files onto Wikimedia Commons and Wikisource.

Media coverage

British Library blog
- tech blog posts
Flickr
- no copyright restrictions
Wikimedia Commons mirror
- Synoptic index
- Full list of books
Andrew Gray
- discusses why release did not go to Wikimedia Commons directly
Twitter
blpublicdomain wiki
Ars technica
The Atlantic
- links to other large image donations
Wired
The Guardian
heise.de
Daily Mail
Creative Review
Spiegel online
The Signpost
Sounds inspired by the images
Crowdsourcing Comic Art

Similar releases

Wellcome Images (100k)
- Comments at Public Domain Review

Wikidata: the database anyone can edit

Wikidata: A free knowledge base that can be read and edited by humans and machines alike

Wikidata sample statement: place of birth of Douglas Adams.

Wikidata Subclass of (p279) tree for mineral (q7946).

Linked Open Data cloud

Wikipedia exists in over 280 languages and has traditionally operated in a way that the content in each of these languages was curated rather independently. In late 2012, Wikidata has been added to the ecosystem of Wikimedia platforms. Much like Wikimedia Commons acts as a common repository for media used across Wikimedia projects, Wikidata acts as such a repository for data. Starting out with data about which Wikipedia articles exist in what languages, the platform is steadily expanding its scope to include other kinds of data and a wider range of properties. While many ontologies have been created with strong community involvement, the Wikidata approach differs in that it is not limited to specific domains and that it allows anyone to join in, expert or not.

Links

Wikipedia entry about Wikidata
Wikidata Main Page
DBpedia
Sample items
Sample bot
- User:VIAFbot
  - VIAFbot and the Integration of Library Data on Wikipedia
stats

About

This belongs to a talk on May 15, 2014, given as part of Archiving 2014, which took place on May 13 - May 16, 2014, in Berlin.

Abstracts of the originally three proposed talks that have been merged into this one:

Licensing

Text displayed on this page is available under a Creative Commons CC0 waiver/ Public Domain dedication. The licensing of embedded media or code or of templates used to display text here may differ, but all are compatible with the Open definition as well as Wikipedia's default license, the Creative Commons Share-Alike License 3.0.

Contact

[Williams2010-1] Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLoS ONE. 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760.{{cite journal}}: CS1 maint: unflagged free DOI (link) CC0 full text media metadata

[1]