User:Daniel Mietchen/Talks/Archiving 2014/Web Harvesting and Archiving with and for the Crowd, Including Bots
Reuse of multimedia files from PubMed Central on Wikimedia Commons
editThe Open Access Media Importer Bot is a script that crawls the Open Access subset of PubMed Central - a database for biomedical literature - in order to find video and audio files that are licensed compatibly with reuse on Wikimedia platforms. If it finds such materials, it uploads them to Wikimedia Commons, the media repository shared between all Wikipedias and their sister projects. The digital media collection created by the bot now has about 15,000 files and is curated by the volunteer community at Wikimedia Commons. In this talk, I will use the bot as an example to highlight the reusability of digital archives and collections, the importance of open licensing, metadata standards and opportunities for community involvement.
Links
edit- The bot's user page on Wikimedia Commons
- Signalling OA-ness through full-text import to Wikisource
- Inconsistent XML as a Barrier to Reuse of Open Access Content
Outlook
editReference
editNote the icons and links complementing the bibliographic information.
- ^ Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLoS ONE. 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760.
{{cite journal}}
: CS1 maint: unflagged free DOI (link) CC0 full text media metadata
A million first steps: crowdsourcing the creation of metadata
editIn December 2013, the British Library released a set of more than one million images on Flickr. They had extracted them automatically from scans of Public Domain works in their collection. The metadata they had about these images were those pertaining to the scanned work, plus the page number. With the release, they hoped to crowdsource the generation of more specific metadata, describing what the content of the images is rather than their bibliographic location. In this talk, I will review the progress of the initiative over the course of the six months since then, paying special attention to metadata generated through integration of these images into Wikimedia Commons and putting the project into perspective of a range of large-scale releases of media files onto Wikimedia Commons and Wikisource.
Media coverage
edit- British Library blog
- Flickr
- Wikimedia Commons mirror
- Andrew Gray
- discusses why release did not go to Wikimedia Commons directly
- Twitter
- https://twitter.com/doctaCynthia/status/412669450832195584
- https://twitter.com/FourRedShoes/status/412411445683290112
- http://isitdown.tumblr.com/post/70301625505/flickr-is-down#.UrCefY0hb-s
- https://twitter.com/IsItDownNow/status/412994972292775937
- https://twitter.com/OffLucasLima/status/412293111574822912
- https://twitter.com/duzovakawoh/status/411806342060339200
- https://twitter.com/ZaynaHamarneh/status/411191850478092288
- https://twitter.com/peter_s_clarke/status/415061062208483328
- https://twitter.com/benosteen/status/413313959245406208
- hashtag #BL1million
- blpublicdomain wiki
- Ars technica
- The Atlantic
- links to other large image donations
- Wired
- The Guardian
- heise.de
- Daily Mail
- Creative Review
- Spiegel online
- The Signpost
- Sounds inspired by the images
- Crowdsourcing Comic Art
Similar releases
editWikidata: the database anyone can edit
editWikipedia exists in over 280 languages and has traditionally operated in a way that the content in each of these languages was curated rather independently. In late 2012, Wikidata has been added to the ecosystem of Wikimedia platforms. Much like Wikimedia Commons acts as a common repository for media used across Wikimedia projects, Wikidata acts as such a repository for data. Starting out with data about which Wikipedia articles exist in what languages, the platform is steadily expanding its scope to include other kinds of data and a wider range of properties. While many ontologies have been created with strong community involvement, the Wikidata approach differs in that it is not limited to specific domains and that it allows anyone to join in, expert or not.
Links
editAbout
editThis belongs to a talk on May 15, 2014, given as part of Archiving 2014, which took place on May 13 - May 16, 2014, in Berlin.
Abstracts of the originally three proposed talks that have been merged into this one:
- User:Daniel Mietchen/Talks/Archiving 2014/Reuse of multimedia files from PubMed Central on Wikimedia Commons
- User:Daniel Mietchen/Talks/Archiving 2014/A million first steps: crowdsourcing the creation of metadata
- User:Daniel Mietchen/Talks/Archiving 2014/Wikidata: the database anyone can edit
Licensing
editText displayed on this page is available under a Creative Commons CC0 waiver/ Public Domain dedication. The licensing of embedded media or code or of templates used to display text here may differ, but all are compatible with the Open definition as well as Wikipedia's default license, the Creative Commons Share-Alike License 3.0.
Contact
edit- Institutional
- @EvoMRI on Twitter
- Wikipedia talk page
- Wikipedia email