User:DutchTreat/Projects/list-maintenance

List Sync Challenge

edit

Based on the problem posed at the Wikimedia NYC meeting 2022-11-30 Old revision of Wikipedia:Meetup/NYC/November_2022 a challenge was proposed.-

Problem
Correct listing of colleges at Oberlin Group of Libraries using the fewest keystrokes
Solution
  • 1. Create two lists into two files
    • List A. copy the wikitext from the article; and
    • List B. copy links from members;
    • Paste the values into a text editor where the links will not be copied.
  • 2. For List A. use text editor to make global substitution. Remove extra markup, for example in VIM use

    :1,$s/^...\([a-zA-Z \.]*\).*$/\1/

  • 3. sort each list
  • 4. diff two sorted lists
Results
Differences
List B - Source Only List A - WP Only
Lewis & Clark College Harvey Mudd College
University of Puget Sound Pitzer College
Xavier University of Louisiana Pomona College
. Scripps College

Note, there were additional differences due to naming. For example, WP article has two entries for "Morehouse College" and "Spelman College" and the source list has one combined entry "Morehouse/Spelman Colleges (AUC)"

Discussion

edit

@CmdrDan: Comments? - DutchTreat (talk) 11:12, 1 December 2022 (UTC)