unordered list | plainlist | ship infobox list |
---|---|---|
|
|
|
Monkbot task 9 was created to standardize lists in ship article infoboxen and operates primarily on the content of Category:WPSHIPS:Infobox list errors.
In the past, the WP:SHIPS infobox usage guide required unbulleted lists for reasons of limited available space and for aesthetics. Editors used a variety of other methods to create lists in infoboxes. These included <br />
line break HTML tags and the use of {{br}}
, {{plainlist}}
and {{unbulleted list}}
templates. Problems with these methods are:
- use of
<br />
and{{br}}
make visually 'correct' lists that are not correct for those who use screen readers. See MOS:ACCESS §Vertical lists. - limitations in Mediawiki:Common.css, prevent
{{plainlist}}
and{{unbulleted list}}
from correctly rendering multi-level lists
description
editShip infoboxen are wiki-tables that contain, at a minimum, two and usually more specialized templates that provide formatting and header data for the infobox. The templates are {{infobox ship begin}}
(required), {{infobox ship image}}
, {{infobox service record}}
, {{infobox ship career}}
, {{infobox ship characteristics}}
, and {{infobox ship class overview}}
. For the time being, this task only operates on the last three of these though it is expected that it will eventually operate on {{infobox service record}}
as well.
standardization
editThe task begins by standardizing the names of the infobox templates to sentence case and their canonical names if redirects are used.
setup
editTo constrain operation of this task to the limited area of the infobox table, task 9 hides certain characters and templates. The first step is to hide equal signs (=) in templates that are not ship infobox templates and in templates that are not either of the list templates {{plainlist}}
or {{unbulleted list}}
(or their redirect aliases) by replacing the equal signs with the text string __3QU4L__
. Similarly, the equal sign in <ref <param>=...>
, and pipes in templates and wikilinks are hidden using __3QU4L__
and __P1P3__
respectively.
Templates that are not infobox or list templates are hidden by replacing the opening and closing curly-brace pairs with __0P3N__
and __CL0S3__
respectively. Finally, list templates are hidden with __0P3N_PL_
and __CL0S3_PL_
for {{plainlist}}
, and __0P3N_UB_
and __CL0S3_UB_
for {{unbulleted list}}
.
All of this hiding make subsequent rules simpler.
line-break lists
editLine break lists are the most common form of list in ship infoboxen. These list usually use some form of <br />
but occasionally use {{br}}
. These latter are first converted to <br />
. Similarly, the various forms <br>...</br>
, <BR>...</BR>
, </br>
etc. are converted to the canonical form <br />
and where more than one of these tags is present in succession, the duplicates are removed.
When the first text in an infobox template parameter is <br />
, the tag is removed. Task 9 inserts an asterisk at the start of parameter value and then replaces each occurrence of <br />
with \n*
.
plainlist
editBecause {{plainlist}}
templates were hidden during setup, unhide them by replacing __0P3N_PL_
and __CL0S3_PL_
with {{
and }}
. {{plainlist}}
supports the named parameters |class=
, |style=
, and |indent=
. These parameters are not supported by unordered lists in ship infoboxen.
Except for white-space, {{plainlist}}
templates must be the only text in the parameter value. Any text, even empty html comment tags (<!-- -->
), before or after a {{plainlist}}
will cause the value to be ignored. When this happens, all subsequent {{plainlist}}
templates are also ignored. It is not expected that this limitation will be 'fixed' by this tool.
When infobox parameters hold only {{plainlist}}
templates, the template markup (the {{plainlist|
and }}
) is removed along with any white-space between the parameter's equal sign and the first line of the {{plainlist}}
content.
unbulleted list
editBecause {{unbulleted list}}
templates were hidden during setup, unhide them by replacing __0P3N_UB_
and __CL0S3_UB_
with {{
and }}
. {{unbulleted list}}
supports the named parameters |class=
, |style=
, |indent=
, |list_style=
, |item_style=
, and |itemn_style=
. These parameters are not supported by unordered lists in ship infoboxen.
Except for white-space, {{unbulleted list}}
templates must be the only text in the parameter value. Any text, even empty html comment tags (<!-- -->
), before or after a {{plainlist}}
will cause the value to be ignored. When this happens, all subsequent {{unbulleted list}}
templates are also ignored. It is not expected that this limitation will be 'fixed' by this tool.
When infobox parameters hold only {{unbulleted list}}
templates, the template markup (the {{unbulleted list|
and }}
) is removed along with any white-space between the parameter's equal sign and the first parameter of the {{unbulleted list}}
template. The individual {{unbulleted list}}
parameters are split at the pipes into an array of strings. A new string is constructed from the array by adding *
and \n
to each array string as it is concatenated to previous strings.
miscellaneous cleanup
editItems in lists within ship infoboxen often take the form
*<digit> × <thing>
sometimes with or without
on either side of ×
; sometimes an x
or ×
is used in place of ×
. Non-breaking spaces are not required at the start of a list item.
Some lists in ship infoboxen prefix a list item in the item text with • (Bullet, U+2022, •
) or · (Interpunct, U+00B7, ·
) with or without surrounding spaces. When these are found, they are removed.
restoration
editAll of the above tasks being completed, task 9 unhides pipes (__P1P3__), equals (__3QU4L__), and template open (__0P3N__) and close (__CL0S3__). It then assembles a summary text to be used as an edit summary. If no lists were converted, sets Skip
to true
and abandons the edit.
script
edit// this script converts various list forms to generic unordered list (* markup).
// The AWB list is What transcludes page and the page is Template:Infobox ship begin
// or Category:WPSHIPS:Infobox list errors
public string ProcessArticle(string ArticleText, string ArticleTitle, int wikiNamespace, out string Summary, out bool Skip)
{
Skip = true; // set to true here will be set to false just before the task ends if we have converted one or more lists
// Skip = false; // for debugging
Summary = "";
string IS_INFOBOX_SHIP_BEGIN = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Bb]egin)";
string IS_INFOBOX_SHIP_IMAGE = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Ii]mage)";
string IS_INFOBOX_SHIP_CAREER = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]areer)";
string IS_INFOBOX_SHIP_CHARACTERISTICS = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]haracteristics)";
string IS_INFOBOX_SHIP_CLASS_OVERVIEW = @"(\{\{\s*[Ii]nfobox\s+[Ss]hip\s+[Cc]lass\s+[Oo]verview)";
string IS_INFOBOX_SERVICE_RECORD = @"(\{\{\s*(?:[Ii]nfobox\s+[Ss]ervice\s+[Rr]ecord|[Ss]ervice\s+[Rr]ecord))";
string IS_UNBULLETED_LIST = @"(?:[Uu]nbulleted\s*list|[Uu]bl|[Uu]blist|[Uu]bt|[Uu]nbullet|[Vv]unblist)";
string IS_PLAINLIST = @"(?:[Pp]lain\s*list|[Bb]ulletless list|PL|Startplainlist)";
string IS_INFOBOX_SHIP; // USE THIS AFTER INFOBOX SERVICE RECORD IS UPDATED
if (Regex.Match (ArticleText, IS_INFOBOX_SERVICE_RECORD + @"[^\|\}]*\|\s*is_ship\s*=\s*yes").Success)
IS_INFOBOX_SHIP = @"Infobox\s+(?:ship\s+(?:begin|career|characteristics|class\s+overview)|service\s+record)"; // don't do {{infobox service record}} until it is updated
else
IS_INFOBOX_SHIP = @"Infobox\s+ship\s+(?:begin|career|characteristics|class\s+overview)";
string IS_INFOBOX_SHIP_OR_LISTS = @"(?:" + IS_INFOBOX_SHIP + @"|" + IS_PLAINLIST + @"|" + IS_UNBULLETED_LIST + @")";
string pattern;
bool br_list=false;
bool plainlist=false;
bool ublist=false;
//---------------------------< I N F O B O X T E M P L A T E N A M E S >----------------------------------
// normalize infobox template names since we're mucking about in ship infoboxen, might as well do this
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_BEGIN, "{{Infobox ship begin");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_IMAGE, "{{Infobox ship image");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CAREER, "{{Infobox ship career");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CHARACTERISTICS, "{{Infobox ship characteristics");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SHIP_CLASS_OVERVIEW, "{{Infobox ship class overview");
ArticleText = Regex.Replace(ArticleText, IS_INFOBOX_SERVICE_RECORD, "{{Infobox service record");
//---------------------------< H I D E >----------------------------------------------------------------------
// HIDE TEMPLATES: find templates that are not {{infobox ship ...}} and not {{plainlist}};
// replace the equal signs in templates with __3QU4L__
pattern = @"(\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")[^\{\}]*)=([^\}]*\}\})";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__3QU4L__$2");
}
// replace the equal sign in <ref ...=...> tags with __3QU4L__ (making this rule generic is problematic)
pattern = @"(\<\s*ref[^=\>]*)=([^\|\}\>]*\>)";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__3QU4L__$2");
}
// replace the pipes in templates with __P1P3__
pattern = @"(\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")[^\{\}]*)\|([^\}]*\}\})";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__P1P3__$2");
}
// replace the pipes in wikilinks with __P1P3__
pattern = @"(\[\[[^\|\]]*)\|([^\]]*\]\])";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "$1__P1P3__$2");
}
// replace the opening {{ with __0P3N__ and the closing }} with __CL0S3__
while (Regex.Match (ArticleText, @"\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")([^\{\}]*)\}\}").Success)
{
ArticleText = Regex.Replace(ArticleText, @"\{\{(?!\s*" + IS_INFOBOX_SHIP_OR_LISTS + @")([^\{\}]*)\}\}", "__0P3N__$1__CL0S3__");
}
// Hide {{plainlist}} replace the opening {{ with __0P3N_PL_ and the closing }} with __CL0S3_PL_
// do this so that {{plainlist}} closing }} doesn't hide stuff that follows
pattern = @"\{\{\s*(" + IS_PLAINLIST + @"[^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N_PL_$1__CL0S3_PL_");
}
// Hide {{unbulleted list}} replace the opening {{ with __0P3N_UB_ and the closing }} with __CL0S3_UB_
// do this so that {{unbulleted list}} closing }} doesn't hide stuff that follows
pattern = @"\{\{\s*(" + IS_UNBULLETED_LIST + @"[^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success)
{
ArticleText = Regex.Replace(ArticleText, pattern, "__0P3N_UB_$1__CL0S3_UB_");
}
//---------------------------< { { B R } } >------------------------------------------------------------------
// replace {{br}} with <br /> in ship info box templates
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)__0P3N__\s*[Bb][Rr]\s*__CL0S3__";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{br}}
ArticleText = Regex.Replace(ArticleText, pattern, "$1<br />");
//---------------------------< < B R > >----------------------------------------------------------------------
// replace <br> variants with <br /> in ship info box templates
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)(?:\<\s*[Bb][Rr]\s*\>|\<\s*[Bb][Rr]/\s*\>)";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is non-standard forms of <br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1<br />");
// sometimes there are multiple <br /> tags in a row; remove all but one
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\<br /\>)\s*\<br /\>";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is <br /\><br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// now replace all remaining <space><br /><space> with __BR34K__; this should remove all newlines
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*?)\s*\<br /\>\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is <br /\><br /\>
ArticleText = Regex.Replace(ArticleText, pattern, "$1__BR34K__");
//---------------------------< < B R > T O L I S T >------------------------------------------------------
// convert <br /> lists in ship info box templates with * unordered lists
// <br /> lists that contain {{para|plainlist}} or {{unbulleted list}} templates are converted but the internal
// list templates are not.
// remove a __BR34K__ tag at the beginning of a list (|Ship <parameter> =__BR34K__<value> ... becomes |Ship <parameter> =<value> ...)
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\|\s*[^\|\}]*=)__BR34K__";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// insert a * at the beginning of a __BR34K__ list (|Ship <parameter> = <value>__BR34K__<value> ... becomes |Ship <parameter> =*<value>__BR34K__<value> ...
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\|\s*[^\|\}]*=)\s*([^\*\|][^\|\}]*__BR34K__)";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
{
br_list=true;
ArticleText = Regex.Replace(ArticleText, pattern, "$1*$2");
}
// replace __BR34K__ with a newline followed by a splat; if next line starts with * the splat is replaced to prevent duplication
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)__BR34K__\*?";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is __BR34K__
ArticleText = Regex.Replace(ArticleText, pattern, "$1\n*");
//---------------------------< P L A I N L I S T >------------------------------------------------------------
// remove {{plainlist|}} template markup from the list it contains in ship info box templates
// Does not work if there is text between the parameter = sign and the opening {{. Introductory text or other
// cruft will need to be attended to by a human. When this occurs, any subsequent {{plainlist}} is ignored
// because the script can't see beyond the former's }}
//Does not work when a parameter has multiple {{plainlist}} templates
// UNHIDE plainlist: replace __0P3N_PL_ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N_PL_", "{{");
// UNHIDE plainlist: replace __CL0S3_PL_ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3_PL_", "}}");
// remove plainlist named parameters if present |class=, |style=, |indent=
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_PLAINLIST + @"[^\}]*)\|\s*(?:class|style|indent)\s*=[^\|\}]*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// remove plainlist empty parameters if present
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_PLAINLIST + @"[^\}]*)\|\s*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// remove {{plainlist|}} markup when it directly follows the parameter = sign (spaces excepted) and {{plainlist}} can be followed by nothing but spaces so:
// |Ship param = {{plainlist|...}}
// |Ship param = ...
// but not other text precedes or follows {{plainlist}}:
// |Ship param = <text> {{plainlist|...}} <text>
// |Ship param = ...
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*" + IS_PLAINLIST + @"\s*\|\s*(\*\s*[^\}]*)\}\}(\s*[\|\}])"; // {{plainlist}} must follow the parameter = sign followed by nothing but spaces; must have a leading asterisk
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{plainlist|}}
{
plainlist=true;
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2$3");
}
// remove {{plainlist}} and {{endplainlist}} templates from ship info box templates
/*
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*[Pp]lainlist\s*\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{plainlist}}
{
Skip = false;
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
}
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*[Ee]ndplainlist\s*\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there is {{endplainlist}}
{
Skip = false;
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
}
*/
//---------------------------< U N B U L L E T E D L I S T >------------------------------------------------
// remove {{unbulleted list|}} template markup from the list it contains in ship info box templates
// UNHIDE {{unbulleted list}}: replace __0P3N_UB_ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N_UB_", "{{");
// UNHIDE {{unbulleted list}}: replace __CL0S3_UB_ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3_UB_", "}}");
// remove {{unbulleted list}} named parameters if present |class=, |style=, |list_style=, |item_style=, |item#_style=
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=\s*\{\{\s*" + IS_UNBULLETED_LIST + @"[^\}]*)\|\s*(?:class|style|list_style|item\d*_style)\s*=[^\|\}]*([\|\}])";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*\{\{\s*" + IS_UNBULLETED_LIST + @"\s*\|\s*([^\}]*)\}\}";
while (Regex.Match (ArticleText, pattern).Success) // repeat til gone
{
ArticleText = Regex.Replace(ArticleText, pattern,
delegate(Match match)
{
string raw_capture = match.Groups[0].Value; // 0 - the whole match
string ret_val = match.Groups[1].Value; // 1 - start of infobox through parameter = sign
string source = match.Groups[2].Value; // 2 - the pipe-separated list items from {{unbulleted list}}
string[] items = source.Split ('|'); // create a string array of list items
foreach (string item in items)
{
ret_val = ret_val + "*" + item.Trim() + "\n"; // reassemble the list as a regular unordered list
}
return ret_val;
});
ublist=true;
}
//---------------------------< M I S C C L E A N U P >------------------------------------------------------
// remove extra blank lines from infoboxen which may be the result of {{plainlist}} removal
// doesn't work properly when there is white space between start of line and pipe
// pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*)\s{1,}(\s[\|\}])";
// while (Regex.Match (ArticleText, pattern).Success)
// ArticleText = Regex.Replace(ArticleText, pattern, "$1$2");
// If there are any small bullets (•·) following the * markup in an unordered list, remove them
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*)[•·]\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there are small bullets
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// If there are any small bullets (•·) at the beginning of other parameter values, remove them
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*=)\s*[•·]\s*";
while (Regex.Match (ArticleText, pattern).Success) // repeat as long as there are small bullets
ArticleText = Regex.Replace(ArticleText, pattern, "$1");
// clean-up list items in the form * 2x something – should be 2 × something
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)\s*x\s";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
// clean-up list items in the form * 2×something – should be 2 × 2 something; this one at the start only
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)[x×](\S)";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × $2");
// clean-up list items in the form * 2×2 something – should be 2 × 2 something; this one at the start only
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)[x×](\d)";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × $2");
// clean-up list items in the form * 2 × something – should be 2 × something
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+)\s*×\s+";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
// clean-up list items in the form * 2 × – non breaking space not required at start of list item
pattern = @"(\{\{\s*" + IS_INFOBOX_SHIP + @"[^\}]*\*\s*\d+) [x×] ";
while (Regex.Match (ArticleText, pattern).Success)
ArticleText = Regex.Replace(ArticleText, pattern, "$1 × ");
//---------------------------< U N H I D E >------------------------------------------------------------------
// UNHIDE: replace __P1P3__ with |
ArticleText = Regex.Replace(ArticleText, @"__P1P3__", "|");
// UNHIDE: replace __3QU4L__ with |
ArticleText = Regex.Replace(ArticleText, @"__3QU4L__", "=");
// UNHIDE: replace __0P3N__ with {{
ArticleText = Regex.Replace(ArticleText, @"__0P3N__", "{{");
// UNHIDE: replace __CL0S3__ with }}
ArticleText = Regex.Replace(ArticleText, @"__CL0S3__", "}}");
if (br_list)
Summary = "line-break";
if (plainlist)
{
if ("" != Summary)
Summary = Summary + ", ";
Summary = Summary + "plain";
}
if (ublist)
{
if ("" != Summary)
Summary = Summary + ", ";
Summary = Summary + "unbulleted";
}
if ("" != Summary)
{
Skip = false; // if there is a summary here then we should not skip this page
Summary = "[[User:Monkbot/Task 9: Ship infobox lists|Monkbot task 9]]: convert " + Summary + " list(s) to unordered list(s) in ship infobox templates;";
}
else
Summary = "no list conversions";
return ArticleText;
}