User:Pearle/pearle-documentation.txt

Publication date

edit

April 30, updated online 11 September 2005 (UTC)

Important notices

edit

Though it works well for the author, this code may eat all the files in your wiki, create an infinite number of network connections, create subtle errors that will only be discovered months later, among many other things. Use it at your own risk.

This code and documentation is hereby released into the public domain by the original author, Christopher Beland <beland at alum.mit.edu>, User:Beland on the English Wikipedia. You may copy and modify it freely. ANY CONTRIBUTIONS TO EITHER THE CODE OR DOCUMENTATION WILL BE ASSUMED TO BE RELEASED TO THE PUBLIC DOMAIN.

Occasional updates may be published which enhance functionality or repair defects. If interested, watchlist this wiki page. Contributions to the code are welcome; you may edit the wiki page displaying the source code directly, or you may comment on the associated talk page. Code contributions will be considered to be released to the public domain.

PLEASE BE AWARE THAT YOU MAY NOT RUN A BOT ON WIKIPEDIA WITHOUT PERSONALLY OBTAINING PERMISSION FROM THE COMMUNITY, AND YOUR BOT MUST FOLLOW COMMUNITY RULES. This applies whether or not you are using the same code as an existing bot. Other public wikis may have similar policies. Be sure to know the rules of the wiki you are changing, and be aware that bots may cause excessive server load, owner and community upset, and/or mild indigestion.

To run a bot on the English Wikipedia project, you must personally obtain permission. See Wikipedia:Bots.

Setup

edit
  1. You will need to create an account on the wiki you wish to edit, and obtain cookies which allow the bot access. Automatic cookie saving may or may not be implemented properly, and in any case, there's no facility provided for you to type your bot's password. The easiest thing to do is to log on to the wiki as your bot, and copy the values of the cookies provided from your web browser. A sample file is provided in cookies.fake.txt. You will need to correct pearle.pl to point to the correct cookie jar.
  2. You should change any instances of the word "Pearle" in pearle.pl to match the name of your bot.
  3. If you are connecting to a wiki other than Wikipedia, you may need to adjust some strings, including edit summaries and error messages. You will probably also need to adjust the names of the variables captured from the HTML, as well as possibly modify some of the HTML-scraping code. Search for instances of "wp" and "wikipedia" (case insensitive).
  4. If running on a Wikimedia site, you must download http://www.wikimedia.org/langlist on a regular basis and somehow load it into $langlist in fixCategoryInterwiki() instead of /home/beland/wikipedia/pearle-wisebot/langlist.
  5. You will need to change $historyFile at the top of opentaskUpdate(). You may also wish to chage $target there.

A warning about special characters

edit

When sending pages to users to edit, Mediawiki converts many non-ASCII characters and special punctuation to their HTML escape characters (e.g. < becomes &lt;). Pearle must convert these to Unicode so that the HTTP::Request::Common library can properly URL-encode them. As long as Mediawiki only uses escape sequences recognized by the HTML::Entities library, this is OK. If other entities are escaped, then they will not be properly preserved when Pearle does edits.

To get a list (if your terminal supports Unicode):

while (my ($key, $value) = each (%{HTML::Entities::entity2char}))
{
    print "$key -&gt; $value\n";
}
die "Done.";

General reference: http://www.w3.org/TR/REC-html40/sgml/entities.html

A note about character encoding

edit

Please note that some futzing may be required on your part to get non-ASCII characters to work properly. Take note of the character encoding that your scripts (including this one) and data are stored in, what character encoding Perl is expecting for input data, what encoding it uses for output data, and whether or not the code in your scripts handles these encodings properly.

As of March 2005, Wikipedia database dumps are in a Latin encoding. Please note that this is not compatible in all ways with the UTF-8 representation of Unicode.

To learn more about how Perl handles character encodings, read the "perlunicode" manpage, and the documentation on the -C flag in the "perlrun" manpage.

To change the character encoding in which Emacs saves a particular buffer, use M-x set-buffer-file-encoding-system.

Your system may already be properly configured; you may want to go ahead and see if everything Just Works before you change anything.

Usage

edit

(Usage examples assume a Linux-like system running the tcsh shell.)

There are two ways to run the bot. For safety and security, the only way to initiate an action is from the command line, not from the wiki itself.

The first way is to run a single command directly from the command line, allowing a data stream to be fed through STDIN (e.g. through a pipe).

perl pearle.pl COMMAND ARGUMENT1 ARGUMENT2

To run a series of commands, do something like:

cat commands.txt | perl pearle.pl READ_COMMANDS

All commands are logged in a file in the current working directory. You may wish to clean it out occasionally, as it may eventually get very big.

To see what commands are available, see the function interpretCommands() in pearle.pl. Documentation of what each command does is either near the top of the function that implements the command, or on the page User:Pearle.

When putting commands in a file, simply put one command on each line. Blank lines are OK, and to exit execution, run the command STOP.

Commands are usually written like this:

COMMAND ARGUMENT_ONE ARGUMENT_TWO

Note the use of underscores to conveniently distinguish between arguments.

For move commands, you may be able to do this:

* COMMAND [[ARGUMENT ONE]] -> [[ARGUMENT TWO]]

The second style allows you to post a list of commands on a Mediawiki page and have the pages you are moving have "live" links, so people can check up on them.