User talk:OverlordQBot/Archive/Dev1
June 25, 2007
editUpdate 1
editYay, I have it listening to the irc channel. Stupid color codes tripped me up a bit, then I shamlessly borrowed and fixed the code from VandalFighter to give me good output. Also have it filtering for just talk pages.
Debug output:
ovrlrdq@myhost:~/svn/perlwikipedia$ ./SigBot.pl Retrieving http://en.wikipedia.org/w/index.php?title=Special%3AUserlogin&action=edit Login as "OverlordQBot" succeeded. Connected to irc.wikimedia.org .#.#. http://en.wikipedia.org/w/index.php?title=Talk:Good_Samaritan_%28Hellboy%29&diff=140470832&oldid=99140288 !.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#. http://en.wikipedia.org/w/index.php?title=User_talk:Shaunyboy_Brikman&diff=140470852&oldid=137821030 !.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#. http://en.wikipedia.org/w/index.php?title=User_talk:LaraLove&diff=140470882&oldid=140440565 !.#.#. http://en.wikipedia.org/w/index.php?title=User_talk:Rambutan&diff=140470884&oldid=140467437 !.#.#.#.#.#. http://en.wikipedia.org/w/index.php?title=User_talk:Deryck_Chan&diff=140470890&oldid=140225824 !.#.#.#. http://en.wikipedia.org/w/index.php?title=Talk:Good_Samaritan_%28Hellboy%29&diff=140470893&oldid=140470832 !.#.#
A . is when the bot recieves a message from rc of a new edit. A # indicates it's not a talk page. If it is a talk page, it outputs the url and then an !.
Update 2
editGetting a Diff engine to work has been a pain, haven't come up with a better way of creating a diff then pulling the two revisions and then running a diff algo on both. Any input would be greatly appreciated.
myhost:/home/ovrlrdq/svn/perlwikipedia# ./SigBot.pl Connected to irc.wikimedia.org .A.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.# .A.S.#.#.S.#.#.#.#.#.#.A.#.#.#.#.#.#.S.#.#.#.A.#.#.#.#.#.#.A.#.#.#.#.A.#.#.#.#.#.#.#.#.#.A.#.S.#.A .#.A.A.A.#.A.#.#.#.#.#.#.#.#.#.#.A.#.A.S.#.#.S.#.#.#.#.#.A.#.#.#.#.#.#.#.#.#.#.#.A.#.#.#.A.#.S.#.# .#.#.#.#.#.#.A.#.#.#.#.#.#
Legend:
- . Revision (followed by one of the following results)
- # Revsion was not on a Talk page
- A Revision only had additions
- S Revision had subtrations
- N Revision was non-contiguous
Also fixed bug in parser where it was both newpage and minor edit. Need sleep.
June 30, 2007
editUpdate 1
editCompletely scrapped Perlwikipedia module. Using as many POE components as possible. Switched to using the API instead of making a normal request and scraping the html, dont know if it provides any less load on their end, but I'm sure that's why it's there.
Working on writing the xml parser for the calls to the API. Q T C 05:53, 30 June 2007 (UTC)
July 2, 2007
editTest
Update 1
editI hacked myself into a corner with some excessive subroutines named similarly so I scrapped what I had and redid those logic portions. Working good again, only problem is some of the HTTP requests are timing out :-/ Added a HTTP Keep-Alive pool, so hopefully that'll smooth those out, otherwise I'll have to figure out some way of writing an error handler. Still up in arms on whether or not diffing the two revisions or pulling the diff page and munging it is the best method. I've gotten to a good position in the code for it to go either way so I'll save a copy of what I have now and persue the munging the diff page method. This will cut down on requests to to WP by half on a best-case. Q T C 00:29, 3 July 2007 (UTC)
July 06, 2007
editUpdate 1
editTesting new diff parsing routines. Q T C 09:03, 6 July 2007 (UTC)
Hopefully it works :) Q T C 09:04, 6 July 2007 (UTC)
One last test Q T C 09:04, 6 July 2007 (UTC)
July 24, 2007
editUpdate 1
editOuch? Has it been 18 days? Got sidetracked with Real Life (tm). I'm a horrible procrastinator, but I'mma sit down now and finish it off.
- Rewrote Requests to api.php, only sends request for pages that are new (ie: not oldrev newrev from rv feed)
- Split parsing from doing diff on two calls to api.php, to munging the html from viewing actual diff.
- Parsing 'engines' done.
ToDo:
- Logic on to sign a post or not.
- Submit edit back to WP.
Q T C 05:09, 24 July 2007 (UTC)
Update 2
editStill running into a bug where I get the revision notice from the rv irc bot, request it through api.php but get an error reply saying that the page doesn't exist. Of course this only happens on new pages, but still is slightly aggravating. Q T C 05:37, 24 July 2007 (UTC)
EG:
'<page ns="3" title="User talk:76.210.5.146" missing=""/>'
Q T C 05:46, 24 July 2007 (UTC)
Update 3
editFixed the non-existant page bug, now filtering out revisions that only add a template is proving to be the pain. In persuit of the conditional portion of the Bot I figured out i was chewing off one to many letters so usernames were getting truncated by one letter, which explains why skipping Bot edits wasn't failing because names became BetacommandBo. Q T C 00:30, 25 July 2007 (UTC)
Update 4
editParsing the HTML is proving to be a PITA. Gotta take a break from this, at least on the plus side I went the parsing as XML route since using regex's to parse HTML is a 'Bad Thing' *winks at wikilinkwatcher* Q T C 01:08, 25 July 2007 (UTC)
July 25, 2007
editUpdate 1
editsigh, Looks like somebody else wrote a similar bot. I'm going to finish up anyways, throw it in my resume. Seems they just looked at the bot accounts' activity log and assumed I was idle instead of asking me about it. *shrug* Would have been done by, but I rewrote lots of guts to cut the WP server requests in half, guess how they say 'nice guys finish last' is oh so true.
Anyways, on a related note, kinda hard to test the last little functionality because there's so few unsigned talk page edits going on :) Q T C 10:37, 25 July 2007 (UTC)
Update 2
editArg, looks like I optimized myself into a hole. Back to the drawing board. Q T C 10:48, 25 July 2007 (UTC)
July 27, 2007
editUpdate 1
editDevelopment of this strain terminated. Q T C 22:07, 27 July 2007 (UTC)