User talk:CheMoBot/Data

Latest comment: 16 years ago by Beetstra in topic Database format

General

edit
  • We could make the bot in such a way that it
    • reports changes on-wiki.
    • autoreverts changes of values (this may get resistance, we are the encyclopedia that anyone can edit).
    • Autorepair changes (e.g. check the fields 5 minutes after the 'offending' edit has been performed, and reset changed fields back to the verified value.

--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)Reply

Database format

edit

I started this to have some fields to work with, but this is not a 'handy format'. Some suggestions to discuss:

'Comma separated', which is similar to what it is now. A line would look like

Water=Water,0,100

Points:

  • Easy to read when there are not too many fields (but we have 50 fields).
  • Page would be huge in the end (for 4000+ compounds).
  • Not too sensitive to errors, one missing field on a compound would render only that line useless.
  • Relatively easy to update, many database programs can provide this output, and a simple find and replace can provide the proper format
  • Only one (or a few) page(s) to render.

--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)Reply

xml is another format which is easy to read by a computer

<?xml version="1.0" encoding="utf-8"?>
<compounds>
   <compound IUPACName="Water" MeltingPt="0" BoilingPt="100"/>
</compounds>

Points:

  • Easy to read, even with many fields as every one is named
  • MUCH bigger than the csv above.
  • Per compound not sensitive to errors, though some typos (especially in the tags) may render the WHOLE database useless
  • Easy to update, many database programs can provide this output
  • Only one (or a few) page(s) to render.

--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)Reply

data-sub-page

edit

Create for the compounds a sub-page with some easy to read/edit format, and use that as the base for data. So the subpage on water (molecule) could be water (molecule)/Verified, which could contain:

IUPACName=Water
MeltingPt=0
BoilingPt=100

Points:

  • Easy to read, even when many fields are there
  • Small, if a field in Water (molecule) gets edited, it only needs to read this data, and check
  • Really low sensitivity to errors
  • Difficult to update, a bot would have to go and update every subpage, with would be impossible when the pages get 'protected'
  • Bot has to render on every edit, but the data-throughput would be small, so that can be done quick.

--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)Reply