User talk:CheMoBot/Data
Latest comment: 16 years ago by Beetstra in topic Database format
General
edit- We could make the bot in such a way that it
- reports changes on-wiki.
- autoreverts changes of values (this may get resistance, we are the encyclopedia that anyone can edit).
- Autorepair changes (e.g. check the fields 5 minutes after the 'offending' edit has been performed, and reset changed fields back to the verified value.
Database format
editI started this to have some fields to work with, but this is not a 'handy format'. Some suggestions to discuss:
csv
edit'Comma separated', which is similar to what it is now. A line would look like
Water=Water,0,100
Points:
- Easy to read when there are not too many fields (but we have 50 fields).
- Page would be huge in the end (for 4000+ compounds).
- Not too sensitive to errors, one missing field on a compound would render only that line useless.
- Relatively easy to update, many database programs can provide this output, and a simple find and replace can provide the proper format
- Only one (or a few) page(s) to render.
--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)
xml
editxml is another format which is easy to read by a computer
<?xml version="1.0" encoding="utf-8"?> <compounds> <compound IUPACName="Water" MeltingPt="0" BoilingPt="100"/> </compounds>
Points:
- Easy to read, even with many fields as every one is named
- MUCH bigger than the csv above.
- Per compound not sensitive to errors, though some typos (especially in the tags) may render the WHOLE database useless
- Easy to update, many database programs can provide this output
- Only one (or a few) page(s) to render.
--Dirk Beetstra T C 18:15, 1 July 2008 (UTC)
data-sub-page
editCreate for the compounds a sub-page with some easy to read/edit format, and use that as the base for data. So the subpage on water (molecule) could be water (molecule)/Verified, which could contain:
IUPACName=Water MeltingPt=0 BoilingPt=100
Points:
- Easy to read, even when many fields are there
- Small, if a field in Water (molecule) gets edited, it only needs to read this data, and check
- Really low sensitivity to errors
- Difficult to update, a bot would have to go and update every subpage, with would be impossible when the pages get 'protected'
- Bot has to render on every edit, but the data-throughput would be small, so that can be done quick.