Wikipedia:Reference desk/Archives/Computing/2019 August 9

Computing desk
< August 8 << Jul | August | Sep >> August 10 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 9

edit

Follow-up to database operation question

edit

You might remember this question I asked back in late June about which way to handle a mass update in a database would be more efficient, when there are some rows that already have the values they are supposed to be updated with.

I just tested it at work with a mass of about one million rows, where only about one thousand rows would actually change their values.

It turns out it's much faster to first find out the rows that don't need updating and exclude them from the update, than to just update everything. I don't have any exact numbers, but the impression I got the time taken is in the order of tens of seconds vs. several hours. JIP | Talk 20:29, 9 August 2019 (UTC)[reply]

Wow, that's an impressive difference. But it's possible a future update of the DBMS may make that disappear, if they add a feature to "only do updates that result in a change". (I'm actually surprised they haven't optimized that aspect already.) Also, if others are accessing the table to be updated at the same time, then table locking could cause the time delays, since it could only lock the table when nobody else has a transaction pending. The smaller number of updates might only have to lock those records instead of the entire table. SinisterLefty (talk) 20:47, 9 August 2019 (UTC)[reply]
It is necessary to look at the plan to see what is taking time. It is very likely that indexing is taking a very long time. If you update a value to the same value, the DBMS knows it has been updated. The row is kicked to a heap of some kind and then reinserted into the index tree (the specifics depend on the actual DBMS). In doing that, you may need to rebalance your index. That takes time. You can avoid all that indexing stuff if you add to your where clause. Instead of "update mytable set x=5 where id=100", add "update mytable set x=5 where id=100 and x!=5". Now, it will be processed, but it won't be updated and it won't be reindexed. 199.164.8.1 (talk) 13:30, 12 August 2019 (UTC)[reply]
There could also be a flag to set, something like FORCE_UPDATE = false, which will tell it not to force an update where no change occurs. SinisterLefty (talk) 15:26, 13 August 2019 (UTC)[reply]