Wikipedia:Reference desk/Archives/Computing/2019 August 17

Computing desk
< August 16 << Jul | August | Sep >> August 18 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 17

edit

SQL update

edit

This is from a practical problem but part of the motivation is trying to get better at SQL in general, which should frame any perceived XY problem. I.e. there are only a few hundred rows involved so I could easily write a script that does it row by row, but I'm trying to figure out if it's possible to write a sane SQL update that does the following.

I have a table OLD with columns NAME and ADDRESS, and some of the people in it have moved, so I have another table NEW which also has NAME and ADDRESS (giving the new addresses of the people who have moved). Most of the people in OLD have not moved, so they don't have entries in NEW. I want to update OLD to replace the addresses of people who have moved with their new addresses. This is wrong SQL but is conceptually something like what I'm looking for:

UPDATE old SET address = new.address FROM old JOIN new ON old.name = new.name;

Any suggestions? Thanks. 173.228.123.207 (talk) 01:42, 17 August 2019 (UTC)[reply]

That's pretty much right, but you don't need to put the FROM clause in because you are calling the old table in the UPDATE statement, and you also need to disambiguate the address field, like this:

UPDATE old JOIN new ON old.name = new.name SET old.address = new.address;

--Canley (talk) 02:23, 17 August 2019 (UTC)[reply]
Wow, thanks! Sqlite3 complains of a syntax error near "JOIN" when I try that, but I'll keep loking into that approach (didn't know it was possible to update anything but a named table). Meanwhile I've messed up the table and have to rebuild it from the data sources, which should be easy except the scripts that build it have gotten disorganized so I have to straighten those out first. 173.228.123.207 (talk) 02:54, 17 August 2019 (UTC)[reply]
Oh OK, I didn't know you were using SQLite, that doesn't use JOINs the same way, so I think you would have to use a nested query, like this:

UPDATE old SET address = (SELECT address FROM new WHERE name = old.name) WHERE EXISTS (SELECT address FROM new WHERE name = old.name);

--Canley (talk) 04:15, 17 August 2019 (UTC)[reply]
Thanks! That worked, though I'll have to study it to figure out what it did. It hadn't occurred to me that you could use a select on the rhs of an = like that. It makes sense though. There must be a good place to read up on this stuff: it seems like every sql db has its own sql dialect and it's not really clear how anything works. I'll keep at it though. 173.228.123.207 (talk) 06:33, 17 August 2019 (UTC)[reply]
The term to read up on is "subquery". I find a lot of SQL material online to be very difficult to read, but this is a decent overview. This is an overview based on SQLite. Matt Deres (talk) 12:29, 17 August 2019 (UTC)[reply]

Getting the exact URL of a PDF file I want to archive on megalodon.jp

edit

Hi, guys! At https://www.webcitation.org/6XfufE6Jb?url=http://ci.nii.ac.jp/els/110000469807.pdf?id%3DART0000851255%26type%3Dpdf%26lang%3Den%26host%3Dcinii%26order_no%3D%26ppv_type%3D0%26lang_sw%3D%26no%3D1428634639%26cp%3D there is a PDF file that I want to make a second archive of, on http://megalodon.jp

However the archive seems to only show a blank page https://megalodon.jp/2019-0818-0337-28/https://www.webcitation.org:443/6XfufE6Jb ... I'd like a find a way to archive the actual PDF. However I need the actual URL of the PDF itself, and:

  • When I try to view only the PDF frame the resulting URL is "https://www.webcitation.org/mainframe.php" which obviously won't result in the actual PDF if I archive it.
  • And when I try "View Page Info" in Firefox, the PDF does not show up in the file list (otherwise it would as "embed").

How do I get the exact URL of the embedded PDF file? Thanks, WhisperToMe (talk) 18:41, 17 August 2019 (UTC)[reply]

It looks messy, i.e. the server is sending the pdf to mainframe.php based on a cookie set in the outer page or something like that. Simplest might be to set up your browser preferences for application/pdf to save the pdf to disk instead of previewing it. Then re-upload it to wherever you want your 2nd archive to be. The file is about 1.4MB so the download/upload is not too bad. 67.164.113.165 (talk) 08:21, 18 August 2019 (UTC)[reply]

Yes to explain the IP's answer in more detail, there is sometimes no "exact URL" and the only way that the server will serve you the content you want is when you browser tells the server certain things. This could be via cookies, or it could be via POST or referers or a bunch of other things. These can of course also interact with URLs, so in some cases there is an exact URL, it just won't work for most. (Although it can get complicated. For example for many of those download sites with a crap load of dodgy ads etc will generate URLs when you convince them to, which will maybe work for a limited time and maybe also only from your IP and/or only for a single attempt.) Nil Einne (talk) 16:48, 19 August 2019 (UTC)[reply]
BTW https://archive.is despite their questionable history here on wikipedia, is often good a archiving stuff which other sites have problems with although it doesn't work here [1]. Nil Einne (talk) 16:53, 19 August 2019 (UTC)[reply]

If you're really determined you can usually figure out a way to do scripted downloads from those sites, but if it's just for one file it's not worth the trouble. I had no trouble manually downloading the pdf using a browser. After that it's just a matter of re-uploading it to wherever. 67.164.113.165 (talk) 22:50, 21 August 2019 (UTC)[reply]

Thank you for the explanation! WhisperToMe (talk) 18:33, 11 September 2019 (UTC)[reply]