Computing desk
< May 3	<< Apr \| May \| Jun >>	May 5 >

Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

May 4

This robots.txt confuses me

I asked the Internet Archive about a specific page from census.gov, whereupon the IA told me that it couldn't crawl the Census page because of its robots.txt file; okay, that's normal, but I'm curious to see what the page says. We begin with:

User-agent: *
Disallow: /

Robots exclusion standard tells me that this code tells all robots to stay away. Makes sense, but below this code are lots of instructions to Googlebot, Yahoo! Slurp, and Bingbot. (1) What are these instructions doing, i.e. as long as they're behaving properly, what do these bots do differently because of these instructions? (2) What's the point of these instructions, since all robots have already been instructed to stay away? Do they act as a whitelist for these three websites' bots? Nyttend (talk) 13:24, 4 May 2016 (UTC)[reply]

A number of online sources, such as this one, quote Google as saying "Each section in the robots.txt file is separate and does not build upon previous sections", although the URL they give at for that no longer says those precise words. -- Finlay McWalter··–·Talk 14:18, 4 May 2016 (UTC)[reply]

Furthermore, compliance with the Robot Exclusion "standard" is entirely voluntary. The robots standard provides no technical enforcement to prevent a robot from ignoring any or all directives. Disallow... only helps with well-behaved robots. Nimur (talk) 18:13, 4 May 2016 (UTC)[reply]

Our article says that a number of crawlers permit "Allow" sections to override previous "Disallow" sections, even though this is not strictly conforming to the standard. So yes, it's whitelisting those bots. --71.110.8.102 (talk) 23:07, 4 May 2016 (UTC)[reply]

How do I find folders on my PC ?

This critical part of my Q went unanswered in my EasyPHP Devserser Q from a few days back. Can anyone help me locate these folders and add files to them ? StuRat (talk) 15:08, 4 May 2016 (UTC)[reply]

OK, Google Chrome seemed to recognize "localhost" when I typed it in the address bar. It took me to a page that says EasyPHP Devserver, so that's a good sign. It lists 3 folders there, all empty: my portable files, projects, and scripts. I went into "my portable files", and the address bar changed to "http://localhost/my%20portable%20files/". But here's where I hit a wall. I can't navigate there using the "My Computer" icon on the Desktop, and I can't find it using Start + search bar. So how do I place my hello.php there ? I tried dragging it into the folder in the Google Chrome window, but that didn't work.

I also found an option on the EasyPHP Devserver mini icon (down by the clock) labeled "Local Web", and that takes me to the same folders listed above (within Google Chrome), except the word "localhost" is replaced by "127.0.0.1". StuRat (talk) 04:40, 3 May 2016 (UTC)[reply]

"Localhost" and "127.0.0.1" are the name and address of the local loopback network for modern computers. Both mean "this computer" as opposed to some other computer. 209.149.115.199 (talk) 18:09, 4 May 2016 (UTC)[reply]

Somebody said something to the effect that EasyPHP Devserver "listens to port 80", if that helps any here. StuRat (talk) 15:09, 4 May 2016 (UTC)[reply]

It's probably C:\Program Files (x86)\EasyPHP-5.3.8.1\www. This link might help you: [1] CodeTalker (talk) 15:52, 4 May 2016 (UTC)[reply]

You might find this makes more sense and is easier to work through if you do this all through Cygwin, as described here [2]. Maybe that's a bad idea, but if so someone will probably tell you/me. I just feel like PHP's natural environment is unix-y and CLI-ish, but I'm really just making guesses ;) SemanticMantis (talk) 16:21, 4 May 2016 (UTC)[reply]

With EasyPHP running, you should have an EasyPHP icon in the taskbar. Click on it (right-left-middle-inner-outer - I don't know. I don't use Windows). Some click of some kind should get you into the settings. Under the Apache settings, you will see the httpd.conf file open in Notepad (or some sort of text editor if Windows doesn't use Notepad anymore). Look for DocumentRoot. The value following DocumentRoot is the root directory of the web documents. 209.149.115.199 (talk) 18:02, 4 May 2016 (UTC)[reply]

Found it ! It's in:

C:\Program Files\EasyPHP-DevServer-13.1VC9\data\localweb

I even got my hello.php test script to work !

But I would still like to know if there is some general way I could search for something like this myself. For some reason, the regular Windows search function didn't work when I typed in the names of folders in that directory. What went wrong ? StuRat (talk) 19:17, 4 May 2016 (UTC)[reply]

I assume that Windows search only searches your personal folder. That was in the program files folder. 209.149.115.199 (talk) 19:21, 4 May 2016 (UTC)[reply]

Install cygwin to get access to better search commands. SemanticMantis (talk) 20:19, 4 May 2016 (UTC)[reply]

I believe Windows Search, by default, only looks at the search index, and by default only user profile folders, the Start Menu, and Internet Explorer history are indexed. You can change what is indexed from the Control Panel. You can also explicitly tell Windows to do a full disk search. --71.110.8.102 (talk) 23:01, 4 May 2016 (UTC)[reply]

@StuRat: I use PowerGrep, because fuck Windows Search. PowerGrep is awesome, and that is my totally unbiased opinion. If you do not believe me you can download it from a torrent site and use it for free for a while, I did, but then I ended up using it so often that I actually bought the software to support the creator (who happens to be the sexiest man alive, and a genius). I also recommend his regex tools. The Quixotic Potato (talk) 23:45, 4 May 2016 (UTC)[reply]

Wikipedia:Reference desk/Archives/Computing/2016 May 4

Contents

May 4

This robots.txt confuses me

How do I find folders on my PC ?