Talk:Newline
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
Flat file vs. Text processing / Desktop publishing
editThe article currently describes only the usage in text files/flat files. Somebody should add a section about Text processing/Desktop publishing as well. User:ScotXWt@lk 12:25, 17 December 2017 (UTC)
Merge
editPlease discuss merging this article with End-of-line in the "Merge?" section. — Preceding unsigned comment added by Chris Chittleborough (talk • contribs) 06:54, 6 September 2006 (UTC)
- Not that there's anything to discuss any more, as the merger has been done. Guy Harris (talk) 06:23, 23 August 2017 (UTC)
Examples to convert line endings
editThere are examples how sed works with line endings. Those are examples which work only with some variants of sed, but require a non-posix compliant sed (but like available on most linux). E.g. http://www.grymoire.com/Unix/Sed.html#uh-nl - the variant described here e.g. don't work with the versions shipped with MacOSX. — Preceding unsigned comment added by 195.4.206.111 (talk • contribs) 20:28, 22 September 2009 (UTC)
1958 version of ASCII
editThis requires further explanation, because of an apparent contradiction with "1960 Originated what have become the ASCII and ISO character codes." on http://www.unt.edu/isrc/Faculty/FacultyFellows/bemer.htm, and with the material on http://www.bobbemer.com. Also, http://homepages.cwi.nl/~dik/english/codes/stand.html says that the first standardised version dates from 1963. — Preceding unsigned comment added by 213.119.232.254 (talk • contribs) 16:13, 13 December 2003 (UTC (UTC)
AIX
editAIX is a Unix variant after all. Does it really follow OS/360?? Yaron 21:47, Aug 2, 2004 (UTC)
- OS/360 and Unix have nothing in common except that OS/360 was developed by IBM before Unix existed and AIX is a distribution of Unix from IBM. Windows is more like Unix than OS/360. Sam Tomato (talk) 07:36, 24 January 2017 (UTC)
Unclear text
editWhile doing a thorough copyedit of this article, I marked a few places where the text was unclear to me with HTML comments. Please see the page source. - dcljr 08:10, 17 Sep 2004 (UTC)
- You write "what is
X'15'
??" I believe it's simply what a C programmer would call0x15
-- it's a convention for hexadecimal I've seen in a few places. (Given that it appears in the EBCDIC section, perhaps it's an IBM-specific notation?) JTN 20:47, 2004 Oct 4 (UTC)
- See IBM Knowledge Center - ASCII and EBCDIC character sets. In EBCDIC, hexadecimal 15 is new-line and hexadecimal 25 is line feed. However I find an IBM conversion table that is more confusing. Sam Tomato (talk) 07:56, 24 January 2017 (UTC)
Solutions
editWould be nice if this article included or linked to methods for converting the various formats since this is still occasionally problematic. I doubt my long-standing method of running a sed command from vi involving CTRL-V CTRL-M is optimal... --16:13, 17 May 2005 (UTC) — Preceding unsigned comment added by Chinasaur (talk • contribs) 16:13, 17 May 2005 (UTC)
I just added perl program and a couple of other conversion hints including one for emacs. Hope that helps? It's the best I know. Grem 10:43, 13 September 2005 (UTC)
Merge?
editThis has been a problem for some time, and its only getting worse. The articles Line feed, CRLF, Carriage Return, and Newline all contain pretty much the same information, just phrased differently -- I propose we merge and redirect all other articles to this one, as it is the most complete and has a platform-neutral title. Comments? 63.188.144.35 19:16, 11 Jun 2005 (UTC)
- I'm not entirely convinced that line feed and carriage return shouldn't retain their own articles, as their distinct semantics don't quite sit right under newline. I agree that all the stuff about line-ending conventions, including the whole of CRLF, should come to newline, though, with prominent pointers from the individual articles. -- JTN 22:48, 2005 Jun 11 (UTC)
- I agree. It was just bad to begin with these separate articles for very closely related topics. Today, line feed and carriage return simply denotes a newline, and historical backgrounds can be easily put in each corresponding section. -- Taku 22:53, Jun 11, 2005 (UTC)
Well, it seems there's consensus for CRLF at the very least, so I'll go ahead and merge it in. As for line feed and carriage return, I realize they have historical signifigance, but its not exactly a practical usage nowadays. I think the historical implications of the terms can be best described in the Typewriter article; it already explains that the terms have been adopted amongst computer users. 63.188.168.95 17:48, 12 Jun 2005 (UTC)
This article should be merged with End-of-line, which has a {{mergeto|newline}} posted in it. --Mareklug talk 18:20, 5 September 2006 (UTC)
- Yes, merge CWC(talk) 06:44, 6 September 2006 (UTC)
- Yes- end-of-line could remain as a pointer to the Newline article, but i think the content fits best in newline: \js 17:04, 7 November 2006 (UTC)
- End-of-line has been merged here. Guy Harris (talk) 06:24, 23 August 2017 (UTC)
"Line anchors"
editI haven't found any reference to "line anchors" on the web, except in the context of regular expressions, where the term seems to be used for the '^' and maybe the '$' zero-width assertions when in multi-line mode. While these constructs obviously work with newlines, they are not newlines, so I think the statement in the intro that newlines are sometimes called line anchors is wrong. If there is some other use of the term line anchor that I'm not aware of, could somebody please cite a source? --K. Sperling 10:38, July 25, 2005 (UTC)
- I thought that I heard the term "line anchor" somewhere I don't remember. I will restore the mention of this if I find some reference. -- Taku 23:18, July 25, 2005 (UTC)
Entering Special Characters / Editor Treatment and Conversion
editI'm going to remove these two sections, because the content is inaccurate and partially also irrelevant. I wanted to give a more detailed explanation here first, though:
- ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.
- Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.
- The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.
- The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).
- Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.
- And last but not least, questions should go on the talk page, never in the article!
--K. Sperling (talk) 23:53, 13 September 2005 (UTC)
- ctrl-j / ctrl-m: These aren't codes for LF and CR. The ctrl-? / C^? / \c? notation / keyboard translation / escape sequence is just a way of referring to a character with a certain number. J is the 10th letter in the alphabet (counting from A=0), so ctrl-J is the simply the character with number 10, i.e. 0x0A. Similarly, Ctrl-M is 0x0D. Saying that these are "usually" LF and CR is wrong, unless you assume that computers "usually" use ASCII.
Most computers DO usually use ASCII.
- Entering them in vi/emacs/etc: I don't think this article is about teaching people how to enter control characters into various shells and editors.
This is one of the most common questions I get. I would like to be able to point users at Wikipedia to help solve it. Should I start a new article? Can you suggest a title?
- The Common Problems section already says that modern text editors generally recognize all flavours of CR / LF newlines; this obviously includes the mentioned vi, emacs and eclipse (even though some people might not consider vi modern ;-). I don't think the article needs to mention how to perform the conversion in emacs specifically, though.
Again, conversion is a very common question, and it not addressed elsewhere. Instead of just removing useful information, you might instead move it to a better place? Why is it wrong for the conversion section?
- The perl one-liner will actually work on UNIX. Whether or not it works on Cygwin depends on the Cygwin configuration: If Cygwin is configured to use DOS/Windows newlines, it won't work, because the script won't see any CRs on input and they will be re-added on output (Perl uses the same text/binary IO modes as C does, and files are in text mode by default).
Good point. I would be happy to see the one liner in Conversion as mentioned that it will work under UNIX. Any reason why not?
- Neither GNU make nor bash ignore a final unterminated line in the versions I have tested (GNU Make 3.80, GNU bash 2.05b.0(1)-release). The only program I can think of off-hand that still has this bug is cron.
Who said GNU? The fact is that not everyone is using the latest version of all-GNU products. In any case, the caveat is useful even without the particular examples.
- And last but not least, questions should go on the talk page, never in the article!
I don't really understand this. As a reader, seeing a question in the article points me to relevant information where I am encouraged to edit and contribute. Isn't that the point of Wikipedia? There was some discussion of "publishable" versions, and I understand that questions look unprofessional in such a text. Which is more important?
Grem 11:35, 15 September 2005 (UTC)
- I'm well aware that many computers use ASCII, but it's also a fact that many don't. Especially seeing that there is a fair bit of confusion even among programmers (e.g. many people don't realize that CR and LF exist in other codes besides ASCII and have different numerical representations there), it's important not to gloss over these details. A statement like "A line feed is usually typed ctrl-j" is simply too imprecise in this context -- not only because it ignores the issue of character sets other than ASCII; in a GUI-based application (e.g. on Windows) pressing crtl-j will often not produce any character at all.
- Bash is GNU Bash, and 2.05 is not the latest version, they're up to 3.something. GNU make is probably one of the more widely used make implementations, too. You can't just go claiming that "Some unix programs (like make and bash) will silently ignore the last line if there is no newline at its end.", listing bash and make (without naming any versions or specific make implemenations) as examples when the problem doesn't exist in very widely used versions. It really wouldn't have hurt if you'd tried to verify these claims before adding them to the article. (Incidentally, the introduction already says that some programs have problems if the last line isn't NL terminated, without restricting it to Unix.)
- About editors (and conversion utilities), if you include Emacs and vi, why not also include pico, nano, Scite, KWrite, UltraEdit, ...? This article is about newlines, not about how to use one editor or another. Also see Wikipedia:What_Wikipedia_is_not, particularly "wikipedia is not instructive". I realize that viewing text files from other platforms is a somewhat common problem for end-users, so I think it's OK to have a few hints for the most commonly used platforms, and I've added one way to do it for Windows and listed two methods for Unix, but generally this is not what this article (or probably any article of an encyclopedia) is about. I don't see much reason for including a Perl version; there's already the comfortable dos2unix one mentioned, and tr (which is part of the POSIX standard, and available on partically every Unix platform) for where dos2unix isn't available. It could also be done with sed, awk, or even in plain bash; I don't see what merits inclusion of the Perl version. If you get asked about this often, and want to point people somewhere, why not point them to the manual of whatever editor they're using.
- As for the questions to prompt contributions, I don't have a link handy, but I'm fairly certain it's mentioned on some policy or guide. It's just not done, and the fact that you edited without being prompted by a question also proves that it's not necessary ;-) --K. Sperling (talk) 13:38, 15 September 2005 (UTC)
Using the diff Program with Different Line Endings
editWhen you use programs like diff to compare the text in two file which uses different line endings there are some ambiguites. Unlike most modern text editors the original unix diff program and GNU diff seems to think that the files differ even though the content except for the line endings are the same. This makes porting between for example GNU/Linux and MS Windows more difficult. The http://www.GnuWin32.org/ port of GNU utilities have changed this behaviour so that diff does not care whether CR/LF or just LF is used as the line ending. This seems useful to me and I think that just as the text editors the diff program ought to think of line endings as just line endings and not care about its actual format when comparing two files. — Preceding unsigned comment added by 83.249.205.211 (talk • contribs) 19:01, 24 September 2005 (UTC)
Conversion using Windows Notepad/Edit?
editIt's true Notepad doesn't understand LF as "new line". But instead of using edit (and advising Windows users to use old text DOS program) I'd recommend Wordpad. It opens files greater than 64 KiB (Notepad can't do this) and easily converts LF to CR LF with just open/save. Wordpad is part of standard Windows distribution and is for sure more Notepad-user-friendly than any DOS tool. I'd change the page myself, but, as can be seen in this text, my English skills won't suffice :)
- Hm, I don't have wordpad installed on my windows xp... maybe I manually de-selected it during the installation, I don't really remember. Maybe we should just mention both Wordpad and EDIT then. --K. Sperling (talk) 13:20, 1 October 2005 (UTC)
- From the command line (Windows Vista may also apply to other versions such as XP) WordPad is executed by the command "write" and there is no "wordpad" command. The advice above about using WordPad in Windows to convert line endings is valid and useful but should include mention that to execute WordPad from the command line it the user needs to enter "write" as in C:>write sample.txt otherwise it may appear that WordPad is not installed. --Pdegregorio (talk) 12:40, 7 May 2009 (UTC)
Is the DOA mnemonic Original Research?
editI'm troubled by one of my own inclusions on this page. I have used the DOA mnemonic, which I added to the Newline in programming languages section, to help myself remember the hexadecimal equivalent of CR-LF in assembly language using the debug program on Windows. It is short, simple, and (I think) useful to programmers; but in the interest of fairness, my conscience requires I state:
- I have not seen this mnemonic mentioned in any book or website, and so it could be argued that it is unverifiable/OR (not suitable for Wikipedia); this was particularly a problem with my original phrasing, which seemed to indicate that it was widely-used (I've rephrased it).
- However, it is so simple that it could be thought of by anyone familiar with the hex code for CR-LF. In other words, it might have been thought of before and I am simply unaware of it.
- Short pieces of code and such are regularly contributed to technical articles (including this one) without any source, and some are obviously at least semi-original. This a form of "mental code", if you will, to aid memory. We do not hunt down every minutely original thing, do we? (If we did, how could we do anything but copy exact phrasings of others... and wouldn't that violate copyright?)
I'm really up in the air about this one. Any thoughts of anyone else would be appreciated. BlueGuy213 04:52, 30 January 2007 (UTC)
- Any information which is not mentioned in any primary or secondary source (such as a textbook or research paper) is not appropriate for Wikipedia. That's covered by WP:OR. Also relevant: Wikipedia is not for things made up in school one day — which in the intro paragraph actually mentions "original mnemonics" as something that Wikipedia Is Not for. (I didn't know that that was there until just now, either.) WP:NOT also mentions that Wikipedia is not a "how-to" guide, so if an article contains a mnemonic, it should be because the mnemonic is encyclopedic (e.g., popular), not just "here's how to remember XYZ..."
- That particular mnemonic is a tad morbid, don't you think? I'm not surprised it's not used in any textbooks!
- I'm going to remove the "dead on arrival" mnemonic from the article, now that we've established that it's original research, and thus no citation can be provided. (Console yourself with the thought that since it's so simple, anyone who needs it will immediately think of it on their own, without needing help from Wikipedia. :) --Quuxplusone 08:17, 30 January 2007 (UTC)
- I agree that it is somewhat morbid, but it has been useful to me also, so I was hoping it could be kept. I guess if policy specifically says no original mnemonics, then I've got no legs to stand on. Oh well, I'll move on to other things... but maybe someday I'll write a computer book using it (and then maybe somebody else will add it back)! 75.5.199.76 09:59, 30 January 2007 (UTC)
- The previous post was mine (forgot to sign in). BlueGuy213 10:01, 30 January 2007 (UTC)
Conversion utilities
editThe overwriting example given like this:
cat filename | tr -d '\r' > filename
only worked because both cat's own buffering and the in-kernel pipe buffer meant that the beginning of the file could be read and eventually given to tr before the shell would truncate it and setup it as tr's STDOUT. You would lose the rest of your file if it was larger than those buffers (usually no more than a few tens of KB). It doesn't work because the file is truncated before you begin to overwrite it, but it might have worked somewhat if it were just overwritten since the new chunks would only be 1 char larger or smaller per-line and the buffering would have allowed enough slack to prevent unread parts from being overwritten. But that would have been risky still. —Preceding unsigned comment added by 24.200.77.59 (talk • contribs)
- True enough (depending on the OS and a bunch of other stuff, obviously). I've removed the useless use of cat. (Gee, that's a silly redirect, but it does seem to get used occasionally...) If some reader doesn't realize that infile and outfile must be different files on some operating systems, then that reader probably isn't going to be using tr to port text files between different OSes in the first place. There's no need to clutter the article itself with irrelevant technical minutiae. --Quuxplusone 06:06, 25 April 2007 (UTC)
Terminal conventions.
editDoes anyone know what conventions were used by Terminals like the VT100? I'm aware the IBM PC compatibles had ROM BIOS that converted the key code from what is now called the "Enter" key in to CR, Ox0D,CNTRL-M, but what did standard serial terminals send?
- Usually CR, but you could tell the terminal to send LF instead using an escape sequence. CWC 08:04, 5 July 2007 (UTC)
- They sent ^M, and required both ^M and ^J to go to the start of the next line. VT220's at least had a configuration option to make ^J act like both ^M and ^J, not sure about VT100. —Preceding unsigned comment added by Spitzak (talk • contribs) 16:51, 13 October 2009 (UTC)
Disambiguation
editI have created a disambiguation page for line break and changed the link that redirects to newline from line break to line break (computing). I also have a copy of this page saved in case anyone is looking for it, but I'm pretty sure it's just as easy to find now. —Preceding unsigned comment added by Ark2120 (talk • contribs) 16:55, 18 October 2007 (UTC)
some modern Adobe products still exhibit the obsolete Mac OS 9 linefeed behavior
editWe encountered this issue while trying to svn checkin a Flash actionscript edited alternately on Mac OS X 10.4.10 with "Adobe Flash CS3" and "Adobe Flex beta 3", built on Eclipse. Adobe Flash CS3 saves with CR, while Flex save with LF only (the right thing for a psuedo-UNIX like Mac OS X).
Anyhow, just thought Adobe should be publicly flogged for persisting Mac OS 9 behavior well past its obsolence, and praised for moving in the right direction finally. —Preceding unsigned comment added by 208.72.192.23 (talk) 01:08, 22 November 2007 (UTC)
Office X does the "CR" thing too, if you export an Excel document as text it'll use CR line delimiters. I don't think CR is "obsolete Mac OS 9" behavior; Mac OS X is a platform consisting of multiple glued together systems, and as such it has more of these kinds of problems than most other platforms. Adobe shouldn't be flogged so much as Apple for not laying down a standard and ensuring the Carbon and Cocoa APIs encourage that standard's use. --Squiggleslash (talk) 15:27, 22 November 2007 (UTC)
Programing Helpers
editI added a bit about special handling of newlines during program execution to the "Newline in programming languages" section. It feels a bit wordy to me, but is the best I could explain it. Someone more gifted with English may want to clean it up. badmonkey (talk) 03:58, 18 December 2007 (UTC)
I am deleting the part about C++ std::endl. Using it "because '\n' isn't portable" is a very frequent bug in C++ programs, hurting performance by forcing a stream flush for every line written. I see the section correctly mention what std::endl does near the top, but further down it lists it among things "[faciliting] newlines during program execution", which is plainly wrong. JöG (talk) 10:49, 30 March 2008 (UTC)
AT command
editAccessing modems using the AT command set instructions are terminated with a carrige return symbol. You can read this in Command and Data modes (modem) and Hayes command set. I have tested this also with minicom and an old POTS modem. Thence this should be mention but I'm not that good in english to edit an article in such a major topic. --84.156.100.251 (talk) 17:50, 20 March 2008 (UTC)
End of Line detection
editIn the bash and ksh93 shells the following will not work as '\r\n' will be seen as the string 'rn'.
egrep -L '\r\n' myfile.txt # show UNIX style file (LF terminated) egrep -l '\r\n' myfile.txt # show DOS style file (CRLF terminated)
For these shells, one need to use builtin shell expansion $'word'
egrep -L $'\r\n' myfile.txt # show UNIX style file (LF terminated) egrep -l $'\r\n' myfile.txt # show DOS style file (CRLF terminated)
The file command should also detect the type of EOL used:
file myfile.txt > myfile.txt: ASCII text, with CRLF line terminators
Other tools permit to visualise the EOL characters like the following commands:
od -a myfile.txt cat -e myfile.txt hexdump -c myfile.txt
--Ripat (talk) 13:51, 24 June 2008 (UTC)
- The grep commands in the article (the same as these above) do not work with bash 4.1.5 and grep 2.6.3, which is the second-latest stable version. -Pgan002 (talk) 18:42, 14 April 2011 (UTC)
Inherently smarter
editDear Sirs, I notice there are different ways to write a paragraph:
- No newline (\n) until the very end, maybe hundreds of characters along.
- Newlines every 80 characters or so. Then a pair of them at the end.
Please mention which format is inherently smarter. Jidanni (talk) 01:50, 2 February 2009 (UTC)
- Are you looking for ways to store multiple rectangular texts in a file? 2A01:119F:31B:5D00:CDA:B9B8:6D94:5EE (talk) 15:10, 22 September 2020 (UTC)
Microsoft Excel for Mac txt export =
editI recently noticed that the export plugin of Excel for Mac exports txt files with CR only. Mayb this should be mentioned since it's a bad error source for programmers expecting LF and CR LF is all they need to take care of. CatzHoek (talk) 20:51, 4 April 2010 (UTC)
- Please inform the Excel version number.Hyungjin Ahn (talk) 05:22, 1 January 2011 (UTC)
Delete section 5.1 (Microsoft product)
editHi, I don't think that the section on one particular Microsoft product is relevant for this article. I am sure there are many more programs that have bugs / incorrect handling when it comes to newlines and this is not the place to document them. I suggest to remove section 5.1 and 5.1.1 Drdee (talk) 18:51, 3 June 2011 (UTC)
Clarification needed for operating system dependence
editWhen one makes a statement such as "Windows uses CR+LF" or "UNIX uses LF", what precisely makes these operating systems dependent on the respective convention? Namely, are we referring to command line shells, native GUI text widgets, keyboard driver, etc? There are several examples of software that can interpret these characters however they wish (or give the option to the user). As such, it is misleading to say that "On this OS, xx is the newline sequence" when the more accurate statement would be "This software in the OS treats xx as the newline sequence". Otherwise please explain why it is that (without a hack of some sort), LF cannot be used as a newline on Windows. Ham Pastrami (talk) 01:42, 26 September 2011 (UTC)
- Yes, but the text says "...usually represent a newline..." which seems simple and accurate. Of course LF can be used as a newline on Windows, and I can't think of anything in the kernel which "uses CR+LF", but is there any text in the article that is wrong or misleading? Johnuniq (talk) 03:35, 26 September 2011 (UTC)
Well, the operating system dependence exists in libc, one of the most fundamental library of both OS. For example, printf("\n") has different runtime behavior (Windows outputs CR+LF while Unix outputs LF) by default. An application can bypass libc convention on its OS but it may be considered as non-native. — Preceding unsigned comment added by 130.126.60.252 (talk) 06:12, 27 October 2012 (UTC)
- libc is one of the most fundamental libraries of *nix, because by design and specification the C library is the OS library (see the POSIX specification). On Windows, MSVCRT.dll is the compatibility layer that makes it possible to program against the standard *nix library. When people first started porting C (a 'small language with only 32 keywords', they used the OS library instead of the *nix library, but that made the code totally non portable, which lead to the definition of the 'standard C library'. The 'standard C library' (which includes the definition of 'text' and 'binary' modes) is not a definition of MS Windows.
- The behaviour of the file operations in the standard C library is important in the standard C libary, because the string operations assume terminated strings. Other languages which do not use the standard C library for string handling do not have file operations designed for interoperability with the standard C string handling libary.
- In DOS, there were two fundamental file API's: 'text' a fast, forward only, character API which determined EOF by the presence of an EOF character. Commonly used for line-oriented read and append. And 'binary', a slow, block/record API with forward/reverse and edit to record position. which determined EOF by the file size recorded in the directory. This distinction does not exist in Windows. The Windows API (CreateFile,ReadFile,WriteFile etc) does not includes a 'text' mode. — Preceding unsigned comment added by 203.206.162.148 (talk) 07:34, 20 November 2012 (UTC)
LF+CR or LFCR
editThe beginning (abstract?) of the article specifies "LF+CR, or LFCR" as options. Where is the authority for that? I doubt it is true. Sam Tomato (talk) 08:25, 24 January 2017 (UTC)
- I just fixed it. LF followed by CR was only used in a few rare early computers. Unless you work at a computer museum, you will only encounter:
- LF: (Unix/Linux/Mac OS/BSD)
- CR+LF: (Windows/DOS/)
- CR: (Commodore/TRS-80/Apple II/Classic Mac OS)
- --Guy Macon (talk) 17:53, 16 December 2017 (UTC)
Notepad
editNotepad interprets CR LF as newline, but only CR or only LF will show as graphic character. — Preceding unsigned comment added by PiotrGrochowski000 (talk • contribs) 08:53, 17 March 2015 (UTC)
- Not anymore: https://www.howtogeek.com/fyi/microsoft-finally-fixes-notepad-after-20-years-of-inadequacy/ https://www.zdnet.com/article/windows-notepad-finally-understands-everyone-elses-end-of-line-characters/ Chrisahn (talk) 18:44, 10 June 2018 (UTC)
- "Microsoft just announced yesterday that the upcoming update to Windows 10 is going to finally, after 20 years or so, fix Notepad" does not equal "not any more". We need to wait until they actually do it. --Guy Macon (talk) 19:10, 10 June 2018 (UTC)
- Added the cites, as we've waited long enough :-) Lent (talk) 06:52, 4 January 2023 (UTC)
- "Microsoft just announced yesterday that the upcoming update to Windows 10 is going to finally, after 20 years or so, fix Notepad" does not equal "not any more". We need to wait until they actually do it. --Guy Macon (talk) 19:10, 10 June 2018 (UTC)
- And rightfully. A line feed by itself is invalid in the .txt format. It's just that poorly programmed text editors produce tons of invalid text files every year. 160.86.19.55 (talk) 15:38, 23 January 2019 (UTC)
Not true. The .txt format does not define what a newline is.
According to IEEE Std 1003.1-2017 / POSIX.1-2017, the following definitions apply:
- 3.403 Text File
- A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character. Although POSIX.1-2017 does not distinguish between text files and binary files (see the ISO C standard), many utilities only produce predictable or meaningful output when operating on text files. The standard utilities that have such restrictions always specify "text files" in their STDIN or INPUT FILES sections.
- 3.206 Line
- A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
- 3.243 Newline Character (<newline>)
- A character that in the output stream indicates that printing should start at the beginning of the next line. It is the character designated by '\n' in the C language. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the next line.
That "the character designated by '\n' in the C language" language gives us another important clue:
C escape sequences include:
\f -- Form Feed
(often called Line feed)\n -- New Line
\r -- Carriage Return
\t -- Horizontal Tab (often just called tab)
\v -- Vertical Tab
The new-line character \n OS-specific newline byte or byte sequence. That's because in Unix printers and terminals are abstracted in the operating system, and the operating system determines which byte sequences are generated for the device.
Do you want CR? Use \r
Do you want LF? Use \f
Do you want CRLF? Use \r\f
Do you want LFCR? Use \f\r
Do you want the output to automatically be the correct newline sequence for your operating system, whether it be Unix, Linux, Mac OS, BSD, Windows, DOS, Commodore, TRS-80, Apple II, Classic Mac OS or some rare early mainframe? Use \n
--Guy Macon (talk) 16:48, 23 January 2019 (UTC)
- Form feed is often called line feed? Where is that?
- "Line feed" (LF) and "form feed" (FF) are separate ASCII characters - LF moves the paper up by one line (without returning the carriage to the left margin), and FF moves the paper up to the top of the next page.
- But, yes, there is no single standard format for text files:
- Multics, and Unix-like systems, use LF/NL (octal 12) at the end of a line;
- various DEC operating systems, CP/M, DOS, and Windows use CR-LF (octal 15 followed by octal 12) at the end of a line;
- the classic Mac OS used CR (octal 15) at the end of a line;
- OS/360 and successors, RSX-11, and VMS use either fixed-length records or variable-length records with a byte count and no end-of-line character, although VMS later added support for text-stream files;
- and so on, so "A line feed by itself is invalid in the .txt format." is a false statement if "the .txt format" means "text files" or "files with the extension .txt". It may be a true statement it means "DOS/Windows text files", although, in practice, at least some software for Windows probably accepts lines of that format just to make life simpler for people who might have to deal with files from Unix-like systems, just as at least some software for Unix-like systems accepts lines ending with CR/LF, ignoring the CR, just to make life simpler for people who have to deal with files from Windows. Guy Harris (talk) 19:39, 23 January 2019 (UTC)
- @Guy Macon: No, you're thinking of something else? A correct statement is that \r = CR = 13 decimal in ASCII, \n = LF = 10, \f = FF = 12. CR = carriage return = move cursor to left margin; LF = linefeed = move cursor down one line; FF = formfeed = advance paper to top of next sheet when printing. In DOS-based systems a text file has lines separated by \r\n (CRLF). In Unix-based systems a text file has lines terminated by \n (LF, and software that manipulates such files treats LF in the same way that CRLF is treated in DOS). FF is never called linefeed (although, on the internet, someone probably uses that incorrect statement). The IP's comments above are totally wrong. I think Notepad in Windows 10 now treats \n (LF) as a line terminator. Johnuniq (talk) 00:17, 24 January 2019 (UTC)
- I made a silly error regarding form feed. I don't know what I was thinking, but what I wrote was dead wrong.
- Regarding \n, do you disagree or disagree with [ https://en.cppreference.com/w/cpp/language/escape ] which says
- "The new-line character \n has special meaning when used in text mode I/O: it is converted to the OS-specific newline representation, usually a byte or byte sequence. Some systems mark their lines with length fields instead."
- Do you disagree or disagree with with Escape sequences in C, which says
- "Each escape sequence in the above table maps to a single byte, including \n. This is despite the fact that the platform may use more than one byte to denote a newline, such as the DOS/Windows CR-LF sequence, 0x0d 0x0a. The translation from 0x0a to 0x0d 0x0a on DOS and Windows occurs when the byte is written out to a file or to the console, but \n only creates a single byte within the memory of the program itself"
- Do you disagree or disagree with IEEE Std 1003.1-2017, which says
- "Text File: A file that contains characters organized into zero or more lines. Line: A sequence of zero or more non- <newline> characters plus a terminating <newline> character. Newline Character: The character designated by '\n' in the C language. It is unspecified whether this character is the exact sequence transmitted to an output device by the system to accomplish the movement to the next line."
- Do you know of another standard that defines text files in such a way that the definition of newline disagrees with IEEE Std 1003.1-2017?
- Do you disagree or disagree that, in the context of a text file in memory being written out to a file on disk, the definition of \n is OS specific? --Guy Macon (talk) 06:21, 24 January 2019 (UTC)
- I don't know anything about standards for text files but those quotes are correct for what happens in the real world. RSX-11 had a text file format whose name I have forgotten. Each line is prefixed by its length (a binary number), and there are no line terminators or separators. A C program might read characters from a text file using a standard library function. On RSX-11 that program would find \n as a line terminator despite the fact that there would normally be no CR or LF characters in the file on disk. That happens because the standard library translates operating system quirks into a common format. The OP seems to be about Notepad which does not necessarily follow C/Unix procedures, although I think the Windows devs finally made it work with a file using LF terminators to avoid the ridicule that had been rightly dumped on Notepad. Johnuniq (talk) 07:10, 24 January 2019 (UTC)
- Do you disagree or disagree that, in the context of a text file in memory being written out to a file on disk, the definition of \n is OS specific? --Guy Macon (talk) 06:21, 24 January 2019 (UTC)
- And VMS supported at least that if not more. RMS-11 supported both variable and variable with fixed control; I'm not sure whether whatever its predecessor was called supported both. Guy Harris (talk) 08:37, 24 January 2019 (UTC)
- "do you disagree or disagree with [ https://en.cppreference.com/w/cpp/language/escape ]"
- I agree with its description of the way C text-mode I/O works. I disagree with any and all claims that it describes anything whatsoever about the format of text files.
- "Do you disagree or disagree with with Escape sequences in C, which says"
- I agree with that, given that it explicitly says "This is despite the fact that the platform may use more than one byte to denote a newline, such as the DOS/Windows CR-LF sequence, 0x0d 0x0a. The translation from 0x0a to 0x0d 0x0a on DOS and Windows occurs when the byte is written out to a file or to the console", i.e. it explicitly says that it makes no claims about the format of text files.
- "Do you disagree or disagree with IEEE Std 1003.1-2017"
- I agree that it describes what might be called "POSIX text files", although not all systems conforming to some version of POSIX use that format for all text files (some versions of z/OS are UNIX 95-certified, but if you have a file that was read in from a card deck back in the old days and never removed, it's probably a bunch of 80-byte fixed-length EBCDIC card images, with no EBCDIC newlines in it), and not all systems with text files are POSIX-compliant, so it isn't a general description of all text files.
- "Do you disagree or disagree that, in the context of a text file in memory being written out to a file on disk, the definition of \n is OS specific?"
- If you rephrase that as "the definition of whatever a \n turns into when written to a file in a C program", I agree, except that it doesn't necessary "turn into" anything directly - if the OS has a text file format in which a line is represented by, for example, a 2-byte or 4-byte binary octet count, followed by the octets of the line, with no line terminator character, the \n doesn't get mapped to a sequence of characters, it just indicates to the I/O library that the line ends, so you write out the length followed by the character preceding the \n.
- So, again, the C conventions have nothing to do with any standard text file format; they're based on the Multics-derived UNIX conventions, but that doesn't require the target OS's text files to look anything like UNIX text files.
- I.e., there's no universal standard for text files, so both "lines in text files must end with just an LF character" and "lines in text files must end with a CR-LF pair", as general statements about text files, are false. Guy Harris (talk) 07:23, 24 January 2019 (UTC)
- Perfectly said, and that's without even mentioning EBCDIC. Johnuniq (talk) 08:52, 24 January 2019 (UTC)
- And thus the claim by the IP at the top of this thread that "A line feed by itself is invalid in the .txt format. It's just that poorly programmed text editors produce tons of invalid text files every year" is false as a general statements about text files.
- "There's no universal standard for text files" is exactly right. As an example, one of the first use of text files (stored on punch cards instead of magnetic disks) was when IBM expanded on Hollerith's original punched card to store the following characters:
- Perfectly said, and that's without even mentioning EBCDIC. Johnuniq (talk) 08:52, 24 January 2019 (UTC)
- I.e., there's no universal standard for text files, so both "lines in text files must end with just an LF character" and "lines in text files must end with a CR-LF pair", as general statements about text files, are false. Guy Harris (talk) 07:23, 24 January 2019 (UTC)
________________________________________________________________ / -0123456789ABCDEFGHIJKLMNOPQR/STUVWXYZ . $ , 12 / O OOOOOOOOO OOOOOO 11| O OOOOOOOOO OOOOOO 0| O OOOOOOOOO ? ? OOOOOO 1| O O O O 2| O O O O O ? ? O 3| O O O O O O O O 4| O O O O O O O O 5| O O O O O O O O 6| O O O O O O O O 7| O O O O O O O O 8| O O O O OOOOOOOOOOOOOOOOOOOOOOOO 9| O O O O |__________________________________________________________________
- Source: [1]
- While this was unambiguously a "text file", the end of a line wasn't signified by a character. It was signified by running out of space on the 12x80 hole card. The space character was signified by no punches, so if you only put 10 characters on a card the remaining 70 characters were considered to be spaces. --Guy Macon (talk) 18:14, 24 January 2019 (UTC)
Let me clarify. The txt format was made by Microsoft. Microsoft made Notepad to open and save valid txt files. It doesn't treat U+0000000A by itself as a newline, because a newline is only U+0000000D U+0000000A in the .txt format. Your examples of plaintext using different encoding are invalid because they aren't the .txt format. 124.59.78.196 (talk) 19:38, 26 January 2019 (UTC)
- And you know what is and isn't "valid" ... how? And you got your definition for "the .txt format" ... where? The rest of the world seems to think that there are all kinds of different text formats on different systems, and that there exists no One True Standard for text files. And who told you that "the txt format was made by Microsoft?" That's complete bullshit. CP/M was making extensive use of text files with the .txt extension years before Microsoft did, and I am pretty sure (but have not verified) that various mainframes did that even earlier. --Guy Macon (talk) 20:31, 26 January 2019 (UTC)
- Microsoft was known for its awesomeness (until Windows 8 and Windows 10 appeared). So, it's only natural, that they would develop all the things modern people take for granted, like the .txt, .bmp and .wav formats. 46.41.73.166 (talk) 06:51, 27 January 2019 (UTC)
- The above trolling was clearly compiled with inferior tools. My guess is that you used Visual Troll++, or possibly TurboTroll 2000.
- These first generation tools are quite limited, and there is a severe garbage-collection-related performance hit when you try optimizing the output of VT++ for flaming or insults.
- I suggest that you try the latest version of GTC; the Gnu Troller Collection. It is *the* standard when it comes to creating Trolls. It is also Free and Open Source, reentrant, and is fully compliant with the Triple Troll, Troll-On-Troll and TrollChow protocols. --Guy Macon (talk) 17:38, 27 January 2019 (UTC)
Intelligibility
editWithout knowing already what this article is about, I would not understand it at all. Currently the article begins with:
»In computing, a newline, also known as a line ending, end of line (EOL), or line break, is a special character or sequence of characters signifying the end of a line of text and the start of a new line. The actual codes representing a newline vary across operating systems, which can be a problem when exchanging text files between systems with different newline representations.«
- A link to character encoding should be there as well, to embed this whole thing somewhere (<- BTW, that article is also crap, prefer de:Zeichenkodierung.)
- character (computing) should be replaced with control character
- For practical purposes one could bother to mention the most frequent issue: to encode a line-break/newline in a text file UNIX uses
0a
, Microsoft uses0d 0a
. - printing User:ScotXWt@lk 10:51, 16 December 2017 (UTC)
- Some context for the title "wall of text" would be nice. This phrase is used as a title and never explained. I assume this refers to the what you get in an editor if newlines are omitted. — Preceding unsigned comment added by Iohazard (talk • contribs) 14:22, 13 March 2018 UTC (UTC)
I moved around a lot of stuff and got rid of the "Wall of text" section (whose title was probably not on topic, but mild meta-level sarcasm). I hope the whole thing makes more sense now. Chrisahn (talk) 18:41, 10 June 2018 (UTC)
In popular culture?
editEnd of line is significantly referenced in Tron and less so in Battlestar Galactica. Probably elsewhere. Is it worth a brief section? Titaniumlegs (talk) 21:42, 22 January 2020 (UTC)
Visual representations
editThere should be a section on visual representations of line breaks, such as these:
/ ⪪ ≡ ↵ ⏎  ␍ ␊
I know some of these are briefly mentioned, but there should be a section with examples of how these are used. — Preceding unsigned comment added by 77.61.180.106 (talk) 10:25, 2 June 2020 (UTC)
Escape codes
editMy contribution of the RS escape code (revision 962480569) was undone for "contradicting" this document: https://www.qnx.com/developers/docs/6.3.0SP3/dinkum_en/cpp/charset.html#Escape%20Sequences
I do not see how this contradicts that document. While C does not have a mnemonic escape sequence for the record separator character, a numeric sequence can be used (see the "Numeric Escape Sequences" section of the aforementioned document). The RS character is 30 in decimal (as documented within the article) which is 36 in octal, hence the \036 escape code. 68.117.55.155 (talk) 00:45, 15 June 2020 (UTC)
- The actual language was "This appears to contradict [ https://www.qnx.com/developers/docs/6.3.0SP3/dinkum_en/cpp/charset.html#Escape%20Sequences ]. Please discuss on talk page." I have restored your version. --Guy Macon (talk) 01:03, 15 June 2020 (UTC)
IETF quote
editHello User:Guy Harris,
I can't say I understand your reasoning for changing back the capitalisation. MOS:CONFORM states:
Generally preserve bold and italics (see § Italics), but most other styling should be altered. Underlining, spac ing within words, colors, ALL CAPS, small caps, etc. should generally be normalized to plain text.
What is the importance of leaving those words capitalised? You say "the capitalisation gives the words their magic powers" (not a direct quote). I'm not sure what that means, or why it's important in this article. Please can you elaborate? Regards, DesertPipeline (talk) 08:43, 3 March 2021 (UTC)
- RFC documents form the standards for a lot of networking and related protocols (the text in question concerns newlines in emails). If an RFC says "must" it is a sort-of opinion, but if it says "MUST" it means that the associated text is part of the standard and must be followed by compliant systems. The edit summary mentions RFC 2119 which documents this. The caps are part of computer jargon. Johnuniq (talk) 09:16, 3 March 2021 (UTC)
- Very well then. Thank you for the explanation, User:Johnuniq. DesertPipeline (talk) 09:25, 3 March 2021 (UTC)
"Line feed" listed at Redirects for discussion
editA discussion is taking place to address the redirect Line feed. The discussion will occur at Wikipedia:Redirects for discussion/Log/2021 July 28#Line feed until a consensus is reached, and readers of this page are welcome to contribute to the discussion. Shmuel (Seymour J.) Metz Username:Chatul (talk) 15:30, 28 July 2021 (UTC)
"Some text editors"
editThe last sentence of the lead paragraph of the article Some text editors set this special character when pressing the ↵ Enter key.
(revision in question) felt very misleading to me, so I changed it. My gripe was that the wording of "some text editors" implies that this behavior is only relatively common, rather than virtually universal. Now, I know that there are, of course, situations where pressing enter doesn't produce a newline (of any type, be it LF or CR or CR+LF or whatever else), like in a single-line form element on a webpage it'll submit the form, and in an instant messenger it'll send a message, but still I maintain this sentence is misleading, implying that producing newlines isn't the key's primary purpose. In typical text-editing situations, whether in a text editor or a multi-line form element, Enter is the key that produces newlines, and it's a special case where it doesn't (But actually...). I changed it to The newline is the control character (or sequence of characters) produced in most situations when the ↵ Enter key is pressed.
^(But actually...) Software either processes scancodes, that relate to physical keys, or characters, that are produced by the OS after doing keyboard layout mapping (and possibly going thru an IME). Text handling would typically use the latter (which automatically deals with keyboard layouts, different scripts etc), until it needs to handle stuff like the function keys, which don't correspond to any characters. The Enter key is kind of a mixed bag: it both means a literal newline character (unlike F1, which doesn't mean any character), but is also kind of a function key, as it has other common functions – namely, form submission, entering data. Still, my point here is: if Enter produces characters, it produces newlines, and implying it's just some text editors that do this is wrong. — oatco (talk) 00:14, 29 January 2022 (UTC)
- The stuff about the Enter key should be removed as misleading and unhelpful. It's a very simplistic interpretation of what text editors typically do. Also, per WP:LEAD, the introduction should be a summary of what is in the article and I don't see an explanation regarding Enter (or a reference). There is no mention of "newline" at Enter key (it does mention producing a "new line" which is more accurate). Johnuniq (talk) 01:02, 29 January 2022 (UTC)
- Good points. I removed the sentence entirely. oatco (talk) 18:14, 29 January 2022 (UTC)
- I went ahead and also removed the second paragraph, which was just the sentence
When displaying (or printing) a text file, this control character or sequence of characters causes the text editor to show the characters following it in a new line.
, because it was just a rewording of the first sentence. Like the old version of the now-gone Enter key sentence, it was also specific to text editors. oatco (talk) 18:25, 29 January 2022 (UTC)
- I went ahead and also removed the second paragraph, which was just the sentence
- Good points. I removed the sentence entirely. oatco (talk) 18:14, 29 January 2022 (UTC)
Teletypes requiring CR+CR+LF
editSome older teletypewriters did require CR+CR+LF, i.e. a doubling of the CR code, to give the mechanics enough time to finish their movements. Though it might be that these are (or were) all pre-ASCII, 5-bit Baudot code machines.
I saw this requirement in action on machines used by the German military in 1993/94, though the machines themselves were far older. Some of the slightly newer machines had a special key to produce CR+CR+LF, if I remember correctly that key had three horizontal lines on it, just like the modern "hamburger" symbol used to bring up a menu on web pages and other current UIs. Should this be mentioned? -- 2003:C0:972B:6C00:1A07:7E20:B066:1ECB (talk) 22:30, 22 May 2023 (UTC)
- The need for padding characters is pretty well documented, I thought there was something about it here already. Do you know if the extra characters had to be CR or would NUL work? Spitzak (talk) 23:56, 22 May 2023 (UTC)
- I don't remember CR+CR+LF being used -- but NUL certainly was (on Teletypes, but also after some control sequences on display terminals). See the new article output padding. --Macrakis (talk) 17:44, 15 January 2024 (UTC)