Everyone programmer knows newline but maybe not so familiar. In this post, I want to write what I have learned about newline handling in various cases.

Newline characters on different platforms

Due to historical reasons, different platforms use different characters to signify a new line. On Windows, <CR><LF> is used to represent newline. On Linux, <LF>(byte code 0x0A) is used to represent newline. On older Mac1, <CR>(byte code 0x0D) is used instead.

<CR> and <LF> are from the old time when typewriters is used for printing texts on paper. <CR> represents carriage return, which means to put the carriage to its left most position. <LF> represents line feed, which means to move the paper a little higher so that you can type on another line. You can see that these two action combined will start a newline ready for typing.

Newline handling in Python

In Python 2, there is a universal newline mode, which means that now matter what the file line ending is (Linux(LF), old MAC(CR), or Windows(CR-LF)), it will all be translated to \n in Python when reading the file with mode specifier rU.

In Python 3, things have changed. The old U mode specifier has been deprecated in favor of a newline parameter in the open() method. According to the documentation:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, “, ‘\n’, ‘\r’, and ‘\r\n’. It works as follows:

  • When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in ‘\n’, ‘\r’, or ‘\r\n’, and these are translated into ‘\n’ before being returned to the caller. If it is “, universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
  • When writing output to the stream, if newline is None, any ‘\n’ characters written are translated to the system default line separator, os.linesep. If newline is “ or ‘\n’, no translation takes place. If newline is any of the other legal values, any ‘\n’ characters written are translated to the given string.

By default, when reading text files, newline is None, which means that system-dependent newline will be replaced by \n. If you are not aware of this behavior, you may get into trouble. For example, when you read a file with \r\n line ending and want to split the text into lines on Windows platform, if you write:

with open("some_file.txt", "r") as f:
    text = f.read()
lines = text.split(os.linesep)

you will not be able to split the text into lines, because on Windows, os.linesep is \r\n. But Python has secretly translated the \r\n to \n!

When writing files, you should also be aware that \n maybe translated to platform-dependent line endings.

Newline handling in different editors

Vim

When reading a file, Vim will automatically detect the file format for this file2. THen Vim will replace the platform-dependent newline character with a special mark to mark the ending of each line. When writing the buffer content back into the file, Vim will write the actual newline character based on the detected file format.

For example, if you open a file with Windows-style line ending(i.e., ), Vim will replace all <CR><LF> with its own newline mark. If you try to search these two characters using their byte code (\%x0A for and \%x0D for ), you will not find anything. Neither can you find <CR> character using \r in a proper Windows file detected by Vim. When searching in Vim, \n is used to specify end of line, no matter what the actual newline character is for this file.

How do I show the characters in Vim then?

You can open a Windows file in Vim and use e ++ff=unix to force Vim to treat this file as a unix file. Vim will treat the \n characters as newline, thus removing it from the buffer. But the \r characters in the file will now be treated as normal characters and will be shown as ^M. You will see it now.

You can also press <Ctrl-V> and then press <Enter> to type a carriage return character. Then you can search this character using \r.

A caveat in searching and replacing newlines

In Vim, \n is used to represent newline when searching. If you want to represent a newline in replacement, use \r instead3.

Sublime Text

According to discussions here, Sublime Text will also convert platform-dependent newline to \n in memory and when writing to files, it will write newline according to the detected file types.

Notepad++

Notepad++ is also a popular code editor. It can detect your line endings, but it will not replace the newline with \n. To show the newline characters in a file, go to View --> Show Symbol and toggle on option Show End of Line, you will be able to see the newline characters.

Conversion between different file format?

In Vim, you can use set ff=<Format> to covert the current file to desired format, where <Format> can be unix, dos or mac.

In Sublime Text, just choose the desired format from the bottom right status bar.

In Notepad++, go to Edit --> EOL Conversion and choose the desired file format.

References


Title image is taken from here.


  1. Newer Mac system also use the unix-style newline character, see discussion here.
  2. In vim, use :h fileformats, :h file-read and :h file-formats for more info about how Vim detects and file format and reads files.
  3. In replace, \n is used to mean null character \0, which is show as ^@ in Vim. See here for more discussions.