You are not printing the string, you are printing the string literal; it is the strings representation:
>>> 'Hello\nWorld'
'Hello\nWorld'
>>> print 'Hello\nWorld'
Hello
World
>>> print repr['Hello\nWorld']
'Hello\nWorld'
Whenever you echo a variable in the Python interactive interpreter [or IDLE], the interpreter echoes the value back to you:
>>> var = 'Hello\nWorld'
>>> var
'Hello\nWorld'
Printing the value, however, outputs to the same location, but is a different action altogether:
>>> print var
Hello
World
If I were to call a function that printed, for example, and that function returned a value, you'd see both echoed to the screen:
>>> function foo[]:
... print 'Hello\nWorld'
... return 'Goodbye\nWorld'
...
>>> foo[]
Hello
World
'Goodbye\nWorld'
In the above example, Hello
and World
were printed by the function, but 'Goodbye\nWorld'
is the return value of the function, which the interpreter helpfully echoed back to me in the form of it's representation.
Escape Characters
To insert characters that are illegal in a string, use an escape character.
An escape character is a backslash \
followed by the character you want to insert.
An example of an illegal character is a double quote inside a string that is surrounded by double quotes:
Example
You will get an error if you use double quotes inside a string that is surrounded by double quotes:
txt = "We are the so-called "Vikings" from the north."
Try it Yourself »
To fix this problem, use the escape character \"
:
Example
The escape character allows you to use double quotes when you normally would not be allowed:
txt = "We are the so-called \"Vikings\" from the north."
Try it Yourself »
Other escape characters used in Python:
\' | Single Quote | Try it » |
\\ | Backslash | Try it » |
\n | New Line | Try it » |
\r | Carriage Return | Try it » |
\t | Tab | Try it » |
\b | Backspace | Try it » |
\f | Form Feed | |
\ooo | Octal value | Try it » |
\xhh | Hex value | Try it » |
Escape Sequence is a combination of characters [usually prefixed with an escape character], that has a non-literal character interpretation. Such that, the characters sequences which are considered as an escape sequence have a meaning other than the literal characters contained therein. Most Programming languages use a backslash \ as an escape character. This character is used as an escape sequence initiator, any character [one or more] following this is interpreted as an escape sequence. If an escape sequence is designated to a Non-Printable Character or a Control Code, then the sequence is called a control character.
List of Escape Sequence in Python:
\’ | Single quote |
\” | Double quote |
\\ | backslash |
\n | New line |
\r | Carriage Return |
\t | Horizontal tab |
\b | Backspace |
\f | form feed |
\v | vertical tab |
\0 | Null character |
\N{name} | Unicode Character Database named Lookup |
\uxxxxxxxx | Unicode Character with 16-bit hex value XXXX |
\Uxxxxxxxx | Unicode Character with 32-bit hex value XXXXXXXX |
\ooo | Character with octal value OOO |
\xhh | Character with hex value HH |
The above table is applicable for Python programming language, as different languages have different control sequences and control characters so the above table may not work in your programming language of choice. Ex. Windows Command Line interpreter uses a caret [ ^ ] to escape characters, and therefore the above table won’t be applicable there.
Escape Sequence Interpretation
Escape sequence interpretation is done, when a backslash is encountered within a string. After the encounter of a backslash [inside a string], any following character [with the [ \ ]] would be looked upon the aforementioned table. If a match is found then the sequence is omitted from the string, and its translation associated with the sequence is used. If a match is not found, then no lookup happens, and the control sequence is copied as it is.
Example
Python3
print
[
"I will go\tHome"
]
print
[
"See you\jtommorow"
]
Output:
I will go Home See you\jtommorow
As seen in the above output, the first print statement produced an output where the \t got resolved into a vertical tab and is omitted in the output. On the other hand, in the second print statement, the \j persists, as no legal resolution for that sequence exists.
Preventing Escape Sequence Interpretation
There are instances where we don’t want the strings to behave in this way. In those cases, we generally want to preserve the backslashes. Some of the situations in which this may be required are:
- String contains a Network or Local path
- String contains regex, which would further be processed by the regex engine
Methods of Prevention
Method 1:
Consistently doubling the backslashes, also allows us to overcome such issues. In this method, we manually find every single backslash in the string and concatenate another backslash to it [at its immediate position]. Generally, a tedious method, and only advised if the string size is less.
Python3
s
=
"I love to use \t instead of using 4 spaces"
print
[s]
s
=
"I love to use \\t instead of using 4 spaces"
print
[s]
Output:
I love to use instead of using 4 spaces I love to use \t instead of using 4 spaces
Method 2:
Using r’….’ or R’…..’ construct. Commonly referred to as raw strings, which is used to preserve the escape sequences as literals. Such that it does what the previous method did but automatically [does not require human intervention]. For turning a normal string into a raw string, prefix the string [before the quote] with an r or R. This is the method of choice for overcoming this escape sequence problem.
Python3
s
=
"C:\Program Files\norton\appx"
print
[s]
s
=
r
"C:\Program Files\norton\appx"
print
[s]
Output:
C:\Program Files ortonppx C:\Program Files\norton\appx
Problems due to escape characters may not always result in undesirable output, but also errors. For example, the below code upon execution will produce an error.
Python3
print
[
"C:\Users\Desktop\JSON"
]
Produces the following error
print[“C:\Users\Desktop\JSON”]
^
SyntaxError: [unicode error] ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape
The error is caused because the \U in the string leads to the next 4 characters being treated as a 32-bit Hexadecimal value which would correspond to a Unicode code point. Which leads to an error as the next character is s which are outside the base 16 range.