Determines whether the string contains only whitespace characters The string.isspace[]Usage
isspace[]
method returns TRUE if the string is nonempty and all characters in it are whitespace characters. Otherwise, it returns FALSE.Syntax
Basic Example
# Check if the string contains only whitespace characters
S = ' '
x = S.isspace[]
print[x]
# Prints True
S = ' a'
x = S.isspace[]
print[x]
# Prints False
ASCII Whitespace Characters
The most common whitespace characters are space ' '
, tab '\t'
, and newline '\n'
. Carriage Return '\r'
and ASCII Form Feed '\f'
are also considered as whitespace characters.
S = ' \t \n \r \f '
x = S.isspace[]
print[x]
# Prints True
Unicode Whitespace Characters
Some Unicode characters qualify as whitespace.
S = '\u2005 \u2007'
x = S.isspace[]
print[x]
# Prints True
Here is a complete list:
Unicode Whitespace charactersUnicode Character | Description |
U+0020 | Space |
U+00A0 | No-Break Space |
U+1680 | Ogham Space Mark |
U+2000 | En Quad |
U+2001 | Em Quad |
U+2002 | En Space |
U+2003 | Em Space |
U+2004 | Three-Per-Em Space |
U+2005 | Four-Per-Em Space |
U+2006 | Six-Per-Em Space |
U+2007 | Figure Space |
U+2008 | Punctuation Space |
U+2009 | Thin Space |
U+200A | Hair Space |
U+202F | Narrow No-Break Space |
U+205F | Medium Mathematical Space |
U+3000 | Ideographic Space |
View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
In Python3, string.whitespace
is a pre-initialized string used as string constant. In Python, string.whitespace
will give the characters space, tab,
linefeed, return, formfeed, and vertical tab.
Syntax : string.whitespace
Parameters : Doesn’t take any parameter, since it’s not a function.
Returns : Return the characters space, tab, linefeed, return, formfeed, and vertical tab.
Note : Make sure to import string library function inorder to use string.whitespace
Code #1 :
import
string
print
[
"Hello"
]
result
=
string.whitespace
print
[result]
print
[
"Geeksforgeeks"
]
Output:
Hello Geeksforgeeks
Code #2 : Given code tests for the whitespace values.
import
string
Sentence
=
"Hey, Geeks !, How are you?"
for
i
in
Sentence:
if
i
in
string.whitespace:
print
[
"printable Value is: "
+
i]
Output:
printable Value is: printable Value is: printable Value is: printable Value is: printable Value is:
Is there a Python constant for Unicode whitespace?
Short answer: No. I have personally grepped for these characters [specifically, the numeric code points] in the Python code base, and such a constant is not there.
The sections below explains why it is not necessary, and how it is implemented without this information being available as a constant. But having such a constant would also be a really bad idea.
If the Unicode Consortium added another character/code-point that is semantically whitespace, the maintainers of Python would have a poor choice between continuing to support semantically incorrect code or changing the constant and possibly breaking pre-existing code that might [inadvisably] make assumptions about the constant not changing.
How could it add these character code-points? There are 1,111,998 possible characters in Unicode. But only 120,672 are occupied as of version 8. Each new version of Unicode may add additional characters. One of these new characters might be a form of whitespace.
The information is stored in a dynamically generated C function
The code that determines what is whitespace in unicode is the following dynamically generated code.
# Generate code for _PyUnicode_IsWhitespace[]
print["/* Returns 1 for Unicode characters having the bidirectional", file=fp]
print[" * type 'WS', 'B' or 'S' or the category 'Zs', 0 otherwise.", file=fp]
print[" */", file=fp]
print['int _PyUnicode_IsWhitespace[const Py_UCS4 ch]', file=fp]
print['{', file=fp]
print[' switch [ch] {', file=fp]
for codepoint in sorted[spaces]:
print[' case 0x%04X:' % [codepoint,], file=fp]
print[' return 1;', file=fp]
print[' }', file=fp]
print[' return 0;', file=fp]
print['}', file=fp]
print[file=fp]
This is a switch statement, which is a constant code block, but this information is not available as a module "constant" like the string module has. It is instead buried in the function compiled from C and not directly accessible from Python.
This is likely because as more code points are added to Unicode, we would not be able to change constants for backwards compatibility reasons.
The Generated Code
Here's the generated code currently at the tip:
int _PyUnicode_IsWhitespace[const Py_UCS4 ch]
{
switch [ch] {
case 0x0009:
case 0x000A:
case 0x000B:
case 0x000C:
case 0x000D:
case 0x001C:
case 0x001D:
case 0x001E:
case 0x001F:
case 0x0020:
case 0x0085:
case 0x00A0:
case 0x1680:
case 0x2000:
case 0x2001:
case 0x2002:
case 0x2003:
case 0x2004:
case 0x2005:
case 0x2006:
case 0x2007:
case 0x2008:
case 0x2009:
case 0x200A:
case 0x2028:
case 0x2029:
case 0x202F:
case 0x205F:
case 0x3000:
return 1;
}
return 0;
}
Making your own constant:
The following code [from my answer here], in Python 3, generates a constant of all whitespace:
import re
import sys
s = ''.join[chr[c] for c in range[sys.maxunicode+1]]
ws = ''.join[re.findall[r'\s', s]]
As an optimization, you could store this in a code base, instead of auto-generating it every new process, but I would caution against assuming that it would never change.
>>> ws
'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000'
[Other answers to the question linked show how to get that for Python 2.]
Remember that at one point, some people probably thought 256 character encodings was all that we'd ever need.
>>> import string
>>> string.whitespace
' \t\n\r\x0b\x0c'
If you're insisting on keeping a constant in your code base, just generate the constant for your version of Python, and store it as a literal:
unicode_whitespace = u'\t\n\x0b\x0c\r\x1c\x1d\x1e\x1f \x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000'
The u
prefix makes it unicode in Python 2 [2.7 happens to recognize the entire string above as whitespace too], and in Python 3 it is ignored as string literals are unicode by default.