Hướng dẫn dùng re escape python

Both patterns and strings to be searched can be Unicode strings [

prog = re.compile[pattern]
result = prog.match[string]
2] as well as 8-bit strings [
prog = re.compile[pattern]
result = prog.match[string]
3]. However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

Regular expressions use the backslash character [

prog = re.compile[pattern]
result = prog.match[string]
4] to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write
prog = re.compile[pattern]
result = prog.match[string]
5 as the pattern string, because the regular expression must be
prog = re.compile[pattern]
result = prog.match[string]
6, and each backslash must be expressed as
prog = re.compile[pattern]
result = prog.match[string]
6 inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a
prog = re.compile[pattern]
result = prog.match[string]
8 and in the future this will become a
prog = re.compile[pattern]
result = prog.match[string]
9. This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with

result = re.match[pattern, string]
0. So
result = re.match[pattern, string]
1 is a two-character string containing
prog = re.compile[pattern]
result = prog.match[string]
4 and
result = re.match[pattern, string]
3, while
result = re.match[pattern, string]
4 is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

It is important to note that most regular expression operations are available as module-level functions and methods on compiled regular expressions. The functions are shortcuts that don’t require you to compile a regex object first, but miss some fine-tuning parameters.

See also

The third-party regex module, which has an API compatible with the standard library

prog = re.compile[pattern]
result = prog.match[string]
1 module, but offers additional functionality and a more thorough Unicode support.

Regular Expression Syntax¶

A regular expression [or RE] specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression [or if a given regular expression matches a particular string, which comes down to the same thing].

Regular expressions can be concatenated to form new regular expressions; if A and B are both regular expressions, then AB is also a regular expression. In general, if a string p matches A and another string q matches B, the string pq will match AB. This holds unless A or B contain low precedence operations; boundary conditions between A and B; or have numbered group references. Thus, complex expressions can easily be constructed from simpler primitive expressions like the ones described here. For details of the theory and implementation of regular expressions, consult the Friedl book [Frie09], or almost any textbook about compiler construction.

A brief explanation of the format of regular expressions follows. For further information and a gentler presentation, consult the Regular Expression HOWTO.

Regular expressions can contain both special and ordinary characters. Most ordinary characters, like

result = re.match[pattern, string]
6,
result = re.match[pattern, string]
7, or
result = re.match[pattern, string]
8, are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so
result = re.match[pattern, string]
9 matches the string
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
0. [In the rest of this section, we’ll write RE’s in
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
1, usually without quotes, and strings to be matched
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
2.]

Some characters, like

>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
3 or
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
4, are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.

Repetition operators or quantifiers [

>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
5,
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
6,
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
7,
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
8, etc] cannot be directly nested. This avoids ambiguity with the non-greedy modifier suffix
>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
7, and with other modifiers in other implementations. To apply a second repetition to an inner repetition, parentheses may be used. For example, the expression
>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
0 matches any multiple of six
result = re.match[pattern, string]
7 characters.

The special characters are:

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
2

[Dot.] In the default mode, this matches any character except a newline. If the

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
3 flag has been specified, this matches any character including a newline.

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
4

[Caret.] Matches the start of the string, and in

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
5 mode also matches immediately after each newline.

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
6

Matches the end of the string or just before the newline at the end of the string, and in

>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
5 mode also matches before a newline.
>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
8 matches both ‘foo’ and ‘foobar’, while the regular expression
>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
9 matches only ‘foo’. More interestingly, searching for
>>> re.split[r'\b', 'Words, words, words.']
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split[r'\W*', '...words...']
['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split[r'[\W*]', '...words...']
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
0 in
>>> re.split[r'\b', 'Words, words, words.']
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split[r'\W*', '...words...']
['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split[r'[\W*]', '...words...']
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
1 matches ‘foo2’ normally, but ‘foo1’ in
>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
5 mode; searching for a single
>>> re.split[r'[\W+]', '...words, words...']
['', '...', 'words', ', ', 'words', '...', '']
6 in
>>> re.split[r'\b', 'Words, words, words.']
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split[r'\W*', '...words...']
['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split[r'[\W*]', '...words...']
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
4 will find two [empty] matches: one just before the newline, and one at the end of the string.

>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
5

Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.

>>> re.split[r'\b', 'Words, words, words.']
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split[r'\W*', '...words...']
['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split[r'[\W*]', '...words...']
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
6 will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s.

>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
6

Causes the resulting RE to match 1 or more repetitions of the preceding RE.

>>> re.split[r'\b', 'Words, words, words.']
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split[r'\W*', '...words...']
['', '', 'w', 'o', 'r', 'd', 's', '', '']
>>> re.split[r'[\W*]', '...words...']
['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']
8 will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’.

>>> re.split[r'\W+', 'Words, words, words.']
['Words', 'words', 'words', '']
>>> re.split[r'[\W+]', 'Words, words, words.']
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split[r'\W+', 'Words, words, words.', 1]
['Words', 'words, words.']
>>> re.split['[a-f]+', '0a3B9', flags=re.IGNORECASE]
['0', '3', '9']
7

Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.

>>> m = re.search[r'[?

Chủ Đề