Hướng dẫn regex python github

Python biểu cảm thường xuyên

Biểu thức chính quy là một ngôn ngữ mạnh mẽ để phù hợp với các mẫu văn bản. Trang này đưa ra một giới thiệu cơ bản về các biểu thức thường xuyên đủ cho các bài tập Python của chúng tôi và cho thấy cách các biểu thức thường xuyên hoạt động trong Python. Mô -đun Python "Re" cung cấp hỗ trợ biểu thức chính quy.

Nội dung chính Show

Python biểu cảm thường xuyên
Các mẫu cơ bản
Ví dụ cơ bản
Sự lặp lại
Bên trái và lớn nhất
Ví dụ lặp lại
Ví dụ về email
Dấu ngoặc vuông
Khai thác nhóm
Findall với các tập tin
Findall và các nhóm
Quy trình làm việc và gỡ lỗi
Tùy chọn
Tham lam so với không tham lam (tùy chọn)
Thay thế (tùy chọn)
Tập thể dục
Cũng thấy

Trong Python, một tìm kiếm biểu thức thông thường thường được viết là:

import re
match = re.search(pat, str)

Phương thức RE.Search () có một mẫu biểu thức chính quy và một chuỗi và tìm kiếm cho mẫu đó trong chuỗi. Nếu tìm kiếm thành công, search () trả về một đối tượng khớp hoặc không có đối tượng khác. Do đó, tìm kiếm thường ngay lập tức theo sau bởi một câu chuyện if để kiểm tra xem tìm kiếm có thành công hay không, như được hiển thị trong ví dụ sau đây tìm kiếm mẫu 'từ:' theo sau là từ 3 chữ cái (chi tiết bên dưới):

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

Mã

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

2 lưu trữ kết quả tìm kiếm trong một biến có tên

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

3. Sau đó, if-statement kiểm tra sự phù hợp-nếu đúng, tìm kiếm đã thành công và

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

4 là văn bản phù hợp (ví dụ: 'từ: cat'). Mặt khác, nếu trận đấu là sai (không có gì cụ thể hơn), thì tìm kiếm không thành công và không có văn bản phù hợp.

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

5 Khi bắt đầu chuỗi mẫu chỉ định chuỗi "RAW" Python đi qua các dấu gạch chéo ngược mà không thay đổi rất tiện dụng cho các biểu thức thông thường. Tôi khuyên bạn nên luôn luôn viết các chuỗi mẫu với

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

5 giống như một thói quen.

Các mẫu cơ bản

Sức mạnh của các biểu thức chính quy là chúng có thể chỉ định các mẫu, không chỉ các ký tự cố định. Dưới đây là các mẫu cơ bản nhất phù hợp với ký tự đơn lẻ:

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

0 - Các ký tự thông thường chỉ phù hợp chính xác. Các nhân vật tổng hợp không phù hợp với chính họ vì chúng có ý nghĩa đặc biệt là:

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

1 (chi tiết bên dưới)

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

2 (một khoảng thời gian) - phù hợp với bất kỳ ký tự nào ngoại trừ Newline '\ n'

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

3 - (chữ thường W) khớp với ký tự "từ": chữ cái hoặc chữ số hoặc dấu gạch dưới

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

4. Lưu ý rằng mặc dù "từ" là ghi nhớ cho điều này, nhưng nó chỉ phù hợp với một từ char, không phải là một từ toàn bộ.

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

5 (trường hợp trên W) phù hợp với bất kỳ ký tự không từ nào.

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

6-ranh giới giữa từ và không từ

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

7 - (chữ thường S) khớp với một ký tự khoảng trắng duy nhất - không gian, dòng newline, return, tab, biểu mẫu

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

9 (trường hợp trên S) phù hợp với bất kỳ ký tự không phải màu nào.

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

2 - tab, newline, return

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

3 - chữ số thập phân

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

4 (một số tiện ích regex cũ không hỗ trợ mà là

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

3, nhưng tất cả chúng đều hỗ trợ

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

3 và

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

8 = Bắt đầu,

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

9 = Kết thúc - khớp với phần bắt đầu hoặc kết thúc của chuỗi

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

0 - ức chế "tính đặc biệt" của một nhân vật. Vì vậy, ví dụ, sử dụng

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

1 để phù hợp với một khoảng thời gian hoặc

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

2 để khớp với một dấu gạch chéo. Nếu bạn không chắc chắn nếu một nhân vật có ý nghĩa đặc biệt, chẳng hạn như

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

3, bạn có thể đặt một dấu gạch chéo trước nó,

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

4, để đảm bảo rằng nó được đối xử như một nhân vật.

Ví dụ cơ bản

Trò đùa: Bạn gọi một con lợn với ba mắt là gì? Piiig!

Các quy tắc cơ bản của tìm kiếm biểu thức chính quy cho một mẫu trong một chuỗi là:

Tìm kiếm tiến hành qua chuỗi từ đầu đến cuối, dừng ở trận đấu đầu tiên được tìm thấy
Tất cả các mẫu phải được khớp, nhưng không phải tất cả các chuỗi

Nếu

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

2 thành công, khớp không phải là

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

6 và đặc biệt

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

4 là văn bản phù hợp

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

Sự lặp lại

Mọi thứ trở nên thú vị hơn khi bạn sử dụng

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

8 và

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

9 để chỉ định sự lặp lại trong mẫu

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

8 - 1 hoặc nhiều sự xuất hiện của mẫu bên trái của nó, ví dụ: 'i+' = một hoặc nhiều tôi

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

9 - 0 hoặc nhiều sự xuất hiện của mẫu bên trái của nó

import re
str = 'purple  monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())

2 - khớp 0 hoặc 1 lần xuất hiện của mẫu bên trái của nó

Bên trái và lớn nhất

Đầu tiên, tìm kiếm tìm thấy sự phù hợp ngoài cùng bên trái cho mẫu và thứ hai nó cố gắng sử dụng càng nhiều chuỗi càng tốt - tức là

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

8 và

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

9 đi càng xa càng tốt (

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

8 và

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

9 được cho là "tham lam").

Ví dụ lặp lại

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

Ví dụ về email

Giả sử bạn muốn tìm địa chỉ email bên trong chuỗi 'Máy rửa chén khỉ màu tím'. Chúng tôi sẽ sử dụng điều này như một ví dụ đang chạy để thể hiện các tính năng biểu thức chính quy hơn. Đây là một nỗ lực bằng cách sử dụng mẫu r '\ w+@\ w+':

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

Tìm kiếm không nhận được toàn bộ địa chỉ email trong trường hợp này vì

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

3 không khớp với

import re
str = 'purple  monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())

8 hoặc

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

2 trong địa chỉ. Chúng tôi sẽ khắc phục điều này bằng cách sử dụng các tính năng biểu thức thông thường bên dưới.

Dấu ngoặc vuông

Dấu ngoặc vuông có thể được sử dụng để chỉ ra một tập hợp các ký tự, do đó

str = 'purple  monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
    print(match.group())   ## '' (the whole match)
    print(match.group(1))  ## 'alice-b' (the username, group 1)
    print(match.group(2))  ## 'google.com' (the host, group 2)

0 khớp với 'A' hoặc 'B' hoặc 'C'. Các mã

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

7, v.v ... làm việc bên trong khung vuông với một ngoại lệ mà dấu chấm (

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

2) chỉ có nghĩa là một dấu chấm theo nghĩa đen. Đối với vấn đề email, dấu ngoặc vuông là một cách dễ dàng để thêm

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

2 và

import re
str = 'purple  monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())

8 vào tập hợp các ký tự có thể xuất hiện xung quanh

import re
str = 'purple  monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print(match.group())

3 với mẫu

str = 'purple  monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
    print(match.group())   ## '' (the whole match)
    print(match.group(1))  ## 'alice-b' (the username, group 1)
    print(match.group(2))  ## 'google.com' (the host, group 2)

7 để lấy toàn bộ địa chỉ email:

import re
str = 'purple  monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())

. Để sử dụng dấu gạch ngang mà không chỉ ra một phạm vi, hãy đặt dấu gạch ngang cuối cùng, ví dụ:

str = 'purple  monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
    print(match.group())   ## '' (the whole match)
    print(match.group(1))  ## 'alice-b' (the username, group 1)
    print(match.group(2))  ## 'google.com' (the host, group 2)

9. Một chiếc Caret (UP-Hat) (

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

8) khi bắt đầu một khung vuông đặt đảo ngược nó, do đó

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

1 có nghĩa là bất kỳ ký tự nào ngoại trừ 'A' hoặc 'B'.

Khai thác nhóm

Tính năng "Nhóm" của biểu thức chính quy cho phép bạn chọn ra các phần của văn bản phù hợp. Giả sử đối với vấn đề email mà chúng tôi muốn trích xuất tên người dùng và máy chủ riêng biệt. Để làm điều này, hãy thêm dấu ngoặc đơn

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

2 xung quanh tên người dùng và máy chủ trong mẫu, như thế này:

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

3. Trong trường hợp này, dấu ngoặc đơn không thay đổi những gì mẫu sẽ khớp, thay vào đó chúng thiết lập "nhóm" logic bên trong văn bản khớp. Trên một tìm kiếm thành công,

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

4 là văn bản phù hợp tương ứng với dấu ngoặc đơn 1 bên trái và

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

5 là văn bản tương ứng với dấu ngoặc đơn 2 bên trái. Đồng bằng

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

4 vẫn là toàn bộ văn bản trận đấu như bình thường.

str = 'purple  monkey dishwasher'
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
    print(match.group())   ## '' (the whole match)
    print(match.group(1))  ## 'alice-b' (the username, group 1)
    print(match.group(2))  ## 'google.com' (the host, group 2)

Một quy trình công việc phổ biến với các biểu thức thông thường là bạn viết một mẫu cho thứ bạn đang tìm kiếm, thêm các nhóm dấu ngoặc đơn để trích xuất các phần bạn muốn.

Findall

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7 có lẽ là chức năng mạnh nhất trong mô -đun

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

8. Ở trên, chúng tôi đã sử dụng

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

9 để tìm trận đấu đầu tiên cho một mẫu.

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7 tìm thấy tất cả các trận đấu và trả về chúng như một danh sách các chuỗi, với mỗi chuỗi đại diện cho một trận đấu.

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

Findall với các tập tin

Đối với các tệp, bạn có thể có thói quen viết một vòng lặp để lặp lại các dòng của tệp và sau đó bạn có thể gọi findall () trên mỗi dòng. Thay vào đó, hãy để findall () làm việc lặp lại cho bạn - tốt hơn nhiều! Chỉ cần cung cấp toàn bộ văn bản tệp vào findall () và để nó trả về danh sách tất cả các trận đấu trong một bước duy nhất (nhớ lại rằng

# Open file
with open('foo.txt', 'r') as f:
  text = f.read()

strings = re.findall(r'another', text)
print(strings)

1 trả về toàn bộ văn bản của một tệp trong một chuỗi):

# Open file
with open('foo.txt', 'r') as f:
  text = f.read()

strings = re.findall(r'another', text)
print(strings)

Findall và các nhóm

Cơ chế nhóm

# Open file
with open('foo.txt', 'r') as f:
  text = f.read()

strings = re.findall(r'another', text)
print(strings)

2 có thể được kết hợp với

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7. Nếu mẫu bao gồm 2 nhóm dấu ngoặc đơn trở lên, thì thay vì trả về danh sách các chuỗi,

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7 trả về danh sách *bộ dữ liệu *. Mỗi tuple đại diện cho một trận đấu của mẫu và bên trong bộ tu là nhóm (1), nhóm (2) .. dữ liệu. Vì vậy, nếu 2 nhóm dấu ngoặc đơn được thêm vào mẫu email, thì findall () trả về một danh sách các bộ dữ liệu, mỗi độ dài 2 chứa tên người dùng và máy chủ, ví dụ: ('Alice', 'Google.com').

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

Khi bạn có danh sách các bộ dữ liệu, bạn có thể lặp qua nó để thực hiện một số tính toán cho mỗi bộ. Nếu mẫu không bao gồm dấu ngoặc đơn, thì

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7 sẽ trả về một danh sách các chuỗi được tìm thấy như trong các ví dụ trước đó. Nếu mẫu bao gồm một bộ dấu ngoặc đơn, thì

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7 sẽ trả về một danh sách các chuỗi tương ứng với nhóm duy nhất đó. . kết quả.)

Quy trình làm việc và gỡ lỗi

Các mẫu biểu thức thông thường đóng gói rất nhiều ý nghĩa vào một vài ký tự, nhưng chúng rất dày đặc, bạn có thể dành nhiều thời gian để gỡ lỗi các mẫu của mình. Thiết lập thời gian chạy của bạn để bạn có thể chạy một mẫu và in những gì nó phù hợp dễ dàng, ví dụ bằng cách chạy nó trên một văn bản thử nghiệm nhỏ và in kết quả của

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7. Nếu mẫu không phù hợp, hãy thử làm suy yếu mẫu, loại bỏ các phần của nó để bạn có quá nhiều trận đấu. Khi nó không phù hợp, bạn không thể đạt được bất kỳ tiến bộ nào vì không có gì cụ thể để nhìn vào. Một khi nó phù hợp quá nhiều, thì bạn có thể làm việc để thắt chặt nó tăng dần để đánh vào những gì bạn muốn.

Tùy chọn

Các chức năng

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

8 có các tùy chọn để sửa đổi hành vi của khớp mẫu. Cờ tùy chọn được thêm vào như một đối số bổ sung cho

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

2 hoặc

## Suppose we have a text with many email addresses
str = 'purple , blah monkey  blah dishwasher'

## Here re.findall() returns a list of all the found email strings
emails = re.findall(r'[\w\.-]+@[\w\.-]+', str) ## ['', '']
for email in emails:
    # do something with each found email string
    print(email)

7, v.v., ví dụ:

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

5 - Bỏ qua sự khác biệt trên/thường xuyên để khớp, vì vậy 'A' phù hợp với cả 'A' và 'A'.

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

6 - Cho phép DOT (

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

2) khớp với Newline - thông thường nó phù hợp với bất cứ điều gì ngoại trừ Newline. Điều này có thể vấp phải bạn - bạn nghĩ

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

8 phù hợp với mọi thứ, nhưng theo mặc định, nó không đi qua cuối một dòng. Lưu ý rằng

import re

## Search for pattern 'iii' in string 'piiig'.
## All of the pattern must match, but it may appear anywhere.
## On success, match.group() is matched text.
match = re.search(r'iii', 'piiig')
match.group()

match = re.search(r'igs', 'piiig')
print(match)

## . = any char but \n
match = re.search(r'..g', 'piiig')
match

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g')
match.group()

match = re.search(r'\w\w\w', '@@abcd!!')
match.group()

7 (Whitespace) bao gồm Newlines, vì vậy nếu bạn muốn khớp với một khoảng trắng có thể bao gồm một dòng mới, bạn chỉ có thể sử dụng

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

01 - Trong một chuỗi được tạo thành từ nhiều dòng, cho phép

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

8 và

import re

## i+ = one or more i's, as many as possible.
match = re.search(r'pi+', 'piiig')

## Finds the first/leftmost solution, and within it drives the +
## as far as possible (aka 'leftmost and largest').
## In this example, note that it does not get to the second set of i's.
match = re.search(r'i+', 'piigiiii')

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx')
match = re.search(r'\d\s*\d\s*\d', 'xx123xx')

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar')
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar')

9 để khớp với phần bắt đầu và kết thúc của mỗi dòng. Thông thường

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

04 sẽ chỉ khớp với phần bắt đầu và kết thúc của toàn bộ chuỗi.

Tham lam so với không tham lam (tùy chọn)

Đây là phần tùy chọn cho thấy một kỹ thuật biểu hiện thông thường tiên tiến hơn không cần thiết cho các bài tập.

Giả sử bạn có văn bản với các thẻ trong đó:

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

Giả sử bạn đang cố gắng khớp với từng thẻ với mẫu

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

05 - nó phù hợp với điều gì đầu tiên?

Kết quả là một chút đáng ngạc nhiên, nhưng khía cạnh tham lam của

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

8 khiến nó phù hợp với toàn bộ

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

07 và

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

08 như một trận đấu lớn. Vấn đề là

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

8 đi xa đến mức có thể, thay vì dừng lại ở lần đầu tiên

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

10 (hay còn gọi là "tham lam").

Có một phần mở rộng cho biểu thức chính quy trong đó bạn thêm

import re
str = 'purple  monkey dishwasher'
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
    print(match.group())

2 vào cuối, chẳng hạn như

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

12 hoặc

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

13, thay đổi chúng thành không màu xanh lá cây. Bây giờ họ dừng lại ngay khi họ có thể. Vì vậy, mẫu

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

14 sẽ chỉ nhận được

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

15 là trận đấu đầu tiên và

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

16 là trận đấu thứ hai, và lần lượt nhận được mỗi cặp

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

17. Phong cách thường là bạn sử dụng

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

12, và ngay sau đó, cái nhìn đúng của nó cho một số điểm cụ thể (

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

10 trong trường hợp này) buộc phải kết thúc cuộc chạy

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

12.

Phần mở rộng

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

21 có nguồn gốc từ Perl và các biểu thức chính quy bao gồm các phần mở rộng của Perl được gọi là các biểu thức thông thường tương thích của Perl - PCRE. Python bao gồm hỗ trợ PCRE. Nhiều dòng lệnh sử dụng, vv có một lá cờ trong đó họ chấp nhận các mẫu PCRE.

Một kỹ thuật cũ hơn nhưng được sử dụng rộng rãi để mã hóa ý tưởng này về "tất cả các ký tự này ngoại trừ dừng tại X" sử dụng kiểu khung vuông. Đối với những điều trên, bạn có thể viết mẫu, nhưng thay vì

str = 'purple , blah monkey  blah dishwasher'

tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
print(tuples)

for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

8 để có được tất cả các ký tự dấu ngoặc).

Không có gì trong phần trước nên được coi là có nghĩa là bạn có thể phân tích HTML với các biểu thức thông thường, bởi vì bạn không thể.

Thay thế (tùy chọn)

Hàm

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

26 tìm kiếm tất cả các phiên bản của mẫu trong chuỗi đã cho và thay thế chúng. Chuỗi thay thế có thể bao gồm

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

27,

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

28 đề cập đến văn bản từ

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

29,

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

30, v.v. từ văn bản phù hợp ban đầu.

Dưới đây là một ví dụ tìm kiếm tất cả các địa chỉ email và thay đổi chúng để giữ người dùng (

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

31) nhưng có yo-yo-dyne.com làm máy chủ.

import re
str = 'an example word:cat!!'
match = re.search(r'word:\w\w\w', str)
if match:
    print('found', match.group())
else:
    print('did not find')

Tập thể dục

Để thực hành các biểu thức thường xuyên, hãy xem bài tập tên em bé.

Cũng thấy

Python Biên tập viên biểu thức chính quy.

Trừ khi có ghi chú khác, nội dung của trang này được cấp phép theo giấy phép Creative Commons Attribution 3.0 và các mẫu mã được cấp phép theo giấy phép Apache 2.0.

Hướng dẫn regex python github - regex python github

Python biểu cảm thường xuyên

Các mẫu cơ bản

Ví dụ cơ bản

Sự lặp lại

Bên trái và lớn nhất

Ví dụ lặp lại

Ví dụ về email

Dấu ngoặc vuông

Khai thác nhóm

Findall

Findall với các tập tin

Findall và các nhóm

Quy trình làm việc và gỡ lỗi

Tùy chọn

Tham lam so với không tham lam (tùy chọn)

Thay thế (tùy chọn)

Tập thể dục

Cũng thấy

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội