How do I extract a single character from a string?

Question

This article describes how to extract a substring from a string in Python. You can extract a substring by specifying the position and number of characters, or with regular expression patterns.

Nội dung chính Show

Extract a substring by specifying the position and number of characters
Regular expression pattern examples
Wildcard-like patterns
Greedy and non-greedy
Match any single character
Match the start/end of the string
Case-insensitive
How do I extract one character from a string?
How do you extract a single character from a string in Excel?
Which method is used to extract a single character from a string object?
How do I extract a single character from a string in Python?

Extract a substring by specifying the position and number of characters
- Extract a character by index
- Extract a substring by slicing
- Extract based on the number of characters
Extract a substring with regular expressions: re.search(), re.findall()
Regular expression pattern examples
- Wildcard-like patterns
- Greedy and non-greedy
- Extract part of the pattern with parentheses
- Match any single character
- Match the start/end of the string
- Extract by multiple patterns
- Case-insensitive

If you want to replace a substring with another string, see the following article.

Replace strings in Python (replace, translate, re.sub, re.subn)

Extract a substring by specifying the position and number of characters

You can get a character at the desired position by specifying an index in []. Indexes begin with 0 (zero-based indexing).

s = 'abcde'

print(s[0])
# a

print(s[4])
# e

You can specify a backward position with negative values. -1 represents the last character.

print(s[-1])
# e

print(s[-5])
# a

An error is raised if the non-existent index is specified.

# print(s[5])
# IndexError: string index out of range

# print(s[-6])
# IndexError: string index out of range

You can extract a substring in the range start <= x < stop with [start:step]. If start is omitted, the range is from the beginning, and if end is omitted, the range is to the end.

s = 'abcde'

print(s[1:3])
# bc

print(s[:3])
# abc

print(s[1:])
# bcde

You can also use negative values.

print(s[-4:-2])
# bc

print(s[:-2])
# abc

print(s[-4:])
# bcde

If start > end, no error is raised and an empty character '' is extracted.

print(s[3:1])
# 

print(s[3:1] == '')
# True

Out of range is ignored.

print(s[-100:100])
# abcde

In addition to the start position start and end position stop, you can specify an increment step like [start:stop:step]. If step is negative, it is extracted from the back.

print(s[1:4:2])
# bd

print(s[::2])
# ace

print(s[::3])
# ad

print(s[::-1])
# edcba

print(s[::-2])
# eca

For more information on slicing, see the following article.

How to slice a list, string, tuple in Python

The built-in function len() returns the number of characters. For example, you can use this to get the central character or extract the first or second half of the string with slicing.

Note that you can specify only integer int values for index [] and slice [:]. Division by / raises an error because the result is a floating-point number float.

The following example uses integer division //. The decimal point is truncated.

s = 'abcdefghi'

print(len(s))
# 9

# print(s[len(s) / 2])
# TypeError: string indices must be integers

print(s[len(s) // 2])
# e

print(s[:len(s) // 2])
# abcd

print(s[len(s) // 2:])
# efghi

You can use regular expressions with the re module of the standard library.

re — Regular expression operations — Python 3.10.4 documentation

Use re.search() to extract a substring matching a regular expression pattern. Specify the regular expression pattern as the first parameter and the target string as the second parameter.

import re

s = '012-3456-7890'

print(re.search(r'\d+', s))
#

\d matches a digit character, and + matches one or more repetitions of the preceding pattern. Thus, \d+ matches one or more consecutive digits.

Since backslash \ is used in regular expression special sequences such as \d, it is convenient to use a raw string by adding r before '' or "".

Raw strings in Python

When a string matches the pattern, re.search() returns a match object. You can get the matched part as a string str by the group() method of the match object.

m = re.search(r'\d+', s)

print(m.group())
# 012

print(type(m.group()))
#

As in the example above, re.search() returns only the match object of the first part, even if there are multiple matching parts.

re.findall() returns all matching parts as a list of strings.

print(re.findall(r'\d+', s))
# ['012', '3456', '7890']

Regular expression pattern examples

This section presents some examples of regular expression patterns with meta characters/special sequences.

Wildcard-like patterns

. matches any single character except a newline, and * matches zero or more repetitions of the preceding pattern.

For example, a.*b matches the string starting with a and ending with b. Since * matches zero repetitions, it also matches ab.

print(re.findall('a.*b', 'axyzb'))
# ['axyzb']

print(re.findall('a.*b', 'a---b'))
# ['a---b']

print(re.findall('a.*b', 'aあいうえおb'))
# ['aあいうえおb']

print(re.findall('a.*b', 'ab'))
# ['ab']

+ matches one or more repetitions of the preceding pattern. a.+b does not match ab.

print(re.findall('a.+b', 'ab'))
# []

print(re.findall('a.+b', 'axb'))
# ['axb']

print(re.findall('a.+b', 'axxxxxxb'))
# ['axxxxxxb']

? matches zero or one preceding pattern. In the case of a.?b, it matches ab and the string with only one character between a and b.

print(re.findall('a.?b', 'ab'))
# ['ab']

print(re.findall('a.?b', 'axb'))
# ['axb']

print(re.findall('a.?b', 'axxb'))
# []

Greedy and non-greedy

*, +, and ? are all greedy matches, matching as much text as possible. *?, +?, and ?? are non-greedy, minimal matches, matching as few characters as possible.

s = 'axb-axxxxxxb'

print(re.findall('a.*b', s))
# ['axb-axxxxxxb']

print(re.findall('a.*?b', s))
# ['axb', 'axxxxxxb']

If you enclose part of a regular expression pattern in parentheses (), you can extract a substring in that part.

print(re.findall('a(.*)b', 'axyzb'))
# ['xyz']

If you want to match parentheses () as characters, escape them with backslash \.

print(re.findall(r'\(.+\)', 'abc(def)ghi'))
# ['(def)']

print(re.findall(r'\((.+)\)', 'abc(def)ghi'))
# ['def']

Match any single character

Enclosing a string with [] matches any one of the characters in the string.

If you connect consecutive Unicode code points with -, such as [a-z], all characters between them are covered. For example, [a-z] matches any one character of the lowercase alphabet.

print(re.findall('[abc]x', 'ax-bx-cx'))
# ['ax', 'bx', 'cx']

print(re.findall('[abc]+', 'abc-aaa-cba'))
# ['abc', 'aaa', 'cba']

print(re.findall('[a-z]+', 'abc-xyz'))
# ['abc', 'xyz']

Match the start/end of the string

^ matches the start of the string, and $ matches the end of the string.

s = 'abc-def-ghi'

print(re.findall('[a-z]+', s))
# ['abc', 'def', 'ghi']

print(re.findall('^[a-z]+', s))
# ['abc']

print(re.findall('[a-z]+$', s))
# ['ghi']

Use | to extract a substring that matches one of the multiple patterns. For example, for regular expression patterns A and B, you can write A|B.

s = 'axxxb-012'

print(re.findall('a.*b', s))
# ['axxxb']

print(re.findall(r'\d+', s))
# ['012']

print(re.findall(r'a.*b|\d+', s))
# ['axxxb', '012']

Case-insensitive

The re module is case-sensitive by default. Set the flags argument to re.IGNORECASE to perform case-insensitive.

s = 'abc-Abc-ABC'

print(re.findall('[a-z]+', s))
# ['abc', 'bc']

print(re.findall('[A-Z]+', s))
# ['A', 'ABC']

print(re.findall('[a-z]+', s, flags=re.IGNORECASE))
# ['abc', 'Abc', 'ABC']

How do I extract one character from a string?

Using String..

Get the string and the index..

Convert the String into Character array using String. toCharArray() method..

Get the specific character at the specific index of the character array..

Return the specific character..

How do you extract a single character from a string in Excel?

Depending on where you want to start extraction, use one of these formulas: LEFT function - to extract a substring from the left. RIGHT function - to extract text from the right. MID function - to extract a substring from the middle of a text string, starting at the point you specify.

Which method is used to extract a single character from a string object?

charAt() is the method of class String is used to extract a single character from a String Object. Hence, option(c) is the correct answer.