got the following code
import pandas as pd
import numpy as np
df1=pd.read_excel['FA9th june.xlsx']
df1.head[]
Days Amount Repayments Balance
40.0 19 500.00 15 000.00 4 500.00
40.0 19 500.00 0 19 500.00
40.0 9 750.00 2 670.00 7 080.00
40.0 32 500.00 11 500.00 21 000.00
40.0 3 250.00 580 2 670.00
I want my data to be without them spaces in between number and without the decimal places, looking like this:
Days Amount Repayments Balance
40 19500 15000 4500
40 19500 0 19500
40 9750 2670 7080
40 32500 11500 21000
40 3250 580 2670
I tried converting it to int but it kept returning this error:
invalid literal for int[] with base 10: '19 500.00'
whenever I run this code:
df1['Amount'] = pd.to_numeric[X['Amount'], errors='ignore'].astype[int]
Zephyr
11.1k34 gold badges42 silver badges71 bronze badges
asked Jun 9, 2020 at 19:14
1
You can also do this:
df = df.replace[' |\.[0-9]*', '', regex=True].astype['int32']
or
df = df.replace[' |\.\d*', '', regex=True].astype[int]
df
Days Amount Repayments Balance
0 40 19500 15000 4500
1 40 19500 0 19500
2 40 9750 2670 7080
3 40 32500 11500 21000
4 40 3250 580 2670
Zephyr
11.1k34 gold badges42 silver badges71 bronze badges
answered Jun 9, 2020 at 19:25
NYC CoderNYC Coder
6,7792 gold badges10 silver badges22 bronze badges
To convert numbers in locales where the group [thousands] separator is a space character [1 234 456
] and the decimal point/separator is a .
[123.456
], you can use a regular expression to capture the number:
\d{1,3}[ \d{3}]*[\.\d+]?
which is to say: match 1-3 decimal digits, followed by zero or more groups consisting of a single space, followed by 3 decimal digits, with the whole followed by an optional group consisting of a single '.' followed by 1 or more decimal digits.
Once you have that, a simple replace will get rid of the group separators [' '] and the fractional part. You'll want to specify the global
flag on the regular expression so that it will match all occurrences.
answered Jun 9, 2020 at 20:09
Nicholas CareyNicholas Carey
67.4k13 gold badges92 silver badges133 bronze badges
You need these conversions:
df1['Days'] = df1['Days'].astype[int]
df1['Amount'] = df1['Amount'].map[lambda x: x.replace[' ','']].astype[float].astype[int]
df1['Repayments'] = df1['Repayments'].astype[str].map[lambda x: x.replace[' ','']].astype[float].astype[int]
df1['Balance'] = df1['Balance'].map[lambda x: x.replace[' ','']].astype[float].astype[int]
which give:
Days Amount Repayments Balance
0 40 19500 15000 4500
1 40 19500 0 19500
2 40 9750 2670 7080
3 40 32500 11500 21000
4 40 3250 580 2670
The Days
column is easy: just convert it to int
.
For the other columns you need to convert them in str
, if necessary as Repayments
, then apply the .replace[' ','']
method to remove whitespaces, then convert them in float
and finally in int
.
The direct conversion from str
to int
is not always possible, in most cases you need first go through float
type.
answered Jun 9, 2020 at 19:32
ZephyrZephyr
11.1k34 gold badges42 silver badges71 bronze badges
2