Hướng dẫn python read byte array

There is a file with following contents:

b'prefix:input_text'
b'oEffect:PersonX \xd8\xaf\xd8\xb1 \xd8\xac\xd9\x86\xda\xaf ___ \xd8\xa8\xd8\xa7\xd8\xb2\xdb\x8c \xd9\x85\xdb\x8c \xda\xa9\xd9\x86\xd8\xaf'
b'oEffect:PersonX \xd8\xaf\xd8\xb1 \xd8\xac\xd9\x86\xda\xaf ___ \xd8\xa8\xd8\xa7\xd8\xb2\xdb\x8c \xd9\x85\xdb\x8c \xda\xa9\xd9\x86\xd8\xaf'

This is my try to read the lines and convert them to readable utf characters, but still it shows the same strings in the output file:

f = open[input_file, "rb"]
for x in f:
  inpcol.append[x.decode['utf-8']]

f = open[pred_file, "r"]
for x in f:
  predcol.append[x]

f = open[target_file, "r"]
for x in f:
  targcol.append[x]
data =[]
for i in tqdm[range[len[targcol]]]:
  data.append[[inpcol[i],targcol[i],predcol[i]]]

pd.DataFrame[data,columns=["input_text","target_text","pred_text"]].to_csv[f"{path}/merge_{predfile}.csv", encoding="utf-8"]
print["Done!"]

The output file is:

,input_text,target_text,pred_text
0,"b'prefix:input_text'
","target_text
","ﺏﺭﺎﯾ ﺩﺮﮐ ﻮﻀﻌﯿﺗ
"
1,"b'xNeed:PersonX \xd8\xaf\xd8\xb1 \xd8\xac\xd9\x86\xda\xaf ___ \xd8\xa8\xd8\xa7\xd8\xb2\xdb\x8c \xd9\x85\xdb\x8c \xda\xa9\xd9\x86\xd8\xaf'
","ﺞﻨﮕﯾﺪﻧ
","ﺏﺭﺎﯾ ﭗﯾﺩﺍ ﮎﺭﺪﻧ ﯽﮐ ﺖﯿﻣ
"

As you see, the problem exists for input line but not for target and prediction lines [however scrambled but that's okay]

bytes[] trong Python trả về các đối tượng byte là một chuỗi các số nguyên, không thể thay đổi, được khởi tạo với size và dữ liệu cho trước, trong phạm vi 0

Chủ Đề