Convert windows 1252 to utf 8 php

We have several database fields that contain Windows-1252 characters:

an example pain— if you’re

Those values map to the desired values from this list:

//www.i18nqa.com/debug/utf8-debug.html

I've tried various permutations of htmlentites, mb_detect_encoding, uft8_decode, etc, but have not yet been able to transform those values to:

an example pain — if you're

How can I transform these characters to their listed values in php?

asked Feb 5, 2016 at 21:49

You can use mb_convert_encoding

$str = "an example pain— if you’re";
$str = mb_convert_encoding[$str, "Windows-1252", "UTF-8"];
echo $str;
//an example pain— if you’re

DEMO:
//ideone.com/NsIb5x

answered Feb 5, 2016 at 22:03

Pedro LobitoPedro Lobito

88.3k29 gold badges238 silver badges256 bronze badges

0

Oh my god, this took too long to solve so I want to post my answer here since this link kept coming up in searches. My MySQL DB table has encoding with utf8mb4_unicode_520_ci and a column has those annoying work curly quotes. I was trying to read the DB value and encode with json_encode but it would fail and json_encode would return blank so I used utf8_encode. That improperly converted the character. I had to use mb_convert_encoding to go from Windows-1252 to UTF-8 but then the json_encode messed that up too. In the end, this worked:

$file = urlencode[mb_convert_encoding [$stringwithcurlyquotes, "UTF-8", 'Windows-1252']];

Since I was having the issue with an image URL, this worked perfectly and didn't require me to decode it on the other side.

Dave

4,86216 gold badges31 silver badges38 bronze badges

answered Apr 9, 2019 at 16:03

[PHP 4 >= 4.0.5, PHP 5, PHP 7, PHP 8]

iconvConvert a string from one character encoding to another

Description

iconv[string $from_encoding, string $to_encoding, string $string]: string|false

Parameters

from_encoding

The current encoding used to interpret string.

to_encoding

The desired encoding of the result.

If the string //TRANSLIT is appended to to_encoding, then transliteration is activated. This means that when a character can't be represented in the target charset, it may be approximated through one or several similarly looking characters. If the string //IGNORE is appended, characters that cannot be represented in the target charset are silently discarded. Otherwise, E_NOTICE is generated and the function will return false.

Caution

If and how //TRANSLIT works exactly depends on the system's iconv[] implementation [cf. ICONV_IMPL]. Some implementations are known to ignore //TRANSLIT, so the conversion is likely to fail for characters which are illegal for the to_encoding.

string

The string to be converted.

Return Values

Returns the converted string, or false on failure.

Examples

Example #1 iconv[] example

The above example will output something similar to:

Original : This is the Euro symbol '€'.
TRANSLIT : This is the Euro symbol 'EUR'.
IGNORE   : This is the Euro symbol ''.
Plain    :
Notice: iconv[]: Detected an illegal character in input string in .\iconv-example.php on line 7

Notes

Note:

The character encodings and options available depend on the installed implementation of iconv. If the argument to from_encoding or to_encoding is not supported on the current system, false will be returned.

See Also

  • mb_convert_encoding[] - Convert a string from one character encoding to another
  • UConverter::transcode[] - Convert a string from one character encoding to another

orrd101 at gmail dot com

10 years ago

The "//ignore" option doesn't work with recent versions of the iconv library.  So if you're having trouble with that option, you aren't alone. 

That means you can't currently use this function to filter invalid characters.  Instead it silently fails and returns an empty string [or you'll get a notice but only if you have E_NOTICE enabled].

This has been a known bug with a known solution for at least since 2009 years but no one seems to be willing to fix it [PHP must pass the -c option to iconv].  It's still broken as of the latest release 5.4.3.

//bugs.php.net/bug.php?id=48147
//bugs.php.net/bug.php?id=52211
//bugs.php.net/bug.php?id=61484

[UPDATE 15-JUN-2012]
Here's a workaround...

  ini_set['mbstring.substitute_character', "none"];
  $text= mb_convert_encoding[$text, 'UTF-8', 'UTF-8'];

That will strip invalid characters from UTF-8 strings [so that you can insert it into a database, etc.].  Instead of "none" you can also use the value 32 if you want it to insert spaces in place of the invalid characters.

Ritchie

15 years ago

Please note that iconv['UTF-8', 'ASCII//TRANSLIT', ...] doesn't work properly when locale category LC_CTYPE is set to C or POSIX. You must choose another locale otherwise all non-ASCII characters will be replaced with question marks. This is at least true with glibc 2.5.

Example:

daniel dot rhodes at warpasylum dot co dot uk

11 years ago

Interestingly, setting different target locales results in different, yet appropriate, transliterations. For example:

annuaireehtp at gmail dot com

12 years ago

to test different combinations of convertions between charsets [when we don't know the source charset and what is the convenient destination charset] this is an example :



then after displaying, you use the $i$j that shows good displaying.
NB: you can add other charsets to $tab  to test other cases.

Daniel Klein

2 years ago

If you want to convert to a Unicode encoding without the byte order mark [BOM], add the endianness to the encoding, e.g. instead of "UTF-16" which will add a BOM to the start of the string, use "UTF-16BE" which will convert the string without adding a BOM.

i.e.



workaround suggested here and elsewhere will also break when encountering illegal characters, at least dropping a useful note ["htmlentities[]: Invalid multibyte sequence in argument in..."]

I have found a lot of hints, suggestions and alternative methods [it's scary and in my opinion no good sign how many ways PHP natively provides to convert the encoding of strings], but none of them really worked, except for this one:

Leigh Morresi

13 years ago

If you are getting question-marks in your iconv output when transliterating, be sure to 'setlocale' to something your system supports.

Some PHP CMS's will default setlocale to 'C', this can be a problem.

use the "locale" command to find out a list..

$ locale -a
C
en_AU.utf8
POSIX

Chủ Đề