Php convert windows-1252 to utf-8
We have several database fields that contain Windows-1252 characters:
Those values map to the desired values from this list: http://www.i18nqa.com/debug/utf8-debug.html I've tried various permutations of htmlentites, mb_detect_encoding, uft8_decode, etc, but have not yet been able to transform those values to: an example pain — if you're How can I transform these characters to their listed values in php? asked Feb 5, 2016 at 21:49
You can use mb_convert_encoding
DEMO: answered Feb 5, 2016 at 22:03
Pedro LobitoPedro Lobito 88.3k29 gold badges238 silver badges256 bronze badges 0 Oh my god, this took too long to solve so I want to post my answer here since this link kept coming up in searches. My MySQL DB table has encoding with utf8mb4_unicode_520_ci and a column has those annoying work curly quotes. I was trying to read the DB value and encode with json_encode but it would fail and json_encode would return blank so I used utf8_encode. That improperly converted the character. I had to use mb_convert_encoding to go from Windows-1252 to UTF-8 but then the json_encode messed that up too. In the end, this worked:
Since I was having the issue with an image URL, this worked perfectly and didn't require me to decode it on the other side.
Dave 4,86216 gold badges31 silver badges38 bronze badges answered Apr 9, 2019 at 16:03
(PHP 4 >= 4.0.5, PHP 5, PHP 7, PHP 8) iconv — Convert a string from one character encoding to another Descriptioniconv(string Parametersfrom_encoding The current encoding used to interpret to_encoding The desired encoding of the result. If the string Caution If and how string The string to be converted. Return Values Returns the converted string, or ExamplesExample #1 iconv() example
'Original : ', $text, PHP_EOL; The above example will output something similar to: Original : This is the Euro symbol '€'. TRANSLIT : This is the Euro symbol 'EUR'. IGNORE : This is the Euro symbol ''. Plain : Notice: iconv(): Detected an illegal character in input string in .\iconv-example.php on line 7 Notes
See Also
orrd101 at gmail dot com ¶ 10 years ago
Ritchie ¶ 15 years ago
daniel dot rhodes at warpasylum dot co dot uk ¶ 11 years ago
annuaireehtp at gmail dot com ¶ 12 years ago
$chain;
Daniel Klein ¶ 2 years ago
manuel at kiessling dot net ¶ 13 years ago
= html_entity_decode(htmlentities($oldstring, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-15');?> Leigh Morresi ¶ 13 years ago
zhawari at hotmail dot com ¶ 17 years ago
$substring1 == "00")
Nopius ¶ 7 years ago
Daniel Klein ¶ 6 years ago
jessiedeer at hotmail dot com ¶ 9 years ago
atelier at degoy dot com ¶ 7 years ago
nikolai-dot-zujev-at-gmail-dot-com ¶ 17 years ago
$i = 0; $i < strlen( $sInput ); $i++ ) vitek at 4rome dot ru ¶ 17 years ago
gree:.. (gree 4T grees D0T net) ¶ 15 years ago
nilcolor at gmail dot coom ¶ 16 years ago
jorortega at gmail dot com ¶ 9 years ago
ameten ¶ 11 years ago
kikke ¶ 13 years ago
anyean at gmail dot com ¶ 17 years ago
zhawari at hotmail dot com ¶ 17 years ago
hexdec($substring1) < 127) Daniel Klein ¶ 9 years ago
berserk220 at mail dot ru ¶ 14 years ago
ng4rrjanbiah at rediffmail dot com ¶ 18 years ago
anton dot vakulchik at gmail dot com ¶ 14 years ago
phpmanualspam at netebb dot com ¶ 13 years ago
mightye at gmail dot com ¶ 14 years ago
phpnet at dariosulser dot ch ¶ 3 years ago
rasmus at mindplay dot dk ¶ 8 years ago
Locoluis ¶ 15 years ago
martin at front of mind dot co dot uk ¶ 13 years ago
chicopeste at gmail dot com ¶ 8 years ago
jessie at hotmail dot com ¶ 9 years ago
vb (at) bertola.eu ¶ 12 years ago
aissam at yahoo dot com ¶ 17 years ago
Anonymous ¶ 12 years ago
Andries Seutens ¶ 12 years ago
(LC_CTYPE, 'nl_BE.utf8');$string = 'rené'; mirek at burkon dot org ¶ 14 years ago
clearUTF($s) Is WindowsWindows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for. Any visible character in the ASCII range (127 and below) are encoded 1:1 in UTF-8.
What is WindowsWindows-1252 or CP-1252 (code page 1252) is a single-byte character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
|