Decode url utf 8 javascript

The decodeURI() function decodes a Uniform Resource Identifier (URI) previously created by encodeURI() or by a similar routine.

Try it

Syntax

Parameters

encodedURI

A complete, encoded Uniform Resource Identifier.

Return value

A new string representing the unencoded version of the given encoded Uniform Resource Identifier (URI).

Exceptions

Throws an URIError ("malformed URI sequence") exception when encodedURI contains invalid character sequences.

Description

Replaces each escape sequence in the encoded URI with the character that it represents, but does not decode escape sequences that could not have been introduced by encodeURI. The character # is not decoded from escape sequences.

Examples

Decoding a Cyrillic URL

decodeURI('https://developer.mozilla.org/ru/docs/JavaScript_%D1%88%D0%B5%D0%BB%D0%BB%D1%8B');
// "https://developer.mozilla.org/ru/docs/JavaScript_шеллы"

Catching errors

try {
  const a = decodeURI('%E0%A4%A');
} catch (e) {
  console.error(e);
}

// URIError: malformed URI sequence

Specifications

Specification
ECMAScript Language Specification
# sec-decodeuri-encodeduri

Browser compatibility

BCD tables only load in the browser

See also

The decodeURIComponent() function decodes a Uniform Resource Identifier (URI) component previously created by encodeURIComponent or by a similar routine.

Try it

Syntax

decodeURIComponent(encodedURI)

Parameters

encodedURI

An encoded component of a Uniform Resource Identifier.

Return value

A new string representing the decoded version of the given encoded Uniform Resource Identifier (URI) component.

Exceptions

Throws an URIError ("malformed URI sequence") exception when used wrongly.

Description

Replaces each escape sequence in the encoded URI component with the character that it represents.

Examples

Decoding a Cyrillic URL component

decodeURIComponent('JavaScript_%D1%88%D0%B5%D0%BB%D0%BB%D1%8B');
// "JavaScript_шеллы"

Catching errors

try {
  const a = decodeURIComponent('%E0%A4%A');
} catch (e) {
  console.error(e);
}

// URIError: malformed URI sequence

Decoding query parameters from a URL

decodeURIComponent cannot be used directly to parse query parameters from a URL. It needs a bit of preparation.

function decodeQueryParam(p) {
  return decodeURIComponent(p.replace(/\+/g, ' '));
}

decodeQueryParam('search+query%20%28correct%29');
// 'search query (correct)'

Specifications

Specification
ECMAScript Language Specification
# sec-decodeuricomponent-encodeduricomponent

Browser compatibility

BCD tables only load in the browser

See also

I have Javascript in an XHTML web page that is passing UTF-8 encoded strings. It needs to continue to pass the UTF-8 version, as well as decode it. How is it possible to decode a UTF-8 string for display?


Jon Adams

23.9k18 gold badges83 silver badges118 bronze badges

asked Nov 13, 2012 at 6:37

Jarrett MattsonJarrett Mattson

9352 gold badges8 silver badges14 bronze badges

21

To answer the original question: here is how you decode utf-8 in javascript:

http://ecmanaut.blogspot.ca/2006/07/encoding-decoding-utf8-in-javascript.html

Specifically,

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

We have been using this in our production code for 6 years, and it has worked flawlessly.

Note, however, that escape() and unescape() are deprecated. See this.

Decode url utf 8 javascript

Anna

1894 silver badges17 bronze badges

answered Dec 3, 2012 at 20:53

12

This should work:

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt

/* utf.js - UTF-8 <=> UTF-16 convertion
 *
 * Copyright (C) 1999 Masanao Izumo <>
 * Version: 1.0
 * LastModified: Dec 25 1999
 * This library is free.  You can redistribute it and/or modify it.
 */

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
    }
    }

    return out;
}

Check out the JSFiddle demo.

Also see the related questions: here and here

answered Mar 13, 2014 at 8:34

AlbertAlbert

62.7k58 gold badges229 silver badges367 bronze badges

3

Perhaps using the textDecoder will be sufficient.

Not supported in IE though.

var decoder = new TextDecoder('utf-8'),
    decodedMessage;

decodedMessage = decoder.decode(message.data);

Handling non-UTF8 text

In this example, we decode the Russian text "Привет, мир!", which means "Hello, world." In our TextDecoder() constructor, we specify the Windows-1251 character encoding, which is appropriate for Cyrillic script.

    let win1251decoder = new TextDecoder('windows-1251');
    let bytes = new Uint8Array([207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33]);
    console.log(win1251decoder.decode(bytes)); // Привет, мир!

The interface for the TextDecoder is described here.

Retrieving a byte array from a string is equally simpel:

const decoder = new TextDecoder();
const encoder = new TextEncoder();

const byteArray = encoder.encode('Größe');
// converted it to a byte array

// now we can decode it back to a string if desired
console.log(decoder.decode(byteArray));

If you have it in a different encoding then you must compensate for that upon encoding. The parameter in the constructor for the TextEncoder is any one of the valid encodings listed here.

answered Nov 17, 2016 at 9:53

JonathanJonathan

1,20513 silver badges22 bronze badges

6

Update @Albert's answer adding condition for emoji.

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3, char4;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
     case 15:
        // 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        char4 = array[i++];
        out += String.fromCodePoint(((c & 0x07) << 18) | ((char2 & 0x3F) << 12) | ((char3 & 0x3F) << 6) | (char4 & 0x3F));

        break;
    }

    return out;
}

answered Feb 25, 2017 at 7:27

Decode url utf 8 javascript

lauthulauthu

3063 silver badges11 bronze badges

1

Here is a solution handling all Unicode code points include upper (4 byte) values and supported by all modern browsers (IE and others > 5.5). It uses decodeURIComponent(), but NOT the deprecated escape/unescape functions:

function utf8_to_str(a) {
    for(var i=0, s=''; i

Tested and available on GitHub

To create UTF-8 from a string:

function utf8_from_str(s) {
    for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) {
        if(enc[i] === '%') {
            a.push(parseInt(enc.substr(i+1, 2), 16))
            i += 3
        } else {
            a.push(enc.charCodeAt(i++))
        }
    }
    return a
}

Tested and available on GitHub

answered Feb 15, 2017 at 5:35

1

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

https://github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

answered Feb 26, 2019 at 10:33

Decode url utf 8 javascript

Olle TiinusOlle Tiinus

2042 silver badges9 bronze badges

1

@albert's solution was the closest I think but it can only parse up to 3 byte utf-8 characters

function utf8ArrayToStr(array) {
  var out, i, len, c;
  var char2, char3;

  out = "";
  len = array.length;
  i = 0;

  // XXX: Invalid bytes are ignored
  while(i < len) {
    c = array[i++];
    if (c >> 7 == 0) {
      // 0xxx xxxx
      out += String.fromCharCode(c);
      continue;
    }

    // Invalid starting byte
    if (c >> 6 == 0x02) {
      continue;
    }

    // #### MULTIBYTE ####
    // How many bytes left for thus character?
    var extraLength = null;
    if (c >> 5 == 0x06) {
      extraLength = 1;
    } else if (c >> 4 == 0x0e) {
      extraLength = 2;
    } else if (c >> 3 == 0x1e) {
      extraLength = 3;
    } else if (c >> 2 == 0x3e) {
      extraLength = 4;
    } else if (c >> 1 == 0x7e) {
      extraLength = 5;
    } else {
      continue;
    }

    // Do we have enough bytes in our data?
    if (i+extraLength > len) {
      var leftovers = array.slice(i-1);

      // If there is an invalid byte in the leftovers we might want to
      // continue from there.
      for (; i < len; i++) if (array[i] >> 6 != 0x02) break;
      if (i != len) continue;

      // All leftover bytes are valid.
      return {result: out, leftovers: leftovers};
    }
    // Remove the UTF-8 prefix from the char (res)
    var mask = (1 << (8 - extraLength - 1)) - 1,
        res = c & mask, nextChar, count;

    for (count = 0; count < extraLength; count++) {
      nextChar = array[i++];

      // Is the char valid multibyte part?
      if (nextChar >> 6 != 0x02) {break;};
      res = (res << 6) | (nextChar & 0x3f);
    }

    if (count != extraLength) {
      i--;
      continue;
    }

    if (res <= 0xffff) {
      out += String.fromCharCode(res);
      continue;
    }

    res -= 0x10000;
    var high = ((res >> 10) & 0x3ff) + 0xd800,
        low = (res & 0x3ff) + 0xdc00;
    out += String.fromCharCode(high, low);
  }

  return {result: out, leftovers: []};
}

This returns {result: "parsed string", leftovers: [list of invalid bytes at the end]} in case you are parsing the string in chunks.

EDIT: fixed the issue that @unhammer found.

answered Jan 21, 2016 at 14:50

fakedrakefakedrake

6,2398 gold badges38 silver badges59 bronze badges

3

// String to Utf8 ByteBuffer

function strToUTF8(str){
  return Uint8Array.from(encodeURIComponent(str).replace(/%(..)/g,(m,v)=>{return String.fromCodePoint(parseInt(v,16))}), c=>c.codePointAt(0))
}

// Utf8 ByteArray to string

function UTF8toStr(ba){
  return decodeURIComponent(ba.reduce((p,c)=>{return p+'%'+c.toString(16),''}))
}

answered Apr 13, 2018 at 16:39

2

Using my 1.6KB library, you can do

ToString(FromUTF8(Array.from(usernameReceived)))

answered Jan 24, 2019 at 13:45

Decode url utf 8 javascript

MCCCSMCCCS

9723 gold badges18 silver badges42 bronze badges

This is a solution with extensive error reporting.

It would take an UTF-8 encoded byte array (where byte array is represented as array of numbers and each number is an integer between 0 and 255 inclusive) and will produce a JavaScript string of Unicode characters.

function getNextByte(value, startByteIndex, startBitsStr, 
                     additional, index) 
{
    if (index >= value.length) {
        var startByte = value[startByteIndex];
        throw new Error("Invalid UTF-8 sequence. Byte " + startByteIndex 
            + " with value " + startByte + " (" + String.fromCharCode(startByte) 
            + "; binary: " + toBinary(startByte)
            + ") starts with " + startBitsStr + " in binary and thus requires " 
            + additional + " bytes after it, but we only have " 
            + (value.length - startByteIndex) + ".");
    }
    var byteValue = value[index];
    checkNextByteFormat(value, startByteIndex, startBitsStr, additional, index);
    return byteValue;
}

function checkNextByteFormat(value, startByteIndex, startBitsStr, 
                             additional, index) 
{
    if ((value[index] & 0xC0) != 0x80) {
        var startByte = value[startByteIndex];
        var wrongByte = value[index];
        throw new Error("Invalid UTF-8 byte sequence. Byte " + startByteIndex 
             + " with value " + startByte + " (" +String.fromCharCode(startByte) 
             + "; binary: " + toBinary(startByte) + ") starts with " 
             + startBitsStr + " in binary and thus requires " + additional 
             + " additional bytes, each of which shouls start with 10 in binary."
             + " However byte " + (index - startByteIndex) 
             + " after it with value " + wrongByte + " (" 
             + String.fromCharCode(wrongByte) + "; binary: " + toBinary(wrongByte)
             +") does not start with 10 in binary.");
    }
}

function fromUtf8 (str) {
        var value = [];
        var destIndex = 0;
        for (var index = 0; index < str.length; index++) {
            var code = str.charCodeAt(index);
            if (code <= 0x7F) {
                value[destIndex++] = code;
            } else if (code <= 0x7FF) {
                value[destIndex++] = ((code >> 6 ) & 0x1F) | 0xC0;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0xFFFF) {
                value[destIndex++] = ((code >> 12) & 0x0F) | 0xE0;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x1FFFFF) {
                value[destIndex++] = ((code >> 18) & 0x07) | 0xF0;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x03FFFFFF) {
                value[destIndex++] = ((code >> 24) & 0x03) | 0xF0;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x7FFFFFFF) {
                value[destIndex++] = ((code >> 30) & 0x01) | 0xFC;
                value[destIndex++] = ((code >> 24) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else {
                throw new Error("Unsupported Unicode character \"" 
                    + str.charAt(index) + "\" with code " + code + " (binary: " 
                    + toBinary(code) + ") at index " + index
                    + ". Cannot represent it as UTF-8 byte sequence.");
            }
        }
        return value;
    }

answered Mar 27, 2020 at 19:27

Decode url utf 8 javascript

I reckon the easiest way would be to use a built-in js functions decodeURI() / encodeURI().

function (usernameSent) {
  var usernameEncoded = usernameSent; // Current value: utf8
  var usernameDecoded = decodeURI(usernameReceived);  // Decoded
  // do stuff
}

answered Mar 2, 2018 at 16:26

1

const decoder = new TextDecoder();
console.log(decoder.decode(new Uint8Array([97])));

Decode url utf 8 javascript

MDN resource link

answered Apr 21 at 0:04

Decode url utf 8 javascript

I searched for a simple solution and this works well for me:

//input data
view = new Uint8Array(data);

//output string
serialString = ua2text(view);

//convert UTF8 to string
function ua2text(ua) {
    s = "";
    for (var i = 0; i < ua.length; i++) {
        s += String.fromCharCode(ua[i]);
    }
    return s;               
}

Only issue I have is sometimes I get one character at a time. This might be by design with my source of the arraybuffer. I'm using https://github.com/xseignard/cordovarduino to read serial data on an android device.

Decode url utf 8 javascript

Adween

2,7722 gold badges17 silver badges19 bronze badges

answered Aug 12, 2015 at 13:41

Evan GrantEvan Grant

131 silver badge4 bronze badges

1

Preferably, as others have suggested, use the Encoding API. But if you need to support IE (for some strange reason) MDN recommends this repo FastestSmallestTextEncoderDecoder

If you need to make use of the polyfill library:

    import {encode, decode} from "fastestsmallesttextencoderdecoder";

Then (regardless of the polyfill) for encoding and decoding:

    // takes in USVString and returns a Uint8Array object
    const encoded = new TextEncoder().encode('€')
    console.log(encoded);

    // takes in an ArrayBuffer or an ArrayBufferView and returns a DOMString
    const decoded = new TextDecoder().decode(encoded);
    console.log(decoded);

answered May 5, 2021 at 20:02

1

How do you decode or encode a URL in JavaScript?

Decoding in Javascript can be achieved using decodeURI function. It takes encodeURIComponent(url) string so it can decode these characters. 2. unescape() function: This function takes a string as a single parameter and uses it to decode that string encoded by the escape() function.

What is decodeURI in JavaScript?

The decodeURI() function decodes a Uniform Resource Identifier (URI) previously created by encodeURI() or by a similar routine.

How do I use encodeURI in JavaScript?

The encodeURI() method encodes a URI..
Note. Use the decodeURI() method to decode a URI..
Special Characters. The encodeURI() method does not encode characters like: , / ? : @ & = + $ * # ... .
See Also: The encodeURIComponent() method to encode a URI. The decodeURIComponent() method to decode a URI..

How do you decode a space in a URL?

URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.