Cyber Security | DevOps | Cloud | Analytics | Open Source | Programming





How To Fix Python Error - UnicodeEncodeError: ascii codec cant encode character



This is a very common error


UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position x

Fix - UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0':

Quite common error while dealing with unicode characters if you fetch or crawl data from different web pages (on different sites). Let's understand why this problem is happening -

  • When you try to use the Python string function, it uses the default character encoding .
    • If you check sys.stdout.encoding value , sometimes it is "None".
    • The default can be located in - /etc/default/locale in case of Linux
    • And the default is defined by the variables LANG, LC_ALL, LC_CTYPE
    • See what values are set against these variables.
    • For example - If the default is UTF-8 , these would be LANG="UTF-8" , LC_ALL="UTF-8" , LC_CTYPE="UTF-8"
  • Now assume default encoding is "XYZ" . Hence Python tries to encode the bytes (input data\text) using this encoding.
  • Assume some of "these" text\data representations belong to unicode characters.
  • Now if the default character encoding used is not equipped to handle that, the error pops out.
  • So to handle this issue , you have to specify the "RIGHT" encode option to Python so it knows how to handle it.
  • A Standard option is to use "UTF-8" as a encode option. It more or less works fine.
  • There are other ways also to workout\ignore the error. We will see that.
  The Python string function handles the below set of ASCII characters comfortably -


whitespace = ' \t\n\r\v\f'
ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
ascii_letters = ascii_lowercase + ascii_uppercase
digits = '0123456789'
hexdigits = digits + 'abcdef' + 'ABCDEF'
octdigits = '01234567'
punctuation = r"""!"#$%&'()*+,-./:;<=>?@[]^_`{|}~"""
printable = digits + ascii_letters + punctuation + whitespace
        

  Fix -

  • Set the Python encoding to UTF-8. This will ensure the fix for the current session .

$ export PYTHONIOENCODING=utf8

  • Set the environment variables correctly in /etc/default/locale .  This sets the system`s default locale encoding to the UTF-8 format.

LANG="UTF-8" or "en_US.UTF-8"
LC_ALL="UTF-8" or "en_US.UTF-8"
LC_CTYPE="UTF-8" or "en_US.UTF-8"


Or use command line
export LC_ALL="UTF-8"  
export LC_ALL="UTF-8"
export LC_CTYPE="UTF-8"

  • Set the encoding at code level.

str1 = <STRING_WITH_UNICODE_CHARACTER>
str2 = str1.encode('utf-8')
print (str1.encode('utf-8'))
print (str2)


str1 = <STRING_WITH_UNICODE_CHARACTER>
str2 = str1.encode('utf-8', 'ignore').decode('utf-8')
print (str2)

  • Set the encoding using sys

# encoding=utf8
from __future__ import unicode_literals
import sys
reload(sys)
sys.setdefaultencoding('utf8')

  • Set the encoding using locale

import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
scriptLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")

  • Set the encoding using Emacs

#!/usr/bin/env python
# -*- coding: utf-8 -*-
u = 'abcdé'
print(ord(u[-1]))


#!/usr/bin/env python
# -*- coding: utf-8 -*-


#!/usr/bin/env python 
# coding: utf8

  • If you can safely ignore or bypass or throw out the unicode characters or you do not need those , you can also use below option . In this example , str2 will no longer have any unicode characters (those are ignored or dropped).

str2 = str1.encode('ascii', 'ignore').decode('ascii')
print (str2)

  • Use codecs for file operation - codecs.open(encoding=”utf-8″) - File handling (Read and write files to and from Unicode) . The encoding can be anything utf-8, utf-16, utf-32 etc.

import codecs
opened = codecs.open("inputfile.txt", "r", "utf-8")

Additional points :

  • In Python 3 as UTF-8 is the default source encoding
  • encode() function converts the Unicode to bytes (returns a bytes representation of the Unicode string). Various encode() options -
    • encode('ascii', 'ignore')
    • encode('ascii', 'replace')
    • encode('ascii', 'xmlcharrefreplace')
    • encode('ascii', 'backslashreplace')
    • encode('ascii', 'namereplace')
  • decode() function converts the bytes to a String . This method takes an encoding argument, such as UTF-8, and optionally an errors argument. The errors argument (e.g. "ignore") specifies the response when the string can’t be converted with the encoding.Various decode() options -
    • decode("utf-8", "strict")
    • decode("utf-8", "replace")
    • decode("utf-8", "backslashreplace")
    • decode("utf-8", "ignore")
  • UTF-8 properties -
    • Can handle any Unicode code point.
    • A string of ASCII text is also valid UTF-8 text.
    • UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded.
  Hope this helps to solve the issue.  

Other Interesting Reads -

 


'ascii' codec can't encode character u'xa0', ascii' codec can t encode character python3, unicodeencodeerror: 'ascii' codec can't encode characters in position ordinal not in range(128), ascii codec can't encode character u' u2019', ascii character u' xa0', unicodeencodeerror: 'ascii' codec can t encode character u'u2026, ascii codec can't encode character u' u2013', unicodeencodeerror: 'ascii' codec can't encode character u'xe9', 'ascii' codec can't encode character u'ufeff', unicodeencodeerror 'ascii' codec can't encode character u' xe0', unicodeencodeerror 'ascii' codec can't encode character u' xe0' in position, python unicodeencodeerror 'ascii' codec can't encode character u' xe0' in position, ascii' codec can t encode character python3, unicodeencodeerror: 'ascii' codec can't encode characters in position ordinal not in range(128), ascii codec can't encode character u' u2019', ascii character u' xa0', unicodeencodeerror: 'ascii' codec can t encode character u'u2026, ascii codec can't encode character u' u2013', unicodeencodeerror: 'ascii' codec can't encode character u'xe9', 'ascii' codec can't encode character u'ufeff', unicodeencodeerror, unicodeencodeerror 'ascii' codec can't, unicodeencodeerror 'charmap', unicodeencodeerror python 3, unicodeencodeerror 'latin-1', unicodeencodeerror while writing to file, ascii' codec can't encode character u' xe9', ascii' codec can't encode character u' xa0', ascii' codec can't encode character ' u2019', ascii' codec can't encode character ' u2013', ascii' codec can't encode character ' u201c', ascii' codec can't encode character ' ufffd', ascii' codec can't encode character u' xa3', ascii' codec can't encode character ' u2026', ascii' codec can't encode character ' ufeff', ascii' codec can't encode character u' xa0', ascii' codec can't encode character u' u2013' in position 11 ordinal not in range(128), ascii' codec can't encode character u' xe9', ascii' codec can't encode character u' u2019', ascii' codec can't encode character u' u201c', ascii' codec can't encode character u' ufffd', ascii' codec can't encode character u' u2026', ascii' codec can't encode character u' xe4', ascii' codec can't encode character u' u2013' in position 33 ordinal not in range(128), ascii' codec can't encode character u' u201c' in position, ascii' codec can't encode character, ascii' codec can't encode character u' xe9', ascii' codec can't encode character ' u2019' ascii' codec can't encode character u' u2013', ascii' codec can't encode character ' u201c', ascii' codec can't encode characters in position 0-5 ascii' codec can't encode character u' ufffd', ascii' codec can't encode character python3, ascii' codec can't encode character u' u2026', ascii' codec can't encode characters in position 0-3, ascii' codec can't encode character, ascii' codec can't decode byte, ascii' codec can't encode character u' xe9', ascii' codec can't encode character ' u2019', ascii' codec can't decode byte 0xe2, ascii' codec can't encode character u' u2013', python ord, python encoding types, python ascii, python t, python string to hex, python print, python unicode to utf8, python string replace, python unicode to ascii, python write to file

###### fix unicode error python, fix unicode error python, fix unicode errors, how do i fix unicode errors in python 3, fix unicode error python, fix unicode, fix unicode using ftp, fix unicode text, fix unicode online, fix unicodedecodeerror python, fix unicode errors, fix unicode font, fix unicode issues, python fix unicode,What is a Unicode error in Python, How do I get Unicode in Python,What is a Unicode decode error,Does Python support Unicode, how to remove unicode error in python, how to solve unicode decode error in python, unicode python3, remove unicode characters python, python unicode() function, python convert unicode to ascii, python unicode to utf8, python unicode to string, how to fix unicodedecodeerror in python

###### how to set utf-8 in python, how to set default encoding to utf-8 in python, how to set default encoding to utf-8 in python 3,set utf 8 python3, set utf8 charset python, python set utf8 default, pycharm set utf8, python3 set utf8,