Cyber Security | DevOps | Cloud | Analytics | Open Source | Programming





How to Enable UTF-8 in Python ?



In this post , we will see - How to Enable UTF-8 in Python.

  • In Python 3 UTF-8 is the default source encoding
  • When the encoding is not correctly set-up , it is commonly seen to throw an ""UnicodeDecodeError: 'ascii' codec can't encode" error
  • Python string function uses the default character encoding .
    • Check sys.stdout.encoding value - sometimes it is set to "None".
    • The encoding default can be located in - /etc/default/locale
    • The default is defined by the variables LANG, LC_ALL, LC_CTYPE
    • Check the values set against these variables.
      • For example - If the default is UTF-8 , these would be LANG="UTF-8" , LC_ALL="UTF-8" , LC_CTYPE="UTF-8"
  • A Standard option is to use "UTF-8" as a encode option which more or less works fine.
  • Verify if the text editor encodes properly your code in UTF-8. Else there would be invisible characters which are not interpreted as UTF-8.
  Let's see the the options to set the UTF-8 Encoding (If you are using Python 3, UTF-8 is the default source encoding)  

  • Set the Python encoding to UTF-8. This will ensure the fix for the current session .


$ export PYTHONIOENCODING=utf8


   

  • Set the environment variables in /etc/default/locale .  This way the system`s default locale encoding is set to the UTF-8 format.


LANG="UTF-8" or "en\_US.UTF-8"
LC\_ALL="UTF-8" or "en\_US.UTF-8"
LC\_CTYPE="UTF-8" or "en\_US.UTF-8"




Or use command line
export LC\_ALL="UTF-8"  
export LC\_ALL="UTF-8"
export LC\_CTYPE="UTF-8"


   

  • You can Set the encoding in the code also.


a = <STRING\_WITH\_UNICODE\_CHARACTER>
b = str1.encode('utf-8')
print (a.encode('utf-8'))
print (b)




a = <STRING\_WITH\_UNICODE\_CHARACTER>
b = str1.encode('utf-8', 'ignore').decode('utf-8')
print (b)


   

  • Set the encoding at Script Level


\# encoding=utf8
from \_\_future\_\_ import unicode\_literals
import sys
reload(sys)
sys.setdefaultencoding('utf8')


 

  • Using locale


import os
import locale
os.environ\["PYTHONIOENCODING"\] = "utf-8"
thisLocale=locale.setlocale(category=locale.LC\_ALL, locale="en\_GB.UTF-8")


 

  • When you use IDLE (Python 2) and the file contains non-ASCII characters , then it will prompt you to add an encoding declaration, using the Emacs -*- style. This basically tells the text editor what codec to use.


#!/usr/bin/env python
# -\*- coding: utf-8 -\*-





#!/usr/bin/env python 
# coding: utf8


 

  • If you encode with ascii an decide to throw out the unicode characters ,use the below option . In this example , unicode characters will be dropped from varB.


varb = str1.encode('ascii', 'ignore').decode('ascii')
print (varB)


 

Additional points :

  • UTF-8 properties -
    • Can handle any Unicode code point.
    • A string of ASCII text is also valid UTF-8 text.
    • UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded.
  Hope this helps.  

Other Interesting Reads -

 


Does Python use UTF 8,  How do I encode utf8 in Python, How do I change encoding in Python, What is Character Set in Python, decode utf-8 python, convert string to unicode python 3, python utf-8 to ascii, python utf-8 header, python unicode to utf8, python encoding types, python unicode() function, python print utf-8, decode utf-8 python, convert string to unicode python 3, python utf-8 to ascii, python utf-8 header, python unicode to utf8, python encoding types, python unicode() function, python print utf-8, decode utf-8 python, python utf-8 header, convert string to unicode python, # -\*- coding: utf-8 -\*-, python utf-8 to ascii, python unicode() function, python unicode to utf8, python print utf-8,Does Python use UTF 8?,How do I encode utf8 in Python?,How do you encode a character in python?, How do I get Unicode in Python?, decode utf-8 python, python utf-8 header, convert string to unicode python, # -\*- coding: utf-8 -\*-, python utf-8 to ascii, python unicode() function, python unicode to utf8, python print utf-8, utf-8 in python, how to decode utf-8 in python, how to use utf-8 in python, how to convert string to utf-8 in python, how to encode a string to utf-8 in python, how to decode utf-8 in python 3,how to convert ascii to utf-8 in python, how to convert a file to utf-8 in python, how to convert iso-8859-1 to utf-8 in python, how to convert ansi to utf-8 in python, utf-8 python ,utf-8 encoding ,utf-8 characters ,utf-8 meaning ,utf-8 vs utf-16 ,utf-8 decoder ,utf-8 vs ascii ,utf-8 converter ,utf-8 character set ,utf-8 table, decode utf-8 python ,encoding utf-8 python ,python utf-8 header ,convert string to unicode python 3 ,python string to unicode ,python unicode to utf8 ,python encoding types ,python print utf-8, set utf 8 python, set default encoding utf-8 python