Monday, January 23, 2023

Python: reading a text file - character


   Reading a text file in python3 (csv or txt) there is a character that can be appreciated using "more" in terminal but in python3 the situation is more complicated.


  $ more epa.csv

<U+FEFF>the text


   Python3 reads the file well, it doesn't throw an error, but that invisible "character" remains in the variables, the texts, etc. and can cause some inconvenience.


   The solution is to read the file and specify the encoding, something as simple as:


with open(FILENAME, encoding='utf-8-sig') as file:

     for line in file:


Explanation (taken from:

The Unicode character U+FEFF is the byte order mark, or BOM, and is used to tell the difference between big- and little-endian UTF-16 encoding.

Good luck,

No comments:

Post a Comment