Skip to content
This repository has been archived by the owner on Sep 2, 2021. It is now read-only.

UnicodeDecodeError: 'gbk' codec can't decode byte (problem with Chinese characters) #76

Open
kmcbest opened this issue Jul 8, 2020 · 3 comments

Comments

@kmcbest
Copy link

kmcbest commented Jul 8, 2020

Test example:

example.zip

If I use
markdown-pp index.mdpp -o out.md
on these two files, markdown-pp throws this error:

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 10: illegal multibyte sequence

append "-e latexrender" doesn't work for this case.

@amyreese
Copy link
Owner

amyreese commented Jul 8, 2020

What version of Python are you using? markdown-pp only supports unicode documents in Python 3.

@kmcbest
Copy link
Author

kmcbest commented Jul 9, 2020

What version of Python are you using? markdown-pp only supports unicode documents in Python 3.

Python 3.8.2, my example files are in UTF-8 without BOM.

@amyreese
Copy link
Owner

The project tries to read files with the default encoding used by Python. If your system uses a locale that specifies encodings other than UTF-8, then it's going to fail on decoding the contents of a UTF-8 file. You can override the system locale by specifying the appropriate environment values, and you can test the default encoding with locale.getpreferredencoding().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants