Skip to content
This repository has been archived by the owner on Sep 2, 2021. It is now read-only.

Choose different output encoding in Windows #56

Open
Frank-Krick opened this issue Oct 2, 2018 · 3 comments
Open

Choose different output encoding in Windows #56

Frank-Krick opened this issue Oct 2, 2018 · 3 comments
Labels

Comments

@Frank-Krick
Copy link
Contributor

It seems that the output encoding in Windows is cp1252 by default which creates problems when the source files contain unicode characters if there is no suitable character defined in the charmap.

When I try to process a document containing the character '●' with MarkdownPP on Windows it exits with the following error:

Traceback (most recent call last):
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\Scripts\markdown-pp-script.py", line 11, in <module>
    load_entry_point('MarkdownPP==1.4', 'console_scripts', 'markdown-pp')()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\main.py", line 112, in main
    MarkdownPP.MarkdownPP(input=mdpp, output=md, modules=modules)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\MarkdownPP.py", line 28, in __init__
    pp.process()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Processor.py", line 49, in process
    transforms = module.transform(self.data)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 39, in transform
    includedata = self.include(match)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 70, in include
    data[linenum:linenum+1] = self.include(match, dirname)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 61, in include
    data = f.readlines()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 6569: character maps to <undefined>

The same input can be processed fine using Linux.

@amyreese amyreese added the bug label Oct 2, 2018
@amyreese
Copy link
Owner

amyreese commented Oct 2, 2018

Seems related to #53. I'm not familiar with how Python handles encodings on Windows, but on Linux, it uses the default encodings specified by the OS/environment.

@Frank-Krick
Copy link
Contributor Author

Yes, that seems to be the same issue. The python 3 interpreter used the encoding returned by locale.getpreferredencoding() which on my Windows systems is cp-1252 and on my Linux system is UTF-8. My .mdpp files are encoded using UTF-8 and contain non-ACSII characters so python on Windows can't read them.
The fix described in #53 would work but only for python 3.

@VincenzoLaSpesa
Copy link
Contributor

I might have a fix for this here
https://github.com/VincenzoLaSpesa/markdown-pp

I exposed the encoding parameter to the MarkdownPP class and now i can call it with:

MarkdownPP(input=infile, modules=['include', 'toc'], output=outfile, encoding="UTF8")

If no encoding is provided it's defaulted to sys.getdefaultencoding()

I will test it a little more and then i will open a merge request.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants