Choose different output encoding in Windows #56

Frank-Krick · 2018-10-02T01:51:12Z

It seems that the output encoding in Windows is cp1252 by default which creates problems when the source files contain unicode characters if there is no suitable character defined in the charmap.

When I try to process a document containing the character '●' with MarkdownPP on Windows it exits with the following error:

Traceback (most recent call last):
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\Scripts\markdown-pp-script.py", line 11, in <module>
    load_entry_point('MarkdownPP==1.4', 'console_scripts', 'markdown-pp')()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\main.py", line 112, in main
    MarkdownPP.MarkdownPP(input=mdpp, output=md, modules=modules)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\MarkdownPP.py", line 28, in __init__
    pp.process()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Processor.py", line 49, in process
    transforms = module.transform(self.data)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 39, in transform
    includedata = self.include(match)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 70, in include
    data[linenum:linenum+1] = self.include(match, dirname)
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\site-packages\MarkdownPP\Modules\Include.py", line 61, in include
    data = f.readlines()
  File "C:\Users\frank\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 6569: character maps to <undefined>

The same input can be processed fine using Linux.

The text was updated successfully, but these errors were encountered:

amyreese · 2018-10-02T04:19:27Z

Seems related to #53. I'm not familiar with how Python handles encodings on Windows, but on Linux, it uses the default encodings specified by the OS/environment.

Frank-Krick · 2018-10-02T04:55:16Z

Yes, that seems to be the same issue. The python 3 interpreter used the encoding returned by locale.getpreferredencoding() which on my Windows systems is cp-1252 and on my Linux system is UTF-8. My .mdpp files are encoded using UTF-8 and contain non-ACSII characters so python on Windows can't read them.
The fix described in #53 would work but only for python 3.

VincenzoLaSpesa · 2021-03-05T18:34:14Z

I might have a fix for this here
https://github.com/VincenzoLaSpesa/markdown-pp

I exposed the encoding parameter to the MarkdownPP class and now i can call it with:

MarkdownPP(input=infile, modules=['include', 'toc'], output=outfile, encoding="UTF8")

If no encoding is provided it's defaulted to sys.getdefaultencoding()

I will test it a little more and then i will open a merge request.

amyreese added the bug label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose different output encoding in Windows #56

Choose different output encoding in Windows #56

Frank-Krick commented Oct 2, 2018

amyreese commented Oct 2, 2018

Frank-Krick commented Oct 2, 2018

VincenzoLaSpesa commented Mar 5, 2021

Choose different output encoding in Windows #56

Choose different output encoding in Windows #56

Comments

Frank-Krick commented Oct 2, 2018

amyreese commented Oct 2, 2018

Frank-Krick commented Oct 2, 2018

VincenzoLaSpesa commented Mar 5, 2021