Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a converted message have a broken encoding in a message body (have cp1251 in msg ) #49

Open
Ivan-753 opened this issue May 31, 2023 · 3 comments · May be fixed by #57
Open

a converted message have a broken encoding in a message body (have cp1251 in msg ) #49

Ivan-753 opened this issue May 31, 2023 · 3 comments · May be fixed by #57
Assignees

Comments

@Ivan-753
Copy link

Ivan-753 commented May 31, 2023

input: .msg saved by outlook in cp1251
output:
-headers: good
-attachments: good
-message body: encoding is broken

a converted file itself seams to be in utf-8

have something like:
--16855398770.0877C.31779
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Óâàæàåìûé ...!

Îçíàêîìüòåñü ñ ... ... ...

need to be like (from the same message re-exported in unicode):
--16855410730.aC1d0Ed.5308
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Content-Disposition: inline

Уважаемый ...!

Ознакомьтесь с ... ... ...

based on this decoder - https://2cyr.com/decode/?lang=en
source encoding:
WINDOWS-1251
displayed as:
WINDOWS-1252

Email::Outlook::Message version: 0.921

@Ivan-753 Ivan-753 changed the title a converted message have broken encoding in message body (have cp1251 in msg ) a converted message have a broken encoding in a message body (have cp1251 in msg ) May 31, 2023
@czende
Copy link

czende commented Dec 14, 2023

Can someone look at it @mvz pls? I've been using you package since version 0.903, but when it was changed from utf8 to raw binary in version 0.919 (by Andreas Pflug) it is useless for our latin characters. It is the same for multiple code pages.

I still have to use PERL version v5.26.1 from Ubuntu 18 and the old version 0.903 with perl msgconvert.pl <msgfile>

@mvz
Copy link
Owner

mvz commented Dec 16, 2023

I'm guessing this means the codepage is set in a property that I'm currently skipping.

@czende if you have a message that you can share with me that has this problem, it would really help in debugging if you can send it to me.

@mvz mvz self-assigned this Dec 16, 2023
@mvz mvz linked a pull request Dec 16, 2023 that will close this issue
@ubpre
Copy link

ubpre commented Jan 16, 2024

Hi, I'm a co-worker of @czende , unfortunately we can't share the problematic email because it contains sensitive data and we couldn't reproduce it any other way.

I tested your PR locally on our problematic email but the result was the same, the output .eml had a broken message body, so your modification did not help in our case.

But I found that in our case, "latin extended" characters like "ěščřž...." in the headers are probably responsible for the problem.
If the Subject or other headers like From, To, .. contain these characters, it causes "Value for 'Subject' header with wide characters at .../perl5/lib/perl5/Email/Outlook/Message.pm line 331."

Based on this, I changed the header_set method to header_str_set on the mentioned line and the wide characters problem disappeared and the message body in output .eml file is also fixed.

I'm not a Perl developer, so I'm not sure about this modification, but even according to the documentation, the header_str_set method seems to be preferable and also seems to solve the broken message body problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants