-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: readpst should produce identical output for identical mails (?switch-controlled? boundary behavior) #9
Comments
I think the default behaviour should be deterministic output,
I also doubt there is a use-case for non-deterministic output,
so we should not need to keep the current behaviour at all.
The non-determinism you mention is just a random integer from `rand()`.
In addition to the random boundary name, there is also a random
filename for mails with appointments converted to calendar files.
I welcome patches for both these issues and any other ones that you can
find, please add more comments if you notice other issues and submit
merge requests for any fixes you make. If you aren't able to work on
fixes, then I will work on it when I find some time.
I quote from the code for the issue mentioned above:
```
src/readpst.c-1728- // create our MIME boundaries here.
src/readpst.c:1729: snprintf(boundary, sizeof(boundary), "--boundary-LibPST-iamunique-%i_-_-", rand());
src/readpst.c-1730- snprintf(altboundary, sizeof(altboundary), "alt-%s", boundary);
```
```
src/readpst.c-1664- // attachment appointment request
src/readpst.c:1665: snprintf(fname, sizeof(fname), "i%i.ics", rand());
src/readpst.c-1666- fprintf(f_output, "\n--%s\n", boundary);
src/readpst.c-1667- fprintf(f_output, "Content-Type: %s; charset=\"%s\"; name=\"%s\"\n", "text/calendar", "utf-8", fname);
src/readpst.c-1668- fprintf(f_output, "Content-Disposition: attachment; filename=\"%s\"\n\n", fname);
src/readpst.c-1669- write_schedule_part_data(f_output, item, sender, method);
src/readpst.c-1670- fprintf(f_output, "\n");
```
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Hi Paul, |
No need to apologise, in open source every contribution is useful,
even feature request suggestions and user discussions.
As a freelance open source developer I am always on the lookout for
opportunities. Please send me an email to discuss the specifics.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
Fall back on the current time when the item creation date is missing. This makes the mail file output deterministic most of the time, which means it is easier to compare the results. Leave a FIXME about using the attendee/owner critical change property for DTSTAMP, since extracting those from the PST data isn't supported yet. Partially-fixes: #9
@fhanzlik this feature has been implemented in git, could you test it? It works for me but I'd like a second set of eyes and data before closing it. |
I now have task: extract mails from several (10+) .PST files (all from one account, collected over the past 15 years or so as bakups), remove duplicities and convert mails into MAILDIR structure.
My idea was to extract individual messages from these .PST (using the libpst/readpst) to separate trees, then delete duplicities (using eg. fdupes) and then join result.
In real (apart from the problem of different number of extracted files when processing one .pst file repeatedly - issue #7 touch it), I ran into the problem of detecting the identical/duplicit messages - because readpst now generate internal message boundaries as random strings. Thus even identical messages not appears so:
Perhaps should be somehow (some switch for this behavior) possible to generate predictable and same in all mails boundaries strings - so the same mails would also be presented by the same message files (in terms of content, not file names).
Thanks in advance, Franta Hanzlík
The text was updated successfully, but these errors were encountered: