Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mismatching of sentence final punctuation #100

Merged
merged 7 commits into from
Nov 16, 2023
Merged

Conversation

XapaJIaMnu
Copy link
Collaborator

Fixes cases where punctuation doesn't match on src/trg OR cases where there's an extra space in before the punctuation.

Examples converts these lines

In the name of Allah the Most Gracious, the Most Merciful	Au nom d'Allah le Tout Miséricordieux, le Très Miséricordieux,
In the Name of Allah, the Compassionate, the Merciful!	Au nom d'Allah, le tout Miséricordieux et très Miséricordieux
In the Holy name of Allah most gracious	A Nom d'Allah le Très Miséricordieux, le Tout Miséricordieux,
In the holy name of Allah most gracious,	Au nom D'Allah le Tout Miséricordieux, le Très Miséricordieux,
To those who are not aware of searchEstate;	À ceux qui ne se rendent pas compte du searchEstate ;

into:

In the name of Allah the Most Gracious, the Most Merciful,	Au nom d'Allah le Tout Miséricordieux, le Très Miséricordieux,
In the Name of Allah, the Compassionate, the Merciful!	Au nom d'Allah, le tout Miséricordieux et très Miséricordieux!
In the Holy name of Allah most gracious,	A Nom d'Allah le Très Miséricordieux, le Tout Miséricordieux,
In the holy name of Allah most gracious,	Au nom D'Allah le Tout Miséricordieux, le Très Miséricordieux,
To those who are not aware of searchEstate;	À ceux qui ne se rendent pas compte du searchEstate;

Copy link

@graemenail graemenail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some concerns about edge cases.

Comment on lines +28 to +29
elif trg[-1] in my_punct and src[-1] in my_punct and src[-1] != trg[-1] and src[-1] != '»' \
and src[-1] != '«' and trg[-1] != '»' and trg[-1] != '«' and src[0] != '–' and trg[0] != '–' and src[0] != '—' and trg[0] != '—':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting the special cases like and not(case1 or case2 or..) is much more readable.

What about the situation when we have end of quote." mapping to fin de devis».

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the situation when we have end of quote." mapping to fin de devis».
This is handled earlier in another edge case.

Comment on lines +13 to +15
if len(src) >= 2 and src[-1] in my_punct and src[-2] == " " and src[-1] != '»' and src[-1] != '«':
src = src[:-2] + src[-1]
if len(trg) >= 2 and trg[-1] in my_punct and trg[-2] == " " and trg[-1] != '»' and trg[-1] != '«':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are two spaces/end puncts? Such as weird quote ."

Copy link
Collaborator Author

@XapaJIaMnu XapaJIaMnu Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other filters should take care of this (normalising whitespace)

if len(src) >= 2 and src[-1] in my_punct and src[-2] == " " and src[-1] != '»' and src[-1] != '«':

@XapaJIaMnu
Copy link
Collaborator Author

@graemenail changes good now?

@XapaJIaMnu XapaJIaMnu merged commit 24014c4 into main Nov 16, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants