Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable silent stemming #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Disable silent stemming #13

wants to merge 1 commit into from

Conversation

grusky
Copy link

@grusky grusky commented Dec 11, 2017

Up until this point, pyrouge has applied Porter stemming (-m) silently, unchangeably, and by default. There is no way to turn this off, even if the user provides an explicit set of ROUGE parameters they want to use. It's also unclear given the existing code structure that the -m parameter is being added, and it seems most users don't realize they are using stemming.

Applying stemming like this is a deviation from the default ROUGE parameters, and tends to fairly significantly inflate ROUGE scores across the board. For this reason, using stemming really needs to be an explicit choice the user makes, rather than being done by default.

My only change here is only deleting the -m when it's added separately in __add_config_option. This makes the set of default parameters clearer, forcing the choice to use stemming to be explicit. This may not be the right solution if some people use and cite pyrouge knowing fully that it stems by default, but my impression is that this is not the case.

@jantrienes
Copy link

jantrienes commented Aug 4, 2021

I agree with @grusky that it would be quite important to let the user remove the -m flag from the set of options, if needed. For example when evaluating languages other than English.

Given that pyrouge is used by quite a lot of projects (either directly or through wrappers like summ-eval), changing the default may not be the best option here.

We could remove the [-m] from this section of code:

pyrouge/pyrouge/Rouge155.py

Lines 584 to 585 in 08e9cc3

def __add_config_option(self, options):
return options + ['-m'] + [self._config_file]

And add it to the set of default options in this block:

pyrouge/pyrouge/Rouge155.py

Lines 515 to 531 in 08e9cc3

if self.args:
options = self.args.split()
elif rouge_args:
options = rouge_args.split()
else:
options = [
'-e', self._data_dir,
'-c', 95,
'-2',
'-1',
'-U',
'-r', 1000,
'-n', 4,
'-w', 1.2,
'-a',
]
options = list(map(str, options))

That way, the existing behavior of pyrouge keeps the same, unless a user had previously provided arguments and relied on the fact that -m is added automatically.

@bheinzerling is there any chance to get this change into pyrouge?

@bheinzerling
Copy link
Owner

return options + ['-m'] + [self._config_file]

That's of course horrible code and I don't know what 8-years-ago me was thinking. I also agree that moving the -m flag into the list of default options is the proper way to fix it. However, I'm weary of changing anything, since it's possible that there are users who rely on the current behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants