-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: Add stemming #1
Comments
Agreed! Some words become bullshit only in combination but there are others that definitely should be stemmed, thanks for the idea! |
Could add a point value to words, or just put them in groups with the same bullshit level, and modify the bs value based on the proximity to other bullshit words i.e. with a threshold of 1, 'monetize' might have 1.2 and always be bullshit, but 'functionality' 0.8 so not bullshit but if 3 words away from 'empowerment', 0.8 then bullshit, 0.8+(0.8/3)=1.07. |
Lol, that's awesome idea. :) May be hard to implement though, and tough to assign/maintain the values. |
Yes, but the usual trick here is to come with the right weights. How do we know that "'monetize' might have 1.2" and no 1.875? On Jan 9, 2013, at 4:47 PM, Calvin Metcalf [email protected] wrote:
|
my bad, was thinking of solutions to the issue of words not bullshit by themselves |
The idea of weights is a good one, the only thing is that one needs a set of manually classified bullshit texts in order to get the values. But we can discuss it in another issue as @mourner mentioned. On Jan 9, 2013, at 4:54 PM, Calvin Metcalf [email protected] wrote:
|
I experemented with some of the available stemming libraries, neither porter stemmer nor Snowball.js are really at a level that is really usable here.. |
Reduce derived word to their stems (stemming) and afterwards match the stems only. It might be more computationally intensive, but the list should become easier to maintain and more bullshit could be discovered.
The text was updated successfully, but these errors were encountered: