Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add README section on advanced usage via classes #113

Merged
merged 2 commits into from
Apr 16, 2024

Conversation

osma
Copy link
Contributor

@osma osma commented Aug 11, 2023

As discussed in #110 (comment), this PR adds a section to the top level README with examples of advanced usage via the Simplemma classes. I used the cache limiting use case I had in mind for the example, but I tried to explains it as a pattern that can be applied also for other customization requirements. Any comments are welcome, I'm happy to adjust as necessary.

While working on this, I discovered some problems that I reported as separate issues #111 and #112.

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.12%. Comparing base (fa1d964) to head (5c5a34c).
Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #113       +/-   ##
===========================================
- Coverage   96.62%   81.12%   -15.50%     
===========================================
  Files          33       35        +2     
  Lines         651      779      +128     
===========================================
+ Hits          629      632        +3     
- Misses         22      147      +125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@osma
Copy link
Contributor Author

osma commented Aug 11, 2023

One aspect of the class API I'm wondering about is the difference between Lemmatizer and LanguageDetector in that Lemmatizer is language-agnostic (the same instance can be used with many languages/combinations, just passing a different lang argument to the lemmatize() method) while LanguageDetector is given a language (or tuple of languages) when it's constructed, so the same instance cannot be reused if you happen to need a different set of languages.

Neither way is wrong, but it seems like these could perhaps be harmonized - either by making Lemmatizer language-specific, or by making LanguageDetector language-agnostic. Maybe @juanjoDiaz has an explanation for the current situation and whether it makes sense to keep it as it is or to try to unify these.

@adbar
Copy link
Owner

adbar commented Aug 11, 2023

Thanks for the added docs and good point above, you could actually open an issue regarding the harmonization of Lemmatizer and LanguageDetector. It's not the priority now though, so we can add a corresponding sentence in the docs if you feel users might fail to understand the current difference.

@adbar adbar merged commit 8f66a43 into adbar:main Apr 16, 2024
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants