Add README section on advanced usage via classes #113

osma · 2023-08-11T07:42:13Z

As discussed in #110 (comment), this PR adds a section to the top level README with examples of advanced usage via the Simplemma classes. I used the cache limiting use case I had in mind for the example, but I tried to explains it as a pattern that can be applied also for other customization requirements. Any comments are welcome, I'm happy to adjust as necessary.

While working on this, I discovered some problems that I reported as separate issues #111 and #112.

codecov · 2023-08-11T07:43:30Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.12%. Comparing base (fa1d964) to head (5c5a34c).
Report is 5 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #113       +/-   ##
===========================================
- Coverage   96.62%   81.12%   -15.50%     
===========================================
  Files          33       35        +2     
  Lines         651      779      +128     
===========================================
+ Hits          629      632        +3     
- Misses         22      147      +125

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

osma · 2023-08-11T08:38:39Z

One aspect of the class API I'm wondering about is the difference between Lemmatizer and LanguageDetector in that Lemmatizer is language-agnostic (the same instance can be used with many languages/combinations, just passing a different lang argument to the lemmatize() method) while LanguageDetector is given a language (or tuple of languages) when it's constructed, so the same instance cannot be reused if you happen to need a different set of languages.

Neither way is wrong, but it seems like these could perhaps be harmonized - either by making Lemmatizer language-specific, or by making LanguageDetector language-agnostic. Maybe @juanjoDiaz has an explanation for the current situation and whether it makes sense to keep it as it is or to try to unify these.

adbar · 2023-08-11T11:15:05Z

Thanks for the added docs and good point above, you could actually open an issue regarding the harmonization of Lemmatizer and LanguageDetector. It's not the priority now though, so we can add a corresponding sentence in the docs if you feel users might fail to understand the current difference.

Add README section on advanced usage via classes

5071464

osma mentioned this pull request Aug 11, 2023

Plans for simplemma 1.0 release? #110

Closed

Update README.rst

5c5a34c

adbar merged commit 8f66a43 into adbar:main Apr 16, 2024
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add README section on advanced usage via classes #113

Add README section on advanced usage via classes #113

osma commented Aug 11, 2023

codecov bot commented Aug 11, 2023 •

edited

Loading

osma commented Aug 11, 2023

adbar commented Aug 11, 2023

Add README section on advanced usage via classes #113

Add README section on advanced usage via classes #113

Conversation

osma commented Aug 11, 2023

codecov bot commented Aug 11, 2023 • edited Loading

Codecov Report

osma commented Aug 11, 2023

adbar commented Aug 11, 2023

codecov bot commented Aug 11, 2023 •

edited

Loading