Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog2.0]: Protocol abstraction for DataCatalog #4160

Merged
merged 107 commits into from
Sep 17, 2024

Conversation

ElenaKhaustova
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova commented Sep 12, 2024

Description

Solves #4138

This introduces the Protocol abstraction for the current DataCatalog, which will be used to add new catalog implementations. Currently, it fully relies on the existing DataCatalog implementation to avoid breaking changes.

Development notes

Protocol class was excluded from test coverage since its method placeholders were marked as untested, see the related issue: nedbat/coveragepy#1616

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova ElenaKhaustova changed the title Protocol abstraction for DataCatalog [DataCatalog2.0]: Protocol abstraction for DataCatalog Sep 13, 2024
Signed-off-by: Elena Khaustova <[email protected]>
@datajoely
Copy link
Contributor

HUGE!

Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So neat and self-contained and best of all - non-breaking! If all references to DataCatalog are replaced with CatalogProtocol, I think it looks ready to merge for me.

kedro/runner/runner.py Outdated Show resolved Hide resolved
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original understanding this is going to be a separate Catalog, but I see this PR has a lot of changes to existing API's typehint so I am slightly confused.

image

Does type hint suppose to work here? As it seems like it doesn't understand the catalog at all as all the IDE features seems broken.

@@ -178,13 +178,13 @@ class KedroContext:
)

@property
def catalog(self) -> DataCatalog:
"""Read-only property referring to Kedro's ``DataCatalog`` for this context.
def catalog(self) -> CatalogProtocol:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this how type hint supposed to be done with Protocol? I roughly understand how Protocol works like traits/interface but I haven't seen much in a real codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, cause this is type in the end, see examples here: https://peps.python.org/pep-0544/#protocol-members

Signed-off-by: Elena Khaustova <[email protected]>
Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova
Copy link
Contributor Author

ElenaKhaustova commented Sep 13, 2024

@noklam

Hmm, it seems working well in PyCharm and mypy does not complain as well.
Screenshot 2024-09-13 at 17 21 53

The new catalog is added in the following PR. But first we need this to fit two implementations together. Here is a context regarding this decision: #4151 (comment)

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I find CatalogProtocol slightly weird, I don't know what's the convention is for Python, I found a list here: python/typeshed#4174

@noklam noklam self-requested a review September 13, 2024 23:56
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved as non-blocking commenet

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good, but I left a comment about the methods added here that will be removed in the future.

kedro/io/core.py Show resolved Hide resolved
kedro/io/core.py Outdated Show resolved Hide resolved
kedro/io/core.py Show resolved Hide resolved
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add this to the release notes ✍️

Otherwise, looks all good to me!

Signed-off-by: Elena Khaustova <[email protected]>
@ElenaKhaustova ElenaKhaustova enabled auto-merge (squash) September 17, 2024 14:48
@ElenaKhaustova ElenaKhaustova merged commit 6bf29f9 into main Sep 17, 2024
41 checks passed
@ElenaKhaustova ElenaKhaustova deleted the 4138-catalog-protocol branch September 17, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants