Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add asdict() #109

Merged
merged 7 commits into from
Jun 4, 2024
Merged

feat: Add asdict() #109

merged 7 commits into from
Jun 4, 2024

Conversation

msto
Copy link
Contributor

@msto msto commented May 6, 2024

I'd like to add a function to convert a Metric instance to a dict. (In support of #107 )

I'm currently receiving the following mypy errors, which I suspect are due to a bug in how Metric is typed - as mypy appears to be inferring that Metric is a union of DataclassInstance and type[DataclassInstance], when it should only be the former.

fgpyo/util/metric.py:346: error: Argument 1 to "asdict" has incompatible type "DataclassInstance | type[DataclassInstance]"; expected "DataclassInstance"  [arg-type]
fgpyo/util/metric.py:347: error: Argument 1 to "has" has incompatible type "Metric[Any]"; expected "type"  [arg-type]
fgpyo/util/metric.py:348: error: Argument 1 to "asdict" has incompatible type "type[AttrsInstance]"; expected "AttrsInstance"  [arg-type]
fgpyo/util/metric.py:348: note: ClassVar protocol member AttrsInstance.__attrs_attrs__ can never be matched by a class object
Found 3 errors in 1 file (checked 46 source files)
Failed Type Checking: [mypy -p fgpyo --config /Users/msto/code/fulcrumgenomics/fgpyo/ci/mypy.ini]

NB: it may be valuable to permit formatting/casting the values as part of this function (e.g. by adding a parameter format: bool = False), but to do so we'll probably want to extract Metric.format_value() to a standalone function instead of a classmethod


Edit: copying my explanation of the updated solution from Slack:

For the curious - the issue was two-fold, and due to my own incorrect typing.

dataclasses.is_dataclass accepts either an instance or a class object, and has a TypeGuard to narrow the type of the argument to DataclassInstance | type[DataclassInstance]. I had to add a helper function to override this guard and narrow the type further to DataclassInstance.

Meanwhile, attr.has only accepts a class object. I was passing an instance, which was the source of one type error. Fixing this by calling attr.has(metric.__class__ was insufficient, because this did not narrow the type of the metric instance, so I added a similar helper for AttrsInstance.

@msto
Copy link
Contributor Author

msto commented May 6, 2024

Note - I've tried changing the type hint to metric: MetricType, as in Metric.write(), but get the same errors

@msto msto requested review from tfenne, TedBrookings and nh13 May 6, 2024 13:46
@TedBrookings
Copy link
Contributor

I'm sorry for the delay, today had a lot of non-work emergencies. This is my suggestion:

  1. Inside the metric class, directly below the existing values method, add asdict
    def values(self) -> Iterator[Any]:
        """An iterator over attribute values in the same order as the header."""
        for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
            yield getattr(self, field.name)

    def asdict(self) -> Dict[str, Any]:
        """A dictionary of attribute values in the same order as the header."""
        return {
            field.name: getattr(self, field.name)
            for field in inspect.get_fields(self.__class__)  # type: ignore[arg-type]
        }

You need to add the # type: ignore[arg-type] because we are insisting that all actual Metric subclasses will be attr.s or dataclasses. As far as I know there isn't any way to signal to mypy that this will be the case though. This task has really made me appreciate the advantage of a typing system that prioritizes traits over types.

  1. Add this test to test_metric.py, directly below the existing test_metric_values
@pytest.mark.parametrize("data_and_classes", (attr_data_and_classes, dataclasses_data_and_classes))
def test_metric_values(data_and_classes: DataBuilder) -> None:
    assert list(data_and_classes.Person(name="name", age=42).values()) == ["name", 42]


@pytest.mark.parametrize("data_and_classes", (attr_data_and_classes, dataclasses_data_and_classes))
def test_metric_asdict(data_and_classes: DataBuilder) -> None:
    assert data_and_classes.Person(name="name", age=42).asdict() == {"name": "name", "age": 42}

I created and pushed a branch that does this: tb-add-asdict. The tests pass.

@msto
Copy link
Contributor Author

msto commented May 7, 2024

Thanks!

I wanted to avoid another type: ignore.

I found a solution using TypeGuard that I would be satisfied with. I've implemented it within metric instead of inspect so we have access to Metric when type hinting the arguments

@msto msto changed the base branch from main to ms_dataclass-instance May 7, 2024 00:29
@msto msto force-pushed the ms_asdict branch 2 times, most recently from 035630d to 16d3f60 Compare May 7, 2024 00:41
Copy link
Contributor

@TedBrookings TedBrookings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of TypeGuard before, that's cool. I think that's probably the recipe to clean up a lot of the type ignore statements currently in inspect.

My only remaining suggestion is that I think asdict could be a member function of the Metric class, with all the statements just acting on self.

@msto
Copy link
Contributor Author

msto commented May 7, 2024

I had the same thought, but I'm of two minds.

I strongly prefer being consistent with established convention when possible, and both dataclasses and attr implement asdict() as a standalone function rather than an instance method.

However, packaging it as a method removes the need for an import (and possibly makes it more discoverable).

Curious what @nh13 @tfenne @clintval think

(NB: if we were to make asdict() a method, I would also make the is_*_instance() functions instance methods - and make them private)

@msto msto requested a review from clintval May 7, 2024 19:57
Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why this convenience function is desired. It is useful when you have a Metric and you don't know if it is an attrs-defined or dataclass-defined instance. In my experience, I usually know which flavor I'm dealing with and use the respective import accordingly:

attr.asdict(metric)
dataclasses.asdict(metric)

I lean towards letting the user import their specific "as dict" implementation for their use case over providing another import to do functionally the same thing but I won't let that opinion of mine block this PR! Here's another idea though, what about adding an __iter__(self) dunder method on the base Metric class so we can start doing dict(metric) instead, which uses a built-in? I'm also a bigger fan of as_dict() instead of asdict() if we're allowed to vote on function naming too!

Will this function be needed when we eventually remove attrs support? Should we consider not adding it to the public API because eventually all Metrics should be using @dataclass and can use the corresponding dataclasses.asdict built-in?

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/tests/test_metric.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@msto msto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful when you have a Metric and you don't know if it is an attrs-defined or dataclass-defined instance. In my experience, I usually know which flavor I'm dealing with and use the respective import accordingly:

Exactly, it's a convenience function to abstract the concern away.

IMO if we intend to support both attr.s and dataclass, then we should have an API that works with Metric (as a pseudo-alias for the union of AttrsInstance and DataclassInstance).

This is also intended to facilitate the MetricWriter in #107 and other such utilities which may not be able to assume which import to use.

what about adding an iter(self) dunder method on the base Metric class so we can start doing dict(metric) instead, which uses a built-in?

See below comment - dict() and asdict() do different things, and I think the latter implementation is preferable here.

I'm also a bigger fan of as_dict() instead of asdict() if we're allowed to vote on function naming too!

I agree that snakecasing is generally preferable, but I have a stronger preference for not conflicting with the established naming convention from dataclass and attr.s

Will this function be needed when we eventually remove attrs support? Should we consider not adding it to the public API because eventually all Metrics should be using @DataClass and can use the corresponding dataclasses.asdict built-in?

At that time we could simply import dataclasses.asdict into this module to avoid breakage? Or replace from fgpyo.util.metric import asdict with from dataclasses import asdict (which I consider another argument in favor of leaving the naming as is)

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/tests/test_metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Show resolved Hide resolved
Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I'm onboard.

My approval contingent on cleaning up the actually reachable-unreachable branches with a TypeError. Thanks Matt!


Returns:
A dictionary representation of the given metric.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlooks the format_value method on Metric that is used to format values when written to a file.

Copy link
Contributor Author

@msto msto May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nh13 the omission was deliberate. I left a comment on the topic in the PR description, but given how much conversation this one has attracted it was easy to overlook 🙂

NB: it may be valuable to permit formatting/casting the values as part of this function (e.g. by adding a parameter format: bool = False), but to do so we'll probably want to extract Metric.format_value() to a standalone function instead of a classmethod

I do not think we should have an asdict() function that changes the value types by default. This function is primarily intended as a dispatcher, selecting the correct (dataclasses or attr.s) function depending on how the Metric in question was decorated.

Ideally, this function will be deprecated once we drop support for attrs. As I mentioned in my comments above to Clint, when that happens it would be preferable to be able to transparently replace this with dataclasses.asdict .

At that time we could simply import dataclasses.asdict into this module to avoid breakage? Or replace from fgpyo.util.metric import asdict with from dataclasses import asdict

I am open to adding an argument to optionally support formatting (e.g. format_values: bool = False), with the stipulation that it should be False by default and the caveat that I expect it to add debt and increase friction when we deprecate attr.s.

However, I'd prefer to leave as is, and then call format_value on the resulting dict when necessary, e.g.

metric_dict: dict[str, str] = {key: format_value(val) for key, val in asdict(metric)}

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s omit it but document it clearly.

I do like that the format_value is a class method so that any new Metric type can perform custom formatting but overriding the class method. I think passing in a parsing function, like defopt, in rare situations causes a conflict when we want to format the type differently.

Copy link
Contributor Author

@msto msto Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of overriding the classmethod, but wouldn't consider doing so in practice. Given that it specifies formatting behavior for a wide variety of primitive and compound types, it doesn't seem to lend itself to easy extension or modification. (Since in order to override formatting for one type, you would have to override them all.)

Base automatically changed from ms_dataclass-instance to main May 23, 2024 18:29
Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.74%. Comparing base (a41a565) to head (27f6876).
Report is 8 commits behind head on main.

Current head 27f6876 differs from pull request most recent head ad13758

Please upload reports for the commit ad13758 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   88.53%   88.74%   +0.20%     
==========================================
  Files          16       16              
  Lines        1727     1750      +23     
  Branches      321      372      +51     
==========================================
+ Hits         1529     1553      +24     
+ Misses        132      131       -1     
  Partials       66       66              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msto msto changed the base branch from main to ms_metric-writer-feature-branch June 4, 2024 17:26
@msto msto merged commit 8cb2ee9 into ms_metric-writer-feature-branch Jun 4, 2024
6 checks passed
@msto msto deleted the ms_asdict branch June 4, 2024 17:26
@msto msto mentioned this pull request Jun 4, 2024
6 tasks
msto added a commit that referenced this pull request Jun 6, 2024
* feat: add asdict

* fix: typeguard

fix: typeguard import

doc: update docstring

refactor: import Instance types

refactor: import Instance types

* fix: 3.8 compatible Dict typing

* refactor: make instance checks private

* tests: add coverage

* fix: typeerror

* doc: clarify that asdict does not format values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants