feat: Add asdict() #109

msto · 2024-05-06T13:41:06Z

I'd like to add a function to convert a Metric instance to a dict. (In support of #107 )

I'm currently receiving the following mypy errors, which I suspect are due to a bug in how Metric is typed - as mypy appears to be inferring that Metric is a union of DataclassInstance and type[DataclassInstance], when it should only be the former.

fgpyo/util/metric.py:346: error: Argument 1 to "asdict" has incompatible type "DataclassInstance | type[DataclassInstance]"; expected "DataclassInstance"  [arg-type]
fgpyo/util/metric.py:347: error: Argument 1 to "has" has incompatible type "Metric[Any]"; expected "type"  [arg-type]
fgpyo/util/metric.py:348: error: Argument 1 to "asdict" has incompatible type "type[AttrsInstance]"; expected "AttrsInstance"  [arg-type]
fgpyo/util/metric.py:348: note: ClassVar protocol member AttrsInstance.__attrs_attrs__ can never be matched by a class object
Found 3 errors in 1 file (checked 46 source files)
Failed Type Checking: [mypy -p fgpyo --config /Users/msto/code/fulcrumgenomics/fgpyo/ci/mypy.ini]

NB: it may be valuable to permit formatting/casting the values as part of this function (e.g. by adding a parameter format: bool = False), but to do so we'll probably want to extract Metric.format_value() to a standalone function instead of a classmethod

Edit: copying my explanation of the updated solution from Slack:

For the curious - the issue was two-fold, and due to my own incorrect typing.

dataclasses.is_dataclass accepts either an instance or a class object, and has a TypeGuard to narrow the type of the argument to DataclassInstance | type[DataclassInstance]. I had to add a helper function to override this guard and narrow the type further to DataclassInstance.

Meanwhile, attr.has only accepts a class object. I was passing an instance, which was the source of one type error. Fixing this by calling attr.has(metric.__class__ was insufficient, because this did not narrow the type of the metric instance, so I added a similar helper for AttrsInstance.

msto · 2024-05-06T13:43:09Z

Note - I've tried changing the type hint to metric: MetricType, as in Metric.write(), but get the same errors

TedBrookings · 2024-05-06T22:59:59Z

I'm sorry for the delay, today had a lot of non-work emergencies. This is my suggestion:

Inside the metric class, directly below the existing values method, add asdict

    def values(self) -> Iterator[Any]:
        """An iterator over attribute values in the same order as the header."""
        for field in inspect.get_fields(self.__class__):  # type: ignore[arg-type]
            yield getattr(self, field.name)

    def asdict(self) -> Dict[str, Any]:
        """A dictionary of attribute values in the same order as the header."""
        return {
            field.name: getattr(self, field.name)
            for field in inspect.get_fields(self.__class__)  # type: ignore[arg-type]
        }

You need to add the # type: ignore[arg-type] because we are insisting that all actual Metric subclasses will be attr.s or dataclasses. As far as I know there isn't any way to signal to mypy that this will be the case though. This task has really made me appreciate the advantage of a typing system that prioritizes traits over types.

Add this test to test_metric.py, directly below the existing test_metric_values

@pytest.mark.parametrize("data_and_classes", (attr_data_and_classes, dataclasses_data_and_classes))
def test_metric_values(data_and_classes: DataBuilder) -> None:
    assert list(data_and_classes.Person(name="name", age=42).values()) == ["name", 42]


@pytest.mark.parametrize("data_and_classes", (attr_data_and_classes, dataclasses_data_and_classes))
def test_metric_asdict(data_and_classes: DataBuilder) -> None:
    assert data_and_classes.Person(name="name", age=42).asdict() == {"name": "name", "age": 42}

I created and pushed a branch that does this: tb-add-asdict. The tests pass.

msto · 2024-05-07T00:27:31Z

Thanks!

I wanted to avoid another type: ignore.

I found a solution using TypeGuard that I would be satisfied with. I've implemented it within metric instead of inspect so we have access to Metric when type hinting the arguments

TedBrookings

I've never heard of TypeGuard before, that's cool. I think that's probably the recipe to clean up a lot of the type ignore statements currently in inspect.

My only remaining suggestion is that I think asdict could be a member function of the Metric class, with all the statements just acting on self.

msto · 2024-05-07T19:53:58Z

I had the same thought, but I'm of two minds.

I strongly prefer being consistent with established convention when possible, and both dataclasses and attr implement asdict() as a standalone function rather than an instance method.

However, packaging it as a method removes the need for an import (and possibly makes it more discoverable).

Curious what @nh13 @tfenne @clintval think

(NB: if we were to make asdict() a method, I would also make the is_*_instance() functions instance methods - and make them private)

clintval

I see why this convenience function is desired. It is useful when you have a Metric and you don't know if it is an attrs-defined or dataclass-defined instance. In my experience, I usually know which flavor I'm dealing with and use the respective import accordingly:

attr.asdict(metric)
dataclasses.asdict(metric)

I lean towards letting the user import their specific "as dict" implementation for their use case over providing another import to do functionally the same thing but I won't let that opinion of mine block this PR! Here's another idea though, what about adding an __iter__(self) dunder method on the base Metric class so we can start doing dict(metric) instead, which uses a built-in? I'm also a bigger fan of as_dict() instead of asdict() if we're allowed to vote on function naming too!

Will this function be needed when we eventually remove attrs support? Should we consider not adding it to the public API because eventually all Metrics should be using @dataclass and can use the corresponding dataclasses.asdict built-in?

fgpyo/util/metric.py

fgpyo/util/tests/test_metric.py

msto

It is useful when you have a Metric and you don't know if it is an attrs-defined or dataclass-defined instance. In my experience, I usually know which flavor I'm dealing with and use the respective import accordingly:

Exactly, it's a convenience function to abstract the concern away.

IMO if we intend to support both attr.s and dataclass, then we should have an API that works with Metric (as a pseudo-alias for the union of AttrsInstance and DataclassInstance).

This is also intended to facilitate the MetricWriter in #107 and other such utilities which may not be able to assume which import to use.

what about adding an iter(self) dunder method on the base Metric class so we can start doing dict(metric) instead, which uses a built-in?

See below comment - dict() and asdict() do different things, and I think the latter implementation is preferable here.

I'm also a bigger fan of as_dict() instead of asdict() if we're allowed to vote on function naming too!

I agree that snakecasing is generally preferable, but I have a stronger preference for not conflicting with the established naming convention from dataclass and attr.s

Will this function be needed when we eventually remove attrs support? Should we consider not adding it to the public API because eventually all Metrics should be using @DataClass and can use the corresponding dataclasses.asdict built-in?

At that time we could simply import dataclasses.asdict into this module to avoid breakage? Or replace from fgpyo.util.metric import asdict with from dataclasses import asdict (which I consider another argument in favor of leaving the naming as is)

fgpyo/util/metric.py

fgpyo/util/tests/test_metric.py

fgpyo/util/metric.py

clintval

Make sense. I'm onboard.

My approval contingent on cleaning up the actually reachable-unreachable branches with a TypeError. Thanks Matt!

nh13 · 2024-05-20T22:46:45Z

fgpyo/util/metric.py

+
+    Returns:
+        A dictionary representation of the given metric.
+    """


This overlooks the format_value method on Metric that is used to format values when written to a file.

@nh13 the omission was deliberate. I left a comment on the topic in the PR description, but given how much conversation this one has attracted it was easy to overlook 🙂

NB: it may be valuable to permit formatting/casting the values as part of this function (e.g. by adding a parameter format: bool = False), but to do so we'll probably want to extract Metric.format_value() to a standalone function instead of a classmethod

I do not think we should have an asdict() function that changes the value types by default. This function is primarily intended as a dispatcher, selecting the correct (dataclasses or attr.s) function depending on how the Metric in question was decorated.

Ideally, this function will be deprecated once we drop support for attrs. As I mentioned in my comments above to Clint, when that happens it would be preferable to be able to transparently replace this with dataclasses.asdict .

At that time we could simply import dataclasses.asdict into this module to avoid breakage? Or replace from fgpyo.util.metric import asdict with from dataclasses import asdict

I am open to adding an argument to optionally support formatting (e.g. format_values: bool = False), with the stipulation that it should be False by default and the caveat that I expect it to add debt and increase friction when we deprecate attr.s.

However, I'd prefer to leave as is, and then call format_value on the resulting dict when necessary, e.g.

metric_dict: dict[str, str] = {key: format_value(val) for key, val in asdict(metric)}

Thoughts?

Let’s omit it but document it clearly.

I do like that the format_value is a class method so that any new Metric type can perform custom formatting but overriding the class method. I think passing in a parsing function, like defopt, in rare situations causes a conflict when we want to format the type differently.

I like the idea of overriding the classmethod, but wouldn't consider doing so in practice. Given that it specifies formatting behavior for a wide variety of primitive and compound types, it doesn't seem to lend itself to easy extension or modification. (Since in order to override formatting for one type, you would have to override them all.)

codecov · 2024-05-27T17:02:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.74%. Comparing base (a41a565) to head (27f6876).
Report is 8 commits behind head on main.

❗ Current head 27f6876 differs from pull request most recent head ad13758

Please upload reports for the commit ad13758 to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   88.53%   88.74%   +0.20%     
==========================================
  Files          16       16              
  Lines        1727     1750      +23     
  Branches      321      372      +51     
==========================================
+ Hits         1529     1553      +24     
+ Misses        132      131       -1     
  Partials       66       66

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fgpyo/util/metric.py

fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types

* feat: add asdict * fix: typeguard fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types * fix: 3.8 compatible Dict typing * refactor: make instance checks private * tests: add coverage * fix: typeerror * doc: clarify that asdict does not format values

msto requested review from tfenne, TedBrookings and nh13 May 6, 2024 13:46

msto force-pushed the ms_asdict branch from 9638f31 to f12c3fd Compare May 7, 2024 00:26

msto changed the base branch from main to ms_dataclass-instance May 7, 2024 00:29

msto force-pushed the ms_asdict branch 2 times, most recently from 035630d to 16d3f60 Compare May 7, 2024 00:41

TedBrookings approved these changes May 7, 2024

View reviewed changes

msto requested a review from clintval May 7, 2024 19:57

clintval reviewed May 8, 2024

View reviewed changes

fgpyo/util/metric.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Show resolved Hide resolved

fgpyo/util/metric.py Outdated Show resolved Hide resolved

fgpyo/util/tests/test_metric.py Outdated Show resolved Hide resolved

msto commented May 8, 2024

View reviewed changes

fgpyo/util/metric.py Outdated Show resolved Hide resolved

fgpyo/util/tests/test_metric.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Show resolved Hide resolved

clintval approved these changes May 8, 2024

View reviewed changes

nh13 requested changes May 20, 2024

View reviewed changes

Base automatically changed from ms_dataclass-instance to main May 23, 2024 18:29

msto mentioned this pull request May 27, 2024

feat: Add MetricWriter class #107

Open

nh13 reviewed May 28, 2024

View reviewed changes

fgpyo/util/metric.py Show resolved Hide resolved

msto added 7 commits June 4, 2024 13:20

feat: add asdict

7df4106

fix: typeguard

3d49776

fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types

fix: 3.8 compatible Dict typing

b3dcc7e

refactor: make instance checks private

7042cce

tests: add coverage

3aed50b

fix: typeerror

fc6d2ba

doc: clarify that asdict does not format values

ad13758

msto force-pushed the ms_asdict branch from 27f6876 to ad13758 Compare June 4, 2024 17:25

msto changed the base branch from main to ms_metric-writer-feature-branch June 4, 2024 17:26

msto merged commit 8cb2ee9 into ms_metric-writer-feature-branch Jun 4, 2024
6 checks passed

msto deleted the ms_asdict branch June 4, 2024 17:26

msto mentioned this pull request Jun 4, 2024

feat: MetricWriter #123

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add asdict() #109

feat: Add asdict() #109

msto commented May 6, 2024 •

edited

Loading

msto commented May 6, 2024

TedBrookings commented May 6, 2024

msto commented May 7, 2024 •

edited

Loading

TedBrookings left a comment

msto commented May 7, 2024 •

edited

Loading

clintval left a comment

msto left a comment

clintval left a comment

nh13 May 20, 2024

msto May 27, 2024 •

edited

Loading

nh13 May 28, 2024

msto Jun 4, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading

feat: Add asdict() #109

feat: Add asdict() #109

Conversation

msto commented May 6, 2024 • edited Loading

msto commented May 6, 2024

TedBrookings commented May 6, 2024

msto commented May 7, 2024 • edited Loading

TedBrookings left a comment

Choose a reason for hiding this comment

msto commented May 7, 2024 • edited Loading

clintval left a comment

Choose a reason for hiding this comment

msto left a comment

Choose a reason for hiding this comment

clintval left a comment

Choose a reason for hiding this comment

nh13 May 20, 2024

Choose a reason for hiding this comment

msto May 27, 2024 • edited Loading

Choose a reason for hiding this comment

nh13 May 28, 2024

Choose a reason for hiding this comment

msto Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented May 27, 2024 • edited Loading

Codecov Report

msto commented May 6, 2024 •

edited

Loading

msto commented May 7, 2024 •

edited

Loading

msto commented May 7, 2024 •

edited

Loading

msto May 27, 2024 •

edited

Loading

msto Jun 4, 2024 •

edited

Loading

codecov bot commented May 27, 2024 •

edited

Loading