Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support reading Picard-style metrics files with comments above the header #103

Open
wants to merge 4 commits into
base: ms_metric-writer-feature-branch
Choose a base branch
from

Conversation

msto
Copy link
Contributor

@msto msto commented Mar 15, 2024

It would be helpful if Metric.read could parse metrics files with comments above the header, such as those produced by Picard. I've updated Metric.read to use the read_header() method introduced in #124 , and also simplified the typing on io.to_reader() and io.to_writer() (per Clint's suggestion)


note I want to keep this separate from the MetricWriter feature branch, but it's dependent on the Metric._read_header method introduced in that branch.

I am waiting to merge until #123 is merged.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.07%. Comparing base (032388d) to head (6b094cc).

Current head 6b094cc differs from pull request most recent head c8a083c

Please upload reports for the commit c8a083c to get more accurate results.

Additional details and impacted files
@@                         Coverage Diff                         @@
##           ms_metric-writer-feature-branch     #103      +/-   ##
===================================================================
+ Coverage                            88.56%   94.07%   +5.50%     
===================================================================
  Files                                   16       33      +17     
  Lines                                 1775     3441    +1666     
  Branches                               378      706     +328     
===================================================================
+ Hits                                  1572     3237    +1665     
- Misses                                 134      135       +1     
  Partials                                69       69              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@msto msto force-pushed the ms_read-picard-metrics branch 2 times, most recently from aad1b76 to 2ac66c4 Compare March 15, 2024 16:08
fgpyo/io/__init__.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
@msto msto marked this pull request as ready for review March 15, 2024 16:25
Copy link
Contributor

@TedBrookings TedBrookings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think you should wait to merge until Nils signs off. My general sense is that Metric at least originated as an internal tool for Fulcrum code, not a general-purpose TSV <-> data utility. It would then read and write in a way that Fulcrum approved of. Header comments and CSV instead of TSV were deliberately excluded. I personally don't see the harm in relaxing the allowed inputs while retaining strict outputs.

@msto
Copy link
Contributor Author

msto commented Mar 15, 2024

That's fair.

I would be open to refactoring so .read() remains specific to Metric-formatted files. In that case I'd suggest providing a .read_picard() method or similar, which would filter comment lines before calling .read() under the hood.

I do think it would be generally valuable to have built-in support for reading Picard metrics into a Metric, and it would be very specifically helpful for one of my clients.

@msto
Copy link
Contributor Author

msto commented Mar 15, 2024

@nh13 thoughts?

@nh13
Copy link
Member

nh13 commented Mar 15, 2024

@msto I'd like to review this one before it gets merged

Copy link
Member

@clintval clintval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent on the PR because for Picard files I use this library:

It's not typed but it's been good enough for me since it doesn't force a co-install of things like pysam.

fgpyo/io/__init__.py Outdated Show resolved Hide resolved
@msto msto changed the title feat: Support reading metrics files with comments above the header feat: Support reading Picard-style metrics files with comments above the header Apr 17, 2024
@msto msto mentioned this pull request Apr 17, 2024
6 tasks
Copy link
Member

@tfenne tfenne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reasonably minor suggestions.

fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments from me, but a few others to fixup from others.

fgpyo/io/__init__.py Outdated Show resolved Hide resolved
fgpyo/util/metric.py Outdated Show resolved Hide resolved
@msto msto marked this pull request as draft May 27, 2024 16:47
* feat: add asdict

* fix: typeguard

fix: typeguard import

doc: update docstring

refactor: import Instance types

refactor: import Instance types

* fix: 3.8 compatible Dict typing

* refactor: make instance checks private

* tests: add coverage

* fix: typeerror

* doc: clarify that asdict does not format values
@msto msto mentioned this pull request Jun 4, 2024
@msto msto changed the base branch from main to ms_get-header June 4, 2024 20:13
@msto msto force-pushed the ms_read-picard-metrics branch 2 times, most recently from f99f839 to 066cf19 Compare June 4, 2024 20:15
@msto msto marked this pull request as ready for review June 4, 2024 23:35
@msto msto requested review from nh13 and tfenne June 4, 2024 23:35
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, take it or leave the additional comments/suggestions.

cls,
path: Path,
ignore_extra_fields: bool = True,
comment_prefix: str = "#",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this backwards incompatible? Should we bump the minor version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, it isn't.

One possible solution for backwards compatibility - I could update Metric._read_header to take an Optional comment prefix, and return the first line if the prefix is None (i.e. no skipping of prefixed or blank lines)

Is that something you'd like to see, or should I leave as is and bump the version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it to the Optional[str] = None, but I don't have that strong of a feeling.

I'd also ask if comment_prefix is the name we want to go with for this parameter. @tfenne is a good judge of naming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might suggest comment_symbol?

Also i agree with @nh13 that ideally the method gets expanded to include both:

comment_symbol: Optional[str] = None,
skip_blank_lines: bool = False

I might even argue for skip_blank_lines to even be a regex that defines lines to skip ... so e.g. I could use skip_lines_matching = "$\s*^" to skip stupid lines that are all tabs/spaces but no content!

fgpyo/util/metric.py Show resolved Hide resolved
* feat: add read_header

* fix: rm kw_only to support 3.9

* fix: use Typing.List to support 3.8

* fix: pr suggestions
Base automatically changed from ms_get-header to ms_metric-writer-feature-branch June 6, 2024 00:11
chore: type hint

fix: method name
@msto
Copy link
Contributor Author

msto commented Jun 6, 2024

note I want to keep this separate from the MetricWriter feature branch, but it's dependent on the Metric._read_header method introduced in that branch.

I am waiting to merge until #123 is merged.

@msto msto force-pushed the ms_metric-writer-feature-branch branch from 032388d to 78f2246 Compare June 6, 2024 14:46
@msto msto assigned msto and unassigned nh13 and tfenne Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants