Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

230 work on adding basic plots #257

Open
wants to merge 6 commits into
base: v0.4-beta
Choose a base branch
from

Conversation

samland1116
Copy link

  • Adds basic timeseries plotting functionality through a new class modeled off Sam Lamont's v0.3-beta example
  • Adds example jupyter notebook to test plotting functionality

…g primary_timeseries with no provided arguments (currently handles one variable only). Created dataframe_accessor_v0_4.py to eventually store functions in new class.
…otting example in '07-2_site_plotting.ipynb'. Removes scratch work in '06-2_site_query.ipynb'.
@samland1116 samland1116 added the enhancement New feature or request label Sep 19, 2024
@samland1116 samland1116 added this to the v0.4 Release milestone Sep 19, 2024
@samland1116 samland1116 linked an issue Sep 19, 2024 that may be closed by this pull request
2 tasks
Copy link
Collaborator

@samlamont samlamont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Here are couple more general comments:

  • We tend to make use of Flake8 in VSCode (an extension) to help enforce coding style like line length, spacing, etc. It can be annoying at first but it very helpful to keep our code consistent.
  • We try to add tests as much as possible to ensure the functions are working as expected, usually starting with a test to help with development. There's an example for the previous accessor in tests/v0_3/test_duckdb_dataframe_accessors.py
  • Also we're trying to make an effort to include logging as much as possible, using info for "normal" priority messages, and debug for lower priority (things with in a loop for example)
  • Sorry to overload you with comments! Happy to discuss more

as well as timeseries. This requires more validation in each method to
ensure the DataFrame has the appropriate data.

Methods operating on metrics data should start with 'metrics_' and methods
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this was not intended to be a hard-and-fast rule, just pointing out that it could be useful to use some naming pattern for the accessor methods since we'll likely have methods for plotting both timeseries and metrics (potentially more?) in the same accessor. Maybe we could split out the methods into separate modules that get imported in the class just to keep it from becoming too huge.

"from teehr import Evaluation\n",
"from pathlib import Path\n",
"\n",
"from teehr.visualization.dataframe_accessor import TEEHRDataFrameAccessor\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be imported in the Evaluation class perhaps, so that the users doesn't need to manually import it. That way the teehr accessor could be available on the dataframe that is returned to the user from a query. So they could just do df.teehr.plot_xxxxx(...)

return

def timeseries_plot(self):
"""Calls workflow to generate plot(s) based on number of unique values in self._df['variable_name'] column."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detailed docstrings are important, especially for methods that are exposed to the user. These are viewable in your IDE (vscode, pycharm) and can be helpful to developers. We typically use the Numpy style docstrings to describe the method functionality, it's arguments and types, and what it returns. Examples can also be helpful and get parsed by the documentation creator (sphinx), like https://rtiinternational.github.io/teehr/autoapi/teehr/queries/duckdb/index.html#teehr.queries.duckdb.get_timeseries

palette_count = 0

p = figure(title="Click legend entry to toggle display of timeseries",
y_axis_label="{} [{}]".format(variable,unique_units[0]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer to use f-strings over .format for consistency


return

def timeseries_plot(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could more options be available to the user on what to plot and how to plot it? It could be useful for the user to be able to further filter the dataframe for specific time range, or location, configuration, etc. Maybe could make use of DataFrame.filter for example. Just a thought. Also some mechanism for controlling the plot options could be helpful, probably beyond scope of this PR though :)

self._validate(pandas_obj)

@staticmethod
def _validate(obj):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main plotting methods should have some validation checking if the df contains the fields the user is asking for, and returning a message if not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Work on adding basic plots
2 participants