-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
230 work on adding basic plots #257
base: v0.4-beta
Are you sure you want to change the base?
Conversation
samland1116
commented
Sep 19, 2024
- Adds basic timeseries plotting functionality through a new class modeled off Sam Lamont's v0.3-beta example
- Adds example jupyter notebook to test plotting functionality
…g primary_timeseries with no provided arguments (currently handles one variable only). Created dataframe_accessor_v0_4.py to eventually store functions in new class.
…r x unique variable names to example 6.
…otting example in '07-2_site_plotting.ipynb'. Removes scratch work in '06-2_site_query.ipynb'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Here are couple more general comments:
- We tend to make use of Flake8 in VSCode (an extension) to help enforce coding style like line length, spacing, etc. It can be annoying at first but it very helpful to keep our code consistent.
- We try to add tests as much as possible to ensure the functions are working as expected, usually starting with a test to help with development. There's an example for the previous accessor in
tests/v0_3/test_duckdb_dataframe_accessors.py
- Also we're trying to make an effort to include logging as much as possible, using
info
for "normal" priority messages, anddebug
for lower priority (things with in a loop for example) - Sorry to overload you with comments! Happy to discuss more
as well as timeseries. This requires more validation in each method to | ||
ensure the DataFrame has the appropriate data. | ||
|
||
Methods operating on metrics data should start with 'metrics_' and methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have mentioned this was not intended to be a hard-and-fast rule, just pointing out that it could be useful to use some naming pattern for the accessor methods since we'll likely have methods for plotting both timeseries and metrics (potentially more?) in the same accessor. Maybe we could split out the methods into separate modules that get imported in the class just to keep it from becoming too huge.
"from teehr import Evaluation\n", | ||
"from pathlib import Path\n", | ||
"\n", | ||
"from teehr.visualization.dataframe_accessor import TEEHRDataFrameAccessor\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be imported in the Evaluation class perhaps, so that the users doesn't need to manually import it. That way the teehr
accessor could be available on the dataframe that is returned to the user from a query. So they could just do df.teehr.plot_xxxxx(...)
return | ||
|
||
def timeseries_plot(self): | ||
"""Calls workflow to generate plot(s) based on number of unique values in self._df['variable_name'] column.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More detailed docstrings are important, especially for methods that are exposed to the user. These are viewable in your IDE (vscode, pycharm) and can be helpful to developers. We typically use the Numpy style docstrings to describe the method functionality, it's arguments and types, and what it returns. Examples can also be helpful and get parsed by the documentation creator (sphinx), like https://rtiinternational.github.io/teehr/autoapi/teehr/queries/duckdb/index.html#teehr.queries.duckdb.get_timeseries
palette_count = 0 | ||
|
||
p = figure(title="Click legend entry to toggle display of timeseries", | ||
y_axis_label="{} [{}]".format(variable,unique_units[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally prefer to use f-strings over .format
for consistency
|
||
return | ||
|
||
def timeseries_plot(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could more options be available to the user on what to plot and how to plot it? It could be useful for the user to be able to further filter the dataframe for specific time range, or location, configuration, etc. Maybe could make use of DataFrame.filter for example. Just a thought. Also some mechanism for controlling the plot options could be helpful, probably beyond scope of this PR though :)
self._validate(pandas_obj) | ||
|
||
@staticmethod | ||
def _validate(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main plotting methods should have some validation checking if the df
contains the fields the user is asking for, and returning a message if not