-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Supported Data Types #2206
base: main
Are you sure you want to change the base?
Conversation
np.datetime64('2025-01-01T00:00:00'), | ||
]) | ||
}), | ||
'np.timedelta64': pd.DataFrame({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a few more numpy dtypes:
import numpy as np
np.dtypes.Float16DType()
np.dtypes.Float32DType()
np.dtypes.Float64DType()
4b35bad
to
e3f337a
Compare
} | ||
|
||
PYARROW_DTYPES = { | ||
'pa.int8': pd.DataFrame({'pa.int8': pd.Series([1, -1, 127], dtype=pd.ArrowDtype(pa.int8()))}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be NaNs for the all the columns? I believe pyarrow supports that
- main | ||
|
||
jobs: | ||
build: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: can we use more specific names for the jobs? We can actually require certain jobs to pass before allowing merging, so it's helpful if the names are unique
on: | ||
push: | ||
branches: | ||
- main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we still running the tests every time without updating the sheet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the benchmark is only for message
in slack and updating the gdrive
.
Resolves #2200
CU-86b1xxa7d
This pull request introduces a benchmarking suite designed to test all supported data types for our synthesizers and validation processes. Key changes and additions include:
Benchmarking Integration: Added a new benchmarking framework to evaluate the functionality of all supported data types.
Private Spreadsheet Integration: The benchmarking results are compared against data read from a private spreadsheet. This spreadsheet contains the expected outcomes for each data type, ensuring that our tests remain accurate and relevant.
Automated Test Failures: If a data type is no longer supported due to recent changes, the test will automatically fail. This helps in catching unsupported data types and ensures that our system continues to function correctly with all valid data types.