Benchmark Supported Data Types #2206

pvk-developer · 2024-09-06T12:13:15Z

Resolves #2200
CU-86b1xxa7d

This pull request introduces a benchmarking suite designed to test all supported data types for our synthesizers and validation processes. Key changes and additions include:

Benchmarking Integration: Added a new benchmarking framework to evaluate the functionality of all supported data types.
Private Spreadsheet Integration: The benchmarking results are compared against data read from a private spreadsheet. This spreadsheet contains the expected outcomes for each data type, ensuring that our tests remain accurate and relevant.
Automated Test Failures: If a data type is no longer supported due to recent changes, the test will automatically fail. This helps in catching unsupported data types and ensures that our system continues to function correctly with all valid data types.

sdv-team · 2024-09-06T12:13:22Z

Task linked: CU-86b1xxa7d SDV - Supported data types benchmark #2200

tests/benchmark/supported_dtypes_benchmark.py

gsheni · 2024-09-06T16:55:18Z

tests/benchmark/supported_dtypes_benchmark.py

+            np.datetime64('2025-01-01T00:00:00'),
+        ])
+    }),
+    'np.timedelta64': pd.DataFrame({


Add a few more numpy dtypes:

import numpy as np np.dtypes.Float16DType() np.dtypes.Float32DType() np.dtypes.Float64DType()

tests/benchmark/supported_dtypes_benchmark.py

tests/benchmark/utils.py

gsheni · 2024-09-10T19:29:19Z

tests/benchmark/supported_dtypes_benchmark.py

+}
+
+PYARROW_DTYPES = {
+    'pa.int8': pd.DataFrame({'pa.int8': pd.Series([1, -1, 127], dtype=pd.ArrowDtype(pa.int8()))}),


Should there be NaNs for the all the columns? I believe pyarrow supports that

amontanez24 · 2024-09-16T17:23:24Z

.github/workflows/dtypes_benchmark.yml

+      - main
+
+jobs:
+  build:


minor: can we use more specific names for the jobs? We can actually require certain jobs to pass before allowing merging, so it's helpful if the names are unique

amontanez24 · 2024-09-16T17:58:37Z

.github/workflows/dtypes_benchmark.yml

+on:
+  push:
+    branches:
+      - main


are we still running the tests every time without updating the sheet?

Yes, the benchmark is only for message in slack and updating the gdrive.

tests/benchmark/supported_dtypes_benchmark.py

pvk-developer added 2 commits September 6, 2024 14:04

Add benchmarking for data types

6ae829d

Add dependency

c7f08b0

pvk-developer added 2 commits September 6, 2024 14:15

Fix lint

014a68f

Add to integration workflow

1934059

pvk-developer marked this pull request as ready for review September 6, 2024 12:19

pvk-developer requested a review from a team as a code owner September 6, 2024 12:19

pvk-developer requested review from rwedge and removed request for a team September 6, 2024 12:19

auto-assign bot assigned pvk-developer Sep 6, 2024

pvk-developer requested a review from amontanez24 September 6, 2024 12:19

Fix constraints evaluating int or float instances

633447e

gsheni requested changes Sep 6, 2024

View reviewed changes

pvk-developer added 3 commits September 9, 2024 18:58

Update benchmark workflow to ignore errors

ba10af8

Fix lint

1aff4d9

Fix lint from ruff

ecbc660

amontanez24 reviewed Sep 9, 2024

View reviewed changes

tests/benchmark/supported_dtypes_benchmark.py Outdated Show resolved Hide resolved

tests/benchmark/supported_dtypes_benchmark.py Show resolved Hide resolved

tests/benchmark/supported_dtypes_benchmark.py Show resolved Hide resolved

tests/benchmark/utils.py Outdated Show resolved Hide resolved

pvk-developer added 2 commits September 10, 2024 16:06

Split gdrive utils

e270e16

Fix lint

e3f337a

pvk-developer force-pushed the issue-2200-supported-data-types-benchmark branch from 4b35bad to e3f337a Compare September 10, 2024 14:25

pvk-developer requested review from amontanez24 and gsheni September 10, 2024 15:27

gsheni reviewed Sep 10, 2024

View reviewed changes

pvk-developer added 2 commits September 16, 2024 18:00

Split data, reorganize code base and add slack messaging

3e218b6

Fix: install the dependencies when running the report

1cdf943

amontanez24 reviewed Sep 16, 2024

View reviewed changes

pvk-developer added 4 commits September 17, 2024 15:53

Use args for utils

e94a4bf

Fix lint

dae1d47

Fix parseargs

c484891

Add credentials

76dc7ef

pvk-developer added 19 commits September 17, 2024 18:34

Improve slack message

23cfbb6

Fix workflow uploading spreadsheet

0db3da5

Fix message formatting

0e6ab21

Fix: Typo code

5c6e272

Add summary and improve slack messaging

c8fdecc

Rename method

b25a9ce

Rename method in utility.py

b53cb5a

Fix report generation

6f2ff18

Change artifact download order

4bafbda

Fix saving to csv / json. Improve messaging

f99e13e

Improve summary report

a6ae49e

Add excluded tests

7b3bbfa

Reduce workflows while testing

5059b7b

Add py38 failing combinations

e4cd822

Remove args

eb24dc3

Remove summary for now

0b637e4

Try channel with #

3269b8e

Do one slack message instead

de9b4ee

Remove response from comparing dfs

bd52d6e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Supported Data Types #2206

Benchmark Supported Data Types #2206

pvk-developer commented Sep 6, 2024

sdv-team commented Sep 6, 2024

gsheni Sep 6, 2024 •

edited

Loading

gsheni Sep 10, 2024

amontanez24 Sep 16, 2024

amontanez24 Sep 16, 2024

pvk-developer Sep 17, 2024

+                    - main
+              jobs:
+                build:

Benchmark Supported Data Types #2206

Are you sure you want to change the base?

Benchmark Supported Data Types #2206

Conversation

pvk-developer commented Sep 6, 2024

sdv-team commented Sep 6, 2024

gsheni Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

gsheni Sep 10, 2024

Choose a reason for hiding this comment

amontanez24 Sep 16, 2024

Choose a reason for hiding this comment

amontanez24 Sep 16, 2024

Choose a reason for hiding this comment

pvk-developer Sep 17, 2024

Choose a reason for hiding this comment

gsheni Sep 6, 2024 •

edited

Loading