Skip to content

Commit

Permalink
Add environment checks (#233)
Browse files Browse the repository at this point in the history
* Clear existing sparsify source

* Add back version file

* Port of sparsify.auto from private repository (#124)

* remove javascript deps

* Initial port of autosparse to sparsify.auto

* Initial port autosparse -> sparsify.auto

* Added tests and fixes

* Add back yarn

* Add github workflow for test checks

* Update workflows

Co-authored-by: Benjamin Fineran <[email protected]>

workflow

* Add GHA tests for base, package, and auto (#133)

* `sparsify.package` base CLI (#125)

* bump up main to 1.2.0 (#128)

Co-authored-by: dhuang <[email protected]>

* Adds the following:

* Setup directory Structure
* `from sparsify import package` importable + callable function
* A constants file with supported tasks, criterions, and deployment scenarios (Should probably converted to `Enums` or something better than `python lists`)
* Add `click` as a required dependency
* Additional CLI helpers for updated acceptance criterion
* `sparsify.package` cli utility
* setup test directory
* Add tests for CLI
* Setup Entrypoints

* Remove old docstring

* - Moved utils outside `package`
- Renamed package_ to package
- Add more tests
- Update Usage command
- Rebased on `sparsify.alpha`
- Add typing
- Add version info to cli

Apply review comments from @corey-nm
- Remove `cli_helpers.py` and rely on `click`

* Remove unintended change added while resolving merge conflicts

* Style

* Add dataset registry
update cli to use dataset registry

* Fix failing tests

* Centralize task registry (#132)

* Centralize task name and alias handeling

* Propagate TaskName updates to auto tasks

* Fix click parse args call

* Fix failing tests after TASK name updates

* Prevent auto install of integrations on sparsify import (#134)

* * Change `NO_VNNI` --> `DEFAULT`
* Refactor CLI arg parsing cause originally `System.exit()` was thrown on invoking help
* Rename `scenario` --> `target`
* Remove single character shortcuts, as suggested by @bfineran
* Default directory to `None` for now, logic to choose an appropriate name will be added to diff #130
* Added show defaults at the top level `click.command()` decorator
* Added a `DEFAULT_OPTIMIZNG_METRIC`
* Added a `DEFAULT_DEPLOYMENT_SCENARIO`
* Changed `optimizing_metric` help message
* Updated Tests

* - Style
- Example Usage

Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuang <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>

* Add DDP support (#126)

* `sparsify.package` backend-call (#130)

* bump up main to 1.2.0 (#128)

Co-authored-by: dhuang <[email protected]>

* Adds the following:

* Setup directory Structure
* `from sparsify import package` importable + callable function
* A constants file with supported tasks, criterions, and deployment scenarios (Should probably converted to `Enums` or something better than `python lists`)
* Add `click` as a required dependency
* Additional CLI helpers for updated acceptance criterion
* `sparsify.package` cli utility
* setup test directory
* Add tests for CLI
* Setup Entrypoints

* Remove old docstring

* - Moved utils outside `package`
- Renamed package_ to package
- Add more tests
- Update Usage command
- Rebased on `sparsify.alpha`
- Add typing
- Add version info to cli

Apply review comments from @corey-nm
- Remove `cli_helpers.py` and rely on `click`

* Remove unintended change added while resolving merge conflicts

* Style

* Add dataset registry
update cli to use dataset registry

* Fix failing tests

* Centralize task registry (#132)

* Centralize task name and alias handeling

* Propagate TaskName updates to auto tasks

* Fix click parse args call

* Fix failing tests after TASK name updates

* Prevent auto install of integrations on sparsify import (#134)

* * Change `NO_VNNI` --> `DEFAULT`
* Refactor CLI arg parsing cause originally `System.exit()` was thrown on invoking help
* Rename `scenario` --> `target`
* Remove single character shortcuts, as suggested by @bfineran
* Default directory to `None` for now, logic to choose an appropriate name will be added to diff #130
* Added show defaults at the top level `click.command()` decorator
* Added a `DEFAULT_OPTIMIZNG_METRIC`
* Added a `DEFAULT_DEPLOYMENT_SCENARIO`
* Changed `optimizing_metric` help message
* Updated Tests

* - Style
- Example Usage

* Add proper commands + gha workflows

* Refactor package function to make a call to the backend service

* Add template function for output
Add importable Backend Base url
Remove unnecessary args from package function
Add end to end integration test

* Updated tests, addressed comments

* Base Cli + importable function

* Style

* Remove files added in faulty rebase

* Changed base url, styling

Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuang <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>
Co-authored-by: Konstantin <[email protected]>

* `sparsify.package` updates (#141)

* Update output to also print model metrics
Update `--optimizing_metrics` to take in a string containing comma separated metrics for example `--optimizing_metric "compression, accuracy"`(added a `_csv_callback` function for that)
Update Usage instructions accordingly
Add a log statement to package function
Added more tests

* Address comments

* Rename `normalized_metric` --> `metric_` to avoid potential confusion

* Add a getter for TASK_REGISTRY and DATASET_REGISTRY (#142)

* Add a getter for TASK_REGISTRY and DATASET_REGISTRY

* typing

* fix potential bug

* Add None to test

* Updated tests according to comments from @bfineran

* Make test cleaner based on feedback from @corey-nm

* Remove config creator (#136)

* [Auto] Add Tensorboard Support (#147)

* Support for Hyperparameter Tuning (#145)

* force convert yolov5 metric keys to float (#151)

* [Auto] Update function name and description to be more generic (#149)

* rename and flip logic for stopping_condition flag (#152)

* [Auto] Support for multi-stage tuning (#157)

* Support for updated tuning flow (#159)

* Support tuning of CLI args (#158)

* Support multiple optimizing metrics (#160)

* Log important updates with an easily visible format (#161)

* Update the user output for `sparsify.package` (#166)

* Add Dockerfile
Download deployment directory, and
Update instructions for user
Update tests

* Add volume mount to docker command

* [Auto] Update interface for sparsifyml (#173)

* Fix: remove debug line

* Update sparsify.auto interface for sparsifyml

* rename interface -> schemas

* Sparsify.alpha.auto (#179)

* Update: sparsify.version to match with main

* Delete: sparsify.package

* Empty commit

* Add: stitch functions

* Update: Env var name
Update: stitch functions slightly

* Add: Sparsifyml to dependencies in setup.py

* Style: Fixes

* Some more fixers

* OLD IC integration working

* Run Integration Tests only when sparsifyml installed

* Fix yolov5 integration

* Propagate student args to teacher

* Update teacher kwargs only when key not present for safety

* Updated: integration_test

* Updated: num trials to 2

* Fix: failing GHA

* make sparsifyml optional
implement own strtobool function

* [Create] alpha implementation (#181)

* [Create] alpha implementation

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: corey-nm <[email protected]>

---------

Co-authored-by: corey-nm <[email protected]>

* Adding one shot cli (#184)

* [Feature branch] standard clis (#187)

* Adding skeleton clis

* [CLI standardization] sparsify.run one-shot impl (#188)

* [CLI standardization] sparsify.run one-shot impl

* Fixing one-shot cli

---------

Co-authored-by: Corey Lowman <[email protected]>

* [WIP][CLI standardization] sparsify.run training-aware and spares-transfer initial impl (#189)

* [CLI standardization] sparsify.run one-shot impl

* [WIP][CLI standardization] sparsify.run training-aware and spares-transfer initial impl

* Fixing training-aware/sparse-transfer

---------

Co-authored-by: Corey Lowman <[email protected]>

* Adding docstring to sparsify.run

* Moving use case to top arg

* Removing apply/init

---------

Co-authored-by: Benjamin Fineran <[email protected]>

* Style changes for sparsify.alpha (#194)

* Update: Minimum supported Python Version to `3.7` as it's consistent with our other repos (#193)

* [Add] `sparsify.login` CLI and function (#180)

* Adding sparsify.login entrypoint and function

* Adding docstring to exception

* Adding pip install of sparsifyml

* Respond to review

* Adding help message at top

* Adding setup python to workflow

* Adding checked sparsifyml import

* Apply suggestions from code review

Co-authored-by: Danny Guinther <[email protected]>

* check against major minor version only

* add client_id and other bug fixes

* Fix: `--index` --> `--index-url`

* Update install command missed during rebase

* * Clean up code
* Remove Global variables
* Update PyPi Server link
* Add Logging
* Move exceptions to their own file

* Style fixes

* Apply suggestions from code review

Add: suggestion from @KSGulin

Co-authored-by: Konstantin Gulin <[email protected]>

* Update src/sparsify/login.py

Co-authored-by: Konstantin Gulin <[email protected]>

* remove comment

---------

Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: Danny Guinther <[email protected]>
Co-authored-by: Benjamin <[email protected]>
Co-authored-by: rahul-tuli <[email protected]>
Co-authored-by: Konstantin Gulin <[email protected]>

* training aware and sparse transfer run mode support (#191)

* add sparsifyml dependencies to sparsify install (#195)

* update task registry + generalize matching (#201)

* rename performance to optim-level in legacy auto api (#199)

* [sparsify.run one-shot] CLI propagation of recipe_args (#198)

* Remove hardware optimization options (#200)

* Remove hardware optimization options

* Rename instead of remove optim_level

* Add OPTIM_LEVEL back to all list

* simple fixes in initial one-shot testing flow (#206)

* fixes for initial E2E runs of sparse transfer and training aware (#207)

* fixes for initial E2E runs of sparse transfer and training aware

* quality

* [Alpha] Rework Auto main script into Training-Aware and Sparse-Transfer script (#208)

* Initial scratch work

* Complete, but untested implementation

* Working yolov5

* Working across all integrations

* IC path fix

* Require model

* Remove debug adds

* make API KEY an argument (#211)

* Update integration and unit tests (#214)

* Update integration and unit tests

* Update IC base test model

* Add login step to test setup (#216)

* bump up version to 1.6.0 (#215) (#218)

Co-authored-by: dhuang <[email protected]>

(cherry picked from commit 699a476)

Co-authored-by: dhuangnm <[email protected]>

* [BugFixes] Fix failing tests in `sparsify.alpha` (#223)

* Intermediate commit should be amended

* Remove failing test as synced with @KSGulin

* Explicitly pin protobuff depencies. (#225)

* Default num_samples to None (#227)

* remove legacy UI cmds from `make build` (#229)

* Remove dev print statements from IC runner (#231)

* Remove dev print statements

* Remove logger

* Fix incomplete wheel build (#232)

* Fix incomplete wheel build

* Add license string

* Add environment hecks

* Address review comments

* Catch generic Exception

* signal test

---------

Co-authored-by: Rahul Tuli <[email protected]>
Co-authored-by: dhuangnm <[email protected]>
Co-authored-by: dhuang <[email protected]>
Co-authored-by: Benjamin Fineran <[email protected]>
Co-authored-by: corey-nm <[email protected]>
Co-authored-by: Danny Guinther <[email protected]>
Co-authored-by: Benjamin <[email protected]>
  • Loading branch information
8 people committed Jul 7, 2023
1 parent fb685ec commit cd5a938
Show file tree
Hide file tree
Showing 7 changed files with 307 additions and 0 deletions.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ def _setup_entry_points() -> Dict:
"console_scripts": [
"sparsify.run=sparsify.cli.run:main",
"sparsify.login=sparsify.login:main",
"sparsify.check_environment=sparsify.check_environment.main:main",
]
}

Expand Down
20 changes: 20 additions & 0 deletions src/sparsify/check_environment/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# flake8: noqa
# isort: skip_file

from .gpu_device import *
from .ort_health import *
from .pathway_checks import *
39 changes: 39 additions & 0 deletions src/sparsify/check_environment/gpu_device.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import logging

import torch


_LOGGER = logging.getLogger(__name__)

__all__ = ["check_for_gpu"]


def check_for_gpu():
"""
Check for GPU and warn if not found
"""
_LOGGER.warning("Checking for GPU...")
if not torch.cuda.is_available():
_LOGGER.warn(
"*************************** NO GPU DETECTED ***************************\n"
"No GPU(s) detected on machine. The use of a GPU for training-aware "
"sparsification, sparse-transfer learning, and one-shot sparsification is "
"highly recommended.\n"
"************************************************************************"
)
else:
_LOGGER.warning("GPU check completed successfully")
26 changes: 26 additions & 0 deletions src/sparsify/check_environment/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from sparsify.check_environment import check_for_gpu, check_ort_health


def main():
"""
Check the environment for compatibility with the sparsifyml package
"""
check_for_gpu()
check_ort_health()


if __name__ == "__main__":
main()
179 changes: 179 additions & 0 deletions src/sparsify/check_environment/ort_health.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


import logging
import signal
from typing import List, Optional

import numpy
import torch
from onnx import TensorProto, helper

import onnxruntime as ort
from deepsparse.utils import generate_random_inputs, get_input_names
from sparsifyml.one_shot.utils import run_onnx_model


__all__ = ["check_ort_health"]

_LOGGER = logging.getLogger(__name__)


CUDA_HELP_STRING = (
"If you would like to run on GPU, please ensure that your CUDA and cuDNN "
"versions are compatible with the installed version of onnxruntime-gpu: "
"https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements" # noqa: E501
)


def _create_simple_conv_graph(
image_pixels_side: int = 32,
channel_count: int = 3,
batch_size: int = 1,
kernel_size: int = 3,
kernel_count: int = 10,
):
feature_size_side = image_pixels_side - kernel_size + 1

# The inputs and outputs
X = helper.make_tensor_value_info(
"X",
TensorProto.FLOAT,
[batch_size, channel_count, image_pixels_side, image_pixels_side],
)
Y = helper.make_tensor_value_info(
"Y",
TensorProto.FLOAT,
[batch_size, kernel_count, feature_size_side, feature_size_side],
)

# Create nodes for Conv, Relu, Flatten, and Gemm (Fully Connected) operations
conv_node = helper.make_node(
"Conv",
inputs=["X", "conv_weight", "conv_bias"],
outputs=["conv_result"],
kernel_shape=[kernel_size, kernel_size],
)

relu_node1 = helper.make_node(
"Relu",
inputs=["conv_result"],
outputs=["Y"],
)

# Define the weights for the Conv and Gemm layers
conv_weight = helper.make_tensor(
"conv_weight",
TensorProto.FLOAT,
[kernel_count, channel_count, kernel_size, kernel_size],
numpy.random.randn(kernel_count, channel_count, kernel_size, kernel_size),
)
conv_bias = helper.make_tensor(
"conv_bias", TensorProto.FLOAT, [kernel_count], numpy.random.randn(kernel_count)
)

# Create the graph (model)

graph_def = helper.make_graph(
[conv_node, relu_node1],
"SimpleCNN",
inputs=[X],
outputs=[Y],
initializer=[conv_weight, conv_bias],
)

return helper.make_model(graph_def, producer_name="onnx-example")


def check_ort_health(providers: Optional[List[str]] = None):
"""
Checks that the model can be executed with the set providers
:param model: model to check
:param providers: list of providers use for ORT execution
"""
_LOGGER.warning("Checking onnxruntime-gpu environment health...")

model = _create_simple_conv_graph()

providers = (
["CUDAExecutionProvider"]
if torch.cuda.is_available()
else ["CPUExecutionProvider"]
)

# If cuda device found by torch, ensure it's found by ORT as well
if ort.get_device() != "GPU" and "CUDAExecutionProvider" in providers:
raise RuntimeError(
"CUDA enabled device detected on your machine, but is not detected by "
"onnxruntime. If you would like to run on CPU, please set "
"CUDA_VISIBLE_DEVICES=-1. Note that this is likely to slow down model "
f"compression significantly. {CUDA_HELP_STRING}"
)

# Ensure that ORT can execute the model
random_input = {
input_name: input
for input_name, input in zip(
get_input_names(model), generate_random_inputs(model)
)
}

# Define a custom exception and signal handler
class _TerminationSignal(Exception):
pass

def handle_termination_signal(signum, frame):
raise _TerminationSignal("Termination signal received")

# Register the signal handler for SIGTERM and SIGINT signals
signal.signal(signal.SIGTERM, handle_termination_signal)
signal.signal(signal.SIGINT, handle_termination_signal)

try:
run_onnx_model(
model=model,
input_batch=random_input,
providers=providers,
)
except _TerminationSignal as ts:
print("Termination signal caught:", ts)
except Exception as e:
# If run fails, try again with CPU only to ensure this is a CUDA environment
# issue
if providers != ["CPUExecutionProvider"]:
try:
run_onnx_model(
model=model,
input_batch=random_input,
providers=["CPUExecutionProvider"],
)

raise RuntimeError(
"ONNXRuntime execution failed with CUDAExecutionProvider"
"but succeeded with CPUExecutionProvider. This is indicative"
f"of a likely issue with nnxruntime-gpu install {CUDA_HELP_STRING}"
) from e

except RuntimeError:
pass

raise RuntimeError(
"ONNXRuntime execution failed with both CUDAExecutionProvider and "
"CPUExecutionProvider. Ensure that onnxruntime-gpu and its dependencies "
"are properly installed."
) from e

_LOGGER.warning("onnxruntime-gpu environment check completed successfully")
35 changes: 35 additions & 0 deletions src/sparsify/check_environment/pathway_checks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from sparsify.check_environment import check_for_gpu, check_ort_health


__all__ = ["one_shot_checks", "auto_checks"]


def one_shot_checks():
"""
Check environment for compatibility with one-shot sparsification
"""
check_for_gpu()
check_ort_health()


def auto_checks():
"""
Check environment for compatibility with training-aware sparsification and
sparse-transfer learning
"""
check_for_gpu()
7 changes: 7 additions & 0 deletions src/sparsify/cli/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

import click
from sparsezoo import Model
from sparsify.check_environment import auto_checks, one_shot_checks
from sparsify.cli import opts


Expand Down Expand Up @@ -45,6 +46,8 @@ def one_shot(**kwargs):
# raises exception if sparsifyml not installed
from sparsify.one_shot import one_shot

one_shot_checks()

recipe_args = kwargs.get("recipe_args")
if isinstance(recipe_args, str):
recipe_args = json.loads(recipe_args)
Expand Down Expand Up @@ -75,6 +78,8 @@ def sparse_transfer(**kwargs):
"""
from sparsify import auto

auto_checks()

# recipe arg should be a sparse transfer recipe
auto.main(_parse_run_args_to_auto(sparse_transfer=True, **kwargs))

Expand All @@ -92,6 +97,8 @@ def training_aware(**kwargs):
"""
from sparsify import auto

auto_checks()

# recipe arg should be a training aware recipe
auto.main(_parse_run_args_to_auto(sparse_transfer=False, **kwargs))

Expand Down

0 comments on commit cd5a938

Please sign in to comment.