Add environment checks (#233)

* Clear existing sparsify source * Add back version file * Port of sparsify.auto from private repository (#124) * remove javascript deps * Initial port of autosparse to sparsify.auto * Initial port autosparse -> sparsify.auto * Added tests and fixes * Add back yarn * Add github workflow for test checks * Update workflows Co-authored-by: Benjamin Fineran <[email protected]> workflow * Add GHA tests for base, package, and auto (#133) * `sparsify.package` base CLI (#125) * bump up main to 1.2.0 (#128) Co-authored-by: dhuang <[email protected]> * Adds the following: * Setup directory Structure * `from sparsify import package` importable + callable function * A constants file with supported tasks, criterions, and deployment scenarios (Should probably converted to `Enums` or something better than `python lists`) * Add `click` as a required dependency * Additional CLI helpers for updated acceptance criterion * `sparsify.package` cli utility * setup test directory * Add tests for CLI * Setup Entrypoints * Remove old docstring * - Moved utils outside `package` - Renamed package_ to package - Add more tests - Update Usage command - Rebased on `sparsify.alpha` - Add typing - Add version info to cli Apply review comments from @corey-nm - Remove `cli_helpers.py` and rely on `click` * Remove unintended change added while resolving merge conflicts * Style * Add dataset registry update cli to use dataset registry * Fix failing tests * Centralize task registry (#132) * Centralize task name and alias handeling * Propagate TaskName updates to auto tasks * Fix click parse args call * Fix failing tests after TASK name updates * Prevent auto install of integrations on sparsify import (#134) * * Change `NO_VNNI` --> `DEFAULT` * Refactor CLI arg parsing cause originally `System.exit()` was thrown on invoking help * Rename `scenario` --> `target` * Remove single character shortcuts, as suggested by @bfineran * Default directory to `None` for now, logic to choose an appropriate name will be added to diff #130 * Added show defaults at the top level `click.command()` decorator * Added a `DEFAULT_OPTIMIZNG_METRIC` * Added a `DEFAULT_DEPLOYMENT_SCENARIO` * Changed `optimizing_metric` help message * Updated Tests * - Style - Example Usage Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuang <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> * Add DDP support (#126) * `sparsify.package` backend-call (#130) * bump up main to 1.2.0 (#128) Co-authored-by: dhuang <[email protected]> * Adds the following: * Setup directory Structure * `from sparsify import package` importable + callable function * A constants file with supported tasks, criterions, and deployment scenarios (Should probably converted to `Enums` or something better than `python lists`) * Add `click` as a required dependency * Additional CLI helpers for updated acceptance criterion * `sparsify.package` cli utility * setup test directory * Add tests for CLI * Setup Entrypoints * Remove old docstring * - Moved utils outside `package` - Renamed package_ to package - Add more tests - Update Usage command - Rebased on `sparsify.alpha` - Add typing - Add version info to cli Apply review comments from @corey-nm - Remove `cli_helpers.py` and rely on `click` * Remove unintended change added while resolving merge conflicts * Style * Add dataset registry update cli to use dataset registry * Fix failing tests * Centralize task registry (#132) * Centralize task name and alias handeling * Propagate TaskName updates to auto tasks * Fix click parse args call * Fix failing tests after TASK name updates * Prevent auto install of integrations on sparsify import (#134) * * Change `NO_VNNI` --> `DEFAULT` * Refactor CLI arg parsing cause originally `System.exit()` was thrown on invoking help * Rename `scenario` --> `target` * Remove single character shortcuts, as suggested by @bfineran * Default directory to `None` for now, logic to choose an appropriate name will be added to diff #130 * Added show defaults at the top level `click.command()` decorator * Added a `DEFAULT_OPTIMIZNG_METRIC` * Added a `DEFAULT_DEPLOYMENT_SCENARIO` * Changed `optimizing_metric` help message * Updated Tests * - Style - Example Usage * Add proper commands + gha workflows * Refactor package function to make a call to the backend service * Add template function for output Add importable Backend Base url Remove unnecessary args from package function Add end to end integration test * Updated tests, addressed comments * Base Cli + importable function * Style * Remove files added in faulty rebase * Changed base url, styling Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuang <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> Co-authored-by: Konstantin <[email protected]> * `sparsify.package` updates (#141) * Update output to also print model metrics Update `--optimizing_metrics` to take in a string containing comma separated metrics for example `--optimizing_metric "compression, accuracy"`(added a `_csv_callback` function for that) Update Usage instructions accordingly Add a log statement to package function Added more tests * Address comments * Rename `normalized_metric` --> `metric_` to avoid potential confusion * Add a getter for TASK_REGISTRY and DATASET_REGISTRY (#142) * Add a getter for TASK_REGISTRY and DATASET_REGISTRY * typing * fix potential bug * Add None to test * Updated tests according to comments from @bfineran * Make test cleaner based on feedback from @corey-nm * Remove config creator (#136) * [Auto] Add Tensorboard Support (#147) * Support for Hyperparameter Tuning (#145) * force convert yolov5 metric keys to float (#151) * [Auto] Update function name and description to be more generic (#149) * rename and flip logic for stopping_condition flag (#152) * [Auto] Support for multi-stage tuning (#157) * Support for updated tuning flow (#159) * Support tuning of CLI args (#158) * Support multiple optimizing metrics (#160) * Log important updates with an easily visible format (#161) * Update the user output for `sparsify.package` (#166) * Add Dockerfile Download deployment directory, and Update instructions for user Update tests * Add volume mount to docker command * [Auto] Update interface for sparsifyml (#173) * Fix: remove debug line * Update sparsify.auto interface for sparsifyml * rename interface -> schemas * Sparsify.alpha.auto (#179) * Update: sparsify.version to match with main * Delete: sparsify.package * Empty commit * Add: stitch functions * Update: Env var name Update: stitch functions slightly * Add: Sparsifyml to dependencies in setup.py * Style: Fixes * Some more fixers * OLD IC integration working * Run Integration Tests only when sparsifyml installed * Fix yolov5 integration * Propagate student args to teacher * Update teacher kwargs only when key not present for safety * Updated: integration_test * Updated: num trials to 2 * Fix: failing GHA * make sparsifyml optional implement own strtobool function * [Create] alpha implementation (#181) * [Create] alpha implementation * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: corey-nm <[email protected]> --------- Co-authored-by: corey-nm <[email protected]> * Adding one shot cli (#184) * [Feature branch] standard clis (#187) * Adding skeleton clis * [CLI standardization] sparsify.run one-shot impl (#188) * [CLI standardization] sparsify.run one-shot impl * Fixing one-shot cli --------- Co-authored-by: Corey Lowman <[email protected]> * [WIP][CLI standardization] sparsify.run training-aware and spares-transfer initial impl (#189) * [CLI standardization] sparsify.run one-shot impl * [WIP][CLI standardization] sparsify.run training-aware and spares-transfer initial impl * Fixing training-aware/sparse-transfer --------- Co-authored-by: Corey Lowman <[email protected]> * Adding docstring to sparsify.run * Moving use case to top arg * Removing apply/init --------- Co-authored-by: Benjamin Fineran <[email protected]> * Style changes for sparsify.alpha (#194) * Update: Minimum supported Python Version to `3.7` as it's consistent with our other repos (#193) * [Add] `sparsify.login` CLI and function (#180) * Adding sparsify.login entrypoint and function * Adding docstring to exception * Adding pip install of sparsifyml * Respond to review * Adding help message at top * Adding setup python to workflow * Adding checked sparsifyml import * Apply suggestions from code review Co-authored-by: Danny Guinther <[email protected]> * check against major minor version only * add client_id and other bug fixes * Fix: `--index` --> `--index-url` * Update install command missed during rebase * * Clean up code * Remove Global variables * Update PyPi Server link * Add Logging * Move exceptions to their own file * Style fixes * Apply suggestions from code review Add: suggestion from @KSGulin Co-authored-by: Konstantin Gulin <[email protected]> * Update src/sparsify/login.py Co-authored-by: Konstantin Gulin <[email protected]> * remove comment --------- Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: Danny Guinther <[email protected]> Co-authored-by: Benjamin <[email protected]> Co-authored-by: rahul-tuli <[email protected]> Co-authored-by: Konstantin Gulin <[email protected]> * training aware and sparse transfer run mode support (#191) * add sparsifyml dependencies to sparsify install (#195) * update task registry + generalize matching (#201) * rename performance to optim-level in legacy auto api (#199) * [sparsify.run one-shot] CLI propagation of recipe_args (#198) * Remove hardware optimization options (#200) * Remove hardware optimization options * Rename instead of remove optim_level * Add OPTIM_LEVEL back to all list * simple fixes in initial one-shot testing flow (#206) * fixes for initial E2E runs of sparse transfer and training aware (#207) * fixes for initial E2E runs of sparse transfer and training aware * quality * [Alpha] Rework Auto main script into Training-Aware and Sparse-Transfer script (#208) * Initial scratch work * Complete, but untested implementation * Working yolov5 * Working across all integrations * IC path fix * Require model * Remove debug adds * make API KEY an argument (#211) * Update integration and unit tests (#214) * Update integration and unit tests * Update IC base test model * Add login step to test setup (#216) * bump up version to 1.6.0 (#215) (#218) Co-authored-by: dhuang <[email protected]> (cherry picked from commit 699a476) Co-authored-by: dhuangnm <[email protected]> * [BugFixes] Fix failing tests in `sparsify.alpha` (#223) * Intermediate commit should be amended * Remove failing test as synced with @KSGulin * Explicitly pin protobuff depencies. (#225) * Default num_samples to None (#227) * remove legacy UI cmds from `make build` (#229) * Remove dev print statements from IC runner (#231) * Remove dev print statements * Remove logger * Fix incomplete wheel build (#232) * Fix incomplete wheel build * Add license string * Add environment hecks * Address review comments * Catch generic Exception * signal test --------- Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: dhuangnm <[email protected]> Co-authored-by: dhuang <[email protected]> Co-authored-by: Benjamin Fineran <[email protected]> Co-authored-by: corey-nm <[email protected]> Co-authored-by: Danny Guinther <[email protected]> Co-authored-by: Benjamin <[email protected]>
neuralmagic · Jul 7, 2023 · cd5a938 · cd5a938
1 parent fb685ec
commit cd5a938
Show file tree

Hide file tree

Showing 7 changed files with 307 additions and 0 deletions.
diff --git a/setup.py b/setup.py
@@ -81,6 +81,7 @@ def _setup_entry_points() -> Dict:
         "console_scripts": [
             "sparsify.run=sparsify.cli.run:main",
             "sparsify.login=sparsify.login:main",
+            "sparsify.check_environment=sparsify.check_environment.main:main",
         ]
     }
 

diff --git a/src/sparsify/check_environment/__init__.py b/src/sparsify/check_environment/__init__.py
@@ -0,0 +1,20 @@
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# flake8: noqa
+# isort: skip_file
+
+from .gpu_device import *
+from .ort_health import *
+from .pathway_checks import *
diff --git a/src/sparsify/check_environment/gpu_device.py b/src/sparsify/check_environment/gpu_device.py
@@ -0,0 +1,39 @@
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+
+import torch
+
+
+_LOGGER = logging.getLogger(__name__)
+
+__all__ = ["check_for_gpu"]
+
+
+def check_for_gpu():
+    """
+    Check for GPU and warn if not found
+    """
+    _LOGGER.warning("Checking for GPU...")
+    if not torch.cuda.is_available():
+        _LOGGER.warn(
+            "*************************** NO GPU DETECTED ***************************\n"
+            "No GPU(s) detected on machine. The use of a GPU for training-aware "
+            "sparsification, sparse-transfer learning, and one-shot sparsification is "
+            "highly recommended.\n"
+            "************************************************************************"
+        )
+    else:
+        _LOGGER.warning("GPU check completed successfully")
diff --git a/src/sparsify/check_environment/main.py b/src/sparsify/check_environment/main.py
@@ -0,0 +1,26 @@
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from sparsify.check_environment import check_for_gpu, check_ort_health
+
+
+def main():
+    """
+    Check the environment for compatibility with the sparsifyml package
+    """
+    check_for_gpu()
+    check_ort_health()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/src/sparsify/check_environment/ort_health.py b/src/sparsify/check_environment/ort_health.py
@@ -0,0 +1,179 @@
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import logging
+import signal
+from typing import List, Optional
+
+import numpy
+import torch
+from onnx import TensorProto, helper
+
+import onnxruntime as ort
+from deepsparse.utils import generate_random_inputs, get_input_names
+from sparsifyml.one_shot.utils import run_onnx_model
+
+
+__all__ = ["check_ort_health"]
+
+_LOGGER = logging.getLogger(__name__)
+
+
+CUDA_HELP_STRING = (
+    "If you would like to run on GPU, please ensure that your CUDA and cuDNN "
+    "versions are compatible with the installed version of onnxruntime-gpu: "
+    "https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements"  # noqa: E501
+)
+
+
+def _create_simple_conv_graph(
+    image_pixels_side: int = 32,
+    channel_count: int = 3,
+    batch_size: int = 1,
+    kernel_size: int = 3,
+    kernel_count: int = 10,
+):
+    feature_size_side = image_pixels_side - kernel_size + 1
+
+    # The inputs and outputs
+    X = helper.make_tensor_value_info(
+        "X",
+        TensorProto.FLOAT,
+        [batch_size, channel_count, image_pixels_side, image_pixels_side],
+    )
+    Y = helper.make_tensor_value_info(
+        "Y",
+        TensorProto.FLOAT,
+        [batch_size, kernel_count, feature_size_side, feature_size_side],
+    )
+
+    # Create nodes for Conv, Relu, Flatten, and Gemm (Fully Connected) operations
+    conv_node = helper.make_node(
+        "Conv",
+        inputs=["X", "conv_weight", "conv_bias"],
+        outputs=["conv_result"],
+        kernel_shape=[kernel_size, kernel_size],
+    )
+
+    relu_node1 = helper.make_node(
+        "Relu",
+        inputs=["conv_result"],
+        outputs=["Y"],
+    )
+
+    # Define the weights for the Conv and Gemm layers
+    conv_weight = helper.make_tensor(
+        "conv_weight",
+        TensorProto.FLOAT,
+        [kernel_count, channel_count, kernel_size, kernel_size],
+        numpy.random.randn(kernel_count, channel_count, kernel_size, kernel_size),
+    )
+    conv_bias = helper.make_tensor(
+        "conv_bias", TensorProto.FLOAT, [kernel_count], numpy.random.randn(kernel_count)
+    )
+
+    # Create the graph (model)
+
+    graph_def = helper.make_graph(
+        [conv_node, relu_node1],
+        "SimpleCNN",
+        inputs=[X],
+        outputs=[Y],
+        initializer=[conv_weight, conv_bias],
+    )
+
+    return helper.make_model(graph_def, producer_name="onnx-example")
+
+
+def check_ort_health(providers: Optional[List[str]] = None):
+    """
+    Checks that the model can be executed with the set providers
+
+    :param model: model to check
+    :param providers: list of providers use for ORT execution
+    """
+    _LOGGER.warning("Checking onnxruntime-gpu environment health...")
+
+    model = _create_simple_conv_graph()
+
+    providers = (
+        ["CUDAExecutionProvider"]
+        if torch.cuda.is_available()
+        else ["CPUExecutionProvider"]
+    )
+
+    # If cuda device found by torch, ensure it's found by ORT as well
+    if ort.get_device() != "GPU" and "CUDAExecutionProvider" in providers:
+        raise RuntimeError(
+            "CUDA enabled device detected on your machine, but is not detected by "
+            "onnxruntime. If you would like to run on CPU, please set "
+            "CUDA_VISIBLE_DEVICES=-1. Note that this is likely to slow down model "
+            f"compression significantly. {CUDA_HELP_STRING}"
+        )
+
+    # Ensure that ORT can execute the model
+    random_input = {
+        input_name: input
+        for input_name, input in zip(
+            get_input_names(model), generate_random_inputs(model)
+        )
+    }
+
+    # Define a custom exception and signal handler
+    class _TerminationSignal(Exception):
+        pass
+
+    def handle_termination_signal(signum, frame):
+        raise _TerminationSignal("Termination signal received")
+
+    # Register the signal handler for SIGTERM and SIGINT signals
+    signal.signal(signal.SIGTERM, handle_termination_signal)
+    signal.signal(signal.SIGINT, handle_termination_signal)
+
+    try:
+        run_onnx_model(
+            model=model,
+            input_batch=random_input,
+            providers=providers,
+        )
+    except _TerminationSignal as ts:
+        print("Termination signal caught:", ts)
+    except Exception as e:
+        # If run fails, try again with CPU only to ensure this is a CUDA environment
+        # issue
+        if providers != ["CPUExecutionProvider"]:
+            try:
+                run_onnx_model(
+                    model=model,
+                    input_batch=random_input,
+                    providers=["CPUExecutionProvider"],
+                )
+
+                raise RuntimeError(
+                    "ONNXRuntime execution failed with CUDAExecutionProvider"
+                    "but succeeded with CPUExecutionProvider. This is indicative"
+                    f"of a likely issue with nnxruntime-gpu install {CUDA_HELP_STRING}"
+                ) from e
+
+            except RuntimeError:
+                pass
+
+        raise RuntimeError(
+            "ONNXRuntime execution failed with both CUDAExecutionProvider and "
+            "CPUExecutionProvider. Ensure that onnxruntime-gpu and its dependencies "
+            "are properly installed."
+        ) from e
+
+    _LOGGER.warning("onnxruntime-gpu environment check completed successfully")
diff --git a/src/sparsify/check_environment/pathway_checks.py b/src/sparsify/check_environment/pathway_checks.py
@@ -0,0 +1,35 @@
+# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from sparsify.check_environment import check_for_gpu, check_ort_health
+
+
+__all__ = ["one_shot_checks", "auto_checks"]
+
+
+def one_shot_checks():
+    """
+    Check environment for compatibility with one-shot sparsification
+    """
+    check_for_gpu()
+    check_ort_health()
+
+
+def auto_checks():
+    """
+    Check environment for compatibility with training-aware sparsification and
+    sparse-transfer learning
+    """
+    check_for_gpu()
diff --git a/src/sparsify/cli/run.py b/src/sparsify/cli/run.py
@@ -18,6 +18,7 @@
 
 import click
 from sparsezoo import Model
+from sparsify.check_environment import auto_checks, one_shot_checks
 from sparsify.cli import opts
 
 
@@ -45,6 +46,8 @@ def one_shot(**kwargs):
     # raises exception if sparsifyml not installed
     from sparsify.one_shot import one_shot
 
+    one_shot_checks()
+
     recipe_args = kwargs.get("recipe_args")
     if isinstance(recipe_args, str):
         recipe_args = json.loads(recipe_args)
@@ -75,6 +78,8 @@ def sparse_transfer(**kwargs):
     """
     from sparsify import auto
 
+    auto_checks()
+
     # recipe arg should be a sparse transfer recipe
     auto.main(_parse_run_args_to_auto(sparse_transfer=True, **kwargs))
 
@@ -92,6 +97,8 @@ def training_aware(**kwargs):
     """
     from sparsify import auto
 
+    auto_checks()
+
     # recipe arg should be a training aware recipe
     auto.main(_parse_run_args_to_auto(sparse_transfer=False, **kwargs))