diff --git a/README.md b/README.md index 5faab566..f1b4bb60 100644 --- a/README.md +++ b/README.md @@ -81,7 +81,7 @@ To empower you in compressing models, Sparsify is made up of two components: the The Sparsify Cloud is a web application that allows you to create and manage Sparsify Experiments, explore hyperparameters, predict performance, and compare results across both Experiments and deployment scenarios. The Sparsify CLI/API is a Python package that allows you to run Sparsify Experiments locally, sync with the Sparsify Cloud, and integrate into your own workflows. -To get started immediately, [create an account](https://account.neuralmagic.com/signup) and then check out the [Installation](https://github.com/neuralmagic/sparsify/blob/main/README.md#installation) and [Quick Start](https://github.com/neuralmagic/sparsify/blob/main/README.md#quick-start) sections of this README. +To get started immediately, [create an account](https://account.neuralmagic.com/signup) and then check out the [Installation](#Installation) and [Quick Start](#quick-start) sections of this README. With all of that setup, sparsifying your models is as easy as: ```bash @@ -131,7 +131,7 @@ An account is required to manage your Experiments and API keys. Visit the [Neural Magic's Web App Platform](https://account.neuralmagic.com/signup) and create an account by entering your email, name, and a unique password. If you already have a Neural Magic Account, [sign in](https://account.neuralmagic.com/signin) with your email. -For more details, see the [Sparsify Cloud User Guide](https://github.com/neuralmagic/sparsify/docs/cloud-user-guide.md). +For more details, see the [Sparsify Cloud User Guide](https://github.com/neuralmagic/sparsify/blob/main/docs/cloud-user-guide.md). ### Install Sparsify @@ -188,7 +188,7 @@ Where the values for each of the arguments follow these general rules: | **++** | **+++++** | **+++** | One-Shot Experiments are the quickest way to create a faster and smaller version of your model. -The algorithms are applied to the model post training utilizing a calibration dataset, so they result in no further training time and much faster sparsification times compared with Training-Aware Experiments. +The algorithms are applied to the model post-training utilizing a calibration dataset, so they result in no further training time and much faster sparsification times compared with Training-Aware Experiments. Generally, One-Shot Experiments result in a 3-5x speedup with minimal accuracy loss. They are ideal for when you want to quickly sparsify your model and don't have a lot of time to spend on the sparsification process. @@ -251,17 +251,56 @@ NLP Example: sparsify.run training-aware --use-case text_classification --model bert-base --data sst2 --optim-level 0.5 ``` -### Compare the Results +### Compare the Experiment results -Once you have run your Experiment, you can compare the results printed out to the console. +Once you have run your Experiment, you can compare the results printed out to the console using the `deepsparse.benchmark` command. In the near future, you will be able to compare the results in the Cloud, measure other scenarios, and compare the results to other Experiments. + +To compare the results of your Experiment with the original dense baseline model, you can use the `deepsparse.benchmark` command with your original model and the new optimized model on your deployment hardware. Models that have been optimized using Sparsify will generally run performantly on DeepSparse, Neural Magic's sparsity-aware CPU inference runtime. + + +For more information on benchmarking, see the [DeepSparse Benchmarking User Guide](https://github.com/neuralmagic/deepsparse/blob/main/docs/user-guide/deepsparse-benchmarking.md). + +Here is an example of a `deepsparse.benchmark`command: + +``` +deepsparse.benchmark zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none --scenario sync + +``` + The results will look something like this: ```bash -Sparsify Results: -TODO +2023-06-30 15:20:41 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled +downloading...: 100%|████████████████████████| 105M/105M [00:18<00:00, 5.81MB/s] +DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230629 COMMUNITY | (fc8b788a) (release) (optimized) (system=avx512, binary=avx512) +[7ffba5a84700 >WARN< operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy. +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO deepsparse.engine.Engine: + onnx_file_path: /home/rahul/.cache/sparsezoo/neuralmagic/obert-base-sst2_wikipedia_bookcorpus-pruned90_quantized/model.onnx + batch_size: 1 + num_cores: 10 + num_streams: 1 + scheduler: Scheduler.default + fraction_of_supported_ops: 0.9981 + cpu_avx_type: avx512 + cpu_vnni: False +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'input_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'attention_mask', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'token_type_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO Starting 'singlestream' performance measurements for 10 seconds +Original Model Path: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +Batch Size: 1 +Scenario: sync +Throughput (items/sec): 134.5611 +Latency Mean (ms/batch): 7.4217 +Latency Median (ms/batch): 7.4245 +Latency Std (ms/batch): 0.0264 +Iterations: 1346 ``` +*Note: performance improvement is not guaranteed across all runtimes and hardware types.* + + ### Package for Deployment Landing soon! diff --git a/docs/cli-api-guide.md b/docs/cli-api-guide.md index fa9bb319..5ba36c87 100644 --- a/docs/cli-api-guide.md +++ b/docs/cli-api-guide.md @@ -14,5 +14,186 @@ See the License for the specific language governing permissions and limitations under the License. --> + # Sparsify CLI/API Guide + +The Sparsify CLI/API is a Python package that allows you to run Sparsify Experiments locally, sync with the Sparsify Cloud, and integrate into your own workflows. + +## Install Sparsify + +Next, you'll need to install Sparsify on your training hardware. +To do this, run the following command: + +```bash +pip install sparsify +``` + +For more details and system/hardware requirements, see the [Installation](https://github.com/neuralmagic/sparsify#installation) section. + +## Login to Sparsify + +With Sparsify installed on your training hardware, you'll need to authorize the local CLI to access your account. +This is done by running the `sparsify.login` command and providing your API key. +Locate your API key on the home page of the [Sparsify Cloud](https://apps.neuralmagic.com/sparsify) under the **'Get set up'** modal. +Once you have located this, copy the command or the API key itself and run the following command: + +```bash +sparsify.login API_KEY +```` + +The `sparsify.login API_KEY` command is used to sync your local training environment with the Sparsify Cloud in order to keep track of your Experiments. Once you run the `sparsify.login API_KEY` command, you should see a confirmation via the console that you are logged into Sparsify. To log out of Sparsify, use the `exit` command. + +If you encounter any issues with your API key, reach out to the team via the [nm-sparsify Slack Channel](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-1xkdlzwv9-2rvS6yQcCs7VDNUcWxctnw), [email](mailto::rob@neuralmagic.com) or via [GitHub Issues](https://github.com/neuralmagic/sparsify/issues). + + +## Run an Experiment + +Experiments are the core of sparsifying a model. +They are the process of applying sparsification algorithms in One-Shot, Training-Aware, or Sparse-Transfer to a dataset and model. + +All Experiments are run locally on your training hardware and can be synced with the cloud for further analysis and comparison. + +To run an Experiment, you can use either the CLI or the API depending on your use case. +The Sparsify Cloud provides a UI for exploring hyperparameters, predicting performance, and generating the desired CLI/API command. + +The general command for running an Experiment is: + +```bash +sparsify.run EXPERIMENT_TYPE --use-case USE_CASE --model MODEL --data DATA --optim-level OPTIM_LEVEL +``` + +Where the values for each of the arguments follow these general rules: +- EXPERIMENT_TYPE: one of `one-shot`, `training-aware`, or `sparse-transfer`. + +- USE_CASE: the use case you're solving for such as `image-classification`, `object-detection`, `text-classification`, a custom use case, etc. A full list of supported use cases for each Experiment type can be found [here](https://github.com/neuralmagic/sparsify/blob/main/docs/use-cases-guide.md). + +- MODEL: the model you want to sparsify which can be a model name such as `resnet50`, a stub from the [SparseZoo](https://sparsezoo.neuralmagic.com), or a path to a local model. For One-Shot, currently the model must be in an ONNX format. For Training-Aware and Sparse-Transfer, the model must be in a PyTorch format. More details on model formats can be found [here](https://github.com/neuralmagic/sparsify/blob/main/docs/models-guide.md). + +- DATA: the dataset you want to use to the sparsify the model. This can be a dataset name such as `imagenette` or a path to a local dataset. Currently, One-Shot only supports NPZ formatted datasets. Training-Aware and Sparse-Transfer support PyTorch ImageFolder datasets for image classification, YOLOv5/v8 datasets for object detection and segmentation, and HuggingFace datasets for NLP/NLG. More details on dataset formats can be found [here](https://github.com/neuralmagic/sparsify/blob/main/docs/datasets-guide.md). + +- OPTIM_LEVEL: the desired sparsification level from 0 (none) to 1 (max). The general rule is that 0 is the baseline model, <0.3 only quantizes the model, 0.3-1.0 increases the sparsity of the model and applies quantization. More details on sparsification levels can be found [here](https://github.com/neuralmagic/sparsify/blob/main/docs/optim-levels-guide.md). + + +### Experiment Type Examples +#### Running One-Shot + +| Sparsity | Sparsification Speed | Accuracy | +|----------|----------------------|----------| +| **++** | **+++++** | **+++** | + +One-Shot Experiments are the quickest way to create a faster and smaller version of your model. +The algorithms are applied to the model post training utilizing a calibration dataset, so they result in no further training time and much faster sparsification times compared with Training-Aware Experiments. + +Generally, One-Shot Experiments result in a 3-5x speedup with minimal accuracy loss. +They are ideal for when you want to quickly sparsify your model and don't have a lot of time to spend on the sparsification process. + +CV Example: +```bash +sparsify.run one-shot --use-case image_classification --model resnet50 --data imagenette --optim-level 0.5 +``` + +NLP Example: +```bash +sparsify.run one-shot --use-case text_classification --model bert-base --data sst2 --optim-level 0.5 +``` + +#### Running Sparse-Transfer + +| Sparsity | Sparsification Speed | Accuracy | +|----------|----------------------|-----------| +| **++++** | **++++** | **+++++** | + +Sparse-Transfer Experiments are the second quickest way to create a faster and smaller model for your dataset. +Sparse, foundational models are sparsified in a Training-Aware manner on a large dataset such as ImageNet. +Then, the sparse patterns are transferred to your dataset through a fine-tuning process. + +Generally, Sparse-Transfer Experiments result in a 5-10x speedup with minimal accuracy loss. +They are ideal when a sparse model already exists for your use case, and you want to quickly utilize it for your dataset. +Note, the model argument is optional for Sparse-Transfer Experiments as Sparsify will select the best one from the SparseZoo for your use case if not supplied. + +CV Example: +```bash +sparsify.run sparse-transfer --use-case image_classification --data imagenette --optim-level 0.5 +``` + +NLP Example: +```bash +sparsify.run sparse-transfer --use-case text_classification --data sst2 --optim-level 0.5 +``` + +#### Running Training-Aware + +| Sparsity | Sparsification Speed | Accuracy | +|-----------|-----------------------|-----------| +| **+++++** | **++** | **+++++** | + +Training-Aware Experiments are the most accurate way to create a faster and smaller model for your dataset. +The algorithms are applied to the model during training, so they offer the best possible recovery of accuracy. +However, they do require additional training time and hyperparameter tuning to achieve the best results. + +Generally, Training-Aware Experiments result in a 6-12x speedup with minimal accuracy loss. +They are ideal when you have the time to train a model, have a custom model, or want to achieve the best possible accuracy. +Note, the model argument is optional for Sparse-Transfer Experiments as Sparsify will select the best one from the SparseZoo for your use case if not supplied. + +CV Example: +```bash +sparsify.run training-aware --use-case image_classification --model resnet50 --data imagenette --optim-level 0.5 +``` + +NLP Example: +```bash +sparsify.run training-aware --use-case text_classification --model bert-base --data sst2 --optim-level 0.5 +``` + +## Advanced CLI/API Usage Landing Soon! + + +## Compare the Experiment results + +Once you have run your Experiment, you can compare the results printed out to the console using the `deepsparse.benchmark` command. +In the near future, you will be able to compare the results in the Cloud, measure other scenarios, and compare the results to other Experiments. + + +To compare the results of your Experiment with the original dense baseline model, you can use the `deepsparse.benchmark` command with your original model and the new optimized model on your deployment hardware. Models that have been optimized using Sparsify will generally run performantly on DeepSparse, Neural Magic's sparsity-aware CPU inference runtime. + + +For more information on benchmarking, see the [DeepSparse Benchmarking User Guide](https://github.com/neuralmagic/deepsparse/blob/main/docs/user-guide/deepsparse-benchmarking.md). + +Here is an example of a `deepsparse.benchmark`command: + +``` +deepsparse.benchmark zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none --scenario sync + +``` + +The results will look something like this: +```bash +2023-06-30 15:20:41 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled +downloading...: 100%|████████████████████████| 105M/105M [00:18<00:00, 5.81MB/s] +DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230629 COMMUNITY | (fc8b788a) (release) (optimized) (system=avx512, binary=avx512) +[7ffba5a84700 >WARN< operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy. +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO deepsparse.engine.Engine: + onnx_file_path: /home/rahul/.cache/sparsezoo/neuralmagic/obert-base-sst2_wikipedia_bookcorpus-pruned90_quantized/model.onnx + batch_size: 1 + num_cores: 10 + num_streams: 1 + scheduler: Scheduler.default + fraction_of_supported_ops: 0.9981 + cpu_avx_type: avx512 + cpu_vnni: False +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'input_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'attention_mask', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'token_type_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO Starting 'singlestream' performance measurements for 10 seconds +Original Model Path: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +Batch Size: 1 +Scenario: sync +Throughput (items/sec): 134.5611 +Latency Mean (ms/batch): 7.4217 +Latency Median (ms/batch): 7.4245 +Latency Std (ms/batch): 0.0264 +Iterations: 1346 +``` + +*Note: performance improvement is not guaranteed across all runtimes and hardware types.* diff --git a/docs/cloud-user-guide.md b/docs/cloud-user-guide.md index b5b16501..8e225dd2 100644 --- a/docs/cloud-user-guide.md +++ b/docs/cloud-user-guide.md @@ -45,7 +45,7 @@ To do this, run the following command: pip install sparsify ``` -For more details and system/hardware requirements, see the [Installation](https://github.com/neuralmagic/sparsify/README.md#installation) section. +For more details and system/hardware requirements, see the [Installation](https://github.com/neuralmagic/sparsify/blob/main/README.md#installation) section. You may copy the command from the Sparsify Cloud in step 1 and run that in your training environment to install Sparsify. @@ -64,7 +64,7 @@ Once you have located this, copy the command or the API key itself and run the f sparsify.login API_KEY ```` -You may copy the command from the Sparsify Cloud in step 2 and run that in your training environment after installing Sparsify to log in via the Sparsify CLI. For more details on the `sparsify.login` command, see the [CLI/API Guide](https://github.com/neuralmagic/sparsify/docs/cli-api-guide.md). +You may copy the command from the Sparsify Cloud in step 2 and run that in your training environment after installing Sparsify to log in via the Sparsify CLI. For more details on the `sparsify.login` command, see the [CLI/API Guide](https://github.com/neuralmagic/sparsify/blob/main/docs/cli-api-guide.md). ## Run an Experiment Experiments are the core of sparsifying a model. @@ -79,7 +79,7 @@ To run an Experiment, use the Sparsify Cloud to generate a code command to run i ![Sparsify a model](https://drive.google.com/uc?id=1FyayVSqq5YtKO_dEgt5iMNSZQNsqaQFq) 3. Select a Use Case for your model. Note that if your use case is not present in the dropdown, fear not; the use case does not affect the optimization of the model. -4. Choose the Experiment Type. To learn more about the Experiments, see the [Sparsify README](https://github.com/neuralmagic/sparsify/README.md#run-an-experiment). +4. Choose the Experiment Type. To learn more about the Experiments, see the [Sparsify README](https://github.com/neuralmagic/sparsify/blob/main/README.md#run-an-experiment). 5. Adjust the Hyperparameter Compression slider to designate whether you would like to to optimize the model for performance, accuracy, or a balance of both. Note that selecting an extreme on the slider will not completely tank the opposing metric. 6. Click 'Generate Code Snippet' to view the code snipppet generated from your sparsification selections on the next modal. ![Generate Code Snippetl](https://drive.google.com/uc?id=14B193hHeYqLeSX8r6C5N1G8beBeXUkYE) @@ -90,12 +90,15 @@ To run an Experiment, use the Sparsify Cloud to generate a code command to run i ![Generate Code Snippetl](https://drive.google.com/uc?id=1xWrla3ps0qeS70P1bzOIYGeIPXWgHfF_) -To learn more about the arguments for the `sparsify.run` command, see the [CLI/API Guide](https://github.com/neuralmagic/sparsify/docs/cli-api-guide.md). - +To learn more about the arguments for the `sparsify.run` command, see the [CLI/API Guide](https://github.com/neuralmagic/sparsify/blob/main/docs/cli-api-guide.md). ## Compare the Experiment results +Once you have run your Experiment, you can compare the results printed out to the console using the `deepsparse.benchmark` command. +In the near future, you will be able to compare the results in the Cloud, measure other scenarios, and compare the results to other Experiments. + + To compare the results of your Experiment with the original dense baseline model, you can use the `deepsparse.benchmark` command with your original model and the new optimized model on your deployment hardware. Models that have been optimized using Sparsify will generally run performantly on DeepSparse, Neural Magic's sparsity-aware CPU inference runtime. @@ -108,5 +111,33 @@ deepsparse.benchmark zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/s ``` -*Note: performance improvement is not guaranteed across all runtimes and hardware types.* +The results will look something like this: +```bash +2023-06-30 15:20:41 deepsparse.benchmark.benchmark_model INFO Thread pinning to cores enabled +downloading...: 100%|████████████████████████| 105M/105M [00:18<00:00, 5.81MB/s] +DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.6.0.20230629 COMMUNITY | (fc8b788a) (release) (optimized) (system=avx512, binary=avx512) +[7ffba5a84700 >WARN< operator() ./src/include/wand/utility/warnings.hpp:14] Generating emulated code for quantized (INT8) operations since no VNNI instructions were detected. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy. +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO deepsparse.engine.Engine: + onnx_file_path: /home/rahul/.cache/sparsezoo/neuralmagic/obert-base-sst2_wikipedia_bookcorpus-pruned90_quantized/model.onnx + batch_size: 1 + num_cores: 10 + num_streams: 1 + scheduler: Scheduler.default + fraction_of_supported_ops: 0.9981 + cpu_avx_type: avx512 + cpu_vnni: False +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'input_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'attention_mask', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.utils.onnx INFO Generating input 'token_type_ids', type = int64, shape = [1, 128] +2023-06-30 15:21:13 deepsparse.benchmark.benchmark_model INFO Starting 'singlestream' performance measurements for 10 seconds +Original Model Path: zoo:nlp/sentiment_analysis/obert-base/pytorch/huggingface/sst2/pruned90_quant-none +Batch Size: 1 +Scenario: sync +Throughput (items/sec): 134.5611 +Latency Mean (ms/batch): 7.4217 +Latency Median (ms/batch): 7.4245 +Latency Std (ms/batch): 0.0264 +Iterations: 1346 +``` +*Note: performance improvement is not guaranteed across all runtimes and hardware types.*