Failed on optimization step of Unet - Intel Meteor Lake Processor #1194

hooroobaby · 2024-06-12T07:35:48Z

Describe the bug
I am using Intel Meteor Lake architecture, Core Ultra 7 165H Processor.

Under Olive\examples\directml\stable_diffusion_xl I ran the command python stable_diffusion_xl.py --model_id stabilityai/sdxl-turbo --optimize --clean_cache , and then I encountered error while Unet optimizing step.

The error is: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : C:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFDFEB251FE: (caller: 00007FFDFEB0BE04) Exception(1) tid(13d8c) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Expected behavior

Also currently in stable_diffusion_xl.py I only see support for CUDA and DML, I want to inquire if you have any suggestions for me if I want to use OpenvinoExecutionProvider as the provider. Thank you

Other information

OS: Windows
Olive version: 0.7.0
ONNXRuntime package and version: onnxruntime-directml 1.18.0

Additional context


Optimizing unet
[2024-06-11 17:15:01,391] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-06-11 17:15:01,401] [INFO] [engine.py:986:save_olive_config] Saved Olive config to cache\default_workflow\olive_config.json
[2024-06-11 17:15:01,403] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-06-11 17:15:01,403] [INFO] [engine.py:109:initialize] Using cache directory: cache\default_workflow
[2024-06-11 17:15:01,404] [INFO] [engine.py:265:run] Running Olive on accelerator: gpu-dml
[2024-06-11 17:15:01,405] [INFO] [engine.py:1085:_create_system] Creating target system ...
[2024-06-11 17:15:01,405] [INFO] [engine.py:1088:_create_system] Target system created in 0.000000 seconds
[2024-06-11 17:15:01,405] [INFO] [engine.py:1097:_create_system] Creating host system ...
[2024-06-11 17:15:01,405] [INFO] [engine.py:1100:_create_system] Host system created in 0.000000 seconds
[2024-06-11 17:15:01,449] [INFO] [engine.py:867:_run_pass] Running pass convert:OnnxConversion
[2024-06-11 17:19:36,875] [INFO] [engine.py:954:_run_pass] Pass convert:OnnxConversion finished in 275.420215 seconds
[2024-06-11 17:19:36,876] [INFO] [engine.py:867:_run_pass] Running pass optimize:OrtTransformersOptimization
fusion: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [07:23<00:00, 24.67s/it]
[2024-06-11 17:28:57,261] [INFO] [common.py:101:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saving tensors as external data, regardless.
[2024-06-11 17:29:13,243] [INFO] [engine.py:954:_run_pass] Pass optimize:OrtTransformersOptimization finished in 576.364000 seconds
[2024-06-11 17:29:13,244] [INFO] [engine.py:845:_run_passes] Run model evaluation for the final model...
[2024-06-11 17:35:53,708] [WARNING] [engine.py:360:run_accelerator] Failed to run Olive on gpu-dml.
Traceback (most recent call last):
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\engine\engine.py", line 339, in run_accelerator
    output_footprint = self.run_no_search(
                       ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\engine\engine.py", line 431, in run_no_search
    should_prune, signal, model_ids = self._run_passes(
                                      ^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\engine\engine.py", line 846, in _run_passes
    signal = self._evaluate_model(model_config, model_id, data_root, evaluator_config, accelerator_spec)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\engine\engine.py", line 1052, in _evaluate_model
    signal = self.target.evaluate_model(model_config, data_root, metrics, accelerator_spec)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\systems\local.py", line 47, in evaluate_model
    return evaluator.evaluate(model, data_root, metrics, device=device, execution_providers=execution_providers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 205, in evaluate
    metrics_res[metric.name] = self._evaluate_latency(
                               ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 123, in _evaluate_latency
    latencies = self._evaluate_raw_latency(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 762, in _evaluate_raw_latency
    return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\evaluator\olive_evaluator.py", line 543, in _evaluate_onnx_latency
    latencies = session.time_run(
                ^^^^^^^^^^^^^^^^^
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\olive\common\ort_inference.py", line 334, in time_run
    self.session.run(input_feed=input_feed, output_names=None)
  File "C:\Users\ftespo\Desktop\newTest\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : C:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFDFEB251FE: (caller: 00007FFDFEB0BE04) Exception(1) tid(13d8c) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

[2024-06-11 17:35:54,328] [INFO] [engine.py:282:run] Run history for gpu-dml:
[2024-06-11 17:35:54,329] [INFO] [engine.py:570:dump_run_history] run history:
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+ 
| model_id                                         | parent_model_id                    | from_pass                   |   duration_sec | metrics   | 
+==================================================+====================================+=============================+================+===========+ 
| 3252ef1c                                         |                                    |                             |                |           | 
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+ 
| 4_OnnxConversion-3252ef1c-89c11e05               | 3252ef1c                           | OnnxConversion              |        275.42  |           | 
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+ 
| 5_OrtTransformersOptimization-4-d70e84da-gpu-dml | 4_OnnxConversion-3252ef1c-89c11e05 | OrtTransformersOptimization |        576.364 |           | 
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+ 
[2024-06-11 17:35:54,331] [INFO] [engine.py:297:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
  File "C:\Users\ftespo\Desktop\newTest\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 635, in <module>
    main()
  File "C:\Users\ftespo\Desktop\newTest\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 601, in main
    optimize(
  File "C:\Users\ftespo\Desktop\newTest\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 374, in optimize
    with footprints_file_path.open("r") as footprint_file:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python312\Lib\pathlib.py", line 1013, in open
    return io.open(self, mode, buffering, encoding, errors, newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\ftespo\\Desktop\\newTest\\Olive\\examples\\directml\\stable_diffusion_xl\\footprints\\unet_gpu-dml_footprints.json'

The text was updated successfully, but these errors were encountered:

guotuofeng added the DirectML DirectML label Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed on optimization step of Unet - Intel Meteor Lake Processor #1194

Failed on optimization step of Unet - Intel Meteor Lake Processor #1194

hooroobaby commented Jun 12, 2024

Failed on optimization step of Unet - Intel Meteor Lake Processor #1194

Failed on optimization step of Unet - Intel Meteor Lake Processor #1194

Comments

hooroobaby commented Jun 12, 2024