Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vae_encoder_gpu-dml_footprints.json file not found when converting stable diffusion xl base model #1202

Open
AshD opened this issue Jun 19, 2024 · 6 comments
Labels
DirectML DirectML waiting for response Waiting for response

Comments

@AshD
Copy link

AshD commented Jun 19, 2024

Describe the bug
python stable_diffusion_xl.py --model_id=stabilityai/stable-diffusion-xl-base-1.0 --optimize
/home/ash/ai/lib/python3.12/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: Transformer2DModelOutput is deprecated and will be removed in version 1.0.0. Importing Transformer2DModelOutput from diffusers.models.transformer_2d is deprecated and this will be removed in a future version. Please use from diffusers.models.modeling_outputs import Transformer2DModelOutput, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Download stable diffusion PyTorch pipeline...
Loading pipeline components...: 100%|█████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.47it/s]

Optimizing vae_encoder
[2024-06-18 20:35:41,419] [INFO] [run.py:138:run_engine] Running workflow default_workflow
[2024-06-18 20:35:41,422] [INFO] [engine.py:986:save_olive_config] Saved Olive config to cache/default_workflow/olive_config.json
[2024-06-18 20:35:41,425] [WARNING] [accelerator_creator.py:182:_check_execution_providers] The following execution providers are not supported: 'DmlExecutionProvider' by the device: 'gpu' and will be ignored. Please consider installing an onnxruntime build that contains the relevant execution providers.
[2024-06-18 20:35:41,425] [INFO] [accelerator_creator.py:224:create_accelerators] Running workflow on accelerator specs: gpu-cpu
[2024-06-18 20:35:41,425] [INFO] [engine.py:109:initialize] Using cache directory: cache/default_workflow
[2024-06-18 20:35:41,425] [INFO] [engine.py:265:run] Running Olive on accelerator: gpu-cpu
[2024-06-18 20:35:41,425] [INFO] [engine.py:1085:_create_system] Creating target system ...
[2024-06-18 20:35:41,425] [INFO] [engine.py:1088:_create_system] Target system created in 0.000057 seconds
[2024-06-18 20:35:41,425] [INFO] [engine.py:1097:_create_system] Creating host system ...
[2024-06-18 20:35:41,425] [INFO] [engine.py:1100:_create_system] Host system created in 0.000053 seconds
[2024-06-18 20:35:41,453] [INFO] [engine.py:867:_run_pass] Running pass convert:OnnxConversion
[2024-06-18 20:35:41,453] [INFO] [engine.py:901:_run_pass] Loaded model from cache: 3_OnnxConversion-45ce4523-e3495161 from cache/default_workflow/runs
[2024-06-18 20:35:41,453] [INFO] [engine.py:867:_run_pass] Running pass optimize:OrtTransformersOptimization
[2024-06-18 20:35:41,454] [INFO] [transformer_optimization.py:169:validate_search_point] CPUExecutionProvider does not support float16 very well, please avoid to use float16.
[2024-06-18 20:35:41,454] [WARNING] [engine.py:873:_run_pass] Invalid search point, prune
[2024-06-18 20:35:41,454] [WARNING] [engine.py:850:_run_passes] Skipping evaluation as model was pruned
[2024-06-18 20:35:41,454] [WARNING] [engine.py:437:run_no_search] Flow ['convert', 'optimize'] is pruned due to failed or invalid config for pass 'optimize'
[2024-06-18 20:35:41,454] [INFO] [engine.py:364:run_accelerator] Save footprint to footprints/vae_encoder_gpu-cpu_footprints.json.
[2024-06-18 20:35:41,454] [INFO] [engine.py:282:run] Run history for gpu-cpu:
[2024-06-18 20:35:41,457] [INFO] [engine.py:570:dump_run_history] run history:
+------------------------------------+-------------------+----------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+====================================+===================+================+================+===========+
| 45ce4523 | | | | |
+------------------------------------+-------------------+----------------+----------------+-----------+
| 3_OnnxConversion-45ce4523-e3495161 | 45ce4523 | OnnxConversion | 6.64365 | |
+------------------------------------+-------------------+----------------+----------------+-----------+
[2024-06-18 20:35:41,457] [INFO] [engine.py:297:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 635, in
main()
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 601, in main
optimize(
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 374, in optimize
with footprints_file_path.open("r") as footprint_file:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/pathlib.py", line 1015, in open
return io.open(self, mode, buffering, encoding, errors, newline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/footprints/vae_encoder_gpu-dml_footprints.json'

To Reproduce
Run python stable_diffusion_xl.py --model_id=stabilityai/stable-diffusion-xl-base-1.0 --optimize

Other information

  • OS: Ubuntu 22.04
    olive-ai 0.6.2
    onnx 1.16.1
    onnxruntime 1.18.0
@jambayk
Copy link
Contributor

jambayk commented Jun 27, 2024

hi,

from the logs it appears that the dml workflow is being skipped since you are running it in a linux environment without dml ep. Since the workflow contains an evaluator, it is checking for the presence of the dml ep and not finding it.

can you try again by removing the evaluator": "common_evaluator" part from the config json?

@jambayk jambayk added the waiting for response Waiting for response label Jun 27, 2024
@AshD
Copy link
Author

AshD commented Jun 28, 2024

Tried it.

Optimizing vae_encoder
Traceback (most recent call last):
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 635, in
main()
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 601, in main
optimize(
File "/home/ash/ai/Olive/examples/directml/stable_diffusion_xl/stable_diffusion_xl.py", line 369, in optimize
olive_run(olive_config)
File "/home/ash/ai/lib/python3.12/site-packages/olive/workflows/run/run.py", line 284, in run
run_config = RunConfig.parse_file_or_obj(run_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/olive/common/config_utils.py", line 120, in parse_file_or_obj
return cls.parse_obj(file_or_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/pydantic/v1/main.py", line 526, in parse_obj
return cls(**obj)
^^^^^^^^^^
File "/home/ash/ai/lib/python3.12/site-packages/pydantic/v1/main.py", line 341, in init
raise validation_error
pydantic.v1.error_wrappers.ValidationError: 4 validation errors for RunConfig
engine
Evaluator common_evaluator not found in evaluators (type=value_error)
passes -> convert
Invalid engine (type=value_error)
passes -> optimize
Invalid engine (type=value_error)
passes -> optimize_cuda
Invalid engine (type=value_error)

@jambayk
Copy link
Contributor

jambayk commented Jun 28, 2024

looks like you only removed it from the "evaluators" section. Sorry I was unclear. please remove the evaluator field under "engine".

@devang-ml devang-ml added the DirectML DirectML label Jul 1, 2024
@WickedHorse
Copy link

I had this same problem. i ran the pip install -r requirements.txt at the projects root but there was another requirements.txt file C:\Users\Cole\olive\Olive\examples\stable_diffusion. i re ran the command then reissued python stable_diffusion.py --optimize and that seemed to run through.

@sanjeev671
Copy link

sanjeev671 commented Jul 18, 2024

I am also getting the same issue for "python stable_diffusion.py --optimize" as
"FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\footprints\vae_encoder_gpu-dml_footprints.json"

The above issue is due to
"pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFFF9E61070: (caller: 00007FFFF9E47F84) Exception(1) tid(1da0) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application."

complete trackback is

[2024-07-18 02:01:44,745] [WARNING] [engine.py:370:run_accelerator] Failed to run Olive on gpu-dml.
Traceback (most recent call last):
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 349, in run_accelerator
output_footprint = self.run_no_search(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 441, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 856, in _run_passes
signal = self._evaluate_model(model_config, model_id, evaluator_config, accelerator_spec)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\engine\engine.py", line 1078, in _evaluate_model
signal = self.target.evaluate_model(model_config, metrics, accelerator_spec)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\systems\local.py", line 46, in evaluate_model
return evaluator.evaluate(model, metrics, device=device, execution_providers=execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 193, in evaluate
metrics_res[metric.name] = self._evaluate_latency(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 118, in _evaluate_latency
latencies = self._evaluate_raw_latency(model, metric, dataloader, post_func, device, execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 706, in _evaluate_raw_latency
return self._evaluate_onnx_latency(model, metric, dataloader, post_func, device, execution_providers)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\evaluator\olive_evaluator.py", line 495, in _evaluate_onnx_latency
latencies = session.time_run(
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\olive\common\ort_inference.py", line 334, in time_run
self.session.run(None, input_feed)
File "C:\Users\HCKTest\Desktop\sanjeev\python_env\stable_diffusion\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlCommandRecorder.cpp(371)\onnxruntime_pybind11_state.pyd!00007FFFF9E61070: (caller: 00007FFFF9E47F84) Exception(1) tid(1da0) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

[2024-07-18 02:01:44,881] [INFO] [engine.py:290:run] Run history for gpu-dml:
[2024-07-18 02:01:44,896] [INFO] [engine.py:589:dump_run_history] run history:
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| model_id | parent_model_id | from_pass | duration_sec | metrics |
+==================================================+====================================+=============================+================+===========+
| 8ddbdd91 | | | | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| 0_OnnxConversion-8ddbdd91-076cfb73 | 8ddbdd91 | OnnxConversion | 34.4892 | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
| 1_OrtTransformersOptimization-0-0f55df8a-gpu-dml | 0_OnnxConversion-8ddbdd91-076cfb73 | OrtTransformersOptimization | 8.00802 | |
+--------------------------------------------------+------------------------------------+-----------------------------+----------------+-----------+
[2024-07-18 02:01:44,897] [INFO] [engine.py:305:run] No packaging config provided, skip packaging artifacts
Traceback (most recent call last):
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 457, in
main()
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 389, in main
optimize(common_args.model_id, common_args.provider, unoptimized_model_dir, optimized_model_dir)
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\stable_diffusion.py", line 266, in optimize
save_optimized_onnx_submodel(submodel_name, provider, model_info)
File "C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\sd_utils\ort.py", line 59, in save_optimized_onnx_submodel
with footprints_file_path.open("r") as footprint_file:
File "C:\Users\HCKTest\Desktop\sanjeev\python\py3_10_9\lib\pathlib.py", line 1119, in open
return self._accessor.open(self, mode, buffering, encoding, errors,
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\HCKTest\Desktop\sanjeev\olive\stable_diffusion\footprints\vae_encoder_gpu-dml_footprints.json'

How to solve the above issue? I also tried installing both the requirements.txt.

@Jay19751103
Copy link

looks like you only removed it from the "evaluators" section. Sorry I was unclear. please remove the evaluator field under "engine".

This works for me. how about protobuf 2GB issue ?

Traceback (most recent call last):
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 635, in
main()
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 601, in main
optimize(
File "G:\Olive\examples\directml\stable_diffusion_xl\stable_diffusion_xl.py", line 369, in optimize
olive_run(olive_config)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\workflows\run\run.py", line 297, in run
return run_engine(package_config, run_config, data_root)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\workflows\run\run.py", line 261, in run_engine
engine.run(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 267, in run
run_result = self.run_accelerator(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 339, in run_accelerator
output_footprint = self.run_no_search(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 431, in run_no_search
should_prune, signal, model_ids = self._run_passes(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 829, in _run_passes
model_config, model_id = self._run_pass(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\engine\engine.py", line 937, in _run_pass
output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\systems\local.py", line 32, in run_pass
output_model = the_pass.run(model, data_root, output_model_path, point)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\olive_pass.py", line 224, in run
output_model = self._run_for_config(model, data_root, config, output_model_path)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\transformer_optimization.py", line 332, in run_for_config
return model_proto_to_olive_model(optimizer.model, output_model_path, config)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\common.py", line 164, in model_proto_to_olive_model
has_external_data = model_proto_to_file(
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\olive\passes\onnx\common.py", line 108, in model_proto_to_file
onnx.save_model(model, str(output_path))
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\onnx_init
.py", line 327, in save_model
serialized = _get_serializer(format, model_filepath).serialize_proto(proto)
File "C:\Users\wenchien\AppData\Local\anaconda3\envs\olive_sd\lib\site-packages\onnx\serialization.py", line 100, in serialize_proto
result = proto.SerializeToString()
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 5136056262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DirectML DirectML waiting for response Waiting for response
Projects
None yet
Development

No branches or pull requests

6 participants