You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a error when i executed the script "ray_client.get_job_info(submission_id)" in the notebook "llm_train_serve"
Failure # 1 (occurred at 2023-11-04_22-13-27)
�[36mray::_Inner.train()�[39m (pid=825, ip=10.8.39.156, actor_id=70e4f07c8e58982fae704cbb05000000, repr=TransformersTrainer)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 375, in train
raise skipped from exception_cause(skipped)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 54, in check_for_failure
ray.get(object_ref)
ray.exceptions.RayTaskError(TypeError): �[36mray::_RayTrainWorker__execute.get_next()�[39m (pid=999, ip=10.8.40.116, actor_id=b615fcc6abc8662c6289d4a205000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f6d22cf4cd0>)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/worker_group.py", line 32, in __execute
raise skipped from exception_cause(skipped)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
train_func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/huggingface/transformers/transformers_trainer.py", line 402, in _huggingface_train_loop_per_worker
trainer: transformers.trainer.Trainer = trainer_init_per_worker(
File "train_script.py", line 134, in trainer_init_per_worker
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/peft/utils/other.py", line 106, in prepare_model_for_int8_training
return prepare_model_for_kbit_training(*args, **kwargs)
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/peft/utils/other.py", line 95, in prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointing_enable
self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
TypeError: _set_gradient_checkpointing() got an unexpected keyword argument 'enable'
The text was updated successfully, but these errors were encountered:
I encountered a error when i executed the script "ray_client.get_job_info(submission_id)" in the notebook "llm_train_serve"
Failure # 1 (occurred at 2023-11-04_22-13-27)
�[36mray::_Inner.train()�[39m (pid=825, ip=10.8.39.156, actor_id=70e4f07c8e58982fae704cbb05000000, repr=TransformersTrainer)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/tune/trainable/trainable.py", line 375, in train
raise skipped from exception_cause(skipped)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 54, in check_for_failure
ray.get(object_ref)
ray.exceptions.RayTaskError(TypeError): �[36mray::_RayTrainWorker__execute.get_next()�[39m (pid=999, ip=10.8.40.116, actor_id=b615fcc6abc8662c6289d4a205000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f6d22cf4cd0>)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/worker_group.py", line 32, in __execute
raise skipped from exception_cause(skipped)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/_internal/utils.py", line 129, in discard_return_wrapper
train_func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.8/site-packages/ray/train/huggingface/transformers/transformers_trainer.py", line 402, in _huggingface_train_loop_per_worker
trainer: transformers.trainer.Trainer = trainer_init_per_worker(
File "train_script.py", line 134, in trainer_init_per_worker
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/peft/utils/other.py", line 106, in prepare_model_for_int8_training
return prepare_model_for_kbit_training(*args, **kwargs)
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/peft/utils/other.py", line 95, in prepare_model_for_kbit_training
model.gradient_checkpointing_enable()
File "/tmp/ray/session_2023-11-04_20-36-48_261997_29/runtime_resources/pip/a937a63b874d2ea72d0cb4fbcd6d256f72de9f22/virtualenv/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1872, in gradient_checkpointing_enable
self._set_gradient_checkpointing(enable=True, gradient_checkpointing_func=gradient_checkpointing_func)
TypeError: _set_gradient_checkpointing() got an unexpected keyword argument 'enable'
The text was updated successfully, but these errors were encountered: