You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
DEEPSPEED_CONFIG = {
"fp16": {
"enabled": True
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": False
},
"overlap_comm": True,
"contiguous_gradients": True,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"gather_16bit_weights_on_model_save": True,
"round_robin_gradients": True
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 10,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": False
}
deepspeed_plugin = DeepSpeedPlugin(
hf_ds_config=DEEPSPEED_CONFIG )
accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin, mixed_precision="fp16")
each GPU creates a string
message=[ f"Hello this is GPU {accelerator.process_index}" ]
collect the messages from all GPUs
messages=gather_object(message)
output the messages only on the main process with accelerator.print()
accelerator.print(messages)
['Hello this is GPU 0']
Expected behavior
It should show all 4 gpus in the output.
and with accelerate I am not able to fine tune Mistral nemo model with batch size of more than 1
The text was updated successfully, but these errors were encountered: