Accelerator.process_index only shows 0 in a 4 GPU env #3122

abpani · 2024-09-19T05:45:04Z

System Info

- `Accelerate` version: 0.34.2
- Platform: Linux-5.15.0-1035-aws-x86_64-with-glibc2.31
- `accelerate` bash location: /home/ubuntu/abpani/FundName/myenv/bin/accelerate
- Python version: 3.10.14
- Numpy version: 2.1.1
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 186.72 GB
- GPU type: NVIDIA A10G
- `Accelerate` default config:
        Not found

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

DEEPSPEED_CONFIG = {
"fp16": {
"enabled": True
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": False
},
"overlap_comm": True,
"contiguous_gradients": True,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"gather_16bit_weights_on_model_save": True,
"round_robin_gradients": True
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 10,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": False
}
deepspeed_plugin = DeepSpeedPlugin(
hf_ds_config=DEEPSPEED_CONFIG )

accelerator = Accelerator(deepspeed_plugin=deepspeed_plugin, mixed_precision="fp16")

each GPU creates a string

message=[ f"Hello this is GPU {accelerator.process_index}" ]

collect the messages from all GPUs

messages=gather_object(message)

output the messages only on the main process with accelerator.print()

accelerator.print(messages)

['Hello this is GPU 0']

Expected behavior

It should show all 4 gpus in the output.
and with accelerate I am not able to fine tune Mistral nemo model with batch size of more than 1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerator.process_index only shows 0 in a 4 GPU env #3122

Accelerator.process_index only shows 0 in a 4 GPU env #3122

abpani commented Sep 19, 2024 •

edited

Loading

Accelerator.process_index only shows 0 in a 4 GPU env #3122

Accelerator.process_index only shows 0 in a 4 GPU env #3122

Comments

abpani commented Sep 19, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

each GPU creates a string

collect the messages from all GPUs

output the messages only on the main process with accelerator.print()

['Hello this is GPU 0']

Expected behavior

abpani commented Sep 19, 2024 •

edited

Loading