You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using wsl 2 with ubuntu-22.04, this is the gpu information
when i run "sudo lshw -C display"
I install torch using this command
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
when i run command
"torchrun --nproc_per_node 1 example_instructions.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 4",
it has error: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'.
These are full log:
:/mnt/c/Users/john.john/codelama/weight/codellama-main$ torchrun --nproc_per_node 1 example_instructions.py \
kpt_dir> --ckpt_dir CodeLlama-7b-Instruct/ \
> --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
> --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 38.10 seconds
Traceback (most recent call last):
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 68, in <module>
fire.Fire(main)
File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 51, in main
results = generator.chat_completion(
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 351, in chat_completion
generation_tokens, generation_logprobs = self.generate(
File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 164, in generate
logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 300, in forward
h = layer(h, start_pos, freqs_cis, (mask.to(device) if mask is not None else mask))
File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 252, in forward
h = x + self.attention.forward(
File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 165, in forward
xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/john/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 290, in forward
output_parallel = F.linear(input_parallel, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 433) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/john/.local/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
example_instructions.py FAILED
Failures:
<NO_OTHER_FAILURES>
Root Cause (first observed failure):
[0]:
time : 2024-01-24_16:51:25
host : company
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 433)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
The text was updated successfully, but these errors were encountered:
hi @akashdhruv , My pc has nvidia GPU, please check above screenshot.
It is suppose to run on GPU. do you know why it only runs on CPU?
I think you need to look into your system and torchrun configuration to figure out why GPU is not being identified. Is your PyTorch installed with GPU support? If yes maybe try,
I am using wsl 2 with ubuntu-22.04, this is the gpu information
when i run "sudo lshw -C display"
I install torch using this command
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url
https://download.pytorch.org/whl/cu113
when i run command
"torchrun --nproc_per_node 1 example_instructions.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 4",
it has error: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'.
These are full log:
The text was updated successfully, but these errors were encountered: