Enable Ascend NPU support #1758

MengqingCao · 2024-07-16T08:32:21Z

Description

Enable Ascend NPU backend for finetuning, inferencing and gradio webui.
Main changes:

modify the hard code related to cuda and abstract to device
add NPU related configure constraints

Motivation and Context

There are two benefits:

Abstracting device make sense for more backends to plugin, and Ascend NPU is a good example.
Allow Ascend NPU users to use axolotl for LLM finetuning, inferencing

Example

# preprocess datasets - optional but recommended
ASCEND_RT_VISIBLE_DEVICES=0 python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out"

# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./lora-out" --gradio

Screenshots

NPU supported CLI inference

NPU supported Gradio webui inference

Config

lora.yaml

base_model: openlm-research/open_llama_3b_v2
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: true
load_in_4bit: false
strict: false
push_dataset_to_hub:
datasets:
  - path: teknium/GPT4-LLM-Cleaned
    type: alpaca
dataset_prepared_path:
val_set_size: 0.02
adapter: lora
lora_model_dir:
sequence_len: 1024
sample_packing: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.0
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./outputs/lora-out
gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 4
optimizer: adamw_torch
torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
float32: true
bf16: false
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank: 0
logging_steps: 1
xformers_attention:
flash_attention: false
gptq_groupsize:
s2_attention:
gptq_model_v1:
warmup_steps: 20
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

winglian · 2024-07-16T17:26:40Z

src/axolotl/utils/models.py

-        max_memory = None
-
-    model_kwargs["device_map"] = device_map
+    set_model_device(cfg, max_memory, model_config, model_kwargs, device_map)


the way python passes by reference and updates them in the function feels a bit awkward here. Not sure right now what a good solution would be to make this more obvious.

I think the simplest way is making model_kwargs as the return value of func set_model_device. Simple but the obvious effect may be similar to the current one. And model_kwargs itself is mutable, this way probably make little sense.

A more complicated solution would be write a class ModelKwargs, with member functions like __init__, update_model_device, update_dtype, update_attention, update_quantization ... These funcs will be called in load_model, making load_model and the change of model_kwargs more clearly. However, this would bring a lot changes into src/axolotl/utils/models.py and may exsisting some issues, time is needed for validating it.

MengqingCao · 2024-07-19T03:25:05Z

Good day! @winglian I tried to create a class ModelKwargs, but with the modification of model_kwargs, there are many other operations such as patching, creating models, etc. And their judgment conditions seem inseparable.

Thus, finnaly I refactor the whole load_model func into a class ModelLoader. All the operations in original load_model func have been placed in several member functions and followed the original logical order.

This brings a lot changes, while making the model loading pipeline more clearly. Moreover, the changes of member variables such as model_kwargs are more obvious. But I am not sure whether the current function naming and pipeline splitting method is completely reasonable.

Please review the latest code and give me some suggestions. Thanks a lot!

MengqingCao · 2024-08-02T07:21:39Z

Hi, @winglian Could you help review the latest code in this PR? Let me know if the breaks brings by refactoring of the original code is not you want.

Just FYI, I accidentally deleted the original commit, and it cann be found in this branch.

1. add Ascend NPU backend support 2. refactor func load_model in src/axolotl/utils/models.py 3. refactor load_in_8bit as a kwarg

Yikun · 2024-09-12T03:29:33Z

Looks like it includes two parts in this commits Model Loaders reafactor and Ascend NPU support. Maybe we could spilit it as two PRs, the first one is Model Loaders reafactor, then we will rebase the Ascend NPU support PR after it.

Or do you have any other suggestions? @winglian Please feel free let us know if you have any more concern. Thanks!

MengqingCao · 2024-09-12T12:36:15Z

The refactoring of ModelLoder is split into #1909, and the support of Ascend NPU will be commited after #1909 . Hope this will make it easier to review and test. cc @winglian

winglian reviewed Jul 16, 2024

View reviewed changes

MengqingCao closed this Jul 19, 2024

MengqingCao force-pushed the npu_support branch from b8b169a to 7830fe0 Compare July 19, 2024 03:00

MengqingCao reopened this Jul 19, 2024

Add Ascend NPU support

8d39332

1. add Ascend NPU backend support 2. refactor func load_model in src/axolotl/utils/models.py 3. refactor load_in_8bit as a kwarg

MengqingCao force-pushed the npu_support branch from a9b5ca4 to 8d39332 Compare September 5, 2024 09:44

refactor cfg.load_in_Xbit to kwarg

58afbe0

MengqingCao mentioned this pull request Sep 12, 2024

Refactor func load_model to class ModelLoader #1909

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Ascend NPU support #1758

Enable Ascend NPU support #1758

MengqingCao commented Jul 16, 2024

winglian Jul 16, 2024

MengqingCao Jul 17, 2024

MengqingCao commented Jul 19, 2024

MengqingCao commented Aug 2, 2024

Yikun commented Sep 12, 2024 •

edited

Loading

MengqingCao commented Sep 12, 2024

Enable Ascend NPU support #1758

Are you sure you want to change the base?

Enable Ascend NPU support #1758

Conversation

MengqingCao commented Jul 16, 2024

Description

Motivation and Context

Example

Screenshots

NPU supported CLI inference

NPU supported Gradio webui inference

Config

winglian Jul 16, 2024

Choose a reason for hiding this comment

MengqingCao Jul 17, 2024

Choose a reason for hiding this comment

MengqingCao commented Jul 19, 2024

MengqingCao commented Aug 2, 2024

Yikun commented Sep 12, 2024 • edited Loading

MengqingCao commented Sep 12, 2024

Yikun commented Sep 12, 2024 •

edited

Loading