Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue of tensors share memory #591

Open
heraldiclily opened this issue Apr 1, 2024 · 2 comments
Open

Issue of tensors share memory #591

heraldiclily opened this issue Apr 1, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@heraldiclily
Copy link

heraldiclily commented Apr 1, 2024

🐛 Describe the bug

I'm facing this issue of shared memory when training LLM models using TRLX

Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'base_model.transformer.wte.weight', 'base_model.lm_head.weight'}].
A potential way to correctly save your model is to use save_model.

Most of forums are recommending the below configuration to fix the issue for non-RL applications:

"save_safetensors=false"

Unfortunately, TRLX library doesn’t offer this argument which is part of Transformers module. Is there any way to define it equivalently in order to resolve the “tensors share memory” problem.

Which trlX version are you using?

0.7.0

Additional system and package information

Linux 20.04
python 3.11.8
pytorch 2.2.2

@heraldiclily heraldiclily added the bug Something isn't working label Apr 1, 2024
@RekkimiARG
Copy link

do you have any solution? i meet the same problem.

@PamKing7
Copy link

I seem to have solved this problem by setting safe_serialization = False on line 99 of the python 3.10/site-packages/accelerate/checkpointing.py library to False, and saving the model will use the torc.save () method by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants