Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: minor changes adapting the code to our needs #1

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

MagdalenaKotynia
Copy link
Member

@MagdalenaKotynia MagdalenaKotynia commented Aug 16, 2024

Description of the changes

  • renamed bridge_orig to bridge to be able to run fine-tuning on example data from README.md (bridge_orig is not registered in the tensorflow_datasets registry)
  • added evaluate.py script - currently not useful because it turns out that metrics on validation dataset are not a good indicator (openvla contributor says to in the Issue).
  • commented the process of merging the adapter weights for each saved epoch and implemented it in separate merge_adapter_weights.py script to speed up the training process - merging adapter weights and saving them is very slow so it is optimal to do it only for chosen checkpoint.
  • added some configs for our dataset
  • modified the pyproject.toml to be able to create poetry shell for the project
  • added custom lr scheduling - experimental
  • refactored finetune.py script to not overwrite checkpoints but to save each new checkpoint in a separate directory
  • tried to use multiple camera views but it seems to not be supported for now - see comment in the Issue

How to run

The instructions on how to run on robo-srv-004 server.

Install openVLA

git clone [email protected]:RobotecAI/openvla.git
cd openvla
git checkout chore/adjust-code-for-fine-tuning
poetry env use python3.10
poetry shell
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0
pip install -e .
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Prepare dataset for training

A dataset is already prepared, you only need to copy it to your tensorflow_datasets located at your home directory

cd
mkdir tensorflow_datasets
cd tensorflow_datasets
cp -r /home/mkotynia/tensorflow_datasets/robotec_o3de_panda_dataset_vX . #replace X with the dataset version

Merge adapter weights

If you are resuming training, you need to first merge LoRA weights from the last saved checkpoint from the training which has to be resumed

cd openvla
# modify EXP variable (experiment name)
# modify the STEP variable (take latest saved checkpoint)
python merge_adapter_weights.py

Resume training

modify EXP and STEP and in finetune_robotec_resume.sh script

export CUDA_VISIBLE_DEVICES=0,1
./finetune_robotec_resume.sh
### Run fine-tuning
[`./finetune_robotec.sh`](https://github.com/RobotecAI/openvla/blob/chore/adjust-code-for-fine-tuning/finetune_robotec.sh) - adjust the paths in the script

@MagdalenaKotynia MagdalenaKotynia force-pushed the chore/adjust-code-for-fine-tuning branch from 09435f4 to d31ffea Compare August 23, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant