chore: minor changes adapting the code to our needs #1

MagdalenaKotynia · 2024-08-16T13:45:20Z

Description of the changes

renamed bridge_orig to bridge to be able to run fine-tuning on example data from README.md (bridge_orig is not registered in the tensorflow_datasets registry)
added evaluate.py script - currently not useful because it turns out that metrics on validation dataset are not a good indicator (openvla contributor says to in the Issue).
commented the process of merging the adapter weights for each saved epoch and implemented it in separate merge_adapter_weights.py script to speed up the training process - merging adapter weights and saving them is very slow so it is optimal to do it only for chosen checkpoint.
added some configs for our dataset
modified the pyproject.toml to be able to create poetry shell for the project
added custom lr scheduling - experimental
refactored finetune.py script to not overwrite checkpoints but to save each new checkpoint in a separate directory
tried to use multiple camera views but it seems to not be supported for now - see comment in the Issue

How to run

The instructions on how to run on robo-srv-004 server.

Install openVLA

git clone [email protected]:RobotecAI/openvla.git
cd openvla
git checkout chore/adjust-code-for-fine-tuning
poetry env use python3.10
poetry shell
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0
pip install -e .
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Prepare dataset for training

A dataset is already prepared, you only need to copy it to your tensorflow_datasets located at your home directory

cd
mkdir tensorflow_datasets
cd tensorflow_datasets
cp -r /home/mkotynia/tensorflow_datasets/robotec_o3de_panda_dataset_vX . #replace X with the dataset version

Merge adapter weights

If you are resuming training, you need to first merge LoRA weights from the last saved checkpoint from the training which has to be resumed

cd openvla
# modify EXP variable (experiment name)
# modify the STEP variable (take latest saved checkpoint)
python merge_adapter_weights.py

Resume training

modify EXP and STEP and in finetune_robotec_resume.sh script

export CUDA_VISIBLE_DEVICES=0,1
./finetune_robotec_resume.sh
### Run fine-tuning
[`./finetune_robotec.sh`](https://github.com/RobotecAI/openvla/blob/chore/adjust-code-for-fine-tuning/finetune_robotec.sh) - adjust the paths in the script

…ean distance in wandb

MagdalenaKotynia requested a review from maciejmajek August 16, 2024 14:19

MagdalenaKotynia added 8 commits August 23, 2024 15:28

chore: minor changes adapting the code to our needs

07c0fe0

feat: bash script to finetune

23ddb87

chore: added note

5a637ab

fix: removed using bridge transforms

59dd7ea

chore: added new datasets configs

a878022

chore: used default lr

50298db

chore: set custom lr defaultly as False, added logging actions Euclid…

c9b10a3

…ean distance in wandb

chore: updated .gitignore

d31ffea

MagdalenaKotynia force-pushed the chore/adjust-code-for-fine-tuning branch from 09435f4 to d31ffea Compare August 23, 2024 14:26

chore: added transform to our dataset

daf6ddf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: minor changes adapting the code to our needs #1

chore: minor changes adapting the code to our needs #1

MagdalenaKotynia commented Aug 16, 2024 •

edited

Loading

chore: minor changes adapting the code to our needs #1

Are you sure you want to change the base?

chore: minor changes adapting the code to our needs #1

Conversation

MagdalenaKotynia commented Aug 16, 2024 • edited Loading

Description of the changes

How to run

Install openVLA

Prepare dataset for training

Merge adapter weights

Resume training

MagdalenaKotynia commented Aug 16, 2024 •

edited

Loading