WIP: video_unet_generator_attn #669

wr0124 · 2024-07-10T07:44:50Z

wr0124 · 2024-07-10T07:55:55Z

UNet=((ResBlock+Attention )*2)*4 for input_blocks
python3 -W ignore::UserWarning train.py
--dataroot /data1/juliew/mini_dataset/online_mario2sonic_lite
--checkpoints_dir /data1/juliew/checkpoints
--name mario
--config_json examples/example_ddpm_mario.json
--gpu_ids 1
--output_display_env test_mario_unet
--output_display_freq 1
--output_print_freq 1
--G_diff_n_timestep_test 5
--G_diff_n_timestep_train 2000
--G_unet_mha_channel_mults 1 2 4 8
--G_unet_mha_res_blocks 2 2 2 2
--train_batch_size 1
--G_unet_mha_attn_res 1 2 4 8
--data_num_threads 1
~

beniz · 2024-07-11T06:30:57Z

models/modules/unet_generator_attn/video_unet_generator_attn.py

+                    )
+                ]
+                ch = int(mult * self.inner_channel)
+                if ds in attn_res:


Variable attn_res should not condition the motion module (MM). The MM is mandatory, not conditioned I believe.

Also, attn_res conditions the AttentionBlock in the frame-only UNet, and we should keep this code here as well.

This is because the MM is an addition to any configuration of the frame-only UNet.

beniz · 2024-07-11T06:32:39Z

models/modules/unet_generator_attn/video_unet_generator_attn.py

+                efficient=efficient,
+                freq_space=self.freq_space,
+            ),
+            MotionModule(


Let's verify this because :

frame-only UNet has a "within-frame" AttentionBlock here, that needs to be kept.

I'm not sure the MM applies to the bottleneck : please double-check in publications an code.

AttentionBlock is kept, MM is added after it

In the publication's code, whether MM is applied to the bottleneck depends on two options. However, in the two illustration figures in the publication, the bottleneck does not have MM.

beniz · 2024-07-11T06:32:50Z

models/modules/unet_generator_attn/video_unet_generator_attn.py

+                    )
+                ]
+                ch = int(self.inner_channel * mult)
+                if ds in attn_res:


Same remark here.

wr0124 · 2024-07-18T11:36:16Z

Since joliGEN DDPM temporal use_temporal, it creates tensor in the shape(b,f,c,h,w), which differs from the priginal paper's formate of (b,c,f,h,w). So, in this version, all tensor flow is the formate of (b,f,c,h,w). Due to compatibility with other models in joliGEN, it may be advantageous to treat the tensor in 4D format during trainning ?

wr0124 · 2024-07-18T14:46:44Z

python3 -W ignore::UserWarning train.py
--dataroot /data1/juliew/dataset/online_mario2sonic_full_mario
--checkpoints_dir /data1/juliew/checkpoints
--name mario_temporal
--config_json examples/example_ddpm_mario.json
--gpu_ids 2
--output_display_env test_mario_temporal
--output_print_freq 1
--output_display_freq 1
--data_dataset_mode self_supervised_temporal_labeled_mask_online
--train_batch_size 1
--train_iter_size 4
--data_temporal_number_frames 4
--data_temporal_frame_step 1
--data_num_threads 1
--train_temporal_criterion
--G_diff_n_timestep_test 1000
--G_diff_n_timestep_train 2000
--train_temporal_criterion_lambda 1.0
--G_netG unet_vid
--data_online_creation_crop_size_A 64
--data_online_creation_crop_size_B 64
--data_crop_size 64
--data_load_size 64
--G_unet_mha_attn_res 1 2 4 8
--output_verbose \

this UNetVid has 4 blocks of (ResBlock+Attention+MM) down and up with Middle block (ResBlock+Attention+ResBlock). It has a similar architecture to the paper, but due to GPU limitations, it can not handle image sizes of 128.
batchsize bug

wr0124 · 2024-07-22T07:22:20Z

works with command line
python3 -W ignore::UserWarning train.py
--dataroot /data1/juliew/dataset/online_mario2sonic_full_mario
--checkpoints_dir /data1/juliew/checkpoints
--name mario_antoine
--gpu_ids 2
--output_display_env test_mario_antoine
--model_type palette
--output_print_freq 1
--output_display_freq 1
--data_dataset_mode self_supervised_temporal_labeled_mask_online
--train_batch_size 1
--train_iter_size 1
--model_input_nc 3
--model_output_nc 3
--data_relative_paths
--train_G_ema
--train_optim adamw
--train_temporal_criterion_lambda 1.0
--G_netG unet_vid
--data_online_creation_crop_size_A 64
--data_online_creation_crop_size_B 64
--data_crop_size 64
--data_load_size 64
--G_unet_mha_attn_res 16
--data_online_creation_rand_mask_A
--train_G_lr 0.0001
--dataaug_no_rotate
--G_diff_n_timestep_train 5
--G_diff_n_timestep_test 6
--data_temporal_number_frames 4
--data_temporal_frame_step 1
--data_num_threads 4
--UNetVid \

wr0124 · 2024-07-22T14:04:19Z

python3 -W ignore::UserWarning train.py
--dataroot /data1/juliew/dataset/online_mario2sonic_full_mario
--checkpoints_dir /data1/juliew/checkpoints
--name mario_vid_bs1
--gpu_ids 2
--model_type palette
--output_print_freq 1
--output_display_freq 1
--data_dataset_mode self_supervised_temporal_labeled_mask_online
--train_batch_size 1
--train_iter_size 4
--model_input_nc 3
--model_output_nc 3
--data_relative_paths
--train_G_ema
--train_optim adamw
--train_temporal_criterion_lambda 1.0
--G_netG unet_vid
--data_online_creation_crop_size_A 64
--data_online_creation_crop_size_B 64
--data_crop_size 64
--data_load_size 64
--G_unet_mha_attn_res 1 2 4 8
--data_online_creation_rand_mask_A
--train_G_lr 0.0001
--dataaug_no_rotate
--G_diff_n_timestep_train 8
--G_diff_n_timestep_test 6
--data_temporal_number_frames 10
--data_temporal_frame_step 1
--data_num_threads 8
--UNetVid
--output_verbose \

due to broadcasting in PyTorch, this works when batch size is 1. But when batch size is larger than 1, diffusion_generator.py which mainly works with 4D tensor will encounter an issue.
this setting reaches "22625 / 24564 MB" on one GPU

wr0124 · 2024-07-26T13:15:05Z

lanch inference

cd scripts/
python3 gen_vid_diffusion.py
--model_in_file /data1/juliew/checkpoints/mario_vid_bs1/latest_net_G_A.pth
--img_in /data1/juliew/mini_dataset/online_mario2sonic_video/trainA/paths_part.txt
--paths_file /data1/juliew/mini_dataset/online_mario2sonic_video/trainA/paths_part.txt
--mask_in /data1/juliew/mini_dataset/online_mario2sonic_video/trainA/paths_part.txt
--data_root /data1/juliew/mini_dataset/online_mario2sonic_video/
--dir_out ../inference_mario
--img_width 128
--img_height 128 \

wr0124 · 2024-07-29T20:35:23Z

create videos by this command_line:

cd scripts/
python3 gen_vid_diffusion.py
--model_in_file /data1/juliew/checkpoints/test_vid/latest_net_G_A.pth
--img_in /data1/paths_part.txt
--paths_file /data1/juliew/ori_dataset/online_mario2sonic_full/trainA/paths_part4.txt
--mask_in /paths_part.txt
--data_root /data1/juliew/ori_dataset/online_mario2sonic_full/
--dir_out ../inference_mario_vid
--img_width 128
--img_height 128
--nb_samples 2 \

beniz · 2024-08-01T14:11:28Z

models/palette_model.py

+            for k in range(min(nb_imgs, self.get_current_batch_size())):
+                self.fake_B_pool.query(self.visuals[k : k + 1])
+
+        if self.opt.G_netG == "unet_vid":


beniz · 2024-08-01T19:08:03Z

models/modules/unet_generator_attn/unet_generator_attn_vid.py

+                efficient=efficient,
+                freq_space=self.freq_space,
+            ),
+            #            MotionModule(


Remove commented code ?

beniz · 2024-08-01T19:08:26Z

models/modules/unet_generator_attn/unet_generator_attn_vid.py

+
+        # attention, what we cannot get enough of
+        ###attention_score get
+        # hidden_states_select = self._attention(query, key, value, attention_mask)


remove commented code ?

beniz · 2024-08-01T19:14:17Z

data/__init__.py

@@ -61,11 +61,20 @@ def create_dataloader(opt, rank, dataset, batch_size):


 def create_dataset_temporal(opt, phase):
-    dataset_class = find_dataset_using_name("temporal_labeled_mask_online")
+    dataset_class = find_dataset_using_name(


I believe this function needs to be change so that either temporal_labeled_mask_online or self_supervised_temporal_labeled_mask_online is selected based on whether cut or palette is running.

beniz · 2024-08-01T19:14:44Z

data/self_supervised_temporal_labeled_mask_online_dataset.py

+        # sort
+        self.A_img_paths.sort(key=natural_keys)
+        self.A_label_mask_paths.sort(key=natural_keys)
+        if self.use_domain_B:


In self_supervised dataloader, domain B is not needed

wr0124 · 2024-08-05T08:20:40Z

create one unite test file "test_run_video_diffusion_online.py " for unite test

wr0124 · 2024-08-14T08:41:10Z

during inference, additional frames beyong the specified opt.data_temporal_number_frames can be added for video generation, but according to the literature, this often results in degraded outcomes. the additional_frame in gen_vid_diffusion file needs to be tested when its value is negative.

beniz · 2024-08-19T11:15:39Z

scripts/gen_vid_diffusion.py

@@ -346,7 +346,8 @@ def generate(
                bbox_select[3] = min(img.shape[0], bbox_select[3])
            else:
                bbox = bboxes[bbox_idx]
-
+            opt.data_online_creation_load_size_A = (1280, 720)


In general we don´t want hardcoded values here.

we temporarily did this to do inference with your model of bdd100k_vid_64_2, since in this model opt.data_online_creation_load_size_A is 720. Normally, this hardcoded line is not required. It is delected.

beniz · 2024-08-19T11:48:08Z

tests/test_run_vid_diffusion_online.py

+    "train_batch_size": 1,
+    "data_temporal_number_frames": 8,
+    "data_temporal_frame_step": 1,
+    "G_diff_n_timestep_train": 6,


Beware I don´t believe you can theoretically have timestep_test < timestep_train.

maybe I misunderstood opt.G_diff_n_timestep_train is 2000 and opt.G_diff_n_timestep_test is 1000 in defalut setting ?

I had overwritten "G_diff_n_timestep_test", it is corrected now.

…poral consistency and inference feat(ml):step2 replace AttentionBlock by MotionModule. ResBlock/MotionModule class instance pass feat(ml):UNet=ResBlock+Attention(optional)+MM feat(ml): create UNetVid class with temporal MHA for U-Net feat(ml):add dataloader feat(ml): dataloader works with UNet feat(ml): dataloader and UNetVid works for input (b,f,c,h,w),not visdom yet feat(ml):visdom shows the trainning feat(ml):dataloader with mask feat(ml): dataloader fixed with command-line feat(ml): visdom show one batch of frame feat(ml): frame is treated as a batch, so no additional normailisation is needed feat(ml): inference for UNetVid feat(ml): use efficient_attention_xformers for attention feat(ml): xformer bug PR feat(ml): create video based on generated and orig images feat(ml):remove unnecessary option --UNetVid feat(ml): add doc for trainning and inference feat(ml): fix inference paths requirement feat(ml): improve the inference for any paths.txt and longer frames feat(ml):unite test only for vid feat(ml): debug for unite test on metrics doc: modify scripte for inference feat(ml):debug inference paths_file feat(ml): add one option for max frame feat(ml): inference debug bbox_in not img_in, and for bdd100k video feat(ml): delect hardcoding in inference feat(ml): dataloader load frames from same video feat(ml): adapt processing of frames from either a video series or a single video

wr0124 changed the title ~~feat(ml):step1 ResBlock input/output 5D tensor image~~ WIP: video_gen Jul 10, 2024

wr0124 changed the title ~~WIP: video_gen~~ WIP: video_unet_generator_attn Jul 10, 2024

wr0124 force-pushed the video_gen branch from 4037ae5 to 6917d47 Compare July 10, 2024 14:03

beniz reviewed Jul 11, 2024

View reviewed changes

wr0124 added model:arch data:dataloader labels Jul 13, 2024

beniz assigned wr0124 Jul 24, 2024

wr0124 requested review from beniz and royale July 29, 2024 20:35

wr0124 force-pushed the video_gen branch from 45b22f2 to a4a730e Compare August 1, 2024 14:06

beniz reviewed Aug 1, 2024

View reviewed changes

jolibrain deleted a comment from wr0124 Aug 1, 2024

wr0124 force-pushed the video_gen branch from 9f6eca9 to 4c9c920 Compare August 1, 2024 15:08

beniz reviewed Aug 1, 2024

View reviewed changes

wr0124 force-pushed the video_gen branch from 2d8a8e3 to 2e76af1 Compare August 2, 2024 08:22

wr0124 requested a review from beniz August 2, 2024 08:25

wr0124 force-pushed the video_gen branch from 90876b0 to 2ef3726 Compare August 2, 2024 10:42

royale approved these changes Aug 6, 2024

View reviewed changes

wr0124 force-pushed the video_gen branch 3 times, most recently from c7888cb to 1fe3f60 Compare August 9, 2024 13:44

beniz reviewed Aug 19, 2024

View reviewed changes

wr0124 force-pushed the video_gen branch from 3fe1378 to 5e55340 Compare August 21, 2024 12:09

beniz merged commit 43b7018 into jolibrain:master Aug 21, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: video_unet_generator_attn #669

WIP: video_unet_generator_attn #669

wr0124 commented Jul 10, 2024 •

edited

Loading

wr0124 commented Jul 10, 2024

beniz Jul 11, 2024

beniz Jul 11, 2024

wr0124 Jul 15, 2024 •

edited

Loading

wr0124 Jul 28, 2024

beniz Jul 11, 2024

wr0124 commented Jul 18, 2024 •

edited

Loading

wr0124 commented Jul 18, 2024 •

edited

Loading

wr0124 commented Jul 22, 2024

wr0124 commented Jul 22, 2024 •

edited

Loading

wr0124 commented Jul 26, 2024

wr0124 commented Jul 29, 2024

beniz Aug 1, 2024

beniz Aug 1, 2024

beniz Aug 1, 2024

beniz Aug 1, 2024

beniz Aug 1, 2024

wr0124 commented Aug 5, 2024

wr0124 commented Aug 14, 2024

beniz Aug 19, 2024

wr0124 Aug 19, 2024 •

edited

Loading

beniz Aug 19, 2024

wr0124 Aug 19, 2024

wr0124 Aug 20, 2024

WIP: video_unet_generator_attn #669

WIP: video_unet_generator_attn #669

Conversation

wr0124 commented Jul 10, 2024 • edited Loading

wr0124 commented Jul 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wr0124 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wr0124 commented Jul 18, 2024 • edited Loading

wr0124 commented Jul 18, 2024 • edited Loading

wr0124 commented Jul 22, 2024

wr0124 commented Jul 22, 2024 • edited Loading

wr0124 commented Jul 26, 2024

wr0124 commented Jul 29, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wr0124 commented Aug 5, 2024

wr0124 commented Aug 14, 2024

Choose a reason for hiding this comment

wr0124 Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wr0124 commented Jul 10, 2024 •

edited

Loading

wr0124 Jul 15, 2024 •

edited

Loading

wr0124 commented Jul 18, 2024 •

edited

Loading

wr0124 commented Jul 18, 2024 •

edited

Loading

wr0124 commented Jul 22, 2024 •

edited

Loading

wr0124 Aug 19, 2024 •

edited

Loading