[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz · 2024-08-12T09:25:21Z

Thanks for the opportunity to work on this model!

The Abstract of the paper (emphasis is mine):

Diffusion models are the de-facto approach for generating high-quality images and videos but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space, or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion (MDM), an end-to-end framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small-scale inputs are nested within those of large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024 × 1024 pixels, demonstrating strong zero-shot generalization using the CC12M dataset, which contains only 12 million images.

Paper: 🪆Matryoshka Diffusion Models
Repository: https://github.com/apple/ml-mdm
License: MIT license

Key takeaways from the paper:

VAE: None; since Matryoshka Diffusion Models work on the (extended) pixel space(s).
Text-encoder: flan-t5-xl
Enables:
1. a multi-resolution loss that greatly improves the convergence speed of high-resolution input denoising.
2. an efficient progressive training schedule, that starts by training a low-resolution diffusion model and gradually adds high-resolution inputs and outputs following a schedule. This speeds up the overall convergence.
MDM allows us to train high-resolution models without resorting to cascaded (Since each model is trained separately, the generation quality can be bottlenecked by the exposure bias (Bengio et al., 2015) from imperfect predictions and several models need to be trained corresponding to different resolutions.) or latent diffusion (This not only increases the complexity of learning but also bounds the generation quality due to the lossy compression process.), and other end-to-end models (without fully considering the innate structure of hierarchical generation, their results lag behind cascaded and latent models.)
Resolution-specific noise schedules are used.
Allocating more computation in the low-resolution feature maps.
MDM has extensive parameter sharing across resolutions.
Authors see that increasing from two resolution levels to three consistently improves the model's convergence. Note that increasing the number of nesting levels brings only negligible costs.
LDM and MDM methods are complementary. It is possible to build MDM on top of autoencoder codes.

TODOs:
✅ The U-Net; in other words, the inner-most structure, NestedUNet2DConditionModel(nesting_level=0); approximately would be as follows:

UNet2DConditionModel(in_channels=3, out_channels=3, block_out_channels=(256, 512, 768),
		cross_attention_dim=2048, resnet_time_scale_shift='scale_shift',
		down_block_types=('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D'),
		up_block_types=('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D'),
		ff_act_fn='gelu', transformer_layers_per_block=[0, 1, 5],
		use_linear_projection='no_projection', attention_bias=True,
		norm_type='layer_norm_matryoshka', ff_norm_type='group_norm_matryoshka',
		cross_attention_norm='layer_norm', attention_pre_only=True,
		encoder_hid_dim_type='text_proj', encoder_hid_dim=2048,
		flip_sin_to_cos=False, masked_cross_attention=False,
		micro_conditioning_scale=64, addition_embed_type='matryoshka')

⏳ Scheduler(s)
⬜ NestedUNet2DConditionModel(nesting_level=(1, 2))
✅ convert_matryoshka_model_to_diffusers.py
⏳ Verify outputs with the original implementation for:

⏳ 64×64, nesting_level=0
⬜ 256×256, nesting_level=1
⬜ 1024×1024, nesting_level=2

⬜ Show example results
⏳ Upload converted checkpoints to HF
⬜ README.md
❓ examples/**/train_matryoshka.py

sayakpaul · 2024-08-13T02:20:14Z

@tolgacangoz would you have cycles to work on this soon? Another contributor has expressed interest in working on it. Maybe you two could collaborate?

tolgacangoz · 2024-08-13T08:05:41Z

I am into the inference code atm. Will the training code in examples/**/train_matryoshka.py be implemented as well (since this is a very efficient model in training)? If so, he can take this up.

sayakpaul · 2024-08-13T08:08:27Z

For now, we don't have to focus on training.

…t_down_block` for FF layers in attention

…tEmbedding`

…r for Matryoshka models

…tryoshkaTransformer2DModelOutput`

…ing`

a

f666908

tolgacangoz changed the title ~~Add Matryoshka Diffusion Models~~ Add 🪆Matryoshka Diffusion Models Aug 12, 2024

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c092728

tolgacangoz and others added 14 commits August 13, 2024 18:02

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

cfe8dcc

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c1b6c0f

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

60021b8

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

0ad7101

refactor: add ff_act_fn parameter to UNet2DConditionModel and `ge…

aabac0a

…t_down_block` for FF layers in attention

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

ad4c6a3

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

649baa6

Study as an ordinary UNet model

279d613

make style

5f5bd08

make fix-copies

bfd8b9d

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c3b004b

Up

eaef037

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

2e99ec7

Up

99d9099

tolgacangoz changed the title ~~Add 🪆Matryoshka Diffusion Models~~ [Community Pipeline] Add 🪆Matryoshka Diffusion Models Sep 7, 2024

tolgacangoz added 9 commits September 7, 2024 15:33

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

33dd50d

Fix timestep embedding conditioning in `MatryoshkaCombinedTimestepTex…

56e61f0

…tEmbedding`

make style

376500a

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

4c16e5b

Revert; cuz I should have created (probably) a new attention processo…

8c4dcb3

…r for Matryoshka models

Revert to create your own custom transformer block

ef38541

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

8eadb30

Init template for the pipeline

19d6c17

Add MatryoshkaTransformerBlock and MatryoshkaFeedForward classes

7d1a0ab

tolgacangoz added 24 commits September 16, 2024 21:18

Revert

5754bc6

Add GELU activation function module

bc1f68b

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

906298b

Revert

23f4ced

Revert

a2ca8ef

make fix-copies

bcd8939

All in one file

e014e3e

Up

f264b9f

Replace MatryoshkaTransformerBlock with MatryoshkaTransformer2DModel

1a40f68

make style

221c954

Refactor MatryoshkaTransformer2DModel to add forward()and add `Ma…

c75e723

…tryoshkaTransformer2DModelOutput`

make style

e5db6e3

Up

728fb42

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

5b5747d

Remove redundant attention projections in MatryoshkaTransformerBlock

464600d

Up

36d9d29

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

b0bf23f

Fix shape issue

9e37e00

Up

b573182

make style

f35a8f9

Up

0f6bce5

Refactor condition embedding in `MatryoshkaCombinedTimestepTextEmbedd…

1d48420

…ing`

Adapt DDIMScheduler for x_0 prediction by exploiting gammas

b476da9

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

a3fc84d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz commented Aug 12, 2024 •

edited

Loading

sayakpaul commented Aug 13, 2024

tolgacangoz commented Aug 13, 2024

sayakpaul commented Aug 13, 2024

[Community Pipeline] Add 🪆Matryoshka Diffusion Models #9157

Are you sure you want to change the base?

[Community Pipeline] Add 🪆Matryoshka Diffusion Models #9157

Conversation

tolgacangoz commented Aug 12, 2024 • edited Loading

sayakpaul commented Aug 13, 2024

tolgacangoz commented Aug 13, 2024

sayakpaul commented Aug 13, 2024

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz commented Aug 12, 2024 •

edited

Loading