[examples] add train flux-controlnet scripts in example. #9324

PromeAIpro · 2024-08-30T02:27:34Z

What does this PR do?

In this commit we add train flux-controlnet scripts in examples, and tested it on A100-SXM4-80GB.

Using this train script, We can customize the number of layers of the transformer, by setting --num_double_layers=4 --num_single_layers=0 , by this setting, the GPU memory demand is 60G, with batchsize 2, and 512 resolution.

discussed in #9085

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

examples/controlnet/train_controlnet_flux.py

yiyixuxu · 2024-09-04T01:38:07Z

@haofanwang @wangqixun
would you be willing to give this a review if you have time?

HuggingFaceDocBuilderDev · 2024-09-04T01:43:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

linjiapro · 2024-09-11T23:42:34Z

@PromeAIpro

Can we have some sample training results (such as images) from this script attached in the doc, or anywhere suitable?

PromeAIpro · 2024-09-13T07:30:37Z

Here are some training results by lineart controlnet.

input	output	prompt
		cute anime girl with massive fluffy fennec ears and a big fluffy tail blonde messy long hair blue eyes wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
		a busy urban intersection during daytime. The sky is partly cloudy with a mix of blue and white clouds. There are multiple traffic lights, and vehicles are seen waiting at the red signals. Several businesses and shops are visible on the side, with signboards and advertits. The road is wide, and there are pedestrian crossings. Overall, it appears to be a typical day in a bustling city.

First train on 512res and then fine-tune with 1024res

examples/controlnet/README_flux.md

sayakpaul · 2024-09-13T08:37:31Z

examples/controlnet/README_flux.md

+* `report_to="tensorboard` will ensure the training runs are tracked on Weights and Biases.
+* `validation_image`, `validation_prompt`, and `validation_steps` to allow the script to do a few validation inference runs. This allows us to qualitatively check if the training is progressing as expected.
+
+Our experiments were conducted on a single 40GB A100 GPU.


Wow, 40GB A100 seems doable.

I'm sorry, this is the 80g A100 (I wrote it wrong), I did a lot of extra work to get it to train with the zero3 on the 40g A100, but I don't think this is suitable for everyone

Not at all. I think it would still be nice to include the changes you had to make in the form of notes in the README. Does that work?

I'll see if I can add it later.

@sayakpaul We added a tutorial on configuring deepspeed in the readme.

There are some tricks to lower GPU:

gradient_checkpointing

bf16 or fp16.

batch size 1, and then use gradient_accumulation_steps above 1

With 1, 2, 3, can this thing be controlled to be trained under 40GB?

According to my practice, deepspeedzero3 must be used, @linjiapro your settings will cost about 70g when 1024 with bs 1 or 512 with bs 3.

sorry to bother you, have you ever tried cache text-encoder and vae latents to run with lower GPU？ @PromeAIpro @linjiapro

cache text-encoder is already available in this script (saving about 10g of gpu memory on T5), about cache vae You can check how to use deepspeed in the readme, which includes cache vae.

examples/controlnet/README_flux.md

sayakpaul

Hi, thanks for your PR. I just left some initial comments. LMK what you think.

Co-authored-by: Sayak Paul <[email protected]>

sayakpaul

Thanks! Appreciate your hard work here. Left some more comments.

examples/controlnet/README_flux.md

src/diffusers/pipelines/flux/pipeline_flux_controlnet.py

examples/controlnet/train_controlnet_flux.py

sayakpaul · 2024-09-14T12:21:53Z

Can we fix the code quality issues? make quality && make style?

Co-authored-by: Sayak Paul <[email protected]>

…ain_memory,report_to=wandb

sayakpaul

Left some additional minor comments but I see existing comments are yet to be addressed. Let me know when you would like another round of review.

examples/controlnet/README_flux.md

examples/controlnet/train_controlnet_flux.py

Laidawang · 2024-09-16T11:49:38Z

@sayakpaul hey, I think I have fixed all the issues, time to start a new review.

Co-authored-by: Sayak Paul <[email protected]>

sayakpaul · 2024-09-18T15:10:30Z

examples/controlnet/train_controlnet_flux.py

+                # for weighting schemes where we sample timesteps non-uniformly
+                u = compute_density_for_timestep_sampling(
+                    weighting_scheme=args.weighting_scheme,
+                    batch_size=bsz,
+                    logit_mean=args.logit_mean,
+                    logit_std=args.logit_std,
+                    mode_scale=args.mode_scale,
+                )
+                indices = (u * noise_scheduler_copy.config.num_train_timesteps).long()
+                timesteps = noise_scheduler_copy.timesteps[indices].to(device=pixel_latents.device)
+
+                # Add noise according to flow matching.
+                sigmas = get_sigmas(timesteps, n_dim=pixel_latents.ndim, dtype=pixel_latents.dtype)
+                noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise


I thought we were using a different timestep sampling procedure and I suggested to have that as a default. Are we not doing that anymore?

Do you mean to set the original sampling scheme as default?

For the weighting schema i just copied from here.

Yeah I meant to keep the sigmoid sampling as your default and let users configure it as we do in the other scripts.

Could you please write it down briefly? I'm not sure how to edit it. It seems to me that if you use logit_normal, you should be using sigmoid?

Just need to change weighting_scheme from the default value to logit_normal?

Okay. But it depends on an std and mean. IIRC your scheme did torch.randn() and applied sigmoid right?

Yes, this uses torch.randn() at first, but after given the examples you provided, I think this is maybe a better solution for us？

sayakpaul

Left some comments but my concerns:

Why remove the previous timesteps computing scheme?
Let's provide a reasonable ControlNet checkpoint derived from your experiments.

LMK if anything is unclear.

sayakpaul · 2024-09-19T02:20:43Z

@PromeAIpro we didn't have to close this PR. Is there anything we could do to revive this PR? We could very much like to do that. Please let us know.

PromeAIpro · 2024-09-19T02:27:37Z

@PromeAIpro we didn't have to close this PR. Is there anything we could do to revive this PR? We could very much like to do that. Please let us know.

sry, i do it by mistake

…/diffusers into flux-controlnet-train

sayakpaul

Thanks. I think this is looking good. Some minor comments.

Also, we would need to add tests like in https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py.

@yiyixuxu could you review the changes made to the ControlNet pipeline?

examples/controlnet/README_flux.md

sayakpaul · 2024-09-19T08:31:17Z

examples/controlnet/train_controlnet_flux.py

+                # for weighting schemes where we sample timesteps non-uniformly
+                u = compute_density_for_timestep_sampling(
+                    weighting_scheme=args.weighting_scheme,
+                    batch_size=bsz,
+                    logit_mean=args.logit_mean,
+                    logit_std=args.logit_std,
+                    mode_scale=args.mode_scale,
+                )
+                indices = (u * noise_scheduler_copy.config.num_train_timesteps).long()
+                timesteps = noise_scheduler_copy.timesteps[indices].to(device=pixel_latents.device)
+
+                # Add noise according to flow matching.
+                sigmas = get_sigmas(timesteps, n_dim=pixel_latents.ndim, dtype=pixel_latents.dtype)
+                noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise


Okay. But it depends on an std and mean. IIRC your scheme did torch.randn() and applied sigmoid right?

…/diffusers into flux-controlnet-train

PromeAIpro · 2024-09-19T09:52:01Z

Thanks. I think this is looking good. Some minor comments.

Also, we would need to add tests like in https://github.com/huggingface/diffusers/blob/main/examples/controlnet/test_controlnet.py.

@yiyixuxu could you review the changes made to the ControlNet pipeline?

added test in test_controlnet

PromeAIpro added 2 commits August 30, 2024 01:55

add train flux-controlnet scripts in example.

8ab9b5b

fix error

4a53573

Mason-McGough reviewed Aug 31, 2024

View reviewed changes

examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved

PromeAIpro and others added 2 commits September 1, 2024 19:51

fix subfolder error

14e9970

Merge branch 'main' into flux-controlnet-train

3bb431c

yiyixuxu requested a review from sayakpaul September 4, 2024 01:37

PromeAIpro added 4 commits September 4, 2024 03:45

fix preprocess error

973c6fb

Merge branch 'flux-controlnet-train_x' into flux-controlnet-train

599c984

Merge branch 'main' into flux-controlnet-train

24b58f8

Merge branch 'main' into flux-controlnet-train

22a3e10

Merge branch 'main' into flux-controlnet-train

32eb1ef

sayakpaul reviewed Sep 13, 2024

View reviewed changes

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved

sayakpaul reviewed Sep 13, 2024

View reviewed changes

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved

sayakpaul reviewed Sep 13, 2024

View reviewed changes

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved

sayakpaul reviewed Sep 13, 2024

View reviewed changes

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved

sayakpaul reviewed Sep 13, 2024

View reviewed changes

PromeAIpro and others added 7 commits September 13, 2024 17:22

Update examples/controlnet/README_flux.md

57d143b

Co-authored-by: Sayak Paul <[email protected]>

Update examples/controlnet/README_flux.md

af1b7a5

Co-authored-by: Sayak Paul <[email protected]>

fix readme

d19b101

fix note error

64251ac

add some Tutorial for deepspeed

c98d43f

fix some Format Error

569e0de

Merge branch 'main' into flux-controlnet-train

916fd80

sayakpaul reviewed Sep 14, 2024

View reviewed changes

PromeAIpro and others added 4 commits September 15, 2024 23:24

Update examples/controlnet/README_flux.md

b03cb01

Co-authored-by: Sayak Paul <[email protected]>

Merge branch 'main' into flux-controlnet-train

7b98459

update,push_to_hub,save_weight_dtype,static method,clear_objs_and_ret…

443f251

…ain_memory,report_to=wandb

add push to hub in readme

bc68f1a

sayakpaul reviewed Sep 16, 2024

View reviewed changes

examples/controlnet/README_flux.md Show resolved Hide resolved

examples/controlnet/README_flux.md Outdated Show resolved Hide resolved

examples/controlnet/train_controlnet_flux.py Outdated Show resolved Hide resolved

examples/controlnet/train_controlnet_flux.py Show resolved Hide resolved

Your Name added 2 commits September 16, 2024 05:32

apply weighting schemes

fe2a587

add note

3dc16ca

PromeAIpro and others added 2 commits September 16, 2024 19:51

Update examples/controlnet/README_flux.md

aff0951

Co-authored-by: Sayak Paul <[email protected]>

Merge branch 'main' into flux-controlnet-train

b858507

sayakpaul reviewed Sep 18, 2024

View reviewed changes

PromeAIpro closed this Sep 19, 2024

sayakpaul reopened this Sep 19, 2024

Your Name added 4 commits September 18, 2024 23:14

make code style and quality

7bdf9e3

Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…

ba45495

…/diffusers into flux-controlnet-train

fix some unnoticed error

c862d39

make code style and quality

4b979e0

sayakpaul reviewed Sep 19, 2024

View reviewed changes

sayakpaul and others added 8 commits September 19, 2024 14:03

Merge branch 'main' into flux-controlnet-train

0655a75

add example controlnet in readme

90badc2

Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…

4755557

…/diffusers into flux-controlnet-train

add test controlnet

e3d10bc

rm Remove duplicate notes

f9400a6

Merge branch 'main' into flux-controlnet-train

192bbee

Fix formatting errors

de06965

Merge branch 'flux-controlnet-train' of https://github.com/PromeAIpro…

8ee2daf

…/diffusers into flux-controlnet-train

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples] add train flux-controlnet scripts in example. #9324

[examples] add train flux-controlnet scripts in example. #9324

PromeAIpro commented Aug 30, 2024

yiyixuxu commented Sep 4, 2024

HuggingFaceDocBuilderDev commented Sep 4, 2024

linjiapro commented Sep 11, 2024 •

edited

Loading

PromeAIpro commented Sep 13, 2024 •

edited

Loading

sayakpaul Sep 13, 2024

PromeAIpro Sep 13, 2024

sayakpaul Sep 13, 2024

PromeAIpro Sep 13, 2024

PromeAIpro Sep 14, 2024

linjiapro Sep 14, 2024

PromeAIpro Sep 14, 2024 •

edited

Loading

luckyman-dev Sep 18, 2024

PromeAIpro Sep 18, 2024

sayakpaul left a comment

sayakpaul left a comment

sayakpaul commented Sep 14, 2024

sayakpaul left a comment

Laidawang commented Sep 16, 2024

sayakpaul Sep 18, 2024

PromeAIpro Sep 19, 2024 •

edited

Loading

sayakpaul Sep 19, 2024

PromeAIpro Sep 19, 2024

PromeAIpro Sep 19, 2024

sayakpaul Sep 19, 2024

PromeAIpro Sep 19, 2024 •

edited

Loading

sayakpaul left a comment

sayakpaul commented Sep 19, 2024

PromeAIpro commented Sep 19, 2024

sayakpaul left a comment

sayakpaul Sep 19, 2024

PromeAIpro commented Sep 19, 2024

[examples] add train flux-controlnet scripts in example. #9324

Are you sure you want to change the base?

[examples] add train flux-controlnet scripts in example. #9324

Conversation

PromeAIpro commented Aug 30, 2024

What does this PR do?

Before submitting

Who can review?

yiyixuxu commented Sep 4, 2024

HuggingFaceDocBuilderDev commented Sep 4, 2024

linjiapro commented Sep 11, 2024 • edited Loading

PromeAIpro commented Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PromeAIpro Sep 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul commented Sep 14, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

Laidawang commented Sep 16, 2024

Choose a reason for hiding this comment

PromeAIpro Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PromeAIpro Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul commented Sep 19, 2024

PromeAIpro commented Sep 19, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PromeAIpro commented Sep 19, 2024

linjiapro commented Sep 11, 2024 •

edited

Loading

PromeAIpro commented Sep 13, 2024 •

edited

Loading

PromeAIpro Sep 14, 2024 •

edited

Loading

PromeAIpro Sep 19, 2024 •

edited

Loading

PromeAIpro Sep 19, 2024 •

edited

Loading