Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Files and formats #7874

Merged
merged 5 commits into from
May 29, 2024
Merged

[docs] Files and formats #7874

merged 5 commits into from
May 29, 2024

Conversation

stevhliu
Copy link
Member

@stevhliu stevhliu commented May 6, 2024

This PR attempts to clarify the difference between file types (safetensors, ckpt, bin) and storage layout (multifolder, single file) to avoid confusion when discussing "Diffusers" format and "single file" format. The most common misconception is that safetensors is just a file type, and single file or multifolder layout can both hold safetensors files. The PR also removes the old "Load safetensors" doc and discusses it in the current one here where it is more contextually relevant.

todo:
- [ ] add Keras section once Space is fixed

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments
thanks a ton for this PR! it is a very important doc to us:)

docs/source/en/using-diffusers/other-formats.md Outdated Show resolved Hide resolved

```py
from diffusers import DiffusionPipeline
1. Faster to load each component file individually or in parallel.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diffusers-style format provides much greater flexibility and allows users to customize loading. We should expand on this section and support it with more practical and convincing examples! (the example here demonstrates the API they can use to customize loading, but it is very simple and it does not directly demonstrate the speed and memory-saving advantage, users have to deduce that themselves)

if you want to use two different models in single_file, most likely they will share some components, e.g. vae, text_encoders etc, and you will need to load these models twice - it's slower and they take up more memories than needed. With diffusers-format, you would load one model, and load only the different components for the second one, and use the from_pipe API to switch. that cc @asomoza here. Can you help think of an example that will be meaningful for the single-format fans?

another strong argument we can make is when the model includes large components - I wonder how it is even possible for them to handle models like sd3 in a single-file format? with 3 text encoders (including t5) and a 8b unet? also cc @asomoza here for insights -

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly the same that you mentioned, here are the "maybe" reasons:

  • For SDXL we can literally use just one vae for all the models, so if you have 10 models, you will save 3.5 GB of space and bandwidth.
  • For SD 1.5 there's like 4 or 5 good vae models, they're smaller but even with that, since we can switch them on the fly, we can always use the optimal one for each model.

So all the models could be distributed without vaes, in sites like civitai.com there's always a confusion with models that come with or without vae, this is not a problem with the diffusers format.

For the SDXL and SD 1.5 text encoders, we could save that space, but we really don't know which models have trained the text encoders or not, so this is harder to make evident to the single-file fans unless the model owners and sharing sites start to make that information visible.

These ones are the "strong" reasons:

  • Turbo, Lighting and Hyper-SD models, these ones just change the unet, so there's no reason to share anything else.
  • New models with T5 (Pixart Sigma, Kandinsky 3, SD3), the text encoder alone has a size between 17GB and 22 GB, It’s simply not feasible to share the text encoder with every fine-tuned model, and probably no one will train them either. There’s really no need to do so.

IMO the last one is the main reason people will be forced to stop sharing single-file checkpoints.

@stevhliu stevhliu force-pushed the formats branch 2 times, most recently from 565b392 to 8571903 Compare May 16, 2024 17:29
@stevhliu stevhliu marked this pull request as ready for review May 16, 2024 17:35
@stevhliu
Copy link
Member Author

We can add the Keras section back whenever @sayakpaul gets a chance to fix it (no rush!), and I think it'd also be helpful to move the content from here (usage examples for from_single_file in the API docs) to this doc in a separate PR. WDYT?

@sayakpaul
Copy link
Member

We can add the Keras section back whenever @sayakpaul gets a chance to fix it (no rush!),

Yeah sure. Actually, the problem is with the installation and the code itself.

@stevhliu stevhliu requested a review from yiyixuxu May 20, 2024 23:56
Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge now!
want to follow up a little bit more on recommendations on how to use diffusers checkpoint in order to maximize the efficiency. I think we need to refactor the from_single_file a little bit for that so will be a future PR :)

@stevhliu stevhliu merged commit 9e00b72 into huggingface:main May 29, 2024
1 check passed
@stevhliu stevhliu deleted the formats branch May 29, 2024 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants