About the implementation on multi-scale condition. #41

XiaoqiangZhou · 2023-07-27T03:44:52Z

Thanks for sharing this great work.

In the paper, you mentioned that "transfer rich multi-scale texture patterns from the source image distribution to the noise prediction"

How ever, in the code, I find that just the last layer feature of the encoder is used for cross attention. As the [-1] means:
pose_out = self.cros_attn2(x = xt_feats[-1], cond = pose_feats[-1]).mean([2,3])

Could you please briefly tell me where is the implementation of "multi-scale" feature for cross attention?

The text was updated successfully, but these errors were encountered:

XiaoqiangZhou · 2023-07-27T03:53:41Z

Well, I think the actual main model is class "BeatGANsAutoencModel" instead of class "BeatGANsPoseGuideModel". And the multiscale condition feature is saved in variable "enc_cond_emb" "mid_cond_emb" and "dec_cond_emb". Is it right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation on multi-scale condition. #41

About the implementation on multi-scale condition. #41

XiaoqiangZhou commented Jul 27, 2023

XiaoqiangZhou commented Jul 27, 2023

About the implementation on multi-scale condition. #41

About the implementation on multi-scale condition. #41

Comments

XiaoqiangZhou commented Jul 27, 2023

XiaoqiangZhou commented Jul 27, 2023