Skip to content
This repository has been archived by the owner on Apr 28, 2024. It is now read-only.

I find some confusion code in pefy #2

Open
guokan987 opened this issue Mar 15, 2024 · 3 comments
Open

I find some confusion code in pefy #2

guokan987 opened this issue Mar 15, 2024 · 3 comments
Labels
question Further information is requested

Comments

@guokan987
Copy link

code: result_dora = (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out)) ) + mag_norm_scale * lora_B(lora_A(x)) * scaling
Question: what is the effect of (mag_norm_scale - 1) and mag_norm_scale ? And, result_dora can't equals the F.linear(x, transpose(weight, self.fan_in_fan_out)) in the Initializing stage due to the parameter "mag_norm_scale - 1"

@nbasyl
Copy link
Owner

nbasyl commented Mar 15, 2024

Hi, you can refer to this formula:
image

@guokan987
Copy link
Author

Thanks authors, but I can't understand why use the follow equation:
Training: XW_0+dropout(X)(m/norm(V+deltaV))*(V+deltaV),
the effect of
image
is unclear

@nbasyl
Copy link
Owner

nbasyl commented Mar 18, 2024

Note that (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out))) must be included to properly apply dropout; otherwise, the outcome would be inaccurate. You can refer to huggingface/peft#1474 where we discuss this.

@nbasyl nbasyl added the question Further information is requested label Apr 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants