I find some confusion code in pefy #2

guokan987 · 2024-03-15T06:59:35Z

code: result_dora = (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out)) ) + mag_norm_scale * lora_B(lora_A(x)) * scaling
Question: what is the effect of (mag_norm_scale - 1) and mag_norm_scale ? And, result_dora can't equals the F.linear(x, transpose(weight, self.fan_in_fan_out)) in the Initializing stage due to the parameter "mag_norm_scale - 1"

nbasyl · 2024-03-15T11:48:37Z

Hi, you can refer to this formula:

guokan987 · 2024-03-17T05:05:43Z

Thanks authors, but I can't understand why use the follow equation:
Training: XW_0+dropout(X)(m/norm(V+deltaV))*(V+deltaV),
the effect of

is unclear

nbasyl · 2024-03-18T03:28:39Z

Note that (mag_norm_scale - 1) * (F.linear(x, transpose(weight, self.fan_in_fan_out))) must be included to properly apply dropout; otherwise, the outcome would be inaccurate. You can refer to huggingface/peft#1474 where we discuss this.

nbasyl added the question Further information is requested label Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I find some confusion code in pefy #2

I find some confusion code in pefy #2

guokan987 commented Mar 15, 2024

nbasyl commented Mar 15, 2024

guokan987 commented Mar 17, 2024

nbasyl commented Mar 18, 2024

I find some confusion code in pefy #2

I find some confusion code in pefy #2

Comments

guokan987 commented Mar 15, 2024

nbasyl commented Mar 15, 2024

guokan987 commented Mar 17, 2024

nbasyl commented Mar 18, 2024