[FR] (Q)DoRA #893

DreamGenX · 2024-04-28T18:10:45Z

(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT.

Known existing implementations:

https://github.com/huggingface/peft -- enable qlora + set use_qdora
https://github.com/AnswerDotAI/fsdp_qlora?tab=readme-ov-file#training-options

Field reports / comparisons to LoRA:

rohan-varma · 2024-04-29T16:50:00Z

Thanks for filing this issue! The torchtune team has definitely been following some of the progress around (Q)DoRA and are super interested in adding it. Stay tuned for more thoughts and discussion soon!

RdoubleA · 2024-04-29T17:11:27Z

@DreamGenX If this is something you're interested in contributing to the library, we'd be happy to work with you on it :)

Prakyathkantharaju · 2024-05-01T03:06:32Z

@RdoubleA I added a Dora-based update to my fork. If you approve, I can submit a pull request.
Link to the Dora update line here: https://github.com/Prakyathkantharaju/torchtune/blob/aefb8cbb02712177d690ca65cbac480fcb8ac429/torchtune/modules/peft/lora.py#L137

ebsmothers · 2024-05-04T21:51:34Z

Hi @Prakyathkantharaju, thanks for sharing the fork. If I understand correctly, you are returning Wx + (m * BAx / ||BAx||), is that right? (Here W is the original weight matrix, x is the input, A and B the usual LoRA matrices, and m the magnitude vector.) If I understand correctly from eq (5) of the DoRA paper, don't we want to instead return m * [(W + BA)/ ||W + BA||]x? However, please let me know if I'm misunderstanding the weight update here.

Prakyathkantharaju · 2024-05-05T06:51:09Z

Hello @ebsmothers ,

Yes, that is correct. The method presented in the paper and that I wrote is slightly different, I based my code on this repo: https://github.com/rasbt/dora-from-scratch/blob/main/Using-LinearDoRA.ipynb

I updated the code based on the author's recommendation here, which apparently make this inference faster on GPU.

Here is the reference for the author's comment: huggingface/peft#1474 (comment)

I have added references in the code as well.

I also added a pull request #936 So that I can track any changes much clearer.

ebsmothers · 2024-05-06T00:20:14Z

Thanks for the clarification @Prakyathkantharaju, and thanks for opening the PR. I agree, let's consolidate the discussion on the PR. I left a few initial comments, let's continue the conversation there.

Prakyathkantharaju mentioned this issue May 5, 2024

[WIP] Addition of Dora #936

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] (Q)DoRA #893

[FR] (Q)DoRA #893

DreamGenX commented Apr 28, 2024

rohan-varma commented Apr 29, 2024

RdoubleA commented Apr 29, 2024

Prakyathkantharaju commented May 1, 2024

ebsmothers commented May 4, 2024

Prakyathkantharaju commented May 5, 2024

ebsmothers commented May 6, 2024

[FR] (Q)DoRA #893

[FR] (Q)DoRA #893

Comments

DreamGenX commented Apr 28, 2024

rohan-varma commented Apr 29, 2024

RdoubleA commented Apr 29, 2024

Prakyathkantharaju commented May 1, 2024

ebsmothers commented May 4, 2024

Prakyathkantharaju commented May 5, 2024

ebsmothers commented May 6, 2024