Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] (Q)DoRA #893

Open
DreamGenX opened this issue Apr 28, 2024 · 6 comments
Open

[FR] (Q)DoRA #893

DreamGenX opened this issue Apr 28, 2024 · 6 comments

Comments

@DreamGenX
Copy link

(Q)DoRA, an alternative to (Q)LoRA is quickly proving to be a superior technique in terms of closing the gap between FFT and PEFT.

Known existing implementations:

Field reports / comparisons to LoRA:

@rohan-varma
Copy link
Member

Thanks for filing this issue! The torchtune team has definitely been following some of the progress around (Q)DoRA and are super interested in adding it. Stay tuned for more thoughts and discussion soon!

@RdoubleA
Copy link
Contributor

@DreamGenX If this is something you're interested in contributing to the library, we'd be happy to work with you on it :)

@Prakyathkantharaju
Copy link

@RdoubleA I added a Dora-based update to my fork. If you approve, I can submit a pull request.
Link to the Dora update line here: https://github.com/Prakyathkantharaju/torchtune/blob/aefb8cbb02712177d690ca65cbac480fcb8ac429/torchtune/modules/peft/lora.py#L137

@ebsmothers
Copy link
Contributor

Hi @Prakyathkantharaju, thanks for sharing the fork. If I understand correctly, you are returning Wx + (m * BAx / ||BAx||), is that right? (Here W is the original weight matrix, x is the input, A and B the usual LoRA matrices, and m the magnitude vector.) If I understand correctly from eq (5) of the DoRA paper, don't we want to instead return m * [(W + BA)/ ||W + BA||]x? However, please let me know if I'm misunderstanding the weight update here.

@Prakyathkantharaju
Copy link

Hello @ebsmothers ,

Yes, that is correct. The method presented in the paper and that I wrote is slightly different, I based my code on this repo: https://github.com/rasbt/dora-from-scratch/blob/main/Using-LinearDoRA.ipynb

I updated the code based on the author's recommendation here, which apparently make this inference faster on GPU.

Here is the reference for the author's comment: huggingface/peft#1474 (comment)

I have added references in the code as well.

I also added a pull request #936 So that I can track any changes much clearer.

@ebsmothers
Copy link
Contributor

Thanks for the clarification @Prakyathkantharaju, and thanks for opening the PR. I agree, let's consolidate the discussion on the PR. I left a few initial comments, let's continue the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants