Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging PR for implimenting Phi-2 support. #97

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

cm2435
Copy link

@cm2435 cm2435 commented Jan 18, 2024

….org/main/getting-started/tutorials/05-layer-norm.html]

@cm2435 cm2435 changed the title Staging PR for implimenting Phi-2 support. Staging PR for implimenting Phi-2 support. adresses #85 Jan 18, 2024
@cm2435 cm2435 changed the title Staging PR for implimenting Phi-2 support. adresses #85 Staging PR for implimenting Phi-2 support. Jan 18, 2024
@cm2435
Copy link
Author

cm2435 commented Jan 18, 2024

Addresses #85

@danielhanchen
Copy link
Contributor

@cm2435 Oh cool great work!! Was just working on the Jan release :) Ye using torch.allclose is what I normally use!!

@cm2435
Copy link
Author

cm2435 commented Jan 22, 2024

@danielhanchen I've implemented the ReLU and LayerNorm kernels for the model and added some test coverage around their closeness. Mind having a review when you get a min? ty!

@danielhanchen
Copy link
Contributor

@cm2435 Great work again!! I'll do a PR review! :)

@danielhanchen
Copy link
Contributor

@cm2435 Super great work! Some questions and notes from my side:

  1. torch.manual_seed sadly only sets the seed on the CPU side - maybe you meant torch.cuda.manual_seed? I normally use transformers.set_seed
  2. On ReLU - is this just an extra kernel you would like to have - or is this related to Phi? Phi I think uses gelu_new and not ReLU, and one issue I needed to solve was to find the derivative of Gelu. ReLU derivative is simple, since it's just either 1 or 0 for whether it was fired or not multiplied by the backprop deltas.
  3. Triton's layernorm sadly is extremely cumbersome last I checked - https://github.com/lucidrains/triton-transformer/blob/main/triton_transformer/layernorm.py might be more useful - in fact Lucidrain's other kernels might be useful - I haven't checked them all yet though.
  4. Another issue I need to investigate is what is "partial RoPE?" Phi has a scale of 40% 0.4:
# Partial rotary embedding
query_rot, query_pass = (
    query_states[..., : self.rotary_emb.dim],
    query_states[..., self.rotary_emb.dim :],
)
key_rot, key_pass = (
    key_states[..., : self.rotary_emb.dim],
    key_states[..., self.rotary_emb.dim :],
)
# [batch_size, seq_length, num_heads, head_dim // config.partial_rotary_factor]
query_rot, key_rot = apply_rotary_pos_emb(query_rot, key_rot, cos, sin, position_ids)

I'm assuming this is RoPE but only on the first 40% of head dimensions? Super weird - this'll affect the RoPE kernel as well.

All in all super good work - Phi just can be very annoying to work with, since i'm not sure why the authors of Phi decided to dramatically change a lot from Llama ie:

  1. Gelu instead of Swiglu
  2. Partial RoPE instead of full RoPE (does not making training that much slower? so unsure why they're doing this - Stability AI's Stable Code release also did this - unsure on their reasoning?)
  3. Residual dropout after attention and MLP, ie dropout 10% of the attention output and MLP outputs - I guess this partially makes sense due to counteract overfitting.

@cm2435
Copy link
Author

cm2435 commented Jan 24, 2024

Yep. I'm dumb and misread GeLU for ReLU. I need more coffee in my life haha.

I'll take a crack this week at contributing a GeLU fwd and backward kernel, and tidy up the other bits.

Yeah, that partial rope embedding is weird, some GitHub posts I've found on it suggest it was done to apply the rotational transform to the first X % of the embedding dimension.

To quote the author in that issue:
"yup, we found that it worked very slightly better, never any worse, in the exception that the value is too low (I saw performance dip when frequencies reach around 6 - dimension of 24)"

It should affect any triton kernels for it, but it shouldn't be too hard, surely it's just an extra kernel parameter as to what fraction of the embedding to apply the transform to.

@cm2435
Copy link
Author

cm2435 commented Jan 24, 2024

As a separate point; what are you using as your reference implementation? Are you working off the paper or do you have code?
I've found transformers code but that's for phi-1 which seems to diverge a good chunk in architecture.

@danielhanchen
Copy link
Contributor

@cm2435 Interesting on partial RoPE!! I did hear it was used in the new Stable Code quote https://huggingface.co/stabilityai/stable-code-3b

Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022).

Unsure on whether accuracy would improve - I actually thought this might make Phi-2 not RoPE scalable ie it cannot be adjusted "properly" to handle longer sequences sadly. Unsure though. Throughput - extremely unlikely an issue if one uses good kernels.

But I'll research more on increased accuracy - that's interesting! Implementation should be fine hopefully - can just take the view as what Phi-2 did.

In terms of reference implementation - transformers 4.37 got released - you can now follow their official implementation here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/phi/modeling_phi.py

@danielhanchen
Copy link
Contributor

@cm2435 So tried Wolfram but then found this paper: https://arxiv.org/pdf/2305.12073.pdf

image

Absolute horrible nightmare

@cm2435
Copy link
Author

cm2435 commented Jan 24, 2024

@danielhanchen Not had time to read that paper but I'd be careful because at a glance that looks like an approximation.

GeLU involves a $erf(x)$, those $tanh(...)$ terms are approximations
https://paperswithcode.com/method/gelu

I think it should be fine but that's not strictly the derivative. I've got a free hour or two so I'll tack a swing at an implementation

@danielhanchen
Copy link
Contributor

@cm2435 Oh I don't think Phi uses the error function, but rather an approximation

@cm2435
Copy link
Author

cm2435 commented Jan 24, 2024

@danielhanchen That's true. I've taken a first punt at a forward and backward GeLU implementation but I still need to test it. After that will get onto either the partial rope factor or the dropout.

@danielhanchen
Copy link
Contributor

@cm2435 Cool fabulous work!

@danielhanchen danielhanchen mentioned this pull request Feb 2, 2024
@cm2435
Copy link
Author

cm2435 commented Feb 4, 2024

@danielhanchen It's Been a minute since I pushed something but some progress. I've patched the rope scaling kernel to take In a partial scaling factor (which I think is in line with what is done in phi? but you will absolutely want to check. I've also done a basic pass at a seeded dropout kernel based on Ludacrens implementation and the triton tutorials.

If those are good, based on the first message you sent I think we can move on to implementing the model?

danielhanchen and others added 29 commits February 26, 2024 21:25
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py
* faster saving & inference

* Update llama.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update llama.py

* Update save.py

* Update llama.py

* Mistral correct RoPE scaling

* Max sequence lengths

* Apache 2

* fast_linear_forward

* Update utils.py

* Update utils.py

* No print

* Update utils.py

* Update utils.py

* inference

* Update llama.py

* Fast inference RoPE

* Update llama.py

* Update llama.py

* RoPE

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* LoRA

* Fast LoRA saving

* Update llama.py

* hidden_states

* q_len == 1

* q_len issue

* Update mistral.py

* Update mistral.py

* incorrect inference

* Update to transformers 4.37

* Graceful FA2 error + torch 2.1.1

* Update mapper.py

* Update pyproject.toml

* Fix saving and bnb-4bit

* Update fast_lora.py

* Update fast_lora.py

* remove patching

* Update llama.py

* Update llama.py

* Update swiglu.py

* Repatch

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value
* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update llama.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py
* HF Perf Button

* Update README.md

Adding new buttons cleanup

* Update README.md

* Delete images/Discord.png

* Delete images/try live demo green.png

* new transparent logos

* Revamping page

* Revamp mainpage

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* finetune button

* Delete start free finetune button.png

* free finetune button

* Add files via upload

* Update README.md

* Update README.md

* Add files via upload

* Add files via upload

* Update README.md

* Add files via upload

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Squashed commit of the following:

commit 35f2ab4a8b4deecbbbe9fbd95f4efde8694233db
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Feb 4 17:35:56 2024 +1100

    2x faster inference (#151)

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

    * Update llama.py

    * Update llama.py

    * Fix SDPA

    * Update llama.py

    * padding

    * Inference

    * Update llama.py

    * Revert

    * Update mistral.py

    * faster inference

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * inference

    * Update llama.py

    * Update utils.py

    * faster inference

    * Update llama.py

    * revert

    * lm_head

    * Update llama.py

    * inference

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * faster inference

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * torch compile

    * past_key_values

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update llama.py

    * fast inference + saving config.json

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * fast inference again

    * more temp matrices

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update mistral.py

    * Update llama.py

    * SDPA

    * attention_mask

    * New version

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update utils.py

    * Update utils.py

commit 051a73b0e63d3ae3acd7c4d962349280f69bbdb0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 31 04:03:37 2024 +1100

    Hotfix - fix inference (#146)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

    * Fast inference repatch

    * Update llama.py

    * Update utils.py

    * Update utils.py

    * Update utils.py

    * Update mistral.py

    * Update __init__.py

    * Fix inference

    * Update mistral.py

    * fast lm_head

    * Remove fast path

    * Update rope_embedding.py

    * Update loader.py

    * LlamaAttention_fast_forward_inference

    * if past_key_value is not None and q_len == 1:

    * revert inference

    * Update loader.py

    * past_key_value

commit 05624642802c7f90dcc7aeea0e1c8d447cde006e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 17:49:54 2024 +1100

    Fix inference attention mask (#142)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

    * Update llama.py

    * Update llama.py

commit 206a9b65f090bd71ccaad7dd88b67ba2bfde0b58
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 03:45:07 2024 +1100

    Nightly (#140)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

    * Mistral patch

    * Update mistral.py

    * Update save.py

    * saving

commit 8faf469f028a05852b2dc29ec8df1f36998fab33
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Mon Jan 29 02:52:39 2024 +1100

    Fix saving issues (#139)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

    * Update mistral.py

    * attention mask

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Patch saving

    * Update save.py

    * Update save.py

    * patch_saving_functions

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * print

commit 1ecc0185a5759c7a0c95dfc96aceea5023cebdfc
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:30:29 2024 +1100

    1 more bug (#138)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

    * Update save.py

    * Update save.py

commit cd32ba76b71adf3317ede9de7d1cf6f30ad3bf0d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 28 04:20:06 2024 +1100

    Fix bugs + more accurate Swiglu (#137)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

    * Works?

    * Update pyproject.toml

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Swiglu

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * attention_mask

    * Update llama.py

    * Update llama.py

    * labels

    * Update mistral.py

    * Update llama.py

    * attention mask

commit 89daa0efcc38c7690abbb8170b5d9f3d364796ce
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:50:22 2024 +1100

    Inference bug fix (#134)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Revert "Update llama.py"

    This reverts commit a208ec46e012cf470ecefe6268a66358215df7b6.

    * Update llama.py

commit 87a7ef1049f6fca409a0673f51f4758e0aff248d
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 27 04:47:54 2024 +1100

    More bug fixes (#133)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update llama.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update fast_lora.py

    * Update save.py

    * Update fast_lora.py

    * Update utils.py

    * Update llama.py

    * Update fast_lora.py

    * Update swiglu.py

    * Update save.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 3d67790901696e953171f64b4bf9d980780051a0
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 26 04:19:17 2024 +1100

    Fix bugs (#129)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

    * Update llama.py

    * hidden_states

    * q_len == 1

    * q_len issue

    * Update mistral.py

    * Update mistral.py

    * incorrect inference

    * Update to transformers 4.37

    * Graceful FA2 error + torch 2.1.1

    * Update mapper.py

    * Update pyproject.toml

    * Fix saving and bnb-4bit

    * Update fast_lora.py

    * Update fast_lora.py

    * remove patching

    * Update llama.py

    * Update llama.py

    * Update swiglu.py

    * Repatch

    * Update fast_lora.py

commit a833f403462e9cfc1f96b3b84d9da15d7d8db5ee
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Tue Jan 23 03:55:24 2024 +1100

    2-4x faster native HF inference (#119)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * fast inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Mistral correct RoPE scaling

    * Max sequence lengths

    * Apache 2

    * fast_linear_forward

    * Update utils.py

    * Update utils.py

    * No print

    * Update utils.py

    * Update utils.py

    * inference

    * Update llama.py

    * Fast inference RoPE

    * Update llama.py

    * Update llama.py

    * RoPE

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * LoRA

    * Fast LoRA saving

commit b370c9c8aacc31a7845404566dd95dfa8c0e3bac
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 22:20:22 2024 +1100

    Hotfix (#118)

    * faster saving & inference

    * Update llama.py

    * Update save.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update mistral.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update llama.py

commit 57a5b5a49da588b1db8e9a988cc985dc20393d34
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 05:00:37 2024 +1100

    Update save.py

commit 5145a61e69ab9b3035465f649e1c1e5aae749f8f
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:21:54 2024 +1100

    Update save.py

commit a7bd8d119c16433de4f8b6a36903ef7131f225e5
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sun Jan 21 04:13:03 2024 +1100

    Update save.py

commit be4b97e7d89074b6dd1d2e984fa429051d328192
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sun Jan 21 03:43:49 2024 +1100

    Fixed saving! (#113)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

    * upload_to_huggingface

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

commit abb462be71e8cf01ad989dca0efaa17441113651
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 23:23:00 2024 +1100

    Hotfix for Jan 2024 Release (#110)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

    * Fix quantization_method

    * Fix quantization_config

    * patch model

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Update save.py

    * Update save.py

    * tokenizer_save_settings

    * Update save.py

    * quantization and loftq

    * Update save.py

    * Update llama.py

    * Update save.py

commit 31e2d71720e64b854145d7779833b7d2d3d4177e
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Sat Jan 20 04:25:06 2024 +1100

    Quick fixes (#106)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

    * RSLoRA and LoftQ direct support

    * Update llama.py

    * Update llama.py

    * Update llama.py

    * Fix DPO + GGUF

commit 8846337e5c8c2f206a4ac8fe6d239f3d1221f7ac
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Sat Jan 20 02:30:31 2024 +1100

    Update _utils.py

commit d378df87e5f3945474915a098c9aa58313465064
Merge: c1e7480 920e3c2
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:38 2024 +1100

    Merge branch 'main' of https://github.com/unslothai/unsloth

commit c1e7480ac2ad0e5efa05e84fe0997619ccdd86a4
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 23:15:20 2024 +1100

    Revert quantization methods

commit 920e3c2ea07a044addeb7c3fa8be6f0189cb7f84
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:57:22 2024 +1100

    getattr issues (#103)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

    * getattr

commit fc25ab0df032f8ee5ea750f27c68d63f49d2d9a9
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 22:52:30 2024 +1100

    Quick fixes (#101)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

    * Quick fixes

    * Update llama.py

    * Update llama.py

    * Update dpo.py

    * Update dpo.py

    * Update llama.py

    * Update save.py

commit b8b1eafda35d124046e11766aeeb6343957e0daf
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Fri Jan 19 04:51:19 2024 +1100

    2024 Release (#96)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

    * Faster saving + other changes

    * Update llama.py

    * Saving modules

    * spelling

    * Update llama.py

    * Update save.py

    * Update save.py

    * Update loader.py

    * Update llama.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * patch saving

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * original_model

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * saving to RAM leakage?

    * Update save.py

    * new_save_directory

    * Update save.py

    * Update save.py

    * Update save.py

    * Update save.py

    * Update pyproject.toml

    * Update pyproject.toml

    * Update pyproject.toml

commit 4112eb4a3df4c0911e36211b47381086c963b4e0
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:41:00 2024 +1100

    Update pyproject.toml

commit 59d74753362ff59e664cb6d650b564511e6e20f3
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Fri Jan 19 03:35:17 2024 +1100

    Update pyproject.toml

commit c1ac4d2707574868767345e76ebe49c8353f9057
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Thu Jan 11 04:08:03 2024 +1100

    Fix some bugs (#83)

    * Fix tokenizer, dropout, bias for LoRA

    * Update loader.py

    * Fix LoRA downcasting

    * Update _utils.py

    * Saving to GGUF

    * fix

    * colab_quantize_to_gguf

    * move save modules

    * save module

    * Update __init__.py

    * Update save.py

    * Temp downgrade due to TRL issue

    * Fix up bugs

commit d3887c7fd93d9b910bf6ee3ab3c7fd485fc55e46
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 23:10:48 2024 +1100

    Update README.md (#81)

commit b5d94d9a0ad9532494e1b3c7badbb94fa92c50eb
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:10:23 2024 +1100

    Discord button redo (#80)

commit 01d7f58e11373ab07b9282a42bc14f542dbdabf0
Author: shimmy <107991372+shimmyshimmer@users.noreply.github.com>
Date:   Wed Jan 10 23:02:20 2024 +1100

    Update logos (#79)

    * HF Perf Button

    * Update README.md

    Adding new buttons cleanup

    * Update README.md

    * Delete images/Discord.png

    * Delete images/try live demo green.png

    * new transparent logos

    * Revamping page

    * Revamp mainpage

    * Update README.md

    * Update README.md

commit 9faaf5b388e025f8ffc302450a12ffb84e7e1750
Author: Daniel Han <danielhanchen@gmail.com>
Date:   Wed Jan 10 20:03:01 2024 +1100

    Create FUNDING.yml (#78)

commit 82e6fece0b78011707090639823d2d7acf5a3864
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Wed Jan 10 01:02:44 2024 +1100

    fix_tokenizer

commit b52278199b7ae2764f242622275bb8a85ba7b721
Author: Daniel Han-Chen <danielhanchen@gmail.com>
Date:   Tue Jan 9 23:40:43 2024 +1100

    check_tokenizer

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update save.py

* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py
* Update fast_lora.py

* Update utils.py

* Update llama.py

* Update fast_lora.py

* Update swiglu.py

* Update save.py

* Update save.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Revert "Update llama.py"

This reverts commit a208ec4.

* Update llama.py

* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py
* Works?

* Update pyproject.toml

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Swiglu

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update swiglu.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* Update fast_lora.py

* attention_mask

* Update llama.py

* Update llama.py

* labels

* Update mistral.py

* Update llama.py

* attention mask

* Update save.py

* Update save.py

* Update mistral.py

* attention mask

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update dpo.py

* Patch saving

* Update save.py

* Update save.py

* patch_saving_functions

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* print

* Mistral patch

* Update mistral.py

* Update save.py

* saving

* Update llama.py

* Update llama.py

* Fast inference repatch

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update mistral.py

* Update __init__.py

* Fix inference

* Update mistral.py

* fast lm_head

* Remove fast path

* Update rope_embedding.py

* Update loader.py

* LlamaAttention_fast_forward_inference

* if past_key_value is not None and q_len == 1:

* revert inference

* Update loader.py

* past_key_value

* Update llama.py

* Update llama.py

* Fix SDPA

* Update llama.py

* padding

* Inference

* Update llama.py

* Revert

* Update mistral.py

* faster inference

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* inference

* Update llama.py

* Update utils.py

* faster inference

* Update llama.py

* revert

* lm_head

* Update llama.py

* inference

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* faster inference

* Update llama.py

* fast inference

* Update llama.py

* Update llama.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* torch compile

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update utils.py

* Update llama.py

* fast inference + saving config.json

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update mistral.py

* fast inference again

* more temp matrices

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* fast inference

* Update mistral.py

* Update llama.py

* SDPA

* attention_mask

* New version

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update utils.py

* Update utils.py

* Update save.py

* Update save.py

* Torch 2.2.0

* Update save.py

* mistral swa

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Fix SWA inference

* Fix llm_int8_skip_modules

* SWA inference

* Update save.py

* Update save.py

* Update pyproject.toml

* __version__

* __version__

* Update save.py

* Update save.py

* Update mistral.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Chat Templates

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer

* Update chat_templates.py

* Saving, LlamaRotaryEmbedding issues

* Update llama.py

* Update mistral.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants