Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize quantized masked fill #162

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

lucasavila00
Copy link
Contributor

@lucasavila00 lucasavila00 commented Apr 16, 2024

Closes #161

This MR:

Prompt T/s = 70.70707
Completion T/s = 61.244022

Master:

Prompt T/s = 59.82906
Completion T/s = 61.538464

Copy link

Code Metrics Report
  ───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Rust                        60     19995     1439       821    17735       1128
───────────────────────────────────────────────────────────────────────────────
Total                       60     19995     1439       821    17735       1128
───────────────────────────────────────────────────────────────────────────────
Estimated Cost to Develop 53,182
Estimated Schedule Effort 10.982569 months
Estimated People Required 4.474866
───────────────────────────────────────────────────────────────────────────────
Processed 676190 bytes, 0.676 megabytes (SI)
───────────────────────────────────────────────────────────────────────────────
  

@lucasavila00 lucasavila00 marked this pull request as draft April 16, 2024 22:28
@lucasavila00
Copy link
Contributor Author

It reduces completion speed too much.

I wonder if that's because of bad measurements or real. I'm profiling it.

@lucasavila00 lucasavila00 marked this pull request as ready for review April 16, 2024 22:48
@lucasavila00
Copy link
Contributor Author

It was a measurement issue, fixed by #163

@lucasavila00 lucasavila00 marked this pull request as draft April 16, 2024 23:07
@EricLBuehler
Copy link
Owner

@lucasavila00, I just merged #163. Can you please update the benchmarks?

@lucasavila00 lucasavila00 marked this pull request as ready for review April 16, 2024 23:12
@lucasavila00
Copy link
Contributor Author

lucasavila00 commented Apr 16, 2024

@EricLBuehler I just did.

I'm having a hard time measuring small changes, significantly.

I ran it a bunch of times and completion speed is unchanged. Prompt speed improved.

I think we need something like https://github.com/ggerganov/llama.cpp/blob/master/examples/llama-bench/README.md instead of the --prompt setup to benchmark.

I'm also using my local GPU, with bad cooling and other processes using it etc

The llama-bench setup runs a but of repetitions to remove this noise

I created an issue for it #164

@EricLBuehler
Copy link
Owner

That is a great idea, I think that we should add that in light of future improvements. I'll merge this as I think the performance gains are very significant!

@EricLBuehler EricLBuehler merged commit e309168 into EricLBuehler:master Apr 16, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quantized Mistral: Optimize masked fill
2 participants