Skip to content

llama3 generation time (GPU) #6878

Answered by sebastienbo
gopalgk asked this question in Q&A
Apr 24, 2024 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

There are 2 possible reasons:

  1. you are not running on GPU and have a recent cpu with performance cores and efficiency cores. If the process does not force itself onto the performance cores then the os will send some proccesing to the performance cores and others to the efficiency cores.

  2. You have 2 gpu's: One cpu integrated gpu and one dedicated gpu. same problem applies here: If the process does not give a preference to use the fastest GPU, then you might have some processing send to your small gpu (igpu) and other send to your dedicated gpu

It comes down to the code (process) to give a preference. If it does not, the OS will choose.

Same thing could happen with NPU's. NPU's might be…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@sebastienbo
Comment options

@gopalgk
Comment options

Answer selected by gopalgk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants