You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If speculative decoding is in use and the user wants to generate up to the max positional embeddings of the model, errors can arise at runtime causing a CUDA device-side assert error. We should do a better job detecting these errors during warmup, or gracefully handling this edge case per request.
The text was updated successfully, but these errors were encountered:
If speculative decoding is in use and the user wants to generate up to the max positional embeddings of the model, errors can arise at runtime causing a CUDA device-side assert error. We should do a better job detecting these errors during warmup, or gracefully handling this edge case per request.
The text was updated successfully, but these errors were encountered: