Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope for batched predictions #71

Open
saswat0 opened this issue Dec 1, 2023 · 3 comments
Open

Scope for batched predictions #71

saswat0 opened this issue Dec 1, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@saswat0
Copy link

saswat0 commented Dec 1, 2023

@snexus Kudos on this awesome project!

I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?

TIA

@saswat0 saswat0 changed the title Scope for batched preedictions Scope for batched predictions Dec 1, 2023
@snexus
Copy link
Owner

snexus commented Dec 3, 2023

Hi,

Thanks for the suggestion. How do you think batched prompts can be useful in the context of RAG?

@saswat0
Copy link
Author

saswat0 commented Dec 3, 2023

One that I can think of is that, if deployed into production, the server could queue the requests (prompts) and the RAG would run only once. Effectively, the time difference would be slightly higher but GPu utilisation would increase by several folds

@snexus
Copy link
Owner

snexus commented Dec 5, 2023

I will add it as a potential improvement when implementing support for vLLM in the future. Thanks for the suggestion.

@snexus snexus added the enhancement New feature or request label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants