Scope for batched predictions #71

saswat0 · 2023-12-01T17:47:44Z

@snexus Kudos on this awesome project!

I was wondering if support for batched prompts is in your roadmap? There are solutions that make this possible for several language models, so are you planning on including these optimisations in your source?

TIA

snexus · 2023-12-03T12:57:08Z

Hi,

Thanks for the suggestion. How do you think batched prompts can be useful in the context of RAG?

saswat0 · 2023-12-03T15:41:30Z

One that I can think of is that, if deployed into production, the server could queue the requests (prompts) and the RAG would run only once. Effectively, the time difference would be slightly higher but GPu utilisation would increase by several folds

snexus · 2023-12-05T10:14:26Z

I will add it as a potential improvement when implementing support for vLLM in the future. Thanks for the suggestion.

saswat0 changed the title ~~Scope for batched preedictions~~ Scope for batched predictions Dec 1, 2023

snexus added the enhancement New feature or request label Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope for batched predictions #71

Scope for batched predictions #71

saswat0 commented Dec 1, 2023

snexus commented Dec 3, 2023

saswat0 commented Dec 3, 2023

snexus commented Dec 5, 2023

Scope for batched predictions #71

Scope for batched predictions #71

Comments

saswat0 commented Dec 1, 2023

snexus commented Dec 3, 2023

saswat0 commented Dec 3, 2023

snexus commented Dec 5, 2023