Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: RAG use with large context windows/many files #1187

Open
RahSwe opened this issue Apr 25, 2024 · 8 comments
Open

[BUG]: RAG use with large context windows/many files #1187

RahSwe opened this issue Apr 25, 2024 · 8 comments
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug

Comments

@RahSwe
Copy link

RahSwe commented Apr 25, 2024

How are you running AnythingLLM?

Not listed

What happened?

Using Railway instance, with Gemini Pro 1.5 (1M context window).

I am trying to maximize the advantage of the context window by pinning many and large files. It however seems that ALLM does not provide all the pinned files in the request (+20 files).
Also, only some of the pinned files appear in the citations.
From the answers, it seems clear that some of the pinned files are not provided.

Are there known steps to reproduce?

I would guess use Gemini Pro 1.5 API and pin a lot of files, also having files not pinned, and verify that ALLM correctly submit all the pinned files, along with snippets from the unpinned files, and correctly reflecting the sources in citations.

@RahSwe RahSwe added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Apr 25, 2024
@timothycarambat timothycarambat added needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug and removed possible bug Bug was reported but is not confirmed or is unable to be replicated. labels Apr 25, 2024
@timothycarambat
Copy link
Member

Are you sure there are files being omitted and it is not just the model ignoring the context? In the code there is no limit to how many files can be pinned, but there is still a context window and if it is over 1M tokens it will still be pruned from the chat.

Since you are on Railway you should see in the logs when a chat is sent if the context if too large for Google and it should say something like cannonballing context xxxx to xxxx or something along those likes - which indicates you still sent too much text and we had to truncate it

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

@timothycarambat
Copy link
Member

🤔 Hm, if a document is pinned then it for sure will be used in the messages sent to the model.

If you change the embedding model this will make all previously embedded documents and workspaces broken because the differences in embedder model outputs will make vector search impossible. So they would need to be deleted and re-embed.

This is where it occurs

await new DocumentManager({

Gemini pro has a context of https://github.com/Mintplex-Labs/anything-llm/blob/dfaaf1680ff4de647de727fcd404ec044c5dd8e2/server/utils/AiProviders/gemini/index.js#L59C16-L59C25
and because we manage the context window and snippets are appended to system the real limit is

system: this.promptWindowLimit() * 0.15,

or ~ 157,286 tokens. Roughly 15%

The reason we put those limits is because some people like to put really really large inputs and some like to put really large system prompts. We have to cut it one way or another.

Im willing to bet because you has so many documents pinned and that context is still more than 100K tokens most of your documents are not making it to chat.

I am willing to bet that running /reset on a chat window to wipe the history will result in a better first-response. I think what we need to do is have the prompt window values be dynamic based on which value is larger.

If system is huge - that should take the majority of the window and shrink the allocation for user prompt. If the user prompt is large, then shrink the allocation for system.

Context window is limited so you cant have it both ways. That is for sure what is going on here.

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

@RahSwe
Copy link
Author

RahSwe commented Apr 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info / can't replicate Issues that require additional information and/or cannot currently be replicated, but possible bug
Projects
None yet
Development

No branches or pull requests

2 participants