RAG documentation, trouble with large text documents #964

benjaminantunes · 2024-02-29T14:23:55Z

benjaminantunes
Feb 29, 2024

Hi,

I am currently using OllamaWeb-UI, and I am trying to figure out how RAG is implemented.

What I find out :
True RAG is implemented using Langchain and ChromaDB (cf : /backend/apps/rag/main.py).
I found several files corresponding to this.

But, I need some explanation about the exact workflow (Is there any documentation somewhere, I didn't find it ?)

Here is the process inside the web browser :

_initNewChat

File { name: "cleanedMergedDiderot.txt_part2", lastModified: 1709209652720, webkitRelativePath: "", size: 6135799, type: "" }
Array [ {…} ]
length: 1
: Array []
siecle ; depuis ce tems, chaque secrétaire d'état a
ses archives ou son dépôt. ( G ) ( a ),Code des Grecs. Voyez Code canonique .,Mais M. Richer, célebre docteur de Sorbonne,
contrebalance cette autorité dans son histoire des conciles généraux, liv. I. chap. ij. num. 7. en rapportant,
[CUT TEXT]
mortel étoit une disposition indispensable pour la fréquente communion ; mais ils ont aussi pensé que
cette disposition étoit suffisante. . . . . . . . . .

vigostral-7b-chat.Q8_0.gguf:latest_

In the HTTP request, we have (can be found in src/lib/utils/rag/index.ts):
_

Use the following context as your learned knowledge, inside XML tags.

[context]

When answer to user:

If you don't know, just say that you don't know.

If you don't know when you are not sure, ask for clarification.
Avoid mentioning that you obtained the information from the context.
And answer according to the language of the user's question.

Given the context information, answer the query.
Query: [query]

_

My questions are :

I see that some text are sent with the query to the model using HTTP request : Are we just simply extracting text from the provided document and adding it to the context prompt of the model ? (Which cannot take full text file content, because it is too large for the context of the model)
Or are we doing true RAG, searching for part of the text that could fit the request, and then add it to the context of the model ? What is the workflow (I am not sure about how RAG is truly working) ?
I tried to give the model a 140MB text file. I had no errors, but it looked like it did not work (doing nothing, when I check inside the http request to the model, the [context] part is empty). For really small file (5KB), it seems like the full file is giving inside [context], and when giving medium text files (5MB), just some part of the text is given in [context] http request, ending with ". . . . . . . . .". Can someone provide me some explanations, or a link to some documentation ?
We are using "all-MiniLM-L6-v2" from sentence transformers (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) : This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.
So, it looks at our request, search inside the vector database to take chunks of text that could fit our request, and give us back plain text ?

Thank you for your answers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG documentation, trouble with large text documents #964

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

RAG documentation, trouble with large text documents #964

benjaminantunes Feb 29, 2024

Replies: 0 comments

benjaminantunes
Feb 29, 2024