Reformat and improve RAG module and agents #184

ZiTao-Li · 2024-04-28T03:54:56Z

Description

Updates

Changes on code structure

migrate and reformat RAG/knowledge module(s) and RAG agent(s) from examples to a module in src
add llama-index as rag_requires in setup.py

Changes on the RAG agent module

be compatible with the new KnowledgeBank feature
the configurations for the RAG-related functionalities are relocated back to knowledge modules
the retrieve method merges the retrievers from the KnowledgeBank members

Changes on the RAG/knowledge module

Rename the RAG modules to Knowledge (e.g., LlamaIndexRAG -> LlamaIndexKnowledge)
store and persist processed embeddings/indices/documents
support loading multiple doc types and dirs for one index
support docs management in the obtained (persisted) index
add a refresh function to update the index when needed
enable agents to reset or add new retrievers

Improving utility of knowledge module

reformat easy-to-use knowledge module config: the new format only configure the KnowledgeBank
introduce KnowledgeBank:
- KnowledgeBank provides an easier way to initialize a knowledge object, just call add_data_as_knowledge with knowledge_id (a string as the identifier for this knowledge object), emb_model_name (the name of the embedding model config) and data_dirs_and_types (a dictionary of data directories and the wanted file extensions). As shown in the rag_example.py
```
 knowledge_bank.add_data_as_knowledge(
    knowledge_id="agentscope_tutorial_rag",
    emb_model_name="qwen_emb_config",
    data_dirs_and_types={
        "../../docs/sphinx_doc/en/source/tutorial": [".md"],
    },
)
```
- Knowledge objects in KnowledgeBank can be shared and duplicated by multiple agents, which can avoid embedding duplicated documents.
- RAG agents can load multiple Knowledge objects (based on the "knowledge_id" in knowledge_config.json) with associated retrievers to perform multi-source information retrieval. Just need to pass the agent into KnowledgeBank.equip function.

Toturial

Both English and Chinese tutorial are added as 209-rag.md .

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

…odes.

Testing

Persist function added.

persist function added.

…dd a guide agent.

Enhance RAG example

copilot dialog agents update

as_copilot updates

move emb_model_config_name to knowledge_config

…dge bank and add tutorials

improve rag module

DavdGao

It's hard to follow the tutorial to use the RAG module. Some important guidances are missing as follows. More details please refer to inline comments.

how to use the knowledge objects here, what methods does it provide?
how to set the configuration, and what's the meaning and candidate values of each parameter?

src/agentscope/agents/rag_agents.py

examples/conversation_with_RAG_agents/rag_example.py

src/agentscope/rag/knowledge_bank.py

docs/sphinx_doc/en/source/tutorial/209-rag.md

examples/conversation_with_RAG_agents/rag_example.py

src/agentscope/rag/knowledge_bank.py

src/agentscope/agents/rag_agents.py

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

examples/conversation_with_RAG_agents/rag_example.py

update as comments suggest

update as comments suggest (for docs)

garyzhang99

Please see the inline comments.

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

src/agentscope/rag/llama_index_knowledge.py

pan-x-c

Please see the inline comments.

The current version of rag is not compatible with distributed mode. We can add support in future PRs

src/agentscope/rag/knowledge.py

src/agentscope/agents/rag_agents.py

src/agentscope/rag/knowledge_bank.py

update as comments

…s used in previous versions, but no longer needed.

# Conflicts: # examples/conversation_with_RAG_agents/README.md

DavdGao

please see inline comments

DavdGao · 2024-05-28T02:37:26Z

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

+### Knowledge Bank
+知识库将一组Knowledge模块（例如，来自不同数据集的知识）作为知识的集合进行维护。因此，不同的智能体可以在没有不必要的重新初始化的情况下重复使用知识模块。考虑到配置Knowledge模块可能对大多数用户来说过于复杂，知识库还提供了一个简单的函数调用来创建Knowledge模块。
+
+* `KnowledgeBank.add_data_as_knowledge`: 创建Knowledge模块。一种简单的方式只需要提供knowledge_id、emb_model_name和data_dirs_and_types。


What types do we support here? And for unsupported file types, what should the users do?

DavdGao · 2024-05-28T02:41:38Z

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

+* `KnowledgeBank.add_data_as_knowledge`: 创建Knowledge模块。一种简单的方式只需要提供knowledge_id、emb_model_name和data_dirs_and_types。
+  ```python
+  knowledge_bank.add_data_as_knowledge(
+        knowledge_id="agentscope_tutorial_rag",


It's more like a knowledge name rather than ID here

DavdGao · 2024-05-28T02:44:02Z

examples/conversation_with_RAG_agents/README.md

@@ -23,35 +22,28 @@ capability can be used to build easily.
 **Note:** This example has been tested with `dashscope_chat` and `dashscope_text_embedding` model wrapper, with `qwen-max` and `text-embedding-v2` models.
 However, you are welcome to replace the Dashscope language and embedding model wrappers or models with other models you like to test.

-## Start AgentScope Consultants
+## Start AgentScope Copilots
 * **Terminal:** The most simple way to execute the AgentScope Consultants is running in terminal.


copilot or copilots

DavdGao · 2024-05-28T02:44:25Z

examples/conversation_with_RAG_agents/README.md

@@ -23,35 +22,28 @@ capability can be used to build easily.
 **Note:** This example has been tested with `dashscope_chat` and `dashscope_text_embedding` model wrapper, with `qwen-max` and `text-embedding-v2` models.
 However, you are welcome to replace the Dashscope language and embedding model wrappers or models with other models you like to test.

-## Start AgentScope Consultants
+## Start AgentScope Copilots
 * **Terminal:** The most simple way to execute the AgentScope Consultants is running in terminal.


consultants or copilot

DavdGao · 2024-05-28T02:48:42Z

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

+
+### RAG 智能体
+RAG 智能体是可以基于检索到的知识生成答案的智能体。
+  * 让智能体使用RAG: RAG agent在其配置中需要`rag_config`，其中有一个`knowledge_id`的列表


What parameters should be contained in this rag_config? It remains unknown for users in this tutorial.

DavdGao · 2024-05-28T02:55:32Z

docs/sphinx_doc/zh_CN/source/tutorial/209-rag.md

+  ```json
+  [
+  {
+    "knowledge_id": "{your_knowledge_id}",


If we use a config to setup the rag module, do we consider to add a config file to explain what's the usage of each parameters? Just like this file in FederatedScope
https://github.com/alibaba/FederatedScope/blob/master/federatedscope/core/configs/config.py#L258

DavdGao · 2024-05-28T05:22:45Z

examples/conversation_with_RAG_agents/configs/agent_config.json

+      "description": "Code-Search-Assistant is an agent that can provide answer based on AgentScope code base. It can answer questions about specific modules in AgentScope.",
+      "sys_prompt": "You're a coding assistant of AgentScope. The answer starts with appreciation for the question, then provide details regarding the functionality and features of the modules mentioned in the question. The language should be in a professional and simple style. The answer is limited to be less than 100 words.",
+      "model_config_name": "qwen_config",
+      "rag_config": {


Can we do this pre-processing outside the agent? For example (taking get as example):

knowledge_bank = KnowledgeBank(...) knowledges = knowledge_bank.get(knowledge_ids=["kb1", "kb2"], similarity_top_k=5, log_retrieval=5, recent_n_mem=1) AgentClass(name="assistant", knowledges=knowledges, ...)

or user can setup their own knowledge within the agent object's constructor by themselves.

There are two advantages:

No need to know what parameters should be written in a rag config. All parameters are in the declaration of this get() function, which can be accessed easily.

The agent is not required to have a rag config attribute.

FredericW and others added 30 commits April 11, 2024 14:48

new testing code

d4731d0

test the idea that using two agents to analyze different aspects of c…

9b4b060

…odes.

handle irrelevant question and simplify the setting first

7551c2f

fix dir

86fac88

function added for persisting the index.

0687138

Merge pull request #1 from FredericW/testing

c598310

Testing

Merge branch 'modelscope:main' into main

ddddbe4

json files are deleted.

2b1617b

Merge branch 'modelscope:main' into main

8afab8f

Merge branch 'modelscope:main' into testing

6c11924

Merge branch 'main' into testing

dc5b3e5

Merge pull request #2 from FredericW/testing

fc9066c

Persist function added.

Delete rag_storage directory

8daaf4b

Merge pull request #3 from FredericW/main

47a9e22

persist function added.

merge

a890ac1

add mention function

702c5c0

add ui

1e4b4c5

fix bugs

70bb7bb

runnable ui with flask

484fb4d

The agent dialog flow is modified. We remove the summary agent, and a…

99d5a73

…dd a guide agent.

add docstring agent

b3011b1

update info

449c4dc

Changes are made to improve the performance

e55be4a

Changes are made to improve the performance

133dbc9

Merge pull request #1 from ZiTao-Li/zitao/dev_copilot

0fc5e16

Enhance RAG example

config modified

5d6c0fe

Merge pull request #5 from FredericW/dev_copilot_zitao

dd0d328

copilot dialog agents update

Merge branch 'main' into zitao/dev_copilot

3d80561

add file

ccc5b46

add api assistant

ec3d8c0

艾渔 and others added 7 commits May 10, 2024 15:29

Minor edits.

847f3b5

Merge pull request #13 from FredericW/dev/as_copilot_fei

55c34fa

as_copilot updates

move emb_model_config_name to knowledge_config

6186188

Merge pull request #14 from ZiTao-Li/zitao/dev_copilot

123cd06

move emb_model_config_name to knowledge_config

improve after discussion, change names, add equip function for knowle…

4f024b1

…dge bank and add tutorials

fix init function bug

0c25ada

Merge pull request #15 from ZiTao-Li/zitao/dev_copilot

06ce25d

improve rag module

DavdGao reviewed May 16, 2024

View reviewed changes