Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve: aggregate user infos as standalone user context within prompt #47

Open
Tracked by #48
nekomeowww opened this issue May 6, 2023 · 6 comments
Open
Tracked by #48
Labels
enhancement New feature or request

Comments

@nekomeowww
Copy link
Owner

No description provided.

@nekomeowww nekomeowww changed the title improve: aggregate user infos as standalone context within prompt improve: aggregate user infos as standalone user context within prompt May 6, 2023
@nekomeowww nekomeowww added the enhancement New feature or request label May 6, 2023
@rafiramadhana
Copy link
Contributor

@nekomeowww I am interested to work on this issue.

However, maybe some questions before getting started.

Would you mind to explain more about these terms:

  • User infos
  • Standalone user context
  • Within prompt

Thanks.

@nekomeowww
Copy link
Owner Author

nekomeowww commented May 8, 2023

@nekomeowww I am interested to work on this issue.

However, maybe some questions before getting started.

Would you mind to explain more about these terms:

  • User infos

  • Standalone user context

  • Within prompt

Thanks.

Thank you!

Yes you can pick this issue and work on it, but I am afraid of it might be too hard to understand and make changes to OpenAI prompt in pkg/openai/prompt.go for you due to it was written in Simplified Chinese. You will also need a valid OpenAI account to do the prompt engineering.

This issue is meant to improve the chat histories summarization feature (aka. recap), the mean goal is to reduce the token usage of OpenAI prompt.

Before dive into this feature, please allow me to summarize how recap works. There is a middleware called RecordMessage, it will extract and format the chat messages coming from Telegram and then store them into Postgres. When user send the /recap command or it was the time to send a automatic recap message to chat groups (implemented in internal/services/auto recap), the chathistories models will format the messages into the following pattern (implementation at

func (m *Model) SummarizeChatHistories(chatID int64, histories []*ent.ChatHistories) (string, error) {
):

msgId:1 UserName1 sent: ```Hello!```
msgId:2 UserName2 replying to [UserName1 sent msgId:1]: ```Hello! How are you today?```

And then it will inject the pattern into OpenAI prompt template for OpenAI's GPT-3.5 model to summarize the chat histories for us.

You may find out the UserName1 and UserName2 is explicitly stated each time they appears. This can be inefficient and use a lot of tokens for prompt when multiple users appears multiple times when they chatted in a same group.

Therefore I came up with a idea: why don't we aggregate the usernames appeared in the chat histories, and then place a formatted username map before the chat histories, additionally, use userId or array index to represent the username just like this:

Users:"""
1: UserName1
2: UserName2
...
10: UserName10
"""

Chat histories:"""
msgId:1 user:1 sent: """Hello!"""
msgId:2 user:2 replying to [user:1 sent msgId:1]: """Hello! How are you today?"""
msgId:3 user:3 sent: """Nice to meet you guys!"""
msgId:4 user:10 sent """The party is just about to start!!!"""
"""

The terms can be explained now.

  • User infos: includes user's full name, username, userId, it will be used later in the prompt. In the example above, I only used username (can be full name too), the number before the username can be userId or just indexes.
  • Stand-alone user context: it is the part Users: in the above example, it holds the aggregated user names and number it represented for users.
  • Within prompt is the pattern I talked about above, the formatted message pattern is part of the prompt, the full prompt is located at pkg/openai/prompt.go

I think the best way to let you jump into this issue is to wait for me to implement the i18n support we talked about previously (#67), and then we can use English to write the OpenAI prompt for better understanding. How is that?

@rafiramadhana
Copy link
Contributor

Yes you can pick this issue and work on it, but I am afraid of it might be too hard to understand and make changes to OpenAI prompt in pkg/openai/prompt.go for you due to it was written in Simplified Chinese. You will also need a valid OpenAI account to do the prompt engineering

OK. No worries.

Thanks for the explanation.

Do you have recommendation of issues that I can help? Maybe something that needs to be done sooner and requires less dependency (e.g. credentials).

@nekomeowww
Copy link
Owner Author

Yes you can pick this issue and work on it, but I am afraid of it might be too hard to understand and make changes to OpenAI prompt in pkg/openai/prompt.go for you due to it was written in Simplified Chinese. You will also need a valid OpenAI account to do the prompt engineering

OK. No worries.

Thanks for the explanation.

Do you have recommendation of issues that I can help? Maybe something that needs to be done sooner and requires less dependency (e.g. credentials).

What do you mean by credentials?

@nekomeowww
Copy link
Owner Author

TBH, there aren't any simple and easy issues or ongoing future issues with less dependency for you to work with, due to the lacked support of i18n, and insights-bot is a project that relies on languages and GPT models, we initially developed it with Chinese support only. You may have to wait for us to support i18n.

@rafiramadhana
Copy link
Contributor

What do you mean by credentials?

Something like access to GPT products or the telegram bot.

You may have to wait for us to support i18n.

Ok no worries. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants