Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very High Token Consumption #2

Open
Murtuza-Chawala opened this issue Aug 29, 2023 · 9 comments
Open

Very High Token Consumption #2

Murtuza-Chawala opened this issue Aug 29, 2023 · 9 comments

Comments

@Murtuza-Chawala
Copy link
Contributor

Hi, there first of all this is an amazing project which you have built

But it seems that for each query even the basic ones token usage is in 40-50k
fopr queries based on personal .csv data (.csv data record contains 30 records only0
Any suggestions on how to reduce the token usage

@pgalko
Copy link
Owner

pgalko commented Aug 29, 2023

I am glad you like it :-)
The high token use is due to a few things. Please see the below breakdown.

  1. To ensure that the models respond accurately and with properly formatted output the prompts are quite lengthy. The prompts are located in prompts.py module if you want to explore them.
  2. For models to remember the context of previous conversations bamboo ai needs to include them in the api messages. Each message by default can have up to 4 pairs of question/response included in it.
  3. Each execution consists of several llm calls:
    Call 1: Based on the question, llm selects an expert to best address it,
    Call 2: Question is broken down to a list of individual tasks (pseudo code)
    Call 3: llm generates code based on the task list
    Call 4: Code executes and the output is summarized. If code execution fails the llm is called again for the code correction.
    Call 5: llm ranks the final solution

There is a few ways to reduce the token cost:

  1. For simple tasks choose the default model gpt-3.5 which is 10 time less expensive than gpt-4.
  2. I have recently added a support for local open source models like CodeLlama 7B,13B and 34B or WizardCoder. The answers are not as accurate as gpt-4 but are very close to gpt-3.5.
  3. You can reduce the number of previous conversation that are included in messages by setting the 'max_conversations' parameter to a lesser number. For example if you set it to max_conversations=2 it will half the token usage.
  4. I have modified a few things related to error correction in the last release (0.3.20) aiming at token usage reduction. Try to upgrade to this version and you should see some reduction in the token usage.

Please note that bamboo ai only sends the headers and the first row of the dataset to the llm, so the very large datasets will incur the same cost as the small ones.
I am aware that the token usage is quite high and am planning to do a thorough review of the prompts and the flow soon. The prompts have been designed to work with the march versions of the openai models and some sections might no longer be necessary with the current models. I hope it helps.

@pgalko
Copy link
Owner

pgalko commented Aug 29, 2023

I forgot to mention one more thing. You can set 'exploratory=False'. BambooAI will skip the break down of the question into task list and will go straight to code generation. This will result in significant token usage reduction particularly if you also reduce the max_conversations.

@Murtuza-Chawala
Copy link
Contributor Author

@pgalko Your code runs like a charm, it sometimes feels like I'm using OpenAI's own CI API it is soo damn good,
Thanks for all the steps I will follow them and keep you updated. (Regarding High Token Usage)

  • Also Id love to contribute in your project as I'm working on a similar Use-Case.

@pgalko
Copy link
Owner

pgalko commented Aug 29, 2023

@Murtuza-Chawala great to hear that :-).
I frequently use OpenAI Ci as a benchmark. The CI is much faster and the outputs formatting is so much nicer, but often bamboo ai succeeds when CI fails, particularly on multifaceted and complex tasks. You also have a full control of the compute and libraries that it can use, not to mention internet access and google search capabilities and also vector db as a knowledge base to provide in-context learning. The biggest drawback at this point in time is the token cost as you have rightfully pointed out :-(.

  • Feel free to contribute, I would very much appreciate it. The project definitely needs a pair of fresh eyes.

@pgalko
Copy link
Owner

pgalko commented Aug 30, 2023

@Murtuza-Chawala
I have identified the issue that was leading to excessive token usage. The problem stemmed from the way the default example code was incorporated into the prompt template. Specifically, each line of the default example code had a leading space. This led the language model to interpret the pattern as intentional incorporating the pattern into its responses, ultimately causing code execution to fail due to an indentation error. This failure, in turn, triggered an unnecessary automatic error correction call, which led to token usage ballooning after several iterations.

The issue has now been resolved, and the fix has been committed to the repository. A new version, 0.3.21, has been pushed to PyPi. You can install this updated version via pip, and you should experience significantly improved performance and reduced token usage.
Thanks for bringing this to my attention :-). You can still reduce the token usage further following the steps as outlined in the previous messages.

@Murtuza-Chawala
Copy link
Contributor Author

Wow that's great !

So this also would mean that the code generated would have a higher chance of success rather than it going in a loop again right?

@pgalko
Copy link
Owner

pgalko commented Sep 7, 2023

Yes, that is correct. You should see a lot less error corrections, hence reduced token usage.

@jmanhype
Copy link

use this https://twitter.com/raunakdoesdev/status/1700215444542750923

or ZEP memory https://www.getzep.com/

@pgalko
Copy link
Owner

pgalko commented Sep 12, 2023

Thanks mate I will take a look.
I would like the code preserved in the messages in it entirety, but it could definitely help with the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants