Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional prompt processing during chat. #813

Open
inspir3dArt opened this issue Apr 28, 2024 · 4 comments
Open

Additional prompt processing during chat. #813

inspir3dArt opened this issue Apr 28, 2024 · 4 comments

Comments

@inspir3dArt
Copy link

Hi,

since I updated koboldcpp today (Termux/ Android) I have the problem that there are much more tokens processed before the LLM start's to write a reply.

Usually, after loading the the provided character card before the first response, koboldcpp just processed the tokens of my last message, before replying, now it processes between 300 and 500 tokens before replying, every time during the chat.

I usually write one or two sentences in my chat massages that have been processed really fast, and the LLM writer's in a good read while it writes speed. But now I have to wait so long for a reply to start, that doesn't make fun anymore.

Is there a way to fix that?

@inspir3dArt
Copy link
Author

Looks like it's caused by using the (new?) Author's Note feature. I put a short instruction in there, telling the LLM to reply in a range of words. I was really happy to find out about that feature, because for the first time this really worked (I tried it using the system prompt section in json files before, but that never did the job consistently). Unfortunately it causes a lot of additional prompt processing. Would be really nice if that could be fixed / solved differently, like context shifting that works usually really quick.

@LostRuins
Copy link
Owner

If you use author's note then there will always be some reprocessing required. To reduce the amount, you can change the author note depth to strong.

@inspir3dArt
Copy link
Author

inspir3dArt commented Apr 28, 2024

Hi, thank you for your reply. I have tried the different options now.

The closest to finding a solution could be the "Author's Note" strict mode (Putting it at the end of my message like the "Stop sequences".

From what I can see by looking at what's shown in the terminal, it looks like this gets removed after the LLM reply finished, what causes that the previous LLM reply gets processed again. Isn't there a way to cut it out without a need to reprocess, or wouldn't it solve the problem not to remove it at all?

Edit: Or are there other options to motivate a LLM to write replys in a defined word or token range?

@LostRuins
Copy link
Owner

In that case, you can try adding the stuff to Memory instead of author's note, which will stay at a static position.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants