Additional prompt processing during chat. #813

inspir3dArt · 2024-04-28T00:24:58Z

Hi,

since I updated koboldcpp today (Termux/ Android) I have the problem that there are much more tokens processed before the LLM start's to write a reply.

Usually, after loading the the provided character card before the first response, koboldcpp just processed the tokens of my last message, before replying, now it processes between 300 and 500 tokens before replying, every time during the chat.

I usually write one or two sentences in my chat massages that have been processed really fast, and the LLM writer's in a good read while it writes speed. But now I have to wait so long for a reply to start, that doesn't make fun anymore.

Is there a way to fix that?

inspir3dArt · 2024-04-28T01:29:54Z

Looks like it's caused by using the (new?) Author's Note feature. I put a short instruction in there, telling the LLM to reply in a range of words. I was really happy to find out about that feature, because for the first time this really worked (I tried it using the system prompt section in json files before, but that never did the job consistently). Unfortunately it causes a lot of additional prompt processing. Would be really nice if that could be fixed / solved differently, like context shifting that works usually really quick.

LostRuins · 2024-04-28T03:55:28Z

If you use author's note then there will always be some reprocessing required. To reduce the amount, you can change the author note depth to strong.

inspir3dArt · 2024-04-28T10:46:32Z

Hi, thank you for your reply. I have tried the different options now.

The closest to finding a solution could be the "Author's Note" strict mode (Putting it at the end of my message like the "Stop sequences".

From what I can see by looking at what's shown in the terminal, it looks like this gets removed after the LLM reply finished, what causes that the previous LLM reply gets processed again. Isn't there a way to cut it out without a need to reprocess, or wouldn't it solve the problem not to remove it at all?

Edit: Or are there other options to motivate a LLM to write replys in a defined word or token range?

LostRuins · 2024-05-01T14:21:28Z

In that case, you can try adding the stuff to Memory instead of author's note, which will stay at a static position.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional prompt processing during chat. #813

Additional prompt processing during chat. #813

inspir3dArt commented Apr 28, 2024

inspir3dArt commented Apr 28, 2024

LostRuins commented Apr 28, 2024

inspir3dArt commented Apr 28, 2024 •

edited

LostRuins commented May 1, 2024

Additional prompt processing during chat. #813

Additional prompt processing during chat. #813

Comments

inspir3dArt commented Apr 28, 2024

inspir3dArt commented Apr 28, 2024

LostRuins commented Apr 28, 2024

inspir3dArt commented Apr 28, 2024 • edited

LostRuins commented May 1, 2024

inspir3dArt commented Apr 28, 2024 •

edited