Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code snippets in the chat window loose syntax highlighting occasionally #244

Open
nicikiefer opened this issue May 10, 2024 · 6 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on

Comments

@nicikiefer
Copy link

nicikiefer commented May 10, 2024

Describe the bug
The code snippets shown in the chat window loose syntax highlighting occasionally and appear in plain white color. Both the input as well as the output code is affected by this. Sometimes both, the highlighted code as well as the generated code appear without syntax highlighting. One thing I could observe is that sometimes the syntax highlighting is correctly applied until the very end of the code generation. Once code generation finishes, the syntax highlighting is lost again.

I am not sure what is causing this. I don't use any special themes and since the code snippets are correctly formatted as such, I assume there is just the color coding of the syntax highlighting missing in the last step.

To Reproduce

  1. Use Ollama as your provider
  2. Use llama3 as your chat model
  3. Mark code in your editor and have it fixed, explained, refactored by twinny
  4. Either the selected code or the generated code is missing syntax highlighting and appears in plain white (see screenshot below)

Expected behavior
Syntax highlighting being correctly applied to both the code used as an input as well as the generated code.

Screenshots
Bildschirmfoto vom 2024-05-10 16-00-01

grafik

Edit: the first screenshot shows a different model than llama3 because I moved over to codeqwen because I encountered the syntax highlighting issue. But to be clear, I did use llama3 when encountering the issue.

Logging
Rnable logging in the extension settings if not already enabled (you may need to restart vscode if you don't see logs). Proivide the log with the report.

API Provider
Ollama

Chat or Auto Complete?
chat

Model Name
llama3:latest

Desktop (please complete the following information):

  • OS: Ubuntu 24.04
  • VSCode (see metadata attached below)
  • Twinny Version: v3.11.35

VSCode:

  • Version: 1.89.1
  • Commit: dc96b837cf6bb4af9cd736aa3af08cf8279f7685
  • Date: 2024-05-07T05:16:23.416Z
  • Electron: 28.2.8
  • ElectronBuildId: 27744544
  • Chromium: 120.0.6099.291
  • Node.js: 18.18.2
  • V8: 12.0.267.19-electron.0
  • OS: Linux x64 6.8.0-31-generic snap

Additional context
It does happen for other models as well (like codeqwen), but it most often appears with llama3:latest (pulled directly using Ollama).

Please let me know if you need more input or if there are other ways to successfully use llama3 and I am just doing it wrong. Also thanks for this amazing extension ❤️

@rjmacarthy rjmacarthy added the question Further information is requested label May 15, 2024
@rjmacarthy
Copy link
Owner

Hello, thanks for the report. Basically when an LLM replies with the code/text it needs to provide backticks "```" to indicate that the code is a codeblock. By editing/improving your prompt you might get better results, if the model decides not to add backticks I don't know it's code and I am too lazy to implement some other code detection method.

@rjmacarthy rjmacarthy added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers wontfix This will not be worked on labels May 27, 2024
@nicikiefer
Copy link
Author

Fair enough. Since in this special case I tried to fix code using twinny, do you think it is safe to assume to use syntax highlighting even if the model does not provide it itself? If I understand it correctly the missing syntax highlighting in the screenshot does not concern the model but the pasted code. But that being said, I am not sure if the code I highlighted will just be pasted into the Fix code editor or if a model is already doing some preprocessing there.

Just trying go for the low hanging fruit in case such a simple heuristic might improve this issue already, but I also get that it is not high prio and maybe not worth adding workarounds for misbehaving models or wrongly used models.

Hope that helps, lmk if you need anything else and no hard feelings if you focus on something else instead

@rjmacarthy
Copy link
Owner

rjmacarthy commented May 27, 2024

I see, sorry I didn't realise you were referring to pasted code also. Somehow you'd have to provide the language to the markdown react syntax highlighter like ```python, I could read the current active editor, but if the user changes it then pastes the code it might be wrong. Edit: there is no pre-processing but it might be be possible to classify the code type and apply the correct annotation.

@nicikiefer
Copy link
Author

I see, thanks for your explanation! All in all it sounds like it might be too much work for a corner case like this. I guess the only way you could detect it is by the active editor as you pointed out and if possible get the file type and create the markdown syntax highlighter based on this. But as you also pointed out, in case it is possible for a user to switch the editor before the code was pasted, that also is not a consistent behavior.

@rjmacarthy
Copy link
Owner

It might also be possible to get the language at point of copy not just paste. However that doesn't cover the fact that code can be copied outside of the editor too. Code classification would be the best way, might be a waste of tokens for someone using external api like open AI for chat, for local instance it would be fine, not sure if would also need a new model type or or could classify with the instruction model.

@nicikiefer
Copy link
Author

Code classification would be the best way

I agree. Not sure but could there be a way that VSCode can handle that for you? I assume they themselves might have some classification to autodetect which syntax highlighting to use.

Anyways, thanks for bouncing back some ideas! I guess copilot gets away with it by just mentioning the context that it takes into account and afaik does only allow code in the context of VSCode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants