Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama : add Deepseek support #5981

Closed
ggerganov opened this issue Mar 10, 2024 · 8 comments
Closed

llama : add Deepseek support #5981

ggerganov opened this issue Mar 10, 2024 · 8 comments
Assignees
Labels
good first issue Good for newcomers help wanted Extra attention is needed model Model specific

Comments

@ggerganov
Copy link
Owner

ggerganov commented Mar 10, 2024

Support is almost complete. There is a dangling issue with the pre-tokenizer: #7036

A useful discussion related to that is here: #7144


Outdated below

Creating this issue for more visibility

The main problem is around tokenization support, since the models use some variation of the BPE pre-processing regex. There are also some issues with the conversion scripts.

Anyway, looking for contributions to help with this

Previous unfinished work:

Possible implementation plan: #5464 (comment)

@ggerganov ggerganov added help wanted Extra attention is needed good first issue Good for newcomers model Model specific labels Mar 10, 2024
@dragnil1
Copy link
Contributor

Hello, @ggerganov, I'd like to try working on it as my good first issue.

@ggerganov
Copy link
Owner Author

Ok 👍 keep us posted

@Kangmo
Copy link

Kangmo commented Mar 22, 2024

Waiting for this to come. deepseek model is famouse for coding well and Korean language ability.

@fostiropoulos
Copy link

In the supported models there is Deepseek. I am able to use DeepSeekCoder-33B with comparatively similar results to their API. Can someone please clarify what are the failure cases with the current tokenization?

@Columpio
Copy link

Columpio commented Apr 1, 2024

@fostiropoulos There may be a problem with deepseek coder 1.3b, which might be somehow irreprodicible in 6.7b, 7b, and 33b.
See this

@mirek190
Copy link

any progress with Deepseek support?

@hyperbolic-c
Copy link

Hey, so it could support DeepSeek-coder model after #6920 be merged? Thanks !

@ggerganov
Copy link
Owner Author

I've updated the description of the issue with the latest state.

Support is pretty much complete, though there are some edge cases during tokenization that are handled incorrectly. For example the letter ü is tokenized to a different token than it should - more info in the referenced links

I think we can declare DeepSeek models supported and handle the problem above into a separate task related to correct processing of added tokens, since it is not specific for DeepSeek models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed model Model specific
Projects
Status: Done
Development

No branches or pull requests

7 participants