Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use the transformers.AutoTokenizer to load the tokenizer? #151

Open
tian969 opened this issue Apr 25, 2024 · 4 comments
Open

Can I use the transformers.AutoTokenizer to load the tokenizer? #151

tian969 opened this issue Apr 25, 2024 · 4 comments

Comments

@tian969
Copy link

tian969 commented Apr 25, 2024

I know the tokenizer.py in this Repo use TikTokenizer, can I use transformers.AutoTokenizer to load the tokenizer so that I dont need to amend my code class? And if i not use tokenizer.py, ChatFormat can not be used too.

@tian969
Copy link
Author

tian969 commented Apr 25, 2024

I mean transformers.PretrainedTokenizer class

@ppaanngggg
Copy link

same question

@ppaanngggg
Copy link

I find the solution, you should use model files on huggingface. There is a tokenizer.json file can be loaded directly.

@subramen
Copy link
Contributor

subramen commented May 1, 2024

Yes, you can use AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants