Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何预训练模型和增加词汇表? #49

Open
xiongxiaochu opened this issue Apr 6, 2023 · 3 comments
Open

如何预训练模型和增加词汇表? #49

xiongxiaochu opened this issue Apr 6, 2023 · 3 comments

Comments

@xiongxiaochu
Copy link

下载下来7B的模型之后,测试了几个中文问题,发现回答有很多无法识别的字符,是不是模型中中文的词汇表特别小?请问如何扩充中文词汇,并且在此基础上增加中文预训练语料来预训练?

@PhoebusSi
Copy link
Owner

可以试试bloom

@forex24
Copy link

forex24 commented Apr 9, 2023

@PhoebusSi
Copy link
Owner

https://github.com/ymcui/Chinese-LLaMA-Alpaca,这个项目增加了词汇表

发现这个链接没有给增加词表和预训练的相关代码,是否还有其他的推荐?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants