Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [REQUEST] - <title>怎么能够继续昨天的训练,继续训练 #359

Open
sunjunlishi opened this issue Apr 12, 2024 · 3 comments
Labels
question Further information is requested

Comments

@sunjunlishi
Copy link

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

昨天训练到checkpoint-1200,loss值是0.9 今天想继续昨天的训练,能解决吗?
不然loss值要从头2.7开始往下降,比较耗时。

基本示例 | Basic Example

model = transformers.AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
config=config,
cache_dir=training_args.cache_dir,
device_map=device_map,
trust_remote_code=True,
quantization_config=GPTQConfig(
bits=4, disable_exllama=True
)
if training_args.use_lora and lora_args.q_lora
else None,
)

缺陷 | Drawbacks

因为大家的资源有限,只能晚上训练。如果不能继续昨天的训练,从头再训练,比较耗时。

未解决问题 | Unresolved questions

No response

@sunjunlishi sunjunlishi added the question Further information is requested label Apr 12, 2024
@tristanwqy
Copy link

--resume_from_checkpoint /path/to/your/checkpoint

@sunjunlishi
Copy link
Author

非常感谢

@Qinger27
Copy link

你好,想问下怎么配置 finetune 环境呀?我使用 finetune.py 时 deepspeed 不能正常使用?方便的话可以回复一下吗?谢谢大佬~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants