💡 [REQUEST] - <title>怎么能够继续昨天的训练，继续训练 #359

sunjunlishi · 2024-04-12T03:05:28Z

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

摘要 | Summary

昨天训练到checkpoint-1200，loss值是0.9 今天想继续昨天的训练，能解决吗？
不然loss值要从头2.7开始往下降，比较耗时。

基本示例 | Basic Example

model = transformers.AutoModelForCausalLM.from_pretrained(
model_args.model_name_or_path,
config=config,
cache_dir=training_args.cache_dir,
device_map=device_map,
trust_remote_code=True,
quantization_config=GPTQConfig(
bits=4, disable_exllama=True
)
if training_args.use_lora and lora_args.q_lora
else None,
)

缺陷 | Drawbacks

因为大家的资源有限，只能晚上训练。如果不能继续昨天的训练，从头再训练，比较耗时。

未解决问题 | Unresolved questions

No response

tristanwqy · 2024-04-13T22:32:39Z

--resume_from_checkpoint /path/to/your/checkpoint

sunjunlishi · 2024-04-19T10:29:27Z

非常感谢

Qinger27 · 2024-04-28T02:16:06Z

你好，想问下怎么配置 finetune 环境呀？我使用 finetune.py 时 deepspeed 不能正常使用？方便的话可以回复一下吗？谢谢大佬～

sunjunlishi added the question Further information is requested label Apr 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡 [REQUEST] - <title>怎么能够继续昨天的训练，继续训练 #359

💡 [REQUEST] - <title>怎么能够继续昨天的训练，继续训练 #359

sunjunlishi commented Apr 12, 2024

tristanwqy commented Apr 13, 2024

sunjunlishi commented Apr 19, 2024

Qinger27 commented Apr 28, 2024

💡 [REQUEST] - <title>怎么能够继续昨天的训练，继续训练 #359

💡 [REQUEST] - <title>怎么能够继续昨天的训练，继续训练 #359

Comments

sunjunlishi commented Apr 12, 2024

起始日期 | Start Date

实现PR | Implementation PR

相关Issues | Reference Issues

摘要 | Summary

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

tristanwqy commented Apr 13, 2024

sunjunlishi commented Apr 19, 2024

Qinger27 commented Apr 28, 2024