Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load previously trained GPT-2 Language generation model #1527

Open
timmartin opened this issue May 16, 2023 · 0 comments · May be fixed by #1528
Open

Can't load previously trained GPT-2 Language generation model #1527

timmartin opened this issue May 16, 2023 · 0 comments · May be fixed by #1528

Comments

@timmartin
Copy link

Describe the bug
I trained a GPT-2 model from scratch using LanguageModelingModel. This was saved to disk. I then started a new process and tried to load it, and it reported:

RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
    size mismatch for transformer.wte.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

To Reproduce
Generate a model using the train_new_lm.py script shipped in the examples directory. Try to load the model with:

from simpletransformers.language_modeling import LanguageModelingModel

model = LanguageModelingModel(
    "gpt2",
    "./outputs/from_scratch/best_model",
)

Expected behavior
No exception.

Desktop (please complete the following information):

  • Linux
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant