Can't load previously trained GPT-2 Language generation model #1527

timmartin · 2023-05-16T10:07:37Z

Describe the bug
I trained a GPT-2 model from scratch using LanguageModelingModel. This was saved to disk. I then started a new process and tried to load it, and it reported:

RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel:
    size mismatch for transformer.wte.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([375, 768]) from checkpoint, the shape in current model is torch.Size([10000, 768]).
    You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

To Reproduce
Generate a model using the train_new_lm.py script shipped in the examples directory. Try to load the model with:

from simpletransformers.language_modeling import LanguageModelingModel

model = LanguageModelingModel(
    "gpt2",
    "./outputs/from_scratch/best_model",
)

Expected behavior
No exception.

Desktop (please complete the following information):

Linux

The text was updated successfully, but these errors were encountered:

timmartin linked a pull request May 16, 2023 that will close this issue

Record the vocab size in the LanguageModelingModel args #1528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't load previously trained GPT-2 Language generation model #1527

Can't load previously trained GPT-2 Language generation model #1527

timmartin commented May 16, 2023

Can't load previously trained GPT-2 Language generation model #1527

Can't load previously trained GPT-2 Language generation model #1527

Comments

timmartin commented May 16, 2023