Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

ziweizh24 · 2024-05-01T15:39:58Z

i initialized and trained the following model:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-en-mul",
    args=model_args,
    use_cuda=True,
)

After training, model.predict(['this is a test']) gives me desired output.
However, when I loaded back this model to make prediction. The output is off:

from transformers import MarianMTModel
my_model = MarianMTModel.from_pretrained('outputs/best_model')

translated = my_model.generate(**tokenizer(['this is a test'], return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Anything i missed?

The text was updated successfully, but these errors were encountered:

ThilinaRajapakse · 2024-05-02T11:22:06Z

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

ziweizh24 · 2024-05-02T13:08:06Z

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

I did get the warning saying that not all weights are initialized when loading the model using MarianMTModel.from_pretrained('outputs/best_model').
Could you say a bit more about how to reload the model (PATH='outputs/best_model/') with Simple Transformer (I assume it will be using Seq2SeqModel)? Is Seq2SeqModel.from_pretrained(<PATH>) supported?

ThilinaRajapakse · 2024-05-02T14:58:05Z

To load with ST, you'd do:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="<PATH>",
    args=model_args,
    use_cuda=True,
)

In theory, Seq2SeqModel.from_pretrained(<PATH>) is also supported since ST uses a Huggingface model under the hood. I don't remember this exactly, but maybe Marian encoder-decoder models are a special case where this doesn't work (due to how the encoder and the decoder are set up).

ziweizh24 closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

ziweizh24 commented May 1, 2024

ThilinaRajapakse commented May 2, 2024

ziweizh24 commented May 2, 2024

ThilinaRajapakse commented May 2, 2024

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

Comments

ziweizh24 commented May 1, 2024

ThilinaRajapakse commented May 2, 2024

ziweizh24 commented May 2, 2024

ThilinaRajapakse commented May 2, 2024