Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading a fine tuned Seq2Seq MarianMT model gives wrong predictions #1571

Closed
ziweizh24 opened this issue May 1, 2024 · 3 comments
Closed

Comments

@ziweizh24
Copy link

i initialized and trained the following model:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="Helsinki-NLP/opus-mt-en-mul",
    args=model_args,
    use_cuda=True,
)

After training, model.predict(['this is a test']) gives me desired output.
However, when I loaded back this model to make prediction. The output is off:

from transformers import MarianMTModel
my_model = MarianMTModel.from_pretrained('outputs/best_model')

translated = my_model.generate(**tokenizer(['this is a test'], return_tensors="pt", padding=True))
[tokenizer.decode(t, skip_special_tokens=True) for t in translated]

Anything i missed?

@ThilinaRajapakse
Copy link
Owner

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

@ziweizh24
Copy link
Author

Do you get any warnings when you reload the model? (Set up logging if you haven't: logging.basicConfig(level=logging.INFO))

Does it work as expected if you reload the model with Simple Transformers and use model.predict()?

I did get the warning saying that not all weights are initialized when loading the model using MarianMTModel.from_pretrained('outputs/best_model').
Could you say a bit more about how to reload the model (PATH='outputs/best_model/') with Simple Transformer (I assume it will be using Seq2SeqModel)? Is Seq2SeqModel.from_pretrained(<PATH>) supported?

@ThilinaRajapakse
Copy link
Owner

To load with ST, you'd do:

model = Seq2SeqModel(
    encoder_decoder_type="marian",
    encoder_decoder_name="<PATH>",
    args=model_args,
    use_cuda=True,
)

In theory, Seq2SeqModel.from_pretrained(<PATH>) is also supported since ST uses a Huggingface model under the hood. I don't remember this exactly, but maybe Marian encoder-decoder models are a special case where this doesn't work (due to how the encoder and the decoder are set up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants