Fine-tuning the mBart model to correct grammatical and syntactical errors in Bengali sentences. The aim is to achieve high accuracy in transforming incorrect sentences into their correct form. The full training process can be found here: Notebook Here is a live HuggingFace Demo of the finetune model in action.
- Model: mBART Large 50
- Description: A multilingual transformer model capable of understanding and generating 50 different languages.
- Source: BNSEC Data Repository
- Size: 1.3 Million sentence pairs
- Characteristics: Contains a mix of correct and grammatically incorrect Bengali sentences.
Optimizer: AdamW
Learning Rate: 0.00001
Batch Size: 128
Training Steps: 6500
GPU: Google Colab A100
Epochs: 1
Max Token Length: 32
Metric | Training | Post-Training Testing |
---|---|---|
BLEU | 0.805 | 0.443 |
CER | 0.053 | 0.159 |
WER | 0.101 | 0.406 |
Meteor | 0.904 | 0.655 |