This repository contains the Notebook to finetune the T5-base model for title generation from abstracts. The dataset used for this task is open-sourced on Kaggle: Medical paper title and abstract NLP dataset.
The dataset used for fine-tuning is provided in both training and testing sets in CSV format. This dataset forms the basis for training and evaluation.
- /dataset/train.csv: This is used for training and contains a total of 6000 rows with abstracts of various sizes. Only abstracts with more than 200 words are used. You can change this setting in the notebook according to your requirements.
- /dataset/test.csv: This is used for testing and contains 2000 rows with abstracts of various sizes.
To fine-tune the T5 model using this repository and the provided dataset, follow these steps:
-
Open the Google Colab Notebook: Click the Open in Colab button to open the notebook in Colab.
-
If the link does not work, clone this repository and open the notebook by clicking the "Open in Colab" button at the top of the notebook. OR you can open it manually in Colab.
-
Run the Notebook: Follow the instructions provided within the Colab notebook. It will guide you through the steps for fine-tuning the T5 model.
Average BLEU Score: 0.9913513513513513 which is close to 1.
I wish I could add citations, but neither the dataset nor the notebook provided them. However, please consider using my work and referencing this repository if you find it useful.