Skip to content

Persona Styled Recipes are generated given a list of ingredients

License

Notifications You must be signed in to change notification settings

Anshumaan-Chauhan02/Recipe-Infusion

Repository files navigation

Project Description

This project introduces Recipe Infusion, a framework designed to generate style-infused recipes. The framework consists of two main components: Recipe Generation and Style Infusion. In the Recipe Generation component, a distilgpt2 model is fine-tuned on a processed custom dataset. This dataset is created by combining RecipeBox and RecipeNLG data sources. The fine-tuned distilgpt2 model demonstrates the ability to generate coherent and sensible recipes. Moving on to the Style Infusion component, the project focuses on fine-tuning a conditional generation model called T5 small for the purpose of style transfer. Due to the unavailability of parallel datasets specific to the selected celebrities' styles, the project utilizes back translation as an approach to create a parallel dataset. This parallel dataset is generated by translating styled sentences back and forth between languages. The resulting parallel dataset is then used to train the T5 model. Once trained, the T5 model is employed to perform style transfer on the generated recipes. By leveraging the learned style representations, the framework enables the infusion of different styles into the recipe content, providing users with recipe variations that reflect specific styles associated with the selected celebrities or other sources. Overall, the Recipe Infusion framework offers a comprehensive approach to generating style-infused recipes, combining both recipe generation and style transfer techniques. The project's results demonstrate the effectiveness of the approach and its potential to enhance recipe personalization and creativity.

Table of Contents

  1. Dataset Information
  2. Dependencies
  3. Files
  4. How to Run

Dataset Information

Dependencies

  1. Numpy : Perform several mathematical evaluations in the preprocessing of the datasets

    pip install numpy

  2. Pandas : Loading/Processing/Storing of the different datasets

    pip install pandas

  3. Itertools : Easy iteration of large lists

    pip install itertools

  4. Sklearn : Cosine Similarity and TF-IDF

    pip install sklearn

  5. Transformers : DistilGPT2, T5-small, MarianMT (both model and tokenizers)

    pip install transformers

  6. SentencePiece : Used by MarianMT's tokenizer (Back Translation)

    pip install sentencepiece

  7. Evaluate : BLEU Score evaluation

    pip install evaluate

  8. Matplotlib: Plotting of the training curves

    pip install matplotlib

Files

  • RecipeDataset.ipynb :
    • Loading of both Recipes datasets
    • Preprocessing datasets to get into a common format
    • Performing statistical analysis on the data
    • Storing the final concatenated dataset
  • Statistics.ipynb :
    • Statistical analysis on the preprocessed datasets and the final concatenated dataset
  • Recipe_Generation_DistilGPT.ipynb :
    • Loading of the final recipe dataset
    • Data Preparation of the final dataset
    • Training of DistilGPT2 Model
    • Testing of the Finetuned (FT) model and baseline model
    • Evaluation of the models - BLEU Score and Perplexity
    • Generation of Recipe dataset for Style Transfer
    • Error Analysis on Adversarial inputs
  • Preprocess_TST_dataset.ipynb :
    • Loading the non-parallel data - Taylor and Trump
    • Preprocess the datasets
    • Extract statistical info about the dataset
  • Shakespeare_and_Scripts_Preprocessing.ipynb :
    • Loading the non-parallel data - Michael
    • Load the parallel data - Shakespeare
    • Preprocess the datasets
    • Extract some statistical info about the dataset
  • BackTranslation.ipynb :
    • Load the MarianMT models for Fr-En and En-Fr
    • Perform back translation to generate synthetic parallel data - Michael, Taylor and Trump
    • Store the parallel dataset
  • TST_Architecture.ipynb :
    • Load all the parallel datasets
    • Finetune a different T5-small model on each dataset
    • Generate styled recipes - Sentence-wise and Entire Recipe
    • Test the performance (Human Evaluation) on the styled recipes (Sentence-wise)
    • Check for style infusion on random sentences
  • Supplementary/Adversarial Inputs.xlsx
    • Adversarial Examples to the model. Contains 120 examples for which model's output differs from the expected behavior and is of low quality
  • Supplementary/Sentence_Styled_Recipes.xlsx
    • Human Evaluations on the Styled Recipes generated by the Fine tuned T5 model

How to Run

Except training (due to computational limitations) of the LLMs all of the code was implemented in Google Colab. We have listed the steps that needed to be followed for a successful implementation of the project.

  1. Download all the .ipynb files and upload them in a new folder on Google Drive named 'Project 685'
  2. Download all the Recipe Datasets and add to top level folder 'Project 685'
  3. Run RecipeDataset.ipynb to get the 'Final_dataset' file, which consists of the preprocessed concatenated dataset
  4. Run Statistics.ipynb file to display some statistics about the datasets [OPTIONAL]
  5. Run Recipe_Generation_DistilGPT.ipynb to get the finetuned recipe generation model and Recipe generations
  6. Download the Text Style Transfer datasets. Create a new sub-folder {persona}_TST. (ex. Taylor_TST)
  7. Upload the .zip datasets for Taylor and Trump in their respective sub-folders. For Shakespeare and Michael add unzipped .csv files to top level folder
  8. Run the Preprocess_TST_dataset.ipynb and Shakespeare_and_Scripts_Preprocessing.ipynb to get the appropriate formatted dataset for Back translation
  9. Run BackTranslation.ipynb to get a parallel dataset for Taylor, Trump and Michael
  10. Run TST_Architecture.ipynb file to get the finetuned TST models and generate final outputs

Releases

No releases published

Packages

No packages published