Skip to content

Implementing a Denoising Diffsuion Probabilistic Model (DDPM) on Tensorflow from scratch for Pokémon sprites synthesis

License

Notifications You must be signed in to change notification settings

AlejandroPqLz/tf-diffusion-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Project Overview

Implementing a conditioned Denoising Diffusion Probabilistic Model (DDPM) on TensorFlow from Scratch for Pokémon generation and understanding the mathematics and theory behind it. Therefore, to achieve this goal, the Pokémon sprites dataset will be used: Pokémon sprite images with license: .

This project has been developed for my Bachelor's Thesis in Data Science and Artificial Intelligence at Universidad Politécnica de Madrid (UPM).

NOTE: Since this project is for a spanish college institution, the jupyter-notebook's markdowns and the thesis document are in spanish 🇪🇸. However, the code and comments are in english 🇬🇧.

📂 Structure

The structure of the repository is as follows:

📦DiffusionScratch
 ┣ 📂.devcontainer
 ┣ 📂data
 ┃ ┣ 📂interim
 ┃ ┃ ┣ 📜image_paths.json
 ┃ ┃ ┗ 📜pokemon_dict_dataset.json
 ┃ ┣ 📂processed
 ┃ ┃ ┣ 📂pokemon_tf_dataset
 ┃ ┃ ┗ 📜pokedex_cleaned.csv
 ┃ ┗ 📂raw
 ┃ ┃ ┣ 📂sprites
 ┃ ┃ ┗ 📜pokedex.csv
 ┣ 📂docs
 ┃ ┣ 📂papers
 ┃ ┗ 📂study
 ┣ 📂figures
 ┃ ┣ 📂model_results_figures
 ┃ ┣ 📂notebook_figures
 ┃ ┗ 📂readme_figures
 ┣ 📂models
 ┃ ┗ 📜.gitkeep
 ┣ 📂notebooks
 ┃ ┣ 📜00-Intro-and-Analysis.ipynb
 ┃ ┣ 📜01-Dataset-Creation.ipynb
 ┃ ┣ 📜02-Diffusion-Model-Architecture.ipynb
 ┃ ┣ 📜03-Diffusion-Process.ipynb
 ┃ ┗ 📜04-Training-Diffusion-Model.ipynb
 ┣ 📂src
 ┃ ┣ 📂data
 ┃ ┃ ┣ 📜create_dataset.py
 ┃ ┃ ┣ 📜path_loader.py
 ┃ ┃ ┣ 📜preprocess.py
 ┃ ┃ ┗ 📜__init__.py
 ┃ ┣ 📂model
 ┃ ┃ ┣ 📜build_unet.py
 ┃ ┃ ┣ 📜diffusion.py
 ┃ ┃ ┗ 📜__init__.py
 ┃ ┣ 📂utils
 ┃ ┃ ┣ 📜utils.py
 ┃ ┃ ┗ 📜__init__.py
 ┃ ┣ 📂visualization
 ┃ ┃ ┣ 📜visualize.py
 ┃ ┃ ┗ 📜__init__.py
 ┃ ┗ 📜__init__.py
 ┣ 📜.gitattributes
 ┣ 📜.gitignore
 ┣ 📜config.ini
 ┣ 📜LICENSE
 ┣ 📜README.md
 ┗ 📜setup.py

🚀 Prerequisites

This project contains dependencies outside the scope of python. Therefore, you need to perform additional steps.

It is recommended to use a Linux (Ubuntu) distribution for this project, since it is the most common OS for data science and artificial intelligence tasks and for that reason, NVIDIA GPU configurations are easier to set up.

Not only that, but also because it is the simplest way to configure and maintain the project code overtime since we will be using a Docker container, avoiding any compatibility issues with the OS and if the is any issue update or upgrade, it can be easily resolved by just rebuilding the container.

However, you can also use Windows with WSL2 or MacOS. The requirements for each OS are as follows:

Windows Linux (Ubuntu) recommended MacOS
  • macOS 12.0 or later (Get the latest beta)
  • Mac computer with Apple silicon or AMD GPUs
  • Python version 3.10 or later
  • Xcode command-line tools: xcode-select — install

  • Follow the configuration steps:

🔧 OS Configuration

1. NVIDIA GPU Configuration (Windows and Linux)


In order to use the GPU for training the model, you need to install the NVIDIA drivers, CUDA and cuDNN.

Even though the project is developed in TensorFlow and therefore not all CUDA and cuDNN versions are compatible with the version of TensorFlow used, for the GPU to work properly, the versions of CUDA and cuDNN and the NVIDIA drivers must be the most recent ones.

1.1 Install NVIDIA drivers:

Windows Linux (Ubuntu)
  • Download the latest NVIDIA drivers
    for your GPU on Windows from the NVIDIA website
  • Install the .exe file
  • and follow the instructions
  • Check the driver installation:
    nvidia-smi
  • Update and upgrade the system:
    sudo apt update && sudo apt upgrade
  • Remove previous NVIDIA installations:
    sudo apt autoremove nvidia* --purge
  • Check Ubuntu drivers devices:
    ubuntu-drivers devices
  • Install the recommended NVIDIA driver (its version is tagged with recommended):
    sudo apt-get install nvidia-driver-<driver_number>
  • Reboot the system:
    reboot
  • Check the driver installation:
    nvidia-smi

After these steps, when executing the nvidia-smi command, you should see the following output:

user@user:~$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   41C    P8             15W /   70W |      73MiB /   6144MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

1.2 Install CUDA toolkit:

Download and install the CUDA toolkit following the instructions for your OS, if you have any issues, visit the CUDA installation guide:

- Windows: Install CUDA toolkit on Windows
- WSL2: Install CUDA toolkit on WSL2
- Ubuntu: Install CUDA toolkit on Ubuntu

After that, open a terminal and run the following command to check the CUDA installation:

  • For WSL2 and Ubuntu:

    sudo apt install nvidia-cuda-toolkit # to avoid any issues with the CUDA installation
    nvcc --version # to check the CUDA version
  • For Windows:

    nvcc --version # to check the CUDA version

1.3 Install cuDNN:

Install cuDNN following the instructions for your OS, if you have any issues, visit the cuDNN installation guide:

- Windows (WSL2): Install cuDNN on Windows
- Ubuntu: Install cuDNN on Ubuntu

2. Windows Subsystem for Linux (WSL2) Configuration


After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Windows, you need to set up WSL2 to use the GPU for training the model. To do this, follow the steps below:

2.1 Conda Environment

We will use conda to manage the python environment. You can install it following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:

    # Create the environment
    conda create -n diffusion_env python=3.12 -y
    # Activate the environment
    conda activate diffusion_env

2.2 CUDA and cuDNN compatible versions

Since the model is implemented in TensorFlow, you need to install the versions of CUDA and cuDNN that are compatible with the version of TensorFlow you are using. For more information, visit the TensorFlow versions compatibility. For this project, since we are using TensorFlow 2.16.1, we need to install CUDA 12.3 and cuDNN 8.9, to do do so, just execute the following commands:

    # Install CUDA 12.3
    conda install nvidia/label/cuda-12.3.2::cuda-toolkit
    # Install cuDNN 8.9
    conda install -c conda-forge cudnn=8.9

And finally, set the environment variables to use the CUDA and cuDNN libraries every time the environment is activated:

    mkdir -p $CONDA_PREFIX/etc/conda/activate.d
    echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

2.3 External Dependencies

Once the environment is activated, you can install the external dependencies by running the following command:

pip install -e.

And you are ready to go!

3. Linux (Ubuntu) Configuration


After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Ubuntu, you can follow the same steps as in the Windows Subsystem for Linux (WSL2) Configuration section but having in mind that you are working on a Linux distribution it is recommended to use Docker to create a container with all the dependencies installed and avoid any compatibility and version issues.

WARNING: Docker set up approach is not recommended for WSL2 nor Windows, since the there are many issues regarding the CPU usage making it unworkable (more info).

3.1 Install the NVIDIA Container Toolkit

Follow the NVIDIA Container Toolkit Guide

After installing the NVIDIA Container Toolkit, you can check the installation by running the following command:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

If you get an error when checking the installation, just follow the next steps:

# Restart the Docker service
sudo systemctl restart docker

# Open the Docker configuration file of nvidia-container-runtime
sudo nano /etc/nvidia-container-runtime/config.toml

# Set no-cgroups = true
...
no-cgroups = true
...

# Save and close the file and check the installation again
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

3.2 Pull the tensorflow-gpu-jupyter image (Optional)

This image contains all the correct dependencies for TensorFlow with CUDA and cuDNN installed and a Jupyter notebook server to develop the project (if not, pull it will be automatically pulled in the next step). You can pull the image with the following command:

docker pull tensorflow/tensorflow:latest-gpu-jupyter

3.3 Build the container

Since the project has a Dev Container configuration file in .devcontainer folder, you just need to, in VSCode, open the project folder and click on the Reopen in Container button that appears in the bottom right corner of the window. Or you can do it at any time by opening the command palette with Ctrl+Shift+P and type Reopen in Container.


pop_up
Pop-up VSCode message

command palette
Command palette


This will pull the tensorflow-gpu-jupyter image if not pulled before and build a container using the custom Dockerfile for the project with all the dependencies needed.

In order to avoid possible issues with the container not detecting some versions of the libraries, just run the following command in the container terminal to install the external dependencies declared in the setup.py file:

pip install -e.

Finally, when running any Jupyter notebook, choose the python version that matches the one the image was built with. To check the python version, just run the following command in container terminal:

python --version

To this date, the image is built with python 3.11.0rc1, therefore you need to select the python 3.11.0 kernel in the Jupyter notebook.

And voilà! You have a container with all the dependencies installed and ready to go!:

After that, if any issue or problem arises, just rebuild the container using the command palette and selecting the Rebuild Container option.

4. MacOS Configuration


Finally, if you are going to develop the project on MacOS, you can follow the next steps based on TensorFlow Metal but adapting it to the project dependencies:

4.1 Conda Environment

We will follow the same first steps as in the Windows Subsystem for Linux (WSL2) Configuration section, since we are going to use a coda environment to manage the dependencies. Therefore, install miniconda following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:

    # Create the environment
    conda create -n diffusion_env python=3.12 -y
    
    # Activate the environment
    conda activate diffusion_env

    # Install external dependencies
    pip install -e.

4.2 TensorFlow for MacOS

TensorFlow does not support GPU acceleration on MacOS with CUDA and cuDNN, so you need to install the specific version for MacOS. To do so, just run the following command:

    pip install tensorflow-metal

Now you are ready to go!

📊 Data

As mentioned before, the dataset used in this project is the Pokémon sprite images from Kaggle.

The dataset contains +10,000 Pokémon sprites in PNG format (half of them are shiny variants) in 96x96 resolution from 898 Pokémon in different games, and their corresponding labels that may relate to their design in a CSV file. These aspects will be analysed deeper in the 00-Intro-and-Analysis.ipynb notebook.

🛠️ Usage

After following the steps described in the Prerequisites section, you can start using the project by running the notebooks in the notebooks folder. Which contain the whole process of the project from the dataset creation to the model training.

Before diving into the notebooks, take a look at the config.ini file in the root of the project and adapt it to your needs. This file will contain all the hyperparameters for the model training. Once done that, you can run the notebooks in the pre-established order where:

  • 00-Intro-and-Analysis.ipynb: Introduces the project and analyses the Pokémon sprites dataset and pokedex.csv file.

  • 01-Dataset-Creation.ipynb: Gives multiple choices to create the dataset for the model and offers a raw dataset to custom the dataset creation process. Finally, it saves the dataset in the data/processed/pokemon_tf_dataset folder as a Tensorflow Dataset.

  • 02-Diffusion-Model-Architecture.ipynb: Defines the model architecture Unet and explain the theory behind it.

  • 03-Diffusion-Process.ipynb: Defines and explain the diffusion functionalities for the model architecture: forward, reverse, sample and leaves the training process for the next notebook.

  • 04-Training-Diffusion-Model.ipynb: Defines and explains the training diffusion process and trains the model with the dataset created in the 01-Dataset-Creation.ipynb notebook.

📚 Resources

  • Resources and tutorials that have been found useful for this project are located in the /docs folder.

  • Conda environment installation and management: Conda documentation.

  • Docker installation and management: Docker documentation.

  • NVIDIA GPU configuration: NVIDIA documentation, CUDA installation guide, cuDNN installation guide.

  • TensorFlow installation: TensorFlow documentation.

  • Git LFS to upload large files into the repository:

    Git Large File Storage (LFS) replaces large files such as datasets, models or weights with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. For more info, visit: Git LFS repository.

    WARNING: Every account using Git Large File Storage receives 1 GiB of free storage and 1 GiB a month of free bandwidth, so in order to avoid any issues uploading heavy files, it is recommended to only upload the heavy files one at a time and do not commit other changes additionally.

🌱 Contributing

If you wish to make contributions to this project, please initiate the process by opening an issue or submitting a pull request that encapsulates your proposed modifications.

🗞️ License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Contact

Should you have any inquiries or require assistance, please do not hesitate to contact Alejandro Pequeño Lizcano.

Gotta create 'em all!