Skip to content

Using the question-answer dataset on Hugging Face to fine-tune ChatGPT and compare the fine-tuned model with original ChatGPT.

Notifications You must be signed in to change notification settings

Lizhecheng02/Custom-ChatGPT

Repository files navigation

This Repo is for fine-tuning ChatGPT to build custom ChatGPT.

Python Environment

1. Install Packages

pip install -r requirements.txt

2. Set Api Key

create_api_key

  • Copy it into .env file

​ Set OPENAI_API_KEY="Your API KEY"

3. Create Data For Fine-tuning

Here we use the question-answer dataset related to Ubuntu from Hugging Face. We will select 100 data from the training set for fine-tuning testing, and you can also modify it in the code.

The dataset link is here.

python create_data.py

After running it, you will see a train_data.jsonl file appearing in this directory, which is the data format required for fine-tuning ChatGPT.

4. Fine-tune ChatGPT

python finetune.py

After running the above code, you will obtain a parameter called job id in the final output. You need to copy this parameter to your notepad because it will be used later on.

It looks like "ftjob-k66N8QIGu1ehgQmGRIbhXUuD".

Next, log in to your personal OpenAI interface, where you can monitor the progress of the fine-tuning process. The link is https://platform.openai.com/finetune. You need to wait for few minutes to see the beginning of fine-tuning, and then you can see the loss curve.

process

5. Test New Model

Once you see the fine-tuning process has completed, you can start using your own ChatGPT. You have two methods to use it: one is to directly use through the chat interface at https://platform.openai.com/playground; alternatively, you can also use the id of the new model for local testing.

  • Chat Interface

choose_model

  • Local Test
python test.py

During the running process, you will receive a new_model_id. You also need to save it because we will use this variable name in the next step.

6. Compare Two Models

Now we can test the differences between the original ChatGPT and the newly fine-tuned model. You need to pass the previously saved new_model_id into the compare.py file.

python compare.py

We select some data from the test set of the dataset we initially chose for training, and store the labeled answers, original ChatGPT responses, and responses from our fine-tuned model in a json file and a csv file together. After the run completes, we will obtain two files named results.json and results.csv.

Hope this Repo can help you.

About

Using the question-answer dataset on Hugging Face to fine-tune ChatGPT and compare the fine-tuned model with original ChatGPT.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages