Local Llama 3 Chat 🦙

A simple chat interface to run the Llama 3 model locally using OpenVINO Runtime for inference, transformers library for tokenization and Flask for the chat interface.

Quickstart with Docker
Requirements
Model Export
Getting Started
Export from HuggingFace

Quickstart with Docker

Install docker.
Build the docker image with the following command. The source files and model weights are pulled using git, requiring an active internet connection.
```
docker build -t chat-llama .
```
You can optionally pass the --no-cache flag to build with the latest upstream changes.
Start the container using:
```
docker run -p 5000:5000 chat-llama
```
This should start the Flask dev server available on http://localhost:5000

Requirements

Python 3.11

Model Export

To download the original model weights from HuggingFace, visit the HuggingFace model page and accept their License. Once your request has been accepted, use huggingface-cli to login to your HuggingFace account in your current runtime with the following command:

huggingface-cli login

For the INT-4 quantized Meta-Llama-3-8B-Instruct model already converted to the OpenVINO IR format from HuggingFace, you can use the following command:
```
huggingface-cli download rajatkrishna/Meta-Llama-3-8B-Instruct-OpenVINO-INT4 --local-dir models/llama-3-instruct-8b
```

Getting Started

Clone the repository

git clone https://github.com/rajatkrishna/llama3-openvino

Create a new virtual environment to avoid dependency conflicts:
```
python3 -m venv create .env
source .env/bin/activate
```
Install the dependencies in requirements.txt
```
pip install -r requirements.txt
```
Start the flask server from the project root using
```
python3 -m flask run
```

Export from HuggingFace

To export the meta-llama/Meta-Llama-3-8B-Instruct model quantized to INT-8 format yourself using optimum-intel CLI, install the requirements in requirements_export.txt:
```
pip install -r requirements_export.txt
```
Then run the following from the project root:
```
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int8 models/llama-3-instruct-8b
```

Alternately, use the following steps to export the INT-4 quantized model using the Python API:

Import the dependencies:

>>> from optimum.intel.openvino import OVWeightQuantizationConfig, OVModelForCausalLM
>>> from transformers import AutoTokenizer

Load the model using OVModelForCausalLM class. Set export=True to export the model on the fly.

>>> export_path = "models/llama-3-instruct-8b"
>>> q_config = OVWeightQuantizationConfig(bits=4, sym=True, group_size=128)
>>> model = OVModelForCausalLM.from_pretrained(model_name, export=True, quantization_config=q_config)
>>> model.save_pretrained(export_path)

Now use AutoTokenizer to save the tokenizer.

>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> tokenizer.save_pretrained(export_path)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
export.py		export.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
requirements_export.txt		requirements_export.txt
tailwind.config.js		tailwind.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app

app

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

export.py

export.py

package-lock.json

package-lock.json

package.json

package.json

requirements.txt

requirements.txt

requirements_export.txt

requirements_export.txt

tailwind.config.js

tailwind.config.js

Repository files navigation

Local Llama 3 Chat 🦙

Quickstart with Docker

Requirements

Model Export

Getting Started

Export from HuggingFace

About

Releases

Packages

Languages

License

rajatkrishna/chat-llama3

Folders and files

Latest commit

History

Repository files navigation

Local Llama 3 Chat 🦙

Quickstart with Docker

Requirements

Model Export

Getting Started

Export from HuggingFace

About

Topics

Resources

License

Stars

Watchers

Forks

Languages