Skip to content

Commit

Permalink
Merge branch 'Mozilla-Ocho:main' into readme-instaling-a-llamafile
Browse files Browse the repository at this point in the history
  • Loading branch information
mofosyne committed May 13, 2024
2 parents 6918b30 + d4099fe commit 9503aea
Show file tree
Hide file tree
Showing 176 changed files with 37,155 additions and 23,609 deletions.
45 changes: 0 additions & 45 deletions .github/workflows/ci.yml

This file was deleted.

2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,15 @@ install: llamafile/zipalign.1 \
llama.cpp/perplexity/perplexity.1 \
llama.cpp/llava/llava-quantize.1 \
o/$(MODE)/llamafile/zipalign \
o/$(MODE)/llamafile/tokenize \
o/$(MODE)/llama.cpp/main/main \
o/$(MODE)/llama.cpp/imatrix/imatrix \
o/$(MODE)/llama.cpp/quantize/quantize \
o/$(MODE)/llama.cpp/perplexity/perplexity \
o/$(MODE)/llama.cpp/llava/llava-quantize
mkdir -p $(PREFIX)/bin
$(INSTALL) o/$(MODE)/llamafile/zipalign $(PREFIX)/bin/zipalign
$(INSTALL) o/$(MODE)/llamafile/tokenize $(PREFIX)/bin/llamafile-tokenize
$(INSTALL) o/$(MODE)/llama.cpp/main/main $(PREFIX)/bin/llamafile
$(INSTALL) o/$(MODE)/llama.cpp/imatrix/imatrix $(PREFIX)/bin/llamafile-imatrix
$(INSTALL) o/$(MODE)/llama.cpp/quantize/quantize $(PREFIX)/bin/llamafile-quantize
Expand Down
38 changes: 20 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ chmod +x llava-v1.5-7b-q4.llamafile
5. Run the llamafile. e.g.:

```sh
./llava-v1.5-7b-q4.llamafile -ngl 9999
./llava-v1.5-7b-q4.llamafile
```

6. Your browser should open automatically and display a chat interface.
Expand Down Expand Up @@ -88,7 +88,7 @@ README](llama.cpp/server/README.md#api-endpoints).
<summary>Curl API Client Example</summary>

The simplest way to get started using the API is to copy and paste the
following curl command into your terminal.
following curl comd into your terminal.

```shell
curl http://localhost:8080/v1/chat/completions \
Expand Down Expand Up @@ -185,33 +185,35 @@ ChatCompletionMessage(content='There once was a programmer named Mike\nWho wrote
We also provide example llamafiles for other models, so you can easily
try out llamafile with different kinds of LLMs.

| Model | Size | License | llamafile |
| --- | --- | --- | --- |
| LLaVA 1.5 | 3.97 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [llava-v1.5-7b-q4.llamafile](https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile?download=true) |
| Mistral-7B-Instruct | 5.15 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [mistral-7b-instruct-v0.2.Q5\_K\_M.llamafile](https://huggingface.co/jartine/Mistral-7B-Instruct-v0.2-llamafile/resolve/main/mistral-7b-instruct-v0.2.Q5_K_M.llamafile?download=true) |
| Mixtral-8x7B-Instruct | 30.03 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [mixtral-8x7b-instruct-v0.1.Q5\_K\_M.llamafile](https://huggingface.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile?download=true) |
| WizardCoder-Python-34B | 22.23 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [wizardcoder-python-34b-v1.0.Q5\_K\_M.llamafile](https://huggingface.co/jartine/WizardCoder-Python-34B-V1.0-llamafile/resolve/main/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile?download=true) |
| WizardCoder-Python-13B | 7.33 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [wizardcoder-python-13b.llamafile](https://huggingface.co/jartine/wizardcoder-13b-python/resolve/main/wizardcoder-python-13b.llamafile?download=true) |
| TinyLlama-1.1B | 0.76 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [TinyLlama-1.1B-Chat-v1.0.Q5\_K\_M.llamafile](https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile?download=true) |
| Rocket-3B | 1.89 GB | [cc-by-sa-4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) | [rocket-3b.Q5\_K\_M.llamafile](https://huggingface.co/jartine/rocket-3B-llamafile/resolve/main/rocket-3b.Q5_K_M.llamafile?download=true) |
| Phi-2 | 1.96 GB | [MIT](https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE) | [phi-2.Q5\_K\_M.llamafile](https://huggingface.co/jartine/phi-2-llamafile/resolve/main/phi-2.Q5_K_M.llamafile?download=true) |
| Model | Size | License | llamafile | other quants |
| --- | --- | --- | --- | --- |
| LLaVA 1.5 | 3.97 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [llava-v1.5-7b-q4.llamafile](https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/llava-v1.5-7b-q4.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/llava-v1.5-7B-GGUF) |
| TinyLlama-1.1B | 2.05 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [TinyLlama-1.1B-Chat-v1.0.F16.llamafile](https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/TinyLlama-1.1B-Chat-v1.0.F16.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF) |
| Mistral-7B-Instruct | 3.85 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [mistral-7b-instruct-v0.2.Q4\_0.llamafile](https://huggingface.co/jartine/Mistral-7B-Instruct-v0.2-llamafile/resolve/main/mistral-7b-instruct-v0.2.Q4_0.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/Mistral-7B-Instruct-v0.2-llamafile) |
| Phi-3-mini-4k-instruct | 7.67 GB | [Apache 2.0](https://huggingface.co/jartine/Phi-3-mini-4k-instruct-llamafile/blob/main/LICENSE) | [Phi-3-mini-4k-instruct.F16.llamafile](https://huggingface.co/jartine/Phi-3-mini-4k-instruct-llamafile/resolve/main/Phi-3-mini-4k-instruct.F16.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/Phi-3-mini-4k-instruct-llamafile) |
| Mixtral-8x7B-Instruct | 30.03 GB | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) | [mixtral-8x7b-instruct-v0.1.Q5\_K\_M.llamafile](https://huggingface.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile) |
| WizardCoder-Python-34B | 22.23 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [wizardcoder-python-34b-v1.0.Q5\_K\_M.llamafile](https://huggingface.co/jartine/WizardCoder-Python-34B-V1.0-llamafile/resolve/main/wizardcoder-python-34b-v1.0.Q5_K_M.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/WizardCoder-Python-34B-V1.0-llamafile) |
| WizardCoder-Python-13B | 7.33 GB | [LLaMA 2](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) | [wizardcoder-python-13b.llamafile](https://huggingface.co/jartine/wizardcoder-13b-python/resolve/main/wizardcoder-python-13b.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/wizardcoder-13b-python) |
| LLaMA-3-Instruct-70B | 37.25 GB | [llama3](https://huggingface.co/jartine/Meta-Llama-3-8B-Instruct-llamafile/blob/main/Meta-Llama-3-Community-License-Agreement.txt) | [Meta-Llama-3-70B-Instruct.Q4\_0.llamafile](https://huggingface.co/jartine/Meta-Llama-3-70B-Instruct-llamafile/resolve/main/Meta-Llama-3-70B-Instruct.Q4_0.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/Meta-Llama-3-70B-Instruct-llamafile) |
| LLaMA-3-Instruct-8B | 5.37 GB | [llama3](https://huggingface.co/jartine/Meta-Llama-3-8B-Instruct-llamafile/blob/main/Meta-Llama-3-Community-License-Agreement.txt) | [Meta-Llama-3-8B-Instruct.Q5\_K\_M.llamafile](https://huggingface.co/jartine/Meta-Llama-3-8B-Instruct-llamafile/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/Meta-Llama-3-8B-Instruct-llamafile) |
| Rocket-3B | 1.89 GB | [cc-by-sa-4.0](https://creativecommons.org/licenses/by-sa/4.0/deed.en) | [rocket-3b.Q5\_K\_M.llamafile](https://huggingface.co/jartine/rocket-3B-llamafile/resolve/main/rocket-3b.Q5_K_M.llamafile?download=true) | [See HF repo](https://huggingface.co/jartine/rocket-3B-llamafile) |

Here is an example for the Mistral command-line llamafile:

```sh
./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --temp 0.7 -p '[INST]Write a story about llamas[/INST]'
./mistral-7b-instruct-v0.2.Q5_K_M.llamafile --temp 0.7 -p '[INST]Write a story about llamas[/INST]'
```

And here is an example for WizardCoder-Python command-line llamafile:

```sh
./wizardcoder-python-13b.llamafile -ngl 9999 --temp 0 -e -r '```\n' -p '```c\nvoid *memcpy_sse2(char *dst, const char *src, size_t size) {\n'
./wizardcoder-python-13b.llamafile --temp 0 -e -r '```\n' -p '```c\nvoid *memcpy_sse2(char *dst, const char *src, size_t size) {\n'
```

And here's an example for the LLaVA command-line llamafile:

```sh
./llava-v1.5-7b-q4.llamafile -ngl 9999 --temp 0.2 --image lemurs.jpg -e -p '### User: What do you see?\n### Assistant:'
./llava-v1.5-7b-q4.llamafile --temp 0.2 --image lemurs.jpg -e -p '### User: What do you see?\n### Assistant:'
```

As before, macOS, Linux, and BSD users will need to use the "chmod"
Expand Down Expand Up @@ -281,7 +283,7 @@ For Windows users, here's an example for the Mistral LLM:
```sh
curl -L -o llamafile.exe https://github.com/Mozilla-Ocho/llamafile/releases/download/0.6/llamafile-0.6
curl -L -o mistral.gguf https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf
./llamafile.exe -m mistral.gguf -ngl 9999
./llamafile.exe -m mistral.gguf
```

Windows users may need to change `./llamafile.exe` to `.\llamafile.exe`
Expand Down Expand Up @@ -438,7 +440,7 @@ llama.cpp command line interface, utilizing WizardCoder-Python-13B
weights:

```sh
llamafile -ngl 9999 \
llamafile \
-m wizardcoder-python-13b-v1.0.Q8_0.gguf \
--temp 0 -r '}\n' -r '```\n' \
-e -p '```c\nvoid *memcpy(void *dst, const void *src, size_t size) {\n'
Expand Down Expand Up @@ -589,7 +591,7 @@ that describes the changes, and mention it in your Hugging Face commit.

## Documentation

There's a man page for each of the llamafile programs installed when you
There's a manual page for each of the llamafile programs installed when you
run `sudo make install`. The command manuals are also typeset as PDF
files that you can download from our GitHub releases page. Lastly, most
commands will display that information when passing the `--help` flag.
Expand Down
14 changes: 8 additions & 6 deletions build/config.mk
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#── vi: set noet ft=make ts=8 sw=8 fenc=utf-8 :vi ────────────────────┘

PREFIX = /usr/local
COSMOCC = .cosmocc/3.3.3
COSMOCC = .cosmocc/3.3.6
TOOLCHAIN = $(COSMOCC)/bin/cosmo

AR = $(TOOLCHAIN)ar
Expand All @@ -13,9 +13,9 @@ MKDEPS = $(COSMOCC)/bin/mkdeps
INSTALL = install

ARFLAGS = rcsD
CCFLAGS = -g -O3 -fexceptions
CPPFLAGS_ = -iquote. -mcosmo -DGGML_MULTIPLATFORM -Wno-attributes
TARGET_ARCH = -Xx86_64-mavx -Xx86_64-mtune=alderlake
CCFLAGS = -g -O3 -fexceptions -fsignaling-nans
CPPFLAGS_ = -iquote. -mcosmo -DGGML_MULTIPLATFORM -Wno-attributes -DLLAMAFILE_DEBUG
TARGET_ARCH = -Xx86_64-mavx -Xx86_64-mtune=znver4

TMPDIR = o//tmp
IGNORE := $(shell mkdir -p $(TMPDIR))
Expand Down Expand Up @@ -50,5 +50,7 @@ clean:; rm -rf o
.PHONY: distclean
distclean:; rm -rf o .cosmocc

.cosmocc/3.3.3:
build/download-cosmocc.sh $@ 3.3.3 e4d0fa63cd79cc3bfff6c2d015f1776db081409907625aea8ad40cefc1996d08
.cosmocc/3.3.6:
build/download-cosmocc.sh $@ 3.3.6 26e3449357f31b82489774ef5c2d502a711bb711d4faf99a5fd6c96328a1c205


20 changes: 19 additions & 1 deletion llama.cpp/BUILD.mk
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,11 @@ include llama.cpp/imatrix/BUILD.mk
include llama.cpp/quantize/BUILD.mk
include llama.cpp/perplexity/BUILD.mk

$(LLAMA_CPP_OBJS): private CCFLAGS += -DGGML_MULTIPLATFORM
$(LLAMA_CPP_OBJS): private \
CCFLAGS += \
-DNDEBUG \
-DGGML_MULTIPLATFORM \
-DGGML_USE_LLAMAFILE

o/$(MODE)/llama.cpp/ggml-alloc.o \
o/$(MODE)/llama.cpp/ggml-backend.o \
Expand All @@ -39,6 +43,20 @@ o/$(MODE)/llama.cpp/ggml-alloc.o \
o/$(MODE)/llama.cpp/common.o: private \
CCFLAGS += -Os

o/$(MODE)/llama.cpp/ggml-quants.o: private CXXFLAGS += -Os
o/$(MODE)/llama.cpp/ggml-quants-amd-avx.o: private TARGET_ARCH += -Xx86_64-mtune=sandybridge
o/$(MODE)/llama.cpp/ggml-quants-amd-avx2.o: private TARGET_ARCH += -Xx86_64-mtune=skylake -Xx86_64-mf16c -Xx86_64-mfma -Xx86_64-mavx2
o/$(MODE)/llama.cpp/ggml-quants-amd-avx512.o: private TARGET_ARCH += -Xx86_64-mtune=cannonlake -Xx86_64-mf16c -Xx86_64-mfma -Xx86_64-mavx2 -Xx86_64-mavx512f

o/$(MODE)/llama.cpp/ggml-vector.o: private CXXFLAGS += -Os
o/$(MODE)/llama.cpp/ggml-vector-amd-avx.o: private TARGET_ARCH += -Xx86_64-mtune=sandybridge
o/$(MODE)/llama.cpp/ggml-vector-amd-fma.o: private TARGET_ARCH += -Xx86_64-mtune=bdver2 -Xx86_64-mfma
o/$(MODE)/llama.cpp/ggml-vector-amd-f16c.o: private TARGET_ARCH += -Xx86_64-mtune=ivybridge -Xx86_64-mf16c
o/$(MODE)/llama.cpp/ggml-vector-amd-avx2.o: private TARGET_ARCH += -Xx86_64-mtune=skylake -Xx86_64-mf16c -Xx86_64-mfma -Xx86_64-mavx2
o/$(MODE)/llama.cpp/ggml-vector-amd-avx512.o: private TARGET_ARCH += -Xx86_64-mtune=cannonlake -Xx86_64-mf16c -Xx86_64-mfma -Xx86_64-mavx2 -Xx86_64-mavx512f
o/$(MODE)/llama.cpp/ggml-vector-amd-avx512bf16.o: private TARGET_ARCH += -Xx86_64-mtune=znver4 -Xx86_64-mf16c -Xx86_64-mfma -Xx86_64-mavx2 -Xx86_64-mavx512f -Xx86_64-mavx512vl -Xx86_64-mavx512bf16
o/$(MODE)/llama.cpp/ggml-vector-arm82.o: private TARGET_ARCH += -Xaarch64-march=armv8.2-a+fp16

$(LLAMA_CPP_OBJS): llama.cpp/BUILD.mk

.PHONY: o/$(MODE)/llama.cpp
Expand Down
8 changes: 4 additions & 4 deletions llama.cpp/README.llamafile
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,23 @@ LICENSE
ORIGIN

https://github.com/ggerganov/llama.cpp/pull/4406/
fa046eafbc70bf97dcf39843af0323f19a8c9ac3
2024-03-22
c780e75305dba1f67691a8dc0e8bc8425838a452
2024-05-07

LOCAL MODIFICATIONS

- Count the number of cores correctly on Intel's Alderlake architecture
- Remove MAP_POPULATE because it makes mmap(tinyllama) block for 100ms
- Refactor ggml.c, llama.cpp, and llava to use llamafile_open() APIs
- Unify main, server, and llava-cli into single llamafile program
- Make cuBLAS / hipBLAS optional by introducing tinyBLAS library
- Add support to main() programs for Cosmo /zip/.args files
- Introduce pledge() SECCOMP sandboxing to improve security
- Call exit() rather than abort() when GGML_ASSERT() fails
- Clamp bf16/f32 values before passing to K quantizers
- Make GPU logger callback API safer and less generic
- Write log to /dev/null when main.log fails to open
- Use _rand64() rather than time() as default seed
- Make main and llava-cli print timings on ctrl-c
- Make emebeddings CLI program shell scriptable
- Avoid bind() conflicts on port 8080 w/ server
- Use runtime dispatching for matmul quants
- Remove operating system #ifdef statements
Expand Down

0 comments on commit 9503aea

Please sign in to comment.