Releases · nomic-ai/llama.cpp

21 Feb 19:16

7d4ced8

b2245 Latest

Latest

kompute : add gemma, phi-2, qwen2, and stablelm to whitelist

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Assets 14

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-02-21T19:16:55Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-02-21T19:17:03Z
llama-b2245-bin-win-avx-x64.zip

3.49 MB 2024-02-21T19:17:13Z
llama-b2245-bin-win-avx2-x64.zip

3.46 MB 2024-02-21T19:17:13Z
llama-b2245-bin-win-avx512-x64.zip

3.47 MB 2024-02-21T19:17:14Z
llama-b2245-bin-win-clblast-x64.zip

4.66 MB 2024-02-21T19:17:15Z
llama-b2245-bin-win-cublas-cu11.7.1-x64.zip

18.3 MB 2024-02-21T19:17:16Z
llama-b2245-bin-win-cublas-cu12.2.0-x64.zip

18.4 MB 2024-02-21T19:17:17Z
llama-b2245-bin-win-kompute-x64.zip

4.28 MB 2024-02-21T19:17:18Z
llama-b2245-bin-win-noavx-x64.zip

3.48 MB 2024-02-21T19:17:19Z
Source code (zip)

2024-02-21T18:46:18Z
Source code (tar.gz)

2024-02-21T18:46:18Z

13 Feb 22:59

github-actions

b2023

822a9c8

b2023

Early return for zero size calls to get_tensor. (#5482)

* Early return for zero size calls to get_tensor.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml-kompute.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Add an early return to the get/set tensor when the size is null.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Early return after the assertions.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

* Since we do the early return in the generic backend now no reason to do so here as well.

Signed-off-by: Adam Treat <treat.adam@gmail.com>

---------

Signed-off-by: Adam Treat <treat.adam@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Assets 13

13 Feb 22:58

github-actions

b2022

96df17b

b2022

kompute : make partial tensor copies faster by syncing less data (#15)

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Assets 13

12 Feb 18:33

github-actions

b2021

7118c15

b2021

kompute : do not list Intel GPUs as they are unsupported (#14)

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Assets 13

05 Feb 21:37

github-actions

b2020

315102f

b2020

kompute : disable GPU offload for Mixtral

We haven't implemented the necessary GPU kernels yet.

Fixes this crash:

ggml_vk_graph_compute: error: unsupported op 'ARGSORT'
GGML_ASSERT: /home/jared/src/forks/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml-kompute.cpp:1508: !"unsupported op"

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Assets 13

02 Feb 19:20

github-actions

b2019

06ba998

b2019

common : remove llama_token_to_piece for compatibility with hack

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Assets 13

10 Jan 17:40

github-actions

b1782

15f30fe

b1782

Merge branch 'ceb/nomic-vulkan' into nomic

Assets 12

08 Jan 21:25

github-actions

b1780

8eb893c

b1780

Merge branch 'ceb/nomic-vulkan' into nomic

Assets 12

08 Jan 20:19

github-actions

b1720

641dcd6

b1720

Merge branch 'ceb/nomic-vulkan' into nomic

Assets 12

11 Dec 18:26

github-actions

b1641

3cd9532

b1641

kompute : fix -Wunused-private-field warnings from clang

Fixes nomic-ai/gpt4all#1722

Assets 12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: nomic-ai/llama.cpp

b2245

b2023

b2022

b2021

b2020

b2019

b1782

b1780

b1720

b1641