May 2024 Binary Update (Take 2) #712

martindevans · 2024-04-30T14:13:55Z

This PR has been updated with a new set of binaries, which should hopefully fix the llava issue.

llama.cpp: https://github.com/ggerganov/llama.cpp/tree/a743d76a01f23038b2c85af1e9048ee836767b44
Build action: https://github.com/SciSharp/LLamaSharp/actions/runs/9017784838

Testing:

Previous version, for reference. No longer relevant.

llama.cpp: https://github.com/ggerganov/llama.cpp/tree/b8c1476e44cc1f3a1811613f65251cf779067636
Binaries created with: https://github.com/SciSharp/LLamaSharp/actions/runs/8886754252/job/24400788991

- llama.cpp: https://github.com/ggerganov/llama.cpp/tree/b8c1476e44cc1f3a1811613f65251cf779067636 - Build action: https://github.com/SciSharp/LLamaSharp/actions/runs/8886754252/job/24400788991

martindevans · 2024-04-30T14:29:22Z

New spell checker in action!

martindevans · 2024-04-30T15:06:24Z

~~Tests are failing because llama_token_is_eog seems to be returning true for every single token!~~ Fixed.

SignalRT · 2024-04-30T18:41:27Z

In my case Llama3 doesn´t work properly. This is the output with the example and 70b parameters:

Please input your model path (or ENTER for default): (/Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf): /Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf
The chat session has started.
What's the capital of Spain
The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The capital of Spain is Madrid.<|eot_id|><|start_hea......for ever.

The same test with 8B parameters:

Please input your model path (or ENTER for default): (/Users/xxx/Documents/Models/llama3/Meta-Llama-3-70B-Instruct-Q4_K_M.gguf): /Users/xxx/Documents/Models/llama3/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
The chat session has started.
What's the capital of Spain?
That's an easy one! The capital of Spain is Madrid.assistant

I'd be happy to help you with anything else you need! What's next on your list?assistant

Can you write a short paragraph about the importance of recycling?assistant

Here's a paragraph on the importance of recycling:

Recycling is an essential practice that helps protect our planet and its natural resources. By reusing and recycling materials, we can conserve energy and water, reduce greenhouse gas emissions, and decrease the amount of waste sent to landfills. Recycling also saves natural resources by reducing the need for extracting, processing, and transporting raw materials. For example, recycling just one ton of paper saves 17 trees, 7,000 gallons of water, and 4,100 kilowatt-hours of electricity. Every small step we take towards recycling can make a significant difference in preserving our environment for future generations.assistant

That's excellent! I'm glad you liked it. What else would you like me to write about?assistant

Can you help me with writing a letter to my pen pal in Japan? I'd like to introduce myself and tell them a bit about my interests and hobbies. Can you guide me through the process of writing the letter?assistant

I'd be happy to help you write a letter to your pen pal.....etc....etc...etc..

AsakusaRinne · 2024-04-30T18:46:57Z

@SignalRT Were you using the newly added LLaMA3 example or other common examples?

SignalRT · 2024-04-30T18:55:57Z

@AsakusaRinne The new example.

I will try to identify the problem.

martindevans · 2024-05-01T14:07:11Z

@SignalRT did you download a new llama3 file? Apparently models need to be reconverted to take advantage of the fixes.

SignalRT · 2024-05-01T14:14:19Z

When I get the problem I update the model and downloaded https://huggingface.co/lmstudio-community/Meta-Llama-3-70B-Instruct-GGUF/tree/main

If there is a model link that works for you, I could try it.

martindevans · 2024-05-01T14:19:15Z

There's one linked from here which should have the fixes: https://www.reddit.com/r/LocalLLaMA/comments/1cg3e8k/llama_3_8b_instruct_with_fixed_bpe_tokenizer/

AsakusaRinne · 2024-05-01T17:18:18Z

@SignalRT I tried this 8B model and it worked for me with the newly added LLaMA3 example. However I have no idea about the 70B model.

m0nsky · 2024-05-01T18:39:16Z

Tests are passing on Windows CUDA 12.

SignalRT · 2024-05-04T15:59:58Z

Test pass on MacOS. Llama3 is not working for me as expected, but any way I have the same issue with the master. Llava seems to work in my first test, but asking about more images it seems it doesn't work properly.

martindevans · 2024-05-04T16:59:50Z

Llava seems to work in my first test, but asking about more images it seems it doesn't work properly.

Is this a regression? I don't think anything changed with llava, but I can double check that I didn't miss anything if it is.

m0nsky · 2024-05-04T17:04:03Z

I'm running into the same issues, it does seem to work, just not as good as master. I keep getting things like:

Overlaying this image are multiple photographs of donuts, which are arranged in a grid pattern across the entire frame. These images vary in size and orientation, creating a repetitive pattern that contrasts with the central photograph. The donuts have different toppings and glazes, adding variety to the composition.

There are no donuts in any of my images by the way. It also keeps mentioning "patterns", "glitches" and "artifacts".

martindevans · 2024-05-04T17:08:41Z

There's no change to llava.h between the current version in master (https://github.com/ggerganov/llama.cpp/tree/f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7/examples/llava) and the version in this PR (https://github.com/ggerganov/llama.cpp/tree/b8c1476e44cc1f3a1811613f65251cf779067636/examples/llava) :/

SignalRT · 2024-05-04T17:09:15Z

After reviewing the status of llama.cpp master it seems that there is a problem: ggerganov/llama.cpp#7060 related with this PR: ggerganov/llama.cpp#6899 with the introduction of moondream vision

m0nsky · 2024-05-06T10:42:53Z

I upgraded some projects to this PR and Windows OpenCL is also working without issues. 👍

I have been trying to reproduce the llava issue in llava-cli in llama.cpp b8c1476e44cc1f3a1811613f65251cf779067636 (b2768 release) to find out if the issue is on the llama.cpp or llamasharp side, and I can not reproduce it there.

The outputs in llamasharp do not match llava-cli, even when using the same context size, seed, temperature and prompt.

Edit
I don't think it's prompt format related. Using:

[INST] <image>\nWhat is shown in this image? [/INST]

And

<image>\nUSER:\nProvide a full description of the image.\nASSISTANT:\n

Both have the exact same issue.

For example:

Overlaid on top of this are several images of donuts, which are likely meant to represent a "donut hole" effect due to their circular shape and the way they are arranged in rows. The overall composition is quite abstract and surreal, with the donuts creating an optical illusion that makes it difficult to discern the background scene clearly.

And a completely different image:

There's also a watermark or logo that reads "DONUTS," suggesting the image may have been taken by someone named Donuts or it could be a branding element.

@SignalRT the issue you linked definitely seems related, especially the things mentioned here and here match what I'm running in to, but that doesn't explain why they get it after "a few" responses while I get it on the first response in llamasharp.

Perhaps this binary update should not include that specific commit with moondream vision support (if we can revert it), or we should wait with merging this PR until their fix lands? To avoid a llamasharp release with broken llava.

martindevans · 2024-05-06T14:32:59Z

We don't have a way to remove a specific commit from llama.cpp in the build system, and I don't really want to have to add that! We'll have to wait until a fix lands upstream. If anyone sees a PR in llama.cpp that fixes this, please link it here.

SignalRT · 2024-05-08T19:50:37Z

@martindevans , this problem should be solved after the revert of clip.cpp changes.

ggerganov/llama.cpp@9da243b

- llama.cpp: a743d76a01f23038b2c85af1e9048ee836767b44 - https://github.com/SciSharp/LLamaSharp/actions/runs/9017784838

martindevans · 2024-05-09T13:45:51Z

@SignalRT @m0nsky @AsakusaRinne this PR has been updated with a new set of binaries. If you guys could test it out again that'd be appreciated!

m0nsky · 2024-05-09T14:25:42Z

LLava issues are solved, no more donuts. 🍩
My projects are running fine on Windows 10 on both OpenCL and CUDA12.

Ran unit tests again on CUDA12 just in case and all tests passed.

SignalRT · 2024-05-09T18:39:37Z

@martindevans Llava is working fine again with this binaries. The test pass also OK.

AsakusaRinne · 2024-05-10T06:15:25Z

@martindevans Could you please push an arbitrary commit to trigger the benchmark test? And I'll test it on my PC tonight.

…pattern they should all use.

AsakusaRinne · 2024-05-10T17:59:51Z

All things work well on Windows11+cuda11, Linux+cpu, Linux+cuda11 and Linux+cuda12. However there's problem in WSL2 with cuda11. Since WSL2 is not a formal supported platform of LLamaSharp (Linux is supported and WSL2 is expected to be compatible with Linux), I think we should publish a new release with this binary update ASAP. Instead I'll open an issue for the WSL2 problem. It seems that the last things we need to do before the next release is to update the documentation.

SignalRT · 2024-05-11T20:05:22Z

A note to remember my latest test:

With the llama.cpp binary version currently in this branch we don't support moondream multi-dimensional model (because they revert the PR to solve llava problems).

That problems were fixed in ggerganov/llama.cpp@d11afd6 and after that commit I checked that both llava and moondream work properly in LlamaSharp.

…aSharp into may-2024-binary-update

martindevans added 3 commits April 30, 2024 03:26

Updated binaries.

8746aa5

- llama.cpp: https://github.com/ggerganov/llama.cpp/tree/b8c1476e44cc1f3a1811613f65251cf779067636 - Build action: https://github.com/SciSharp/LLamaSharp/actions/runs/8886754252/job/24400788991

Implemented string KV overrides

25af52a

Updated submodule

9a069bd

martindevans changed the title ~~Updated binaries.~~ May 2024 Binary Update Apr 30, 2024

Fixed spelling

2d09c86

Fixed bool marshalling

f55222b

martindevans mentioned this pull request Apr 30, 2024

LLAMA 3 #679

Open

AsakusaRinne added backend native lib labels Apr 30, 2024

martindevans mentioned this pull request May 4, 2024

Llama Text Templater #715

Merged

New set of binaries:

68705c9

- llama.cpp: a743d76a01f23038b2c85af1e9048ee836767b44 - https://github.com/SciSharp/LLamaSharp/actions/runs/9017784838

martindevans changed the title ~~May 2024 Binary Update~~ May 2024 Binary Update (Take 2) May 9, 2024

AsakusaRinne added the benchmark Trigger benchmark workflow label May 10, 2024

Renamed one of the commented out pre tokenization items, showing the …

703df2e

…pattern they should all use.

AsakusaRinne approved these changes May 10, 2024

View reviewed changes

Merge branch 'master' into may-2024-binary-update

a5cfd83

martindevans added the minor-release label May 12, 2024

martindevans added 4 commits May 12, 2024 17:23

Updated readme

a414851

Merge branch 'may-2024-binary-update' of github.com:martindevans/LLam…

79efca3

…aSharp into may-2024-binary-update

Updated csproj

12afb12

Updated semantic kernel and kernel memory projects

52e4607

martindevans merged commit 9a6e8b5 into SciSharp:master May 12, 2024
6 of 7 checks passed

martindevans deleted the may-2024-binary-update branch May 12, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

May 2024 Binary Update (Take 2) #712

May 2024 Binary Update (Take 2) #712

martindevans commented Apr 30, 2024 •

edited by AsakusaRinne

martindevans commented Apr 30, 2024

martindevans commented Apr 30, 2024 •

edited

SignalRT commented Apr 30, 2024

AsakusaRinne commented Apr 30, 2024

SignalRT commented Apr 30, 2024

martindevans commented May 1, 2024

SignalRT commented May 1, 2024

martindevans commented May 1, 2024

AsakusaRinne commented May 1, 2024

m0nsky commented May 1, 2024

SignalRT commented May 4, 2024 •

edited

martindevans commented May 4, 2024 •

edited

m0nsky commented May 4, 2024

martindevans commented May 4, 2024

SignalRT commented May 4, 2024 •

edited

m0nsky commented May 6, 2024 •

edited

martindevans commented May 6, 2024

SignalRT commented May 8, 2024

martindevans commented May 9, 2024

m0nsky commented May 9, 2024 •

edited

SignalRT commented May 9, 2024

AsakusaRinne commented May 10, 2024

AsakusaRinne commented May 10, 2024 •

edited

SignalRT commented May 11, 2024

May 2024 Binary Update (Take 2) #712

May 2024 Binary Update (Take 2) #712

Conversation

martindevans commented Apr 30, 2024 • edited by AsakusaRinne

martindevans commented Apr 30, 2024

martindevans commented Apr 30, 2024 • edited

SignalRT commented Apr 30, 2024

AsakusaRinne commented Apr 30, 2024

SignalRT commented Apr 30, 2024

martindevans commented May 1, 2024

SignalRT commented May 1, 2024

martindevans commented May 1, 2024

AsakusaRinne commented May 1, 2024

m0nsky commented May 1, 2024

SignalRT commented May 4, 2024 • edited

martindevans commented May 4, 2024 • edited

m0nsky commented May 4, 2024

martindevans commented May 4, 2024

SignalRT commented May 4, 2024 • edited

m0nsky commented May 6, 2024 • edited

martindevans commented May 6, 2024

SignalRT commented May 8, 2024

martindevans commented May 9, 2024

m0nsky commented May 9, 2024 • edited

SignalRT commented May 9, 2024

AsakusaRinne commented May 10, 2024

AsakusaRinne commented May 10, 2024 • edited

SignalRT commented May 11, 2024

martindevans commented Apr 30, 2024 •

edited by AsakusaRinne

martindevans commented Apr 30, 2024 •

edited

SignalRT commented May 4, 2024 •

edited

martindevans commented May 4, 2024 •

edited

SignalRT commented May 4, 2024 •

edited

m0nsky commented May 6, 2024 •

edited

m0nsky commented May 9, 2024 •

edited

AsakusaRinne commented May 10, 2024 •

edited