Update to readme and added application notes #168 #178

mofosyne · 2024-01-07T03:57:06Z

Issue Ticket: #168

Added recommended path convention for installation as well as application notes.

This commit is based on jart recommendation regarding llamafile convention.
This is her quote that this is based on:

I want to enable people to integrate with llamafile any way they like.
In terms of recommendations and guidance, I've been following
TheBloke's naming convention when publishing llamafiles to Hugging
Face https://huggingface.co/jartine I also always use the llamafile
tag. So what I'd recommend applications do, is iterate all the files
tagged llamafile on Hugging Face to present those as choices to the
user for LLMs. Be sure to display which user is publishing them, and
sort by heart count. Then, when you download them, feel free to put
them in ~/.llamafile. Then, to show the users which models are
installed, you just look for ~/.llamafile/*.llamafile.

APPLICATION.md

mofosyne · 2024-01-08T15:25:14Z

Okay revised the readmes based on your suggestion. Also spent some time studying how models naming convention is currenting working in the field and how it's defined in llama.cpp . There is likely issues with the "Llamafile Naming Convention" section but everything else should hopefully be addressed now.

mofosyne · 2024-01-08T15:34:35Z

If we settle on <Model>-<Version>-<Parameters>-<Quantization>.llamafile we may want to adjust the file creation process to enforce this. Maybe have a quiz? Maybe extract as much as possible from the GGUF file format metadata?

At least according to https://github.com/ggerganov/ggml/blob/master/docs/gguf.md you can get the ggml_type (e.g. Q6_K or F32), but according to gguf_tensor_info_t... you can have multiple different mix of tensor type as per gguf_tensor_info_t tensor_infos[header.tensor_count]? In that case... we gotta figure how to deal with the naming scheme if we have multiple types in one model?

mofosyne · 2024-01-14T02:28:06Z

Balloob founder of home assistant on what he would require LLM container to do

I would love to see a standardized API for local LLMs that is not just a 1:1 copying the ChatGPT API. For example, as Home Assistant talks to a random model, we should be able to query that model to see what the model is capable off.

Is this achievable by adding Key Values to the GGUF? And maybe accessible via something like llmbot.llamafile --get-metadata capabilities or something?

I want to see local LLMs with support for a feature similar or equivalent to OpenAI functions. We cannot include all possible information in the prompt and we need to allow LLMs to make actions to be useful. Constrained grammars do look like an possible alternative. Creating a prompt to write JSON is possible but need quite an elaborate prompt and even then the LLM can make errors. We want to make sure that all JSON coming out of the model is directly actionable without having to ask the LLM what they might have meant for a specific value.

Having a recommended way to easily constraint output to Json would help in the application notes.

As a user of Home Assistant, I would want to easily be able to try out different AI models with a single click from the user interface.

Home Assistant allows users to install add-ons which are Docker containers + metadata. This is how today users install Whisper or Piper for STT and TTS. Both these engines have a wrapper that speaks Wyoming, our voice assistant standard to integrate such engines, among other things. (https://github.com/rhasspy/rhasspy3/blob/master/docs/wyoming.md)

If we rely on just the ChatGPT API to allow interacting with a model, we wouldn't know what capabilities the model has and so can't know what features to use to get valid JSON actions out. Can we pass our function definitions or should we extend the prompt with instructions on how to generate JSON?

mofosyne · 2024-04-05T01:15:31Z

Just did a rebase to keep this PR up to date with main

mofosyne · 2024-04-05T15:43:12Z

While rebasing ggerganov/llama.cpp#4858 , decided to review my naming convention proposal and noticed that mixtral has a new naming approach for their model like 8x7B to indicate 8 experts 7B quant.

I've added the new addition to both the llama.cpp default filename PR and also updated the readme notes in this repo's PR as well.

mofosyne · 2024-05-13T04:09:24Z

ggerganov/llama.cpp#7165 now merged in so <Model>-<Version>-<ExpertsCount>x<Parameters>-<Quantization>.gguf is now more canonical.

Added recommended path convention for installation as well as application notes. This commit is based on jart recommendation regarding llamafile convention. This is her quote that this is based on: > I want to enable people to integrate with llamafile any way they like. > In terms of recommendations and guidance, I've been following > TheBloke's naming convention when publishing llamafiles to Hugging > Face https://huggingface.co/jartine I also always use the llamafile > tag. So what I'd recommend applications do, is iterate all the files > tagged llamafile on Hugging Face to present those as choices to the > user for LLMs. Be sure to display which user is publishing them, and > sort by heart count. Then, when you download them, feel free to put > them in ~/.llamafile. Then, to show the users which models are > installed, you just look for ~/.llamafile/*.llamafile.

mofosyne · 2024-05-13T04:22:49Z

rebase to be on top of latest changes and squash all the other fixup commits. Did another review to make sure the doc matches with the now merged in change to llama.cpp convert.py

Use https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention as the canonical reference.

mofosyne · 2024-05-18T07:21:07Z

Updated to use https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention as the canonical reference for llamafile filename convention.

On a side note what generates  that I see occasionally in huggingface model cards?

jart reviewed Jan 8, 2024

View reviewed changes

APPLICATION.md Outdated Show resolved Hide resolved

APPLICATION.md Outdated Show resolved Hide resolved

APPLICATION.md Outdated Show resolved Hide resolved

mofosyne pushed a commit to mofosyne/llamafile that referenced this pull request Jan 9, 2024

fixed typo (Mozilla-Ocho#178)

27944c4

mofosyne force-pushed the readme-instaling-a-llamafile branch from 8f310c4 to 131432e Compare April 5, 2024 01:14

jart force-pushed the main branch 2 times, most recently from 622924c to 9cf7363 Compare April 30, 2024 03:35

mofosyne force-pushed the readme-instaling-a-llamafile branch from 9503aea to 3206f27 Compare May 13, 2024 04:21

Update APPLICATION.md

0f217c9

Use https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#gguf-naming-convention as the canonical reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to readme and added application notes #168 #178

Update to readme and added application notes #168 #178

mofosyne commented Jan 7, 2024 •

edited

mofosyne commented Jan 8, 2024

mofosyne commented Jan 8, 2024 •

edited

mofosyne commented Jan 14, 2024 •

edited

mofosyne commented Apr 5, 2024

mofosyne commented Apr 5, 2024

mofosyne commented May 13, 2024

mofosyne commented May 13, 2024

mofosyne commented May 18, 2024 •

edited

Update to readme and added application notes #168 #178

Are you sure you want to change the base?

Update to readme and added application notes #168 #178

Conversation

mofosyne commented Jan 7, 2024 • edited

mofosyne commented Jan 8, 2024

mofosyne commented Jan 8, 2024 • edited

mofosyne commented Jan 14, 2024 • edited

mofosyne commented Apr 5, 2024

mofosyne commented Apr 5, 2024

mofosyne commented May 13, 2024

mofosyne commented May 13, 2024

mofosyne commented May 18, 2024 • edited

mofosyne commented Jan 7, 2024 •

edited

mofosyne commented Jan 8, 2024 •

edited

mofosyne commented Jan 14, 2024 •

edited

mofosyne commented May 18, 2024 •

edited