Skip to content
noco-ai edited this page Mar 4, 2024 · 23 revisions

Overview

Spell Book is a project to create a UI for interacting with different types of AI models, it focuses on LLMs and using them in conjunction with other AI models to create cool applications.

Installing

The easiest way to install Spell Book is to follow the directions in the Spell Book Docker README.md.

  • Default Username: admin
  • Default Password: admin

Features

AI Assistant

The AI assistant integrates large language models such as Llama 2 and Mixtral for engaging in dynamic conversations.

  • Realtime Code and Markdown Rendering: Renders code in highlighted blocks that can be easily copied out of the UI.
  • Function Calling and Model Routing: Can ask questions in chat session that invoke a function. These functions can call a third-party API or AI model to accomplish tasks outside a LLMs abilities, generating music or artwork for example.
  • Conversation Customization: Users can modify conversation settings to adjust generation and routing preferences, enhancing personalization.
  • Conversation Management: Features include the ability to save, access, and delete conversations, with each digital ally maintaining its own list for better organization.
  • Speech Recognition and TTS: Supports voice interactions and can provide vocal responses through xTTS.
  • Content Regeneration and Editing: Allows for regenerating or editing conversation turns to correct the dialogue or adjust the flow as needed.
  • Shortcut and Pinning Functions: Enables quick access to chat abilities or language models via shortcuts and allows for function or skill pinning for efficient future routing.

image

Digital Allies

Digital Allies allow for saving different preset configurations for interacting with LLMs to customize the chat experience. It's primary features include:

  • Wake Phrase: Can switch between allies by using their "wake phrase" at the beginning or end of a chat message, when combined with ASR this allows for switching between allies/conversation quickly.
  • Ally Voice: Assign a voice to the ally that the UI with use in conjunction xTTS to vocalize the LLM responses.
  • Generation Settings: Set all default generation settings that match the task you are trying to use the ally for.
  • Image Selection: Select avatar and background image for chat sessions with each ally. These images can be created with the Image Generation application.
  • Character Card: Define the personality of the ally of detailed system prompt instructions.
  • Conversation Tone: Give the model some input/output example pairs to give some in-context data to guide the model toward responses you are looking for.
  • Conversation Settings: Control all the conversation settings that will be defaulted to for new conversations with each ally.

image

Book Library

  • Character Extract: Extracts information about characters detailed in the book
  • Location Extract: Extracts information about locations detailed in the book
  • Q/A & Quiz: Extracts Q/A pairs from the book and creates a quiz from them
  • Book Summary: Creates a summary of the book
  • Art Generation: Generates artwork for characters and locations in the book

image

Chat Abilities

Also known as function calling when this is enabled the UI will attempt to decipher the intent of the user and call a TypeScript class that can preform tasks like image generation and find the real time weather for a city. As for v0.3.0 the abilities include.

  • Bing News: Uses Bing News API to search for current news stories, will attempt to read and summarize the news articles.
  • Current Weather: Uses Accuweater API to get the current conditions for a location by name
  • Dynamic Functions: The dynamic function ability allows you to create simple TypeScript functions using GPT-4 that are then stored and accessed when you ask similar questions, the OOB database includes 35+ generated function to solve common math problems, something small LLMs are terrible at. To use the skill just append the 🧪 the question and GPT-4 will try to generate a function to solve the problem instead of the language model handling it.
  • FTP Transfer: This chat ability can upload files from the workspace to a sever using the FTP protocol.
  • Image Analyzer: The image analyzer ability lets you run classification and object detection on images in the chat session. If no file name is given the ability will use the last image in the chat session.
  • Image Generator: This ability allows users to generate images based on prompts. The quality and accuracy of the generated images are dependent on the provided negative prompt and settings such as steps, guidance scale, height, and width.
  • Language Translator: The translator ability uses the Alma models to preform translation from one language to another. Specify the text you want to translate and the language you want to transfer it to, the model can correctly guess the input language reliably.
  • Music Generator: This ability uses the MusicGen models from Meta to create music clips up to 30 seconds based off a text prompt.
  • Telenyx SMS: The Telenyx SMS and MMS ability allows you to send outgoing SMS and MMS messages using the Telenyx API. This ability requires you have a Telenyx API key and outgoing phone number configured.
  • Text to Speech: The text to speech ability can take a text string as input and output a wav file with a human voice of the input text.

image

Image Generator

The image generator provides a simple UI for interacting directly with SD, SDXL and SDXL turbo models. The UI is very simple at this moment with small upgrades planned with each release until it is a more robust solution. Current features are:

  • Multi Model Generation: Send the same parameters and prompt to multiple running models giving the ability to compare results side by side.
  • DallE Support: Use DallE-2 or DallE-3 with a OpenAI API key if your system can't run SD models locally.
  • Compel Support: Uses compel under the hood to allow for prompts longer than 77 tokens.

image

Sound Studio

Interact directly with ASR, TTS and music generation models with this app. Currently Whisper, T5, xTTS and Bark models are supported. This UI is useful for testing ASR voice samples for giving a voice to digital allies.

  • Waveform Rendering: UI renders the waveform for the files and allows for playback.
  • Download Files: Download generated music or TTS files for processing outside the UI.
  • Microphone or Upload: Use your microphone or upload sound files in a wav format.

image

LLM Explorer

LLM explorer is a sandbox for testing LLM models for use in agent tasks and other tests where a rawer for of access to the models is desired. It has two tabs; Chat is useful for prompt engineering prototyping, Completion is useful for testing base models with no output alignment tuning.

  • Saved Sets: Save any number of sets in the chat sandbox for testing new models as they come out.
  • Multi Model Generation: Test prompt against multiple LLM models to allow for easy comparison of outputs.
  • Completion Stats: Outputs stats like tokens per seconds and token total counts to evaluate model speed and cost.

image

User Management

Create users and groups with controls on what models, chat abilities and applications each group can access.

image

Application Manager

The application manager allows administrators to enabled and disable applications and chat abilities for all users. The interface is simple with a description of the application or chat ability and an install/uninstall button. Some applications and chat abilities also have configuration values like API keys or default models to route to.