Skip to content

A serverless set of functions for evaluating whether incoming messages to an LLM system seem to contain instances of prompt injection; uses cascading cosine similarity and ROUGLE-L calculation against known good and bad prompts

License

Notifications You must be signed in to change notification settings

rabbidave/Denzel-Crocker-Hunting-For-Fairly-Odd-Prompts

Repository files navigation

♫ The Dream of the 90's ♫ is alive in Portland "a weird suite of Enterprise LLM tools" named after Nicktoons

Utility 3) Denzel Crocker: Serverless & Event Driven inspection of messages for Prompt Injection; for use with Language Models

Denzel Crocker

Description:

A set of serverless functions designed to assist in the monitoring of inputs to language models, specifically inspection of messages for prompt injection and subsequent routing of messages to the appropriate SQS bus

Rationale:

  1. Large Language Models are subject to various forms of prompt injection (indirect or otherwise); lightweight and step-wise alerting of similar prompts compared to a baseline help your application stay secure
  2. User experience, instrumentation, and metadata capture are crucial to the adoption of LLMs for orchestration of multi-modal agentic systems; a high cosine similarity (with known bad prompts) paired with a low rouge-L (for known good prompts) allows for appropriate routing of messages

Intent:

The intent of this FAIRIES.py is to efficiently spin up, calculate needed values for evaluation, and inspect each message for prompt injection attacks; thereafter routing messages to the appropriate SQS bus (e.g. for building a master prompt, alerting, further inspection, etc)

The goal being to detect if the message has high similarity with known bad prompts and low similarity with known good prompts; via cascading cosine similarity and ROUGE-L calculation.

The ROUGE-L value is calculated intially from the baseline, and stored in memory. ROUGE-L is calculated for incoming messages only after comparing the cosine similarity of new messages in the dataframe to known bad prompts; when complete the function spins down appropriately.

The cosine similarity is used as a heuristic to detect similarity of inputs from incoming dataframes with known bad prompts (ostensibly to identify prompt injection), and the ROUGE-L score is used to more precisely compare the inputs with a baseline dataset of known good prompts; as a means of validating the assumption of the first function.

Based on the resultant calculations messages are routed to the appropriate SQS bus. Makes use of an open-source good/bad prompt dataset available on huggingface

Note: Needs logging and additional error-handling; this is mostly conceptual and assumes the use of environment variables rather than hard-coded values for cosine similarity & ROUGE-L

About

A serverless set of functions for evaluating whether incoming messages to an LLM system seem to contain instances of prompt injection; uses cascading cosine similarity and ROUGLE-L calculation against known good and bad prompts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages