Skip to content

pathwaycom/llm-app

Repository files navigation

pathwaycom/llm-app: Build your LLM App in 30 lines of code

LLM App

LICENSE Contributors

Linux macOS chat on Discord follow on Twitter

Pathway's LLM (Large Language Model) Apps allow you to quickly put in production AI applications which use the most up-to-date knowledge available in your data sources. You can directly run a 24/7 service to answer natural language queries about an ever-changing private document knowledge base, or run an LLM-powered data transformation pipeline on a data stream.

The Python application examples provided in this repo are ready-to-use. They can be run as Docker containers, and expose an HTTP API to the frontend. To allow quick testing and demos, most app examples also include an optional Streamlit UI which connects to this API. The apps rely on the Pathway framework for data source synchronization, for serving API requests, and for all low-latency data processing. The apps connect to document data sources on S3, Google Drive, Sharepoint, etc. with no infrastructure dependencies (such as a vector database) that would need a separate setup.

Quick links - 👀 Why use Pathway LLM Apps? 🚀 Watch it in action 📚 How it works 🌟 Application examples 🏁 Get Started 💼 Showcases 🛠️ Troubleshooting 👥 Contributing ⚙️ Hosted Version 💡 Need help?

Why use Pathway LLM Apps?

  1. Simplicity - Simplify your AI pipeline by consolidating capabilities into one platform. No need to integrate and maintain separate modules for your Gen AI app: Vector Database (e.g. Pinecone/Weaviate/Qdrant) + Cache (e.g. Redis) + API Framework (e.g. Fast API).
  2. Real-time data syncing - Sync both structured and unstructured data from diverse sources, enabling real-time Retrieval Augmented Generation (RAG).
  3. Easy alert setup - Configure alerts for key business events with simple configurations. Ask a question, and get updated when new info is available.
  4. Scalability - Handle heavy data loads and usage without degradation in performance. Metrics help track usage and scalability. Learn more about the performance of the underlying Pathway data processing framework.
  5. Monitoring - Provide visibility into model behavior via monitoring, tracing errors, anomaly detection, and replay for debugging. Helps with response quality.
  6. Security - Designed for Enterprise, with capabilities like Personally Identifiable Information (PII) detection, content moderation, permissions, and version control. Pathway apps can run in your private cloud with local LLMs.
  7. Unification - Cover multiple aspects of your choice with a unified application logic: back-end, embedding, retrieval, LLM tech stack.

Watch it in action

Effortlessly extract and organize unstructured data from PDFs, docs, and more into SQL tables - in real-time.

Analysis of live documents streams.

Effortlessly extract and organize unstructured data from PDFs, docs, and more into SQL tables - in real-time

(Check out: gpt_4o_multimodal_rag to see the whole pipeline in the works. You may also check out: unstructured-to-sql for a minimal example which works with non-multimodal models as well.)

Automated real-time knowledge mining and alerting.

Monitor streams of changing documents, get real-time alerts when answers change.

Using incremental vector search, only the most relevant context is automatically passed into the LLM for analysis, minimizing token use - even when thousands of documents change every minute. This is real-time RAG taken to a new level 😊.

Automated real-time knowledge mining and alerting

For the code, see the drive_alert app example. You can find more details in a blog post on alerting with LLM-App.

How it works

The default contextful app example launches an application that connects to a source folder with documents, stored in AWS S3 or locally on your computer. The app is always in sync with updates to your documents, building in real-time a "vector index" using the Pathway package. It waits for user queries that come as HTTP REST requests, then uses the index to find relevant documents and responds using OpenAI API or Hugging Face in natural language. This way, it provides answers that are always best on the freshest and most accurate real-time data.

This application template can also be combined with streams of fresh data, such as news feeds or status reports, either through REST or a technology like Kafka. It can also be combined with extra static data sources and user-specific contexts, to provide more relevant answers and reduce LLM hallucination.

Read more about the implementation details and how to extend this application in our blog article.

Instructional videos

▶️ Building an LLM Application without a vector database - by Jan Chorowski

▶️ Let's build a real-world LLM app in 11 minutes - by Pau Labarta Bajo

Advanced Features

LLM Apps built with Pathway can also include the following capabilities:

  • Local Machine Learning models - Pathway LLM Apps can run with local LLMs and embedding models, without making API calls outside of the User's Organization.
  • Multiple live data sources - Pathway LLM Apps can connect to live data sources of diverse types (news feeds, APIs, data streams in Kafka, and others),
  • Extensible enterprise logic - user permissions, user session handling, and a data security layer can all be embedded in your application logic by integrating with your enterprise SSO, AD Domains, LDAP, etc.
  • Live knowledge graphs - the Pathway framework enables concept mining, organizing data and metadata as knowledge graphs, and knowledge-graph-based indexes, kept in sync with live data sources.

To learn more about advanced features see: Features for Organizations, or reach out to the Pathway team.

Application Examples

Pick one that is closest to your needs.

Example app (template) Description
demo-question-answering The question-answering pipeline that uses the GPT model of choice to provide answers to the queries about a set of documents. You can also try it on the Pathway Hosted Pipelines website.
demo-document-indexing The real-time document indexing pipeline that provides the monitoring of several kinds of data sources and health-check endpoints. It is available on the Pathway Hosted Pipelines website.
contextless This simple example calls OpenAI ChatGPT API but does not use an index when processing queries. It relies solely on the given user query. We recommend it to start your Pathway LLM journey.
contextful This default example of the app will index the jsonlines documents located in the data/pathway-docs directory. These indexed documents are then taken into account when processing queries.
contextful-s3 This example operates similarly to the contextful mode. The main difference is that the documents are stored and indexed from an S3 bucket, allowing the handling of a larger volume of documents. This can be more suitable for production environments.
unstructured Process unstructured documents such as PDF, HTML, DOCX, PPTX, and more. Visit unstructured-io for the full list of supported formats.
local This example runs the application using Huggingface Transformers, which eliminates the need for the data to leave the machine. It provides a convenient way to use state-of-the-art NLP models locally.
unstructured-to-sql This example extracts the data from unstructured files and stores it into a PostgreSQL table. It also transforms the user query into an SQL query which is then executed on the PostgreSQL table.
alert Ask questions, get alerted whenever response changes. Pathway is always listening for changes, whenever new relevant information is added to the stream (local files in this example), LLM decides if there is a substantial difference in response and notifies the user with a Slack message.
drive-alert The alert example on steroids. Whenever relevant information on Google Docs is modified or added, get real-time alerts via Slack. See the tutorial.
contextful-geometric The contextful example, which optimises use of tokens in queries. It asks the same questions
with increasing number of documents given as a context in the question, until ChatGPT finds an answer.

Get Started

Prerequisites

  1. Make sure that Python 3.10 or above installed on your machine.
  2. Download and Install Pip to manage project packages.
  3. [Optional if you use OpenAI models]. Create an OpenAI account and generate a new API Key: To access the OpenAI API, you will need to create an API Key. You can do this by logging into the OpenAI website and navigating to the API Key management page.
  4. [Important if you use Windows OS]. The examples only support Unix-like systems (such as Linux, macOS, and BSD). If you are a Windows user, we highly recommend leveraging Windows Subsystem for Linux (WSL) or Dockerize the app to run as a container.
  5. [Optional if you use Docker to run samples]. Download and install docker.

Now, follow the steps to install and get started with one of the provided examples. You can pick any example that you find interesting - if not sure, pick contextful.

Alternatively, you can also take a look at the application showcases.

Clone the repository

This is done with the git clone command followed by the URL of the repository:

git clone https://github.com/pathwaycom/llm-app.git

Run the chosen example

Each example contains a README.md with instructions on how to run it.

Bonus: Build your own Pathway-powered LLM App

Want to learn more about building your own app? See step-by-step guide Building a llm-app tutorial

Or,

Simply add llm-app to your project's dependencies and copy one of the examples to get started!

Showcases

  • Python sales - Find real-time sales with AI-powered Python API using ChatGPT and LLM (Large Language Model) App.

  • Dropbox Data Observability - See how to get started with chatting with your Dropbox and having data observability.

Troubleshooting

Please check out our Q&A to get solutions for common installation problems and other issues.

Raise an issue

To provide feedback or report a bug, please raise an issue on our issue tracker.

Contributing

Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code cleanup, testing, or code reviews, is very much encouraged to do so.

To join, just raise your hand on the Pathway Discord server (#get-help) or the GitHub discussion board.

If you are unfamiliar with how to contribute to GitHub projects, here is a Get Started Guide. A full set of contribution guidelines, along with templates, are in progress.

Coming Soon

  • Templates for retrieving context via graph walks.
  • Easy setup for model drift monitoring.
  • Templates for model A/B testing.
  • Real-time OpenAI API observability.

☁️ Hosted Version ☁️

Please see cloud.pathway.com for hosted services. You can quickly set up variants of the unstructured app, which connect live data sources on Google Drive and Sharepoint to your Gen AI app.

Need help?

Interested in building your own Pathway LLM App with your data source, stack, and custom use cases? Connect with us to get help with:

  • Connecting your own live data sources to your LLM application (e.g. Google or Microsoft Drive documents, Kafka, databases, API's, ...).
  • Explore how you can get your LLM application up and running in popular cloud platforms such as Azure and AWS.
  • Developing knowledge graph use cases.
  • End-to-end solution implementation.

Reach us at contact@pathway.com or via Pathway's website.

Supported and maintained by

Pathway

See Pathway's offering for AI applications