Skip to content

An LLM app with Retrieval Augmented Generation (RAG) built using OpenAI GPT models, Langchain Tokenizers, HuggingFace transformers, and Meta BART summarization model wrapped in a NextJS web app hosted completely on AWS

License

Notifications You must be signed in to change notification settings

maanvithag/thinkai

Repository files navigation

ThinkAi

Website Code License Python 3.9+ LangChain OpenAI HuggingFace NextJS React TypeScript TailwindCSS

ThinkAi is an LLM app with Retrieval Augmented Generation (RAG) that talks Philosophy built using InstructGPT embeddings, Chroma's Vector Search, LangChain tokenizers for text chunking, Meta's bart-large-cnn model for summarizing, and OpenAI's gpt-3.5-turbo model for structuring the final response. This is wrapped with a NextJS web app hosted completely on AWS (AWS Amplify, AWS Elastic Beanstalk, and AWS EC2), code here - ThinkAi UI

LLMs are trained on the internet making it hard to know if the generated response comes from a reliable source or even if it is a product of its hallucination. RAG helps with adding external knowledge forcing the model to generate a response using this context. It enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of hallucination. This is exactly what ThinkAI aims to achieve.

Performance evaluation of ThinkAI with ChatGPT here: ThinkAI v. ChatGPT

System Architecture:

ThinkAi

Contents

  1. Basic User Flow
  2. Data Collection
  3. Create Embeddings
  4. Summarize each article

Basic User Flow:

Here is how ThinkAi processes each user query;

  • User queries the web client.
  • Chroma DB encodes this query using embeddings.
  • The closest documents to this query are pulled using Chroma's vector search.
  • The summaries for these documents are pulled from the preprocessed JSON file and concatenated serving as context to the prompt.
  • API call prompting gpt-3.5-turbo and the response is sent back to the user.

ThinkAi

Code for all the preprocessing step - COMING SOON!

Preprocessing: Data Collection

All the data has been collected from articles published and owned by Stanford Encyclopedia of Philosphy. Using beautifulsoup, all scraped pages were dumped into files which were then cleaned removing noise like html tags, media, etc. and structured into JSON files.

ThinkAi

Preprocessing: Create Embeddings

Using Chroma DB, embeddings are created on the entire text of each article with an index on the URL of each article. For a given query, Chroma creates embeddings and then using Vector search, Chroma pulls out the closest top 3 articles for the given query.

ThinkAi

Preprocessing: Summarize each article

There are a lot of summarization models, however, the max token size is at 1024 tokens (~800 words) for summarizing any text. This does not fit the use case since the text here is at least a couple of pages long. One way to work through this is to chunk text, summarize individually, and combine again. This may result in loss of continuity in the flow of the text, to overcome that, we add few tokens overlap in each chunk so we stay with the flow. This has proven to extract the main context of the article. Each JSON file is broken into chunks of 1000 tokens using LangChain's Text Splitters. These chunks are individually summarized using Meta's BART model published on HuggingFace. Finally, these individual summaries are combined together by simple concatenation.

ThinkAi

All the generated summaries for each article is stored in a JSON file for fast access.

About

An LLM app with Retrieval Augmented Generation (RAG) built using OpenAI GPT models, Langchain Tokenizers, HuggingFace transformers, and Meta BART summarization model wrapped in a NextJS web app hosted completely on AWS

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages