Semantic Search & Retrieval Augmented Generation Research and Implementations 🧠🔍

This repository is dedicated to collecting blog posts and articles on the implementation of state-of-the-art techniques in semantic search, dense retrieval, and retrieval augmented generation (RAG). From theoretical exploration to hands-on performance in real-world scenarios, this repository serves as a central hub for researchers, practitioners, and enthusiasts in the field.

Who is this Repository For? 🎓

Whether you are a seasoned researcher, a software engineer, a student stepping into the world of dense retrieval, or a business leader looking to understand the practical implications of these technologies, this repository provides valuable insights and resources tailored to your needs.

Scaling the Instagram Explore recommendations system
Preview: Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we’ve been able to make the Explore recommendation system even more scalable and flexible.
Search: Query Matching via Lexical, Graph, and Embedding Methods
Preview: Search and recommendations have a lot in common. They help users learn about new products, and need to retrieve and rank millions of products in a very short time (<150ms). They’re trained on similar data, have content and behavioral-based approaches, and optimize for engagement (e.g., click-through rate) and revenue (e.g., conversion, gross merchandise value).
Broad and Ambiguous Search Queries
Preview: A typical approach for processing search queries is to retrieve a set of matching documents and then rank them with a relevance scoring function. This simple approach generally works well for unambiguous, specific search queries. But sometimes this approach breaks down. When a search query is broad (e.g., “shirts”), it isn’t clear how to decide which matching results are the most relevant ones.
LambdaMART in Depth by Doug Turnbull
Preview: LambdaMART is a classic. It’s the endlessly tinkerable classic car of ranking algorithms. If you can grok the algorithm, you can play with the model architecture, coming up with your own variations on this learning-to-rank staple....
How LambdaMART works - optimizing product ranking goals by Doug Turnbull
Preview: Learning to Rank optimizes search relevance using machine learning. If you bring to Learning to Rank training data – documents labeled as relevant/irrelevant for queries – you’ll get out a ranking function optimizing search closer to what users want....
Save space with byte-sized vectors
Preview: Elasticsearch is introducing a new type of vector in 8.6! This vector has 8-bit integer dimensions, where each dimension has a range of [-128, 127]. This is 4x smaller than the current vector with 32-bit float dimensions, which can result in substantial space savings...
5 Reasons why you should remove stop words (at indexing time)
Preview: While storage feels almost infinite, and cheap, to boot, it does still cost money to store stopwords. Let’s consider a simple index. Each posting consumes 8 bytes for document ID, 4 bytes for position....
10 Reasons why you shouldn’t remove stop words
Preview: So there shouldn’t be any harm in removing them from the data we are indexing in search engines and from the queries that are sent to these to retrieve relevant results, right? ..
Serving Large Language Models to improve Search Relevance at leboncoin
Preview: In this post, we describe this first iteration towards an improved search relevance. By the end of the post you will know how we successfully deployed in production, facing highly restrictive conditions specific to the search engine industry, large neural networks to facilitate users’ contact and improve their search experience on leboncoin........
Improving Search Ranking with Few-Shot Prompting of LLMs
Preview: This blog post explores using large language models (LLMs) to generate labeled data for training ranking models. Distilling the knowledge and power of generative models with billions of parameters to ranking models with a few million parameters...
eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine
Preview: The Similarity Engine's use cases include item-to-item similarity for text and image modality and user-to-item personalized recommendations based on a user’s historical behavior data....
Ask like a human: Implementing semantic search on Stack Overflow
Preview: Our hypothesis is that if our semantic search produces high-quality results, technologists looking for answers will use our search instead of a search engine or conversational AI. Our forthcoming semantic search functionality is the first step in a continuous experimental process that will involve a lot of data science, iteration, and most importantly: our users. We’re excited to embark upon this adventure together with our community and can’t wait for you to experience our new semantic search.
Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models
Preview: RAG looks and acts like a standard seq2seq model, meaning it takes in one sequence and outputs a corresponding sequence. There is an intermediary step though, which differentiates and elevates RAG above the usual seq2seq methods. Rather than passing the input directly to the generator, RAG instead uses the input to retrieve a set of relevant documents, in our case from Wikipedia.
Should you use OpenAI's embeddings? Probably not, and here's why.
Preview: If you do go with OpenAI, one word of advice: make sure you don’t spend $50M embedding the whole internet, become successful and then depend on OpenAI’s API to run your search engine!
Llama from scratch
Preview: I want to provide some tips from my experience implementing a paper. I'm going to cover implementing a dramatically scaled-down version of Llama for training TinyShakespeare. This post is heavily inspired by Karpathy's Makemore series, which I highly recommend.
The little search engine that couldn’t
Preview: A couple of ex-Googlers set out to create the search engine of the future. They built something faster, simpler, and ad-free. So how come you’ve never heard of Neeva?
Patterns for Building LLM-based Systems & Products
Preview: This write-up is about practical patterns for integrating large language models (LLMs) into systems & products. We’ll build on academic research, industry resources, and practitioner know-how, and distill them into key ideas and practices.
The Secret Sauce behind 100K context window in LLMs: all tricks in one place
Preview: tldr; techniques to speed up training and inference of LLMs to use large context window up to 100K input tokens during training and inference: ALiBi positional embedding, Sparse Attention, FlashAttention, Multi-Query attention, Conditional computation, and 80GB A100 GPUs.
Emerging Architectures for LLM Applications
Preview: Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them.
Data Agents
Preview: Today we’re incredibly excited to announce the launch of a big new capability within LlamaIndex: Data Agents..
Building Advanced Query Engine and Evaluation with LlamaIndex and W&B
Preview: If you are trying to leverage an LLM to build a powerful search engine, a chatbot or any other LLM-based system over your data, LlamaIndex should be one of the first options to explore. Why is that? Well, you can literally build a search engine on top of your own data in just a few lines of code. And even that's just scratching the surface here.
Delving Deeper Into LlamaIndex: An Inside Look
Preview: In my previous piece about LlamaIndex, we walked through a simple use case of indexing a GitHub repository to be able to ask questions about it. I thought it would be interesting to delve deeper and try to understand what it does in a bit more detail.
LlamaIndex: the ultimate LLM framework for indexing and retrieval
Preview: LlamaIndex, previously known as the GPT Index, is a remarkable data framework aimed at helping you build applications with LLMs by providing essential tools that facilitate data ingestion, structuring, retrieval, and integration with various application frameworks.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

Semantic Search & Retrieval Augmented Generation Research and Implementations 🧠🔍

Who is this Repository For? 🎓

About

Releases

Packages

ToluClassics/search-related-articles

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

Semantic Search & Retrieval Augmented Generation Research and Implementations 🧠🔍

Who is this Repository For? 🎓

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages