Retrieval and Retrieval-augmented LLMs
-
Updated
Jun 6, 2024 - Python
Retrieval and Retrieval-augmented LLMs
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Model to classify and categorize user complaints into categories for specific departments using LLMs.
Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
Adapted BERTopic pipeline for Topic Modeling the arXiv dataset
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Local-GenAI-Search is a generative search engine based on Llama 3, langchain and qdrant that answers questions based on your local files
Backend for the AI-copilot
The project's goal is to help job seekers understand the basic qualifications for specific jobs and evaluate the suitability of their skills for those positions. Additionally, the program aims to assist recruiters in enhancing their resume selection processes by analyzing and understanding job advertisements ....
🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴
This study aims to investigate the effectiveness of three Transformers (BERT, RoBERTa, XLNet) in handling data sparsity and cold start problems in the recommender system. We present a Transformer-based hybrid recommender system that predicts missing ratings and ex- tracts semantic embeddings from user reviews to mitigate the issues.
Word2vec, sentenceBert, BM25 and IVFFlat Index quality and speed comparison
Data and scripts for training the open source PDF questionnaire extraction component for Harmony Kaggle competition using natural language processing (NLP)
PyTorch implementation of Self-training approch for short text clustering
An implementation of the TaxRetrievalBenchmark task for the 🤗 Massive Text Embedding Benchmark (MTEB) framework.
Embedding Representation for Indonesian Sentences!
Convert MUSE from TensorFlow to PyTorch and ONNX
1 line for thousands of State of The Art NLP models in hundreds of languages The fastest and most accurate way to solve text problems.
ColBERT humor dataset for the task of humor detection, containing 200,000 jokes/news
A custom cross encoder used to predict the diseases from an input of symptoms
Add a description, image, and links to the sentence-embeddings topic page so that developers can more easily learn about it.
To associate your repository with the sentence-embeddings topic, visit your repo's landing page and select "manage topics."