An open source implementation of CLIP.
-
Updated
Jun 8, 2024 - Jupyter Notebook
An open source implementation of CLIP.
Cybertron: the home planet of the Transformers in Go
Unofficial (Golang) Go bindings for the Hugging Face Inference API
[ICML 2024] Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Video Foundation Models & Data for Multimodal Understanding
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models like Grounding DINO and SAM.
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
FastViT base model for use with Autodistill.
Evaluate custom and HuggingFace text-to-image/zero-shot-image-classification models like CLIP, SigLIP, DFN5B, and EVA-CLIP. Metrics include Zero-shot accuracy, Linear Probe, Image retrieval, and KNN accuracy.
GPT-4o (with Vision) module for use with Autodistill.
Scripts, algorithms and files for a rule-based and ML-based approach for binary classification of regulatory / non-regulatory sentences in EU legislative documents, as well as code for evaluating the accuracy of these approaches
Multi-Aspect Vision Language Pretraining - CVPR2024
[CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Comparing zero-shot and fine-tuned text classification transformer models across different review datasets
some useful colab files
Transformers, including the T5 and MarianMT, enabled effective understanding and generating complex programming codes. Consequently, they can help us in Data Security field. Let's see how!
[ NeurIPS 2023 R0-FoMo Workshop ] Official Codebase for "Estimating Uncertainty in Multimodal Foundation Models using Public Internet Data"
LLMs for Low Resource Languages in Multilingual, Multimodal and Dialectal Settings
PyTorch implementation of 'CLIP' (Radford et al., 2021) from scratch and training it on Flickr8k + Flickr30k
Add a description, image, and links to the zero-shot-classification topic page so that developers can more easily learn about it.
To associate your repository with the zero-shot-classification topic, visit your repo's landing page and select "manage topics."