vision-transformer

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

speech multimodal rag edge-ai vector-database vision-transformer llm-inference

Updated Jun 6, 2024
Python

SalvatoreRa / tutorial

Star

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)

python nlp data-science machine-learning natural-language-processing image bioinformatics tutorial r computer-vision deep-learning graph biology tutorials artificial-intelligence convolutional-neural-networks streamlit streamlit-webapp vision-transformer

Updated Jun 6, 2024
Jupyter Notebook

larsklei / AnomalyDetection-VT-ADL

Star

A simple and private project to implement the ideas behind the paper "VT-ADL: A Vision Transformer Network for Image Anomaly Detection and Localisation" by Mishra, Vera et al.

deep-learning unsupervised-learning anomaly-detection tensorflow2 vision-transformer

Updated Jun 6, 2024
Python

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

image translation deep-learning neural-network gpu text machine-translation cuda transformer lstm seq2seq sequence-to-sequence tensor encoder-decoder attention-model transformer-encoder transformer-architecture vision-transformer

Updated Jun 6, 2024
C#

Apsurt / omni-geo-ai

Star

Omni Geoguessr AI: A Vision Transformer AI integrated with Geoguessr for automated geographic location prediction and gameplay using streetview panoramas.

python machine-learning ai computer-vision deep-learning geolocation streetview image-recognition geography google-maps-api location-prediction geoguessr vision-transformer automated-gameplay city-recognition climate-classification elevation-detection

Updated Jun 6, 2024
Python

ViTAE-Transformer / ViTAE-Transformer-Remote-Sensing

Star

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

deep-learning remote-sensing classification object-detection transfer-learning semantic-segmentation change-detection self-supervised-learning vision-transformer

Updated Jun 6, 2024
TeX

OpenGVLab / InternVideo

Star

Video Foundation Models & Data for Multimodal Understanding

benchmark action-recognition video-understanding video-data self-supervised multimodal video-dataset open-set-recognition video-retrieval video-question-answering masked-autoencoder temporal-action-localization contrastive-learning spatio-temporal-action-localization zero-shot-retrieval video-clip vision-transformer zero-shot-classification foundation-models instruction-tuning

Updated Jun 5, 2024
Python

Blaizzy / mlx-vlm

Star

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

mlx vision-framework apple-silicon vision-transformer llm vision-language-model llava local-ai idefics paligemma

Updated Jun 6, 2024
Python

davide-coccomini / MINTIME-Multi-Identity-size-iNvariant-TIMEsformer-for-Video-Deepfake-Detection

Star

Code for Video Deepfake Detector from "MINTIME: Multi-Identity Size-Invariant Video Deepfake Detection", paper available on IEEE Transactions on Information Forensics and Security.

transformers pytorch multi-identity deepfakes efficientnet deepfake-detection vision-transformer timesformer multi-face forgerynet size-invariant video-deepfake

Updated Jun 5, 2024
Jupyter Notebook

IDT-ITI / T-TAME

Star

Scripts and trained models from our paper: M. Ntrougkas, N. Gkalelis, V. Mezaris, "T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers", IEEE Access, 2024. DOI:10.1109/ACCESS.2024.3405788.

deep-learning cnn attention-mechanism explainable-ai xai model-interpretability vision-transformer