eisneim / nanoVLM Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

A simple multi-modal vision-language model that describes an image using only keywords.

Apache-2.0 license

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

nanoVLM

a simple multi-modal vision language model that discribes a image with only keywords

!! currently WORKING IN PROGRESS

Roadmap

image dataset prepaeration ☑
text dataset preparation ◻︎
nano language model ◻︎
openCLIP b/32 projection layer ◻︎
supervised vs instruction fine tuning ◻︎
usage examples ◻︎
export to ONNX ◻︎
add WASM for javascript support ◻︎

About

A simple multi-modal vision-language model that describes an image using only keywords.

python cv vlm llm vision-language-model

Apache-2.0 license

Report repository

Releases

No releases published

Languages

Python 100.0%