Skip to content

A simple multi-modal vision-language model that describes an image using only keywords.

License

Notifications You must be signed in to change notification settings

eisneim/nanoVLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

nanoVLM

a simple multi-modal vision language model that discribes a image with only keywords

!! currently WORKING IN PROGRESS

Roadmap

  • image dataset prepaeration ☑
  • text dataset preparation ◻︎
  • nano language model ◻︎
  • openCLIP b/32 projection layer ◻︎
  • supervised vs instruction fine tuning ◻︎
  • usage examples ◻︎
  • export to ONNX ◻︎
  • add WASM for javascript support ◻︎

About

A simple multi-modal vision-language model that describes an image using only keywords.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages