Replies: 1 comment 1 reply
-
I don't it will be possible to generate captions that are completely satisfactory until better captioning models come out. You can try to tweak the prompt more to include or exclude specific aspects of the image, but there is no guarantee that the model will follow your instructions exactly. You can also try using the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there, currently trying cogvlm and llava 1.6 on their UI demo with some realistic image of female model.
Im not totally satisfied with the result I get from theses 2 llm models, for example this is one of my image :
This is the prompt : describe this image with only unique keywords separate by ",". Exclude from the description " separate " " unique " " keywords " "description " "image" and write maximum 30 keywords. Include also all physical attributes of the woman.
I get this :
" red hair, white dress, kitchen, open shelving, glass jars, wooden countertop, stainless steel appliances, natural light, open window, dishware, utensils, wooden spatula, cutting board, countertop, cabinet, sink, oven, microwave, slim body, curvy hips, long legs "
Feel like its too detailed about the room and miss some important element like " woman " and " in the kitchen " or " kitchen background " and also the specialization like " woman in the middle " etc...
I use the format of keywords separate by comma because otherwise I will get a huge description way too long and detailed.
Would like to know if you guy's have some good prompts to do a clean captioning of the main elements of the images.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions