Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava broke in new version v0.1.33 #4163

Closed
VideoFX opened this issue May 5, 2024 · 14 comments
Closed

llava broke in new version v0.1.33 #4163

VideoFX opened this issue May 5, 2024 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@VideoFX
Copy link

VideoFX commented May 5, 2024

What is the issue?

Ollama v0.1.33
Intel Core i9 14900K 64GB ram
Nvidia RTX 4070

llava only works for the first inference attempt. All attempts afterwards make up strange descriptions not related to the image, almost like its looking at a different picture.

This also happens with llava:13b. It will work the first time after loading. After that, broken.

This also happens on other windows machines with different Intel and Nvidia combinations.

I have updated Ollama, and redownloaded the llava models.

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.1.33

@VideoFX VideoFX added the bug Something isn't working label May 5, 2024
@jmorganca jmorganca self-assigned this May 5, 2024
@jmorganca
Copy link
Member

Sorry you hit this issue. Looking into it now

@jmorganca
Copy link
Member

Related issue in llama.cpp: ggerganov/llama.cpp#7060

@brucecai-2001
Copy link

The same issue on Ollama mac, llava7b & 13b failed on the second attempt

@dlvoy
Copy link

dlvoy commented May 5, 2024

I noticed same after update to v0.1.33, reverting to v0.1.32 fixed issue so it have to be some kind of regression.

I have simple py script with query "What is on the picture?" done over collection of photos taken by phone. For each photo there is API request to generate endpoint:

From response comparisons:

  • first response is accurate
  • few next responses are kind of: "The image appears to be a distorted or abstract representation. It seems to feature multiple layers of overlapping shapes", "The image you've shared appears to be a collage of various pictures", "The image shows a collection of messages written on post-it notes"...
  • rest of responses are total hallucinations ("The image shows a comical representation of the "Göbekli Tepe,")

I do not know what are inner workings of ollama, but it seems that previous images or context is somehow preserved for next queries.

  • OS: Win10 22H2
  • GPU: 3090 Ti
  • Model: llava:34b (hash: 3d2d24f46674)

@DuckyBlender
Copy link

DuckyBlender commented May 5, 2024

for me, the first response is empty 95% of the time, if you follow up the question it works. running v0.1.33 and the moondream model

@jdavid82
Copy link

jdavid82 commented May 6, 2024

I guess Llava is now broken for everyone around the world who wants to try it as of yesterday. If there is documentation on how to install an older version then please kindly point me to it

@dlvoy
Copy link

dlvoy commented May 6, 2024

If there is documentation on how to install an older version then please kindly point me to it

For me downloading previous installer (https://github.com/ollama/ollama/releases/tag/v0.1.32) and running it installed previous version.

@skye0402
Copy link

skye0402 commented May 6, 2024

Same here. 0.1.32 worked, 0.1.33 doesn't. Using llava:13b-v1.6. Running on Nvidia T4 (16GB).

@jmorganca
Copy link
Member

Hi there this should be fixed in this patch #4164 for now and we'll help hunt down why this broke more broadly in llama.cpp in the meantime. It will be fixed in the next release 0.1.34 which should be out very soon

@TheMasterFX
Copy link

Is this really fixed? I can't confirm.
I use ollama 0.1.34 and OpenWebUI v0.1.124.
Here is my conversation with two pictures:
image
image
It seems like it mixed the Context of the first image with the second image. Am I using it wrong? Or is this more an issue with OpenWebUI providing the context to LLava?

@dlvoy
Copy link

dlvoy commented May 10, 2024

v0.1.34 fixed it - but I am using ollama directly via ollama-python lib

@VideoFX
Copy link
Author

VideoFX commented May 10, 2024

Yea fixed for me so far using various python methods including ollama-python. works in open WebUI as well, BUT it is also true that the context can confuse it, so my advice is to make a new chat when using Open Webui, and beware of the growing chat context influencing the outputs.

@jdavid82
Copy link

That means it's not really fixed because I didn't have this problem in the previous version:

import io
import ollama
import os
from PIL import Image

folder_path = 'images'
filtered_folder = 'filtered'
rejected_folder = 'rejected'
image_files = []

for filename in os.listdir(folder_path):
    if filename.endswith('.jpg') or filename.endswith('.png') or filename.endswith('.jpeg'):
        image_files.append(os.path.join(folder_path, filename))

image_files.sort()

for image_file in image_files:
  with open(image_file, 'rb') as file:
    img = file.read()
  score = 0
  totalChecks = 1
  minScore = 1
  print(f"Evaluating:{image_file}")
  for _ in range(totalChecks):
    if score == 0:
      output = ollama.generate(model="llava", prompt="Are there any dogs in this picture? Answer with just yes or no.", images=[img])
      print(f"{output['response']}")
      if 'Yes' in output['response'] or 'yes' in output['response']:
        score += 1
        if score >= minScore:
          break
  print(f"Final Score: {score}")
  if score >= minScore:
    filtered_image_path = os.path.join(filtered_folder, os.path.basename(image_file))
    os.makedirs(filtered_folder, exist_ok=True)
    os.rename(image_file, filtered_image_path)
  else:
    rejected_image_path = os.path.join(rejected_folder, os.path.basename(image_file))
    os.makedirs(rejected_folder, exist_ok=True)
    os.rename(image_file, rejected_image_path)

This code was working in the previous version, now it only works for the first image, after that its no longer accurate.
The ollama api doesn't seem to have a method for clearing the context either

@amonpaike
Copy link

amonpaike commented May 17, 2024

@jmorganca
on windows, native ollama v.1.38 the images are interpreted only for the first one, the subsequent ones are given a description of the first one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants