Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: lorax-launcher failed with --source "s3" for model_id "mistralai/Mistral-7B-Instruct-v0.2" #473

Open
1 of 4 tasks
donjing opened this issue May 17, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@donjing
Copy link

donjing commented May 17, 2024

System Info

lorax_version: "a7e8175"
Python 3.10.8
Platform: ml.g5.16xlarge (AWS)

When deploy the docker container with the source from "s3" and model_id "mistralai/Mistral-7B-Instruct-v0.2" (lorax-launcher --port 8080 --source "s3"), it failed with the following error message:

2024-05-14T20:25:15.424-07:00
huggingface_hub.utils._errors.EntryNotFoundError: No .safetensors weights found for model mistralai/Mistral-7B-Instruct-v0.2

2024-05-14T20:25:15.424-07:00
Error: DownloadError
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 123, in download_weights
    _download_weights(model_id, revision, extension, auto_convert, source, api_token)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/weights.py", line 447, in download_weights
    model_source.weight_files()
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/s3.py", line 222, in weight_files
    return weight_files_s3(self.bucket, self.model_id, self.revision, extension)
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/s3.py", line 156, in weight_files_s3
    pt_filenames = weight_s3_files(bucket, model_id, extension=".bin")
  File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/sources/s3.py", line 86, in weight_s3_files
    raise EntryNotFoundError(

Based on error message above, the model_id (mistralai/Mistral-7B-Instruct-v0.2) has been passed in correctly. However, the top level folder name of the Mistral model is models--mistralai--Mistral-7B-Instruct-v0.2. Thus, the root cause of this bug is at the weight_s3_files function (link) below:

def weight_s3_files(bucket: Any, model_id: str, extension: str = ".safetensors") -> List[str]:
    """Get the weights filenames from s3"""
    model_files = bucket.objects.filter(Prefix=model_id)
    filenames = [f.key.removeprefix(model_id).lstrip("/") for f in model_files if f.key.endswith(extension)]
    if not filenames:
        raise EntryNotFoundError(
            f"No {extension} weights found for model {model_id}",
            None,
        )
    return filenames

In this line: model_files = bucket.objects.filter(Prefix=model_id), model_files returns empty because model_id (mistralai/Mistral-7B-Instruct-v0.2) doesn't match models--mistralai--Mistral-7B-Instruct-v0.2.

The fix of this bug can be converting model_id to folder name, like the get_s3_model_local_dir function (link) does, before filtering the s3 bucket.

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Dockerfile:

ARG VERSION
FROM ghcr.io/predibase/lorax:$VERSION

COPY sagemaker_entrypoint.sh entrypoint.sh
RUN chmod +x entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]

sagemaker_entrypoint.sh:

#!/bin/bash

if [[ -z "${HF_MODEL_ID}" ]]; then
  echo "HF_MODEL_ID must be set"
  exit 1
fi
export MODEL_ID="${HF_MODEL_ID}"

if [[ -n "${HF_MODEL_REVISION}" ]]; then
  export REVISION="${HF_MODEL_REVISION}"
fi

if [[ -n "${SM_NUM_GPUS}" ]]; then
  export NUM_SHARD="${SM_NUM_GPUS}"
fi

if [[ -n "${HF_MODEL_QUANTIZE}" ]]; then
  export QUANTIZE="${HF_MODEL_QUANTIZE}"
fi

if [[ -n "${HF_MODEL_TRUST_REMOTE_CODE}" ]]; then
  export TRUST_REMOTE_CODE="${HF_MODEL_TRUST_REMOTE_CODE}"
fi

if [[ -z "${ADAPTER_BUCKET}" ]]; then
  echo "Warning: ADAPTER_BUCKET not set. Only able to load local or HuggingFace Hub models."
else
  export PREDIBASE_MODEL_BUCKET="${ADAPTER_BUCKET}"
fi

lorax-launcher --port 8080 --source "s3"

Expected behavior

lorax_launcher should be able to filter the Mistral base model saved in S3 to find the .safetensors files

@tgaddair tgaddair added the bug Something isn't working label May 23, 2024
@magdyksaleh magdyksaleh self-assigned this May 23, 2024
@magdyksaleh
Copy link
Collaborator

Will fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants