Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

justsonghua · 2024-05-13T19:47:02Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Window11 23H2 22631.3527

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Gesture recognition

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.

Describe the expected behaviour

Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.

Standalone code/steps you may have used to try to get what you need

import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time

import os
from pathlib import Path


# Set model directory and change working directory
model_dir = Path(r"D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos")
os.chdir(model_dir)

# Set model path
print("Current working directory:", os.getcwd())
model_path = Path("gesture_recognizer.task")

# Get absolute path and check if the file exists
absolute_model_path = os.path.abspath(model_path)

if not os.path.exists(absolute_model_path):
    print("Model file does not exist:", absolute_model_path)
else:
    print("Model file found:", absolute_model_path)


# Initialize hand detection
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=2,
    min_detection_confidence=0.75,
    min_tracking_confidence=0.5
)

# Define gesture recognition callback
def gesture_result_callback(result, image, timestamp):
    if result is not None and result.gestures:
        print('Gesture recognized:', result.gestures)
        cv2.putText(image, f'Gesture: {result.gestures}', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)


# Print the absolute_model_path
print("Using model path:", absolute_model_path)


# Initialize gesture recognizer
base_options = python.BaseOptions(model_asset_path=absolute_model_path)
options = vision.GestureRecognizerOptions(base_options=base_options, running_mode=vision.RunningMode.LIVE_STREAM, result_callback=gesture_result_callback)
recognizer = vision.GestureRecognizer.create_from_options(options)


# Initialize webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Ignoring empty frame")
        break

    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_rgb = cv2.flip(frame_rgb, 1)
    results = hands.process(frame_rgb)

    recognizer.recognize_async(frame_rgb, int(time.time() * 1000))

    frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    cv2.imshow("MediaPipe Hands and Gesture Recognition", frame)

    if cv2.waitKey(5) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

Current working directory: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Traceback (most recent call last):
  File "D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
    recognizer = vision.GestureRecognizer.create_from_options(options)
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
    return cls(
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in __init__
    self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

I suspect the issue might be due to the % character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed into python.BaseOptions(), but after that, it suddenly switches to the conda virtual environment's folder.

The text was updated successfully, but these errors were encountered:

justsonghua · 2024-05-14T11:26:12Z

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

Yes

OS Platform and Distribution

Window11 23H2 22631.3527

MediaPipe Tasks SDK version

No response

Task name (e.g. Image classification, Gesture recognition etc.)

Gesture recognition

Programming Language and version (e.g. C++, Python, Java)

python

Describe the actual behavior

Mediapipe should correctly load and use the gesture recognizer model from the specified path, regardless of special characters in the directory name.

Describe the expected behaviour

Mediapipe try to load the gesture_recognizer.task from the conda virtual env folder.

Standalone code/steps you may have used to try to get what you need

import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time

import os
from pathlib import Path


# Set model directory and change working directory
model_dir = Path(r"D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos")
os.chdir(model_dir)

# Set model path
print("Current working directory:", os.getcwd())
model_path = Path("gesture_recognizer.task")

# Get absolute path and check if the file exists
absolute_model_path = os.path.abspath(model_path)

if not os.path.exists(absolute_model_path):
    print("Model file does not exist:", absolute_model_path)
else:
    print("Model file found:", absolute_model_path)


# Initialize hand detection
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
hands = mp_hands.Hands(
    static_image_mode=False,
    max_num_hands=2,
    min_detection_confidence=0.75,
    min_tracking_confidence=0.5
)

# Define gesture recognition callback
def gesture_result_callback(result, image, timestamp):
    if result is not None and result.gestures:
        print('Gesture recognized:', result.gestures)
        cv2.putText(image, f'Gesture: {result.gestures}', (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2, cv2.LINE_AA)


# Print the absolute_model_path
print("Using model path:", absolute_model_path)


# Initialize gesture recognizer
base_options = python.BaseOptions(model_asset_path=absolute_model_path)
options = vision.GestureRecognizerOptions(base_options=base_options, running_mode=vision.RunningMode.LIVE_STREAM, result_callback=gesture_result_callback)
recognizer = vision.GestureRecognizer.create_from_options(options)


# Initialize webcam
cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Ignoring empty frame")
        break

    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    frame_rgb = cv2.flip(frame_rgb, 1)
    results = hands.process(frame_rgb)

    recognizer.recognize_async(frame_rgb, int(time.time() * 1000))

    frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

    cv2.imshow("MediaPipe Hands and Gesture Recognition", frame)

    if cv2.waitKey(5) & 0xFF == 27:
        break

cap.release()
cv2.destroyAllWindows()

Other info / Complete Logs

Current working directory: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Traceback (most recent call last):
  File "D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
    recognizer = vision.GestureRecognizer.create_from_options(options)
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
    return cls(
  File "C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in __init__
    self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:\_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\%DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

I suspect the issue might be due to the % character in my file path, but I don't understand why the problem still exists even after I set the absolute path. The path remains unchanged until it's passed into python.BaseOptions(), but after that, it suddenly switches to the conda virtual environment's folder.

I tried creating a new path and moving the file into this folder, as you can see, I removed the % from the file path,

Current working directory: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos
Model file found: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task
Using model path: D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task

But I'm still getting the same error message:

Traceback (most recent call last):
File "D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\demo_002.py", line 61, in <module>
recognizer = vision.GestureRecognizer.create_from_options(options)
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\gesture_recognizer.py", line 340, in create_from_options
return cls(
File "C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages\mediapipe\tasks\python\vision\core\base_vision_task_api.py", line 70, in init
self._runner = _TaskRunner.create(graph_config, packet_callback)
RuntimeError: Unable to open file at C:_CodeEnv\miniconda3\envs\hiwi.mediapipe\lib\site-packages/D:\DokiDoki\M.Sc._EAAS\HiWi.Job\Projects\wode.demos\gesture_recognizer.task, errno=22
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.

It seems that this issue isn't related to the % in the original file path, but I still don't know why it throws an error during runtime.

kuaashish · 2024-05-16T05:57:48Z

Hi @justsonghua,

It appears you are using our legacy hand solution based on the provided code. This solution has been upgraded and is now part of the new Gesture Recognition Task API. Support for the legacy hand solution has ended. Please try our new Task API for the updated Python example available here. For a general overview, visit our overview page.

Apart from this we can not do much about this issue, If you encounter any issues with new Task API, please report them here for further assistance.

Thank you!!

justsonghua · 2024-05-23T11:31:15Z

# Created by Songhua at 14.May.2024

import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np

# Initialize Mediapipe modules
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

# Initialize gesture recognizer
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

# Initialize variables
current_frame = None
gesture_text = "None"
current_result = None


# Function to update gesture text

# 1 Hand Only
def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    global gesture_text, current_result
    if result is not None and result.gestures:
        gesture_text = result.gestures[0][0].category_name
    else:
        gesture_text = "None"
    current_result = result


# Function to display results on the frame
def display_result(frame):
    global gesture_text
    cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)


# Function to draw bounding box on the frame
def draw_bounding_box(frame, result: GestureRecognizerResult):
    if result is not None and result.hand_landmarks:
        for hand_landmarks in result.hand_landmarks:
            x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks]
            y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks]
            x_min, x_max = int(min(x_coords)), int(max(x_coords))
            y_min, y_max = int(min(y_coords)), int(max(y_coords))
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

# Configuration for gesture recognizer
model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(
    base_options=base_options,
    running_mode=VisionRunningMode.LIVE_STREAM,
    result_callback=update_gesture_text
)
recognizer = vision.GestureRecognizer.create_from_options(options)


# Initialize webcam

# for index in range(3):
#     cap = cv2.VideoCapture(index)
#     if cap.isOpened():
#         print(f"Camera index {index} is available")
#         cap.release()

camera_index = 2  # Initialize webcam index
cap = cv2.VideoCapture(camera_index)

timestamp = 0

while cap.isOpened():
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        print("Ignoring empty frame")
        break

    timestamp += 1

    # Flip the frame horizontally for a mirrored view
    frame = cv2.flip(frame, 1)

    # Convert the frame to mp.Image format
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)

    # Send live image data to perform gesture recognition
    recognizer.recognize_async(mp_image, timestamp)

    # Display the frame with recognition result
    display_result(frame)

    # Draw bounding box on the frame
    draw_bounding_box(frame, current_result)

    cv2.imshow("MediaPipe Model", frame)

    # Exit on ESC key
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Release the webcam resource
cap.release()
cv2.destroyAllWindows()

So, with the new api, it works now.

But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?

justsonghua · 2024-05-23T12:12:10Z

# Created by Songhua at 14.May.2024

import cv2
import mediapipe as mp
from mediapipe.framework.formats import landmark_pb2
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
import time
import numpy as np

# Initialize Mediapipe modules
mp_hands = mp.solutions.hands
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles

# Initialize gesture recognizer
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

# Initialize variables
current_frame = None
gesture_text = "None"
current_result = None


# Function to update gesture text

# 1 Hand Only
def update_gesture_text(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    global gesture_text, current_result
    if result is not None and result.gestures:
        gesture_text = result.gestures[0][0].category_name
    else:
        gesture_text = "None"
    current_result = result


# Function to display results on the frame
def display_result(frame):
    global gesture_text
    cv2.putText(frame, gesture_text, (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)


# Function to draw bounding box on the frame
def draw_bounding_box(frame, result: GestureRecognizerResult):
    if result is not None and result.hand_landmarks:
        for hand_landmarks in result.hand_landmarks:
            x_coords = [landmark.x * frame.shape[1] for landmark in hand_landmarks]
            y_coords = [landmark.y * frame.shape[0] for landmark in hand_landmarks]
            x_min, x_max = int(min(x_coords)), int(max(x_coords))
            y_min, y_max = int(min(y_coords)), int(max(y_coords))
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

# Configuration for gesture recognizer
model_path = 'D:/DokiDoki/M.Sc._EAAS/HiWi.Job/Projects/wode.demos/gesture_recognizer.task'
base_options = python.BaseOptions(model_asset_path=model_path)
options = vision.GestureRecognizerOptions(
    base_options=base_options,
    running_mode=VisionRunningMode.LIVE_STREAM,
    result_callback=update_gesture_text
)
recognizer = vision.GestureRecognizer.create_from_options(options)


# Initialize webcam

# for index in range(3):
#     cap = cv2.VideoCapture(index)
#     if cap.isOpened():
#         print(f"Camera index {index} is available")
#         cap.release()

camera_index = 2  # Initialize webcam index
cap = cv2.VideoCapture(camera_index)

timestamp = 0

while cap.isOpened():
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        print("Ignoring empty frame")
        break

    timestamp += 1

    # Flip the frame horizontally for a mirrored view
    frame = cv2.flip(frame, 1)

    # Convert the frame to mp.Image format
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)

    # Send live image data to perform gesture recognition
    recognizer.recognize_async(mp_image, timestamp)

    # Display the frame with recognition result
    display_result(frame)

    # Draw bounding box on the frame
    draw_bounding_box(frame, current_result)

    cv2.imshow("MediaPipe Model", frame)

    # Exit on ESC key
    if cv2.waitKey(5) & 0xFF == 27:
        break

# Release the webcam resource
cap.release()
cv2.destroyAllWindows()

So, with the new api, it works now.

But a new problem, that it can only recognize one hand. Is this simple model (gesture_recognizer.task) can only one hand recognize?

https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/gesture_recognizer/raspberry_pi/recognize.py

I found this demo, and now my codes can recognize both hands now.

google-ml-butler bot assigned ayushgdev May 13, 2024

kuaashish assigned kuaashish and unassigned ayushgdev May 14, 2024

kuaashish added os:windows MediaPipe issues on Windows platform:python MediaPipe Python issues task:gesture recognition Issues related to hand gesture recognition: Identify and recognize hand gestures labels May 14, 2024

kuaashish added the stat:awaiting response Waiting for user response label May 16, 2024

google-ml-butler bot removed the stat:awaiting response Waiting for user response label May 23, 2024

justsonghua closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

justsonghua commented May 13, 2024 •

edited

justsonghua commented May 14, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

kuaashish commented May 16, 2024

justsonghua commented May 23, 2024

justsonghua commented May 23, 2024

Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

Path Handling Issue with Gesture Recognizer Model in Mediapipe #5400

Comments

justsonghua commented May 13, 2024 • edited

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

justsonghua commented May 14, 2024

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

OS Platform and Distribution

MediaPipe Tasks SDK version

Task name (e.g. Image classification, Gesture recognition etc.)

Programming Language and version (e.g. C++, Python, Java)

Describe the actual behavior

Describe the expected behaviour

Standalone code/steps you may have used to try to get what you need

Other info / Complete Logs

kuaashish commented May 16, 2024

justsonghua commented May 23, 2024

justsonghua commented May 23, 2024

justsonghua commented May 13, 2024 •

edited