Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add vision #412

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Conversation

angelala3252
Copy link
Collaborator

@angelala3252 angelala3252 commented Jul 20, 2023

What kind of change does this PR introduce?
This PR tests different vision LLMs and compares them to the performance of transformers agent, with the goal of creating a ReplayStrategy.

Summary
Responds to issue #353

Checklist

  • My code follows the style guidelines of OpenAdapt
  • I have performed a self-review of my code
  • If applicable, I have added tests to prove my fix is functional/effective
  • I have linted my code locally prior to submission
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
  • New and existing unit tests pass locally with my changes

How can your code be run and tested?

First, run

poetry update
poetry install

Then, to run the modal files, run

modal token new

Currently, I am just testing local images from my own files, so if you want to test this then at the bottom of each file you should change the image to a path on your computer (within the OpenAdapt directory). Also change the prompts to whatever makes sense.
Finally, run any of the model files using modal run <model_file>.py

Other information

The LaVIN model cannot currently be run as I have encountered an error running the demo and created an issue in their GitHub repo: luogen1996/LaVIN#17

@angelala3252
Copy link
Collaborator Author

@angelala3252 angelala3252 marked this pull request as draft July 20, 2023 18:30
@angelala3252
Copy link
Collaborator Author

Currently working on creating a vision dataset using the screenshots from a recording and their associated window states.

@angelala3252
Copy link
Collaborator Author

angelala3252 commented Aug 21, 2023

Example of data:
image (210.jpg) -
210

and the window state entry in the JSON file - {"id": 210, "window_state": {"title": "Sticky Notes", "left": 150, "top": 0, "width": 1298, "height": 1010}}

So far I have about 200 pairs of images and window states but I can keep adding on to the dataset if needed

@angelala3252
Copy link
Collaborator Author

Example of data including data and meta:
image:
1

window state:
[{"id": 1, "window_state": {"title": "Sticky Notes", "left": 1442, "top": 0, "width": 400, "height": 400, "meta": {"class_name": "ApplicationFrameWindow", "friendly_class_name": "Dialog", "texts": ["Sticky Notes"], "control_id": 0, "rectangle": {"left": 1442, "top": 0, "right": 1842, "bottom": 400}, "is_visible": true, "is_enabled": true, "control_count": 1, "is_keyboard_focusable": true, "has_keyboard_focus": false, "automation_id": ""}, "data": {"class_name": "ApplicationFrameWindow", "friendly_class_name": "Dialog", "texts": ["Sticky Notes"], "control_id": 0, "rectangle": {"left": 1442, "top": 0, "right": 1842, "bottom": 400}, "is_visible": true, "is_enabled": true, "control_count": 1, "children": [{"class_name": "Windows.UI.Core.AppWindow", "friendly_class_name": "Pane", "texts": ["Title"], "control_id": 0, "rectangle": {"left": 1451, "top": 1, "right": 1833, "bottom": 391}, "is_visible": true, "is_enabled": true, "control_count": 1, "children": [{"class_name": "Windows.UI.Input.InputSite.WindowClass", "friendly_class_name": "Pane", "texts": [""], "control_id": 0, "rectangle": {"left": 0, "top": 0, "right": 0, "bottom": 0}, "is_visible": true, "is_enabled": true, "control_count": 1, "children": [{"class_name": "NamedContainerAutomationPeer", "friendly_class_name": "GroupBox", "texts": ["Yellow note window. "], "control_id": null, "rectangle": {"left": 1451, "top": 1, "right": 1833, "bottom": 391}, "is_visible": true, "is_enabled": true, "control_count": 11, "children": [{"class_name": "Button", "friendly_class_name": "Button", "texts": ["New note"], "control_id": null, "rectangle": {"left": 1451, "top": 1, "right": 1491, "bottom": 41}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "Button", "friendly_class_name": "Button", "texts": ["Menu"], "control_id": null, "rectangle": {"left": 1753, "top": 1, "right": 1793, "bottom": 41}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "Button", "friendly_class_name": "Button", "texts": ["Close note"], "control_id": null, "rectangle": {"left": 1793, "top": 1, "right": 1833, "bottom": 41}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "NamedContainerAutomationPeer", "friendly_class_name": "GroupBox", "texts": ["Note Editor"], "control_id": null, "rectangle": {"left": 1451, "top": 41, "right": 1833, "bottom": 342}, "is_visible": true, "is_enabled": true, "control_count": 1, "children": [{"class_name": "RichEditBox", "friendly_class_name": "Edit", "texts": [""], "control_id": null, "rectangle": {"left": 1451, "top": 41, "right": 1833, "bottom": 342}, "is_visible": true, "is_enabled": true, "control_count": 2, "children": [{"class_name": "ScrollViewer", "friendly_class_name": "Pane", "texts": [""], "control_id": null, "rectangle": {"left": 0, "top": 81, "right": 382, "bottom": 382}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "TextBlock", "friendly_class_name": "Static", "texts": ["Take a note..."], "control_id": null, "rectangle": {"left": 15, "top": 94, "right": 115, "bottom": 118}, "is_visible": true, "is_enabled": true, "control_count": 0}]}]}, {"class_name": "InkCanvas", "friendly_class_name": "Pane", "texts": [""], "control_id": null, "rectangle": {"left": 1451, "top": 41, "right": 1833, "bottom": 342}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "ToggleButton", "friendly_class_name": "Button", "texts": ["Bold"], "control_id": null, "rectangle": {"left": 1456, "top": 348, "right": 1494, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "ToggleButton", "friendly_class_name": "Button", "texts": ["Italic"], "control_id": null, "rectangle": {"left": 1509, "top": 348, "right": 1547, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "ToggleButton", "friendly_class_name": "Button", "texts": ["Underline"], "control_id": null, "rectangle": {"left": 1562, "top": 348, "right": 1600, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "ToggleButton", "friendly_class_name": "Button", "texts": ["Strikethrough"], "control_id": null, "rectangle": {"left": 1615, "top": 348, "right": 1652, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 1, "children": [{"class_name": "Image", "friendly_class_name": "Image", "texts": [""], "control_id": null, "rectangle": {"left": 1622, "top": 356, "right": 1645, "bottom": 379}, "is_visible": true, "is_enabled": true, "control_count": 0}]}, {"class_name": "ToggleButton", "friendly_class_name": "Button", "texts": ["Toggle Bullets"], "control_id": null, "rectangle": {"left": 1668, "top": 348, "right": 1706, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 0}, {"class_name": "Button", "friendly_class_name": "Button", "texts": ["Add Image"], "control_id": null, "rectangle": {"left": 1721, "top": 348, "right": 1759, "bottom": 386}, "is_visible": true, "is_enabled": true, "control_count": 0}]}]}]}, {"class_name": "ApplicationFrameInputSinkWindow", "friendly_class_name": "Pane", "texts": [""], "control_id": 0, "rectangle": {"left": 1451, "top": 41, "right": 1833, "bottom": 391}, "is_visible": true, "is_enabled": true, "control_count": 0}]}, "window_id": 0}}]

@angelala3252
Copy link
Collaborator Author

Dataset of images and window states here

@angelala3252 angelala3252 marked this pull request as ready for review August 31, 2023 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant