Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding image decoder to CoCa #467

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Conversation

iejMac
Copy link
Contributor

@iejMac iejMac commented Mar 15, 2023

No description provided.

@iejMac
Copy link
Contributor Author

iejMac commented Mar 15, 2023

Ok current state is bare minimum version to get things kind of working. That means:

  • We get the image tokenized using VQGAN (I think this is correct, still need to write some decoding code to check if we're tokenizing correctly
  • We create a image decoder transformer which is just like the text decoder transformer and predict the next image token autoregressively
  • We calculate the loss
  • Code is as decent as I could make it in one sitting. Still needs improvement

@iejMac iejMac marked this pull request as draft March 17, 2023 03:24
@iejMac
Copy link
Contributor Author

iejMac commented Mar 18, 2023

@iejMac
Copy link
Contributor Author

iejMac commented Mar 18, 2023

TODO:

Code:

  • CoCa generation code should be modality-agnostic - it should be able to generate images and text based on the shape (or parameters) of the input
  • create some start_of_image token !!!
  • BIG Cleanup. Can we go without making a dependency on taming-transformers and omegaconf?
  • Config cleanup + update old coca configs

Model:

  • Train something at B/32 scale
  • dropout text conditioning 10% of the time as suggested by Katherine (either put nothing in cross attention or some learned sequence)
  • axial positional embeddings suggested by lucidrains

@iejMac
Copy link
Contributor Author

iejMac commented Apr 2, 2023

https://arxiv.org/abs/2303.13455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant