Adding image decoder to CoCa #467

iejMac · 2023-03-15T02:16:23Z

No description provided.

iejMac · 2023-03-15T16:19:44Z

Ok current state is bare minimum version to get things kind of working. That means:

We get the image tokenized using VQGAN (I think this is correct, still need to write some decoding code to check if we're tokenizing correctly
We create a image decoder transformer which is just like the text decoder transformer and predict the next image token autoregressively
We calculate the loss
Code is as decent as I could make it in one sitting. Still needs improvement

iejMac · 2023-03-18T00:55:07Z

iejMac · 2023-03-18T21:44:31Z

TODO:

CoCa generation code should be modality-agnostic - it should be able to generate images and text based on the shape (or parameters) of the input
create some start_of_image token !!!
BIG Cleanup. Can we go without making a dependency on taming-transformers and omegaconf?
Config cleanup + update old coca configs

Train something at B/32 scale
dropout text conditioning 10% of the time as suggested by Katherine (either put nothing in cross attention or some learned sequence)
axial positional embeddings suggested by lucidrains

iejMac · 2023-04-02T17:24:22Z

iejMac added 5 commits March 15, 2023 02:15

Adding image decoder to CoCa

055eaf7

tokenizer in CoCa

7836c44

progress in train.py - diff preproc

bb677a5

update loss

44eab43

ok it trains

b1fbef2

iejMac added 4 commits March 16, 2023 06:21

add decode method

0bb8f37

image-generation-lsos-weight

99bfb1c

grad checkpointing + freeze

44631ba

save progress

bd90187

iejMac marked this pull request as draft March 17, 2023 03:24

tradeoff VQGAN memory usage for samples/s + get n_embed from config

b9ffe5e

add method to tok

1b92112