Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nice. #6

Open
arthurwolf opened this issue Mar 5, 2024 · 5 comments
Open

Nice. #6

arthurwolf opened this issue Mar 5, 2024 · 5 comments

Comments

@arthurwolf
Copy link

arthurwolf commented Mar 5, 2024

Really cool project.

I'm working on something similar (structurally at least), a manga-to-anime pipeline.
It involves a lot of different steps/models, similar to this project:

  • Pre processing (alignment, upscaling, coloring).
  • Separating pages into panels.
  • Ordering the panels in the right reading order (took so much more effort than I thought...)
  • Segmentation (using segment-anything)
  • Extracting bubbles, the tails of bubbles/their vector, faces, bodies, backgrounds. Most of that necessitated training custom models.
  • Assigning a character identity to each face/body.
  • Making a naive association between faces and bubbles.
  • Reading the text of bubbles.
  • I feed all that data to GPT4-V, and ask it to "read" each panel, telling it what happened in previous panels, what bubble is associated with what face, etc, asking it to "understand" what is happening in the panel, and to "deduce" some associations between the items, the tone of voice, etc. I tried "just" asking GPT4-V to read manga pages without all the steps above, and it was terrible at it. But with all the provided info (which causes easily 10k-token prompts, just for the text), it gets much better at it. It's sort of "pre-chewing" the work for him.
  • That's where I am at now, the next step is going to be generating voice (what I'm working on now, bark/whisper/other models), sound effects, and then generating animation and special effects, and finally assembling all that into video.

I'll be looking closer into your project, in particular how it's organized, thanks a lot for sharing.
I'd be curious if you have any insights on how you'd do manga reading if you had to.

Cheers!

masked

panel

prompt.json

prompt.txt

reading.json

response.txt

result.json

6253

6254

@arthurwolf
Copy link
Author

page-3461-ids
page-3462-ids
page-3463-ids
page-3464-ids
page-3465-ids

@arthurwolf
Copy link
Author

@arthurwolf
Copy link
Author

@noco-ai
Copy link
Owner

noco-ai commented Mar 10, 2024

Hello! Your approach looks good to me, and it sounds like your hard work is paying off. If I was working on this particular project, I would experiment with fine tuning llava once you have a solid dataset to see if it gives better results than OpenAI's models. I have yet to see anyone share a finetune of llava for a specific task, so am curious how well it would work. If you are posting your progress on your project anywhere, please share the link as I am interest to see it in action once you have it all working.

@arthurwolf
Copy link
Author

Thanks for the feedback.

I'll soon have about two comic books worth of data which I think would be enough to start fine tuning llava, but I have two issues there: 1. this is all very new, and there are no "easy guide" to fine tuning most things, llava even less, it's all very cryptic and assuming a high technical level, and 2. my assumption for llava is that even fine tuning would require a lot more compute than I can afford.

I've tried an alternative to this: trying to get my data into the llava training dataset for the next llava version. I've opened github issues and sent some emails but so far no answer. I hope I can make it happen, I think it'd benefit not just me, but the model itself also.

About posting progress ,I'm considering starting a youtube channel with some updates, I'll post about it here if/when that happens.

Cheers, and thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants