Nice. #6

arthurwolf · 2024-03-05T04:14:18Z

Really cool project.

I'm working on something similar (structurally at least), a manga-to-anime pipeline.
It involves a lot of different steps/models, similar to this project:

Pre processing (alignment, upscaling, coloring).
Separating pages into panels.
Ordering the panels in the right reading order (took so much more effort than I thought...)
Segmentation (using segment-anything)
Extracting bubbles, the tails of bubbles/their vector, faces, bodies, backgrounds. Most of that necessitated training custom models.
Assigning a character identity to each face/body.
Making a naive association between faces and bubbles.
Reading the text of bubbles.
I feed all that data to GPT4-V, and ask it to "read" each panel, telling it what happened in previous panels, what bubble is associated with what face, etc, asking it to "understand" what is happening in the panel, and to "deduce" some associations between the items, the tone of voice, etc. I tried "just" asking GPT4-V to read manga pages without all the steps above, and it was terrible at it. But with all the provided info (which causes easily 10k-token prompts, just for the text), it gets much better at it. It's sort of "pre-chewing" the work for him.
That's where I am at now, the next step is going to be generating voice (what I'm working on now, bark/whisper/other models), sound effects, and then generating animation and special effects, and finally assembling all that into video.

I'll be looking closer into your project, in particular how it's organized, thanks a lot for sharing.
I'd be curious if you have any insights on how you'd do manga reading if you had to.

Cheers!

arthurwolf · 2024-03-05T04:17:46Z

arthurwolf · 2024-03-05T04:19:10Z

output.mp4

result.json
reading.json
response.txt
prompt.txt
prompt.json

arthurwolf · 2024-03-05T04:20:12Z

result.json
reading.json
response.txt
prompt.txt
prompt.json

noco-ai · 2024-03-10T19:48:19Z

Hello! Your approach looks good to me, and it sounds like your hard work is paying off. If I was working on this particular project, I would experiment with fine tuning llava once you have a solid dataset to see if it gives better results than OpenAI's models. I have yet to see anyone share a finetune of llava for a specific task, so am curious how well it would work. If you are posting your progress on your project anywhere, please share the link as I am interest to see it in action once you have it all working.

arthurwolf · 2024-03-10T19:53:40Z

Thanks for the feedback.

I'll soon have about two comic books worth of data which I think would be enough to start fine tuning llava, but I have two issues there: 1. this is all very new, and there are no "easy guide" to fine tuning most things, llava even less, it's all very cryptic and assuming a high technical level, and 2. my assumption for llava is that even fine tuning would require a lot more compute than I can afford.

I've tried an alternative to this: trying to get my data into the llava training dataset for the next llava version. I've opened github issues and sent some emails but so far no answer. I hope I can make it happen, I think it'd benefit not just me, but the model itself also.

About posting progress ,I'm considering starting a youtube channel with some updates, I'll post about it here if/when that happens.

Cheers, and thanks again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nice. #6

Nice. #6

arthurwolf commented Mar 5, 2024 •

edited

arthurwolf commented Mar 5, 2024

arthurwolf commented Mar 5, 2024

arthurwolf commented Mar 5, 2024

noco-ai commented Mar 10, 2024 •

edited

arthurwolf commented Mar 10, 2024

Nice. #6

Nice. #6

Comments

arthurwolf commented Mar 5, 2024 • edited

arthurwolf commented Mar 5, 2024

arthurwolf commented Mar 5, 2024

arthurwolf commented Mar 5, 2024

noco-ai commented Mar 10, 2024 • edited

arthurwolf commented Mar 10, 2024

arthurwolf commented Mar 5, 2024 •

edited

noco-ai commented Mar 10, 2024 •

edited