Self-frankenmerge support? #7012
Replies: 3 comments 2 replies
-
I came here to ask the same, after seeing this reddit thread and qrios' comment there. Since it seems these kind of layer duplications somewhat helps in some cases, it would be a good improvement to have. |
Beta Was this translation helpful? Give feedback.
-
There was initial work on this started here: #5741 |
Beta Was this translation helpful? Give feedback.
-
I might be wrong on that, but quick glance at #5741 seems to suggest that this is something diffent. It's about merging (potentially self merging) models, producing bigger (disk-space/VRAM wise) model. What I was thinking about was to keep original model's (disk/RAM) size, but using some sort of metadata evaluate it with repeating layers. Say we have a model with 5 layers, we define 'mapping' 1-4,2-5 (=8 layers in total), model gets loaded to RAM, it still takes place for 5 layers, yet those are used in the order 1-2-3-4-2-3-4-5. The whole point is to save precious (V)RAM required for inference. |
Beta Was this translation helpful? Give feedback.
-
I've noticed that some people seem to be getting good results by interleaving models with themselves, effectively duplicating layers. As far as I understand, these are actually the same weights, no new information there - but still such a frankenmerge takes more (V)RAM than strictly neccessary. Would it make sense to implement this inside ggml lib? I'm thinking about something like cmdline parameter or perhaps another metadata in gguf file, containing information like [1,2,3,4,5,3,4,5,6,7,5,6,7,8,9,10] - this example defines 16 layer model, but in memory it would only take space of 10 distinct layers (during inference indirectly referred by the list similar to the above). Obviously performance would be on par with 16 layer model, but it'd be still possible to use it where only 10 layers model fit. What do you think?
Beta Was this translation helpful? Give feedback.
All reactions