Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training code #138

Open
sartimo opened this issue Mar 18, 2024 · 1 comment
Open

Training code #138

sartimo opened this issue Mar 18, 2024 · 1 comment

Comments

@sartimo
Copy link

sartimo commented Mar 18, 2024

Hi

Where can I find the code needed to train the initial model and produce the model files?

@spydaz
Copy link

spydaz commented Apr 5, 2024

Hi

Where can I find the code needed to train the initial model and produce the model files?

Yes to instanciate a fresh model from te code and train a new model (after generating the initial config(ie Load with no weights)....
as i would like to train a model from scratch with my own data (i have a trained tokenizer but i would also like a script to train a new tokenizer)

This is needed to create a new model ( with a new tokenization process, ie : multimodal input ) ... so the ability to select which input pre processors / feature extractors are available ... As Speech input should be auto tokenized ... from transcribed to text to token_ID... as well as the image being returned to token _IDs also , for images the processor would process the image and convert to token ID...
In training the image may have relevance as reference as it should give a description of an image to the prompt( a man sitting on a bench), (a Malignant tumour with legions) .. so again ot would be tokenized to words.... the words need to transcribe and the images need to return their learned captions. when traning the peft would be applied to the transformer ! ... the image peft applied to the processor and the sound applied to the open whisper processor etc...

We need only to have a text output , as later we can create a wrapper for generation of sound and for images ... using the same sound but with diffusers to generate an output. the training process should use the diffusers to learn the images as well as by the captioning its description .... hence for later generation any pre captioned image should be able to be regenerated or a representation ! ... For sound input and generation , obviously speech output is no problem as the same library for speech also outputs speech, but we also need a sound generator for our generated outputs ie generate the sound of a sparrow(bird).... (another form of diffuser)....

hence we need the start location ! the training script for a code model ... ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants