Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About memory missing location information #23

Open
LzhinFdu opened this issue May 10, 2024 · 2 comments
Open

About memory missing location information #23

LzhinFdu opened this issue May 10, 2024 · 2 comments

Comments

@LzhinFdu
Copy link

LzhinFdu commented May 10, 2024

I noticed that the memory retrieval and update happens before 'apply_rotary_pos_emb'. Wondering whether the memory lacking location information would confuse the model's perception of the order of historical information?

@Lazy3valuation
Copy link

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization and a segment size of 400.

@LzhinFdu
Copy link
Author

I can also run through training. However, the current training results are not very good. I'm trying to train further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants