`attend_dtype` not used #531

zhixuan-lin · 2024-03-18T20:11:07Z

Here it seems that the hard-coded bfloat16 is used instead of attend_dtype. Also query is not cast. I guess the correct behavior should be casting both query and self.embedding to attend_dtype?

The text was updated successfully, but these errors were encountered:

rwitten · 2024-03-19T03:13:37Z

yes weird. @khatwanimohit can you take a look? I'm not sure what this is meant to represent? And the upstream flag is also kind of weird given that it is orphaned?

maxtext/MaxText/layers/models.py

Line 341 in 5353a95

    
           attend_dtype=jnp.float32 if cfg.logits_dot_in_fp32 else cfg.dtype,  # for logit training stability

I think we should figure out if
(a) does doing the dot in f32 help convergence (using the 1B runs)?
(b) does @ZhiyuLi-goog/MLPerf care?
(c) what does Anselm Levskaya think

We should make the code consistent and as simple as possible. Also, why is our pylint/pytype not raising alarms on this, unused vars are bad?

rwitten assigned khatwanimohit Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`attend_dtype` not used #531

`attend_dtype` not used #531

zhixuan-lin commented Mar 18, 2024

rwitten commented Mar 19, 2024

attend_dtype not used #531

attend_dtype not used #531

Comments

zhixuan-lin commented Mar 18, 2024

rwitten commented Mar 19, 2024

`attend_dtype` not used #531

`attend_dtype` not used #531