You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2.2 Architecture
Following recent work on large language models,
our network is based on the transformer architecture (Vaswani et al., 2017). We leverage various
improvements that were subsequently proposed,
and used in different models such as PaLM. Here
are the main difference with the original architecture, and where we were found the inspiration for
this change (in bracket):
Except I don't see a "difference" in that paper indicating the model is decoder-only.
2.2 Training Details
We adopt most of the pretraining setting and model architecture from Llama 1. We use the standard
transformer architecture (Vaswani et al., 2017), apply pre-normalization using RMSNorm (Zhang and
Sennrich, 2019), use the SwiGLU activation function (Shazeer, 2020), and rotary positional embeddings
(RoPE, Su et al. 2022).
These publications lead me to believe llama one and two are encoder-decoder models based on the original 2017 transformer architecture. Reading the code in this repo reads as if the model is a decoder-only model which is stated clearly for the new llama three. Can you confirm what the llama one and two architectures are and potentially document that perhaps in this repo?
The text was updated successfully, but these errors were encountered:
Hey Meta.
I noticed in the llama one paper it states:
Except I don't see a "difference" in that paper indicating the model is decoder-only.
I noticed in the llama two paper it states:
These publications lead me to believe llama one and two are encoder-decoder models based on the original 2017 transformer architecture. Reading the code in this repo reads as if the model is a decoder-only model which is stated clearly for the new llama three. Can you confirm what the llama one and two architectures are and potentially document that perhaps in this repo?
The text was updated successfully, but these errors were encountered: