Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

Open
TonyLianLong opened this issue Mar 3, 2024 · 0 comments
Open

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

TonyLianLong opened this issue Mar 3, 2024 · 0 comments

Comments

@TonyLianLong
Copy link

AIM-600M:

def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM:
    preprocessor, trunk, head = _aim(
        img_size=img_size,
        patch_size=14,
        embed_dim=1536,
        num_blocks=24,
        num_heads=12,
        **kwargs,
    )
    return AIM(preprocessor, trunk, head)

ml-aim/aim/torch/models.py

Lines 176 to 185 in 0b1dea9

def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM:
preprocessor, trunk, head = _aim(
img_size=img_size,
patch_size=14,
embed_dim=1536,
num_blocks=24,
num_heads=12,
**kwargs,
)
return AIM(preprocessor, trunk, head)

MAE ViT-H/14:

def vit_huge_patch14(**kwargs):
    model = VisionTransformer(
        patch_size=14, embed_dim=1280, depth=32, num_heads=16, mlp_ratio=4, qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
    return model

https://github.com/facebookresearch/mae/blob/efb2a8062c206524e35e47d04501ed4f544c0ae8/models_vit.py#L70-L74

The models have very different embedding dimensions, depth, and num_heads, and are incompatible with each other. However, in Tab. 6 of the paper, these two works share the same architecture in "Arch." column. Are the two architectures different, as it shows in the code? If so, it should probably be clarified in terms of the number of parameters in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant