Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

TonyLianLong · 2024-03-03T18:47:59Z

AIM-600M:

def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM:
    preprocessor, trunk, head = _aim(
        img_size=img_size,
        patch_size=14,
        embed_dim=1536,
        num_blocks=24,
        num_heads=12,
        **kwargs,
    )
    return AIM(preprocessor, trunk, head)

ml-aim/aim/torch/models.py

Lines 176 to 185 in 0b1dea9

    
           def aim_600M(img_size: Union[int, Tuple[int, int]] = 224, **kwargs: Any) -> AIM: 
        
               preprocessor, trunk, head = _aim( 
        
                   img_size=img_size, 
        
                   patch_size=14, 
        
                   embed_dim=1536, 
        
                   num_blocks=24, 
        
                   num_heads=12, 
        
                   **kwargs, 
        
               ) 
        
               return AIM(preprocessor, trunk, head)

MAE ViT-H/14:

def vit_huge_patch14(**kwargs):
    model = VisionTransformer(
        patch_size=14, embed_dim=1280, depth=32, num_heads=16, mlp_ratio=4, qkv_bias=True,
        norm_layer=partial(nn.LayerNorm, eps=1e-6), **kwargs)
    return model

https://github.com/facebookresearch/mae/blob/efb2a8062c206524e35e47d04501ed4f544c0ae8/models_vit.py#L70-L74

The models have very different embedding dimensions, depth, and num_heads, and are incompatible with each other. However, in Tab. 6 of the paper, these two works share the same architecture in "Arch." column. Are the two architectures different, as it shows in the code? If so, it should probably be clarified in terms of the number of parameters in the paper.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

TonyLianLong commented Mar 3, 2024

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

Mismatches between ViT-H/14 in AIM and ViT-H/14 in MAE #7

Comments

TonyLianLong commented Mar 3, 2024