[optim] Noisy benchmarks #1988

janeyx99 · 2023-10-15T22:55:41Z

In the past 2-3 weeks, these configs have been bouncing up and down.

DALLE2_pytorch, Adam, cuda, amsgrad, maximize
DALLE2_pytorch, Adam, cuda, default
DALLE2_pytorch, Adam, cuda, foreach
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad
DALLE2_pytorch, AdamW, cuda, amsgrad, maximize
DALLE2_pytorch, AdamW, cuda, default
DALLE2_pytorch, AdamW, cuda, foreach
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad
DALLE2_pytorch, RAdam, cuda, default
DALLE2_pytorch, RAdam, cuda, foreach
detectron2_maskrcnn, Adam, cuda, differentiable
doctr_det_predictor, Adam, cuda, no_foreach
hf_BigBird, Rprop, cuda, default
hf_Reformer, Adadelta, cuda, differentiable
hf_T5_large, Rprop, cuda, differentiable
mobilenet_v2, RAdam, cuda, no_foreach
mobilenet_v3_large, Rprop, cuda, default
phlippe_densenet, Adamax, cuda, no_foreach
phlippe_densenet, NAdam, cuda, no_foreach
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach
shufflenet_v2_x1_0, NAdam, cuda, no_foreach
speech_transformer, Adadelta, cuda, differentiable
stable_diffusion_unet, RAdam, cuda, differentiable
stable_diffusion_unet, RAdam, cuda, no_foreach
Super_SloMo, RMSprop, cuda, differentiable
timm_efficientnet, Adadelta, cuda, no_foreach
timm_vision_transformer_large, RAdam, cuda, differentiable
timm_vision_transformer_large, RAdam, cuda, no_foreach
tts_angular, Rprop, cuda, maximize

Raw copy paste below:

timm_vision_transformer_large, RAdam, cuda, differentiable: +79.81755%
DALLE2_pytorch, Adam, cuda, foreach: -31.41948%
DALLE2_pytorch, AdamW, cuda, default: -34.05249%
DALLE2_pytorch, AdamW, cuda, default: +34.76710%
DALLE2_pytorch, AdamW, cuda, foreach: -32.74721%
Super_SloMo, RMSprop, cuda, differentiable: -33.06181%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +45.06670%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +36.11691%
DALLE2_pytorch, AdamW, cuda, foreach: +49.40952%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +38.32376%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -44.00085%
timm_vision_transformer_large, RAdam, cuda, differentiable: -44.24940%
hf_T5_large, Rprop, cuda, differentiable: +30.94747%
Super_SloMo, RMSprop, cuda, differentiable: +52.56252%
hf_Reformer, Adadelta, cuda, differentiable: +32.41173%
timm_efficientnet, Adadelta, cuda, no_foreach: +36.42987%
phlippe_densenet, Adamax, cuda, no_foreach: +37.26340%
phlippe_densenet, NAdam, cuda, no_foreach: +45.02834%
DALLE2_pytorch, Adam, cuda, default: -30.66293%
shufflenet_v2_x1_0, NAdam, cuda, no_foreach: -34.01592%
tts_angular, Rprop, cuda, maximize: -31.97153%
doctr_det_predictor, Adam, cuda, no_foreach: +40.54418%
timm_vision_transformer_large, RAdam, cuda, no_foreach: +76.79410%
timm_vision_transformer_large, RAdam, cuda, differentiable: +79.21604%
tts_angular, Rprop, cuda, maximize: +46.56646%
shufflenet_v2_x1_0, NAdam, cuda, no_foreach: +48.30327%
DALLE2_pytorch, Adam, cuda, default: +35.45820%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -43.40771%
timm_vision_transformer_large, RAdam, cuda, differentiable: -44.11391%
stable_diffusion_unet, RAdam, cuda, no_foreach: -47.72116%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +37.85595%
DALLE2_pytorch, AdamW, cuda, foreach: +37.25520%
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad: +36.33095%
DALLE2_pytorch, RAdam, cuda, foreach: +40.75257%
stable_diffusion_unet, RAdam, cuda, no_foreach: +92.67768%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +31.76727%
timm_vision_transformer_large, RAdam, cuda, no_foreach: +78.50101%
speech_transformer, Adadelta, cuda, differentiable: -32.16910%
mobilenet_v2, RAdam, cuda, no_foreach: -31.10531%
hf_BigBird, Rprop, cuda, default: -30.36015%
speech_transformer, Adadelta, cuda, differentiable: +47.25762%
mobilenet_v2, RAdam, cuda, no_foreach: +42.18837%
timm_vision_transformer_large, RAdam, cuda, no_foreach: -44.11201%
DALLE2_pytorch, AdamW, cuda, foreach: +32.51113%
mobilenet_v3_large, Rprop, cuda, default: +58.95230%
detectron2_maskrcnn, Adam, cuda, differentiable: +44.64110%
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach: -34.88800%
DALLE2_pytorch, Adam, cuda, foreach: +33.90456%
DALLE2_pytorch, AdamW, cuda, foreach: +38.89779%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +33.92788%
DALLE2_pytorch, RAdam, cuda, foreach: +44.08831%
shufflenet_v2_x1_0, Adagrad, cuda, no_foreach: +56.42979%
stable_diffusion_unet, RAdam, cuda, no_foreach: -47.85513%
stable_diffusion_unet, RAdam, cuda, differentiable: -48.20885%
DALLE2_pytorch, Adam, cuda, amsgrad, maximize: +32.97666%
DALLE2_pytorch, Adam, cuda, fused, amsgrad, maximize: +39.28866%
DALLE2_pytorch, Adam, cuda, fused, capturable, amsgrad: +44.78136%
DALLE2_pytorch, AdamW, cuda, default: +37.12394%
DALLE2_pytorch, AdamW, cuda, amsgrad, maximize: +37.40143%
DALLE2_pytorch, AdamW, cuda, foreach: +37.27161%
DALLE2_pytorch, AdamW, cuda, fused, amsgrad, maximize: +43.04382%
DALLE2_pytorch, AdamW, cuda, fused, capturable, amsgrad: +40.58533%
DALLE2_pytorch, RAdam, cuda, default: +35.76401%
DALLE2_pytorch, RAdam, cuda, foreach: +35.42852%

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[optim] Noisy benchmarks #1988

[optim] Noisy benchmarks #1988

janeyx99 commented Oct 15, 2023 •

edited

[optim] Noisy benchmarks #1988

[optim] Noisy benchmarks #1988

Comments

janeyx99 commented Oct 15, 2023 • edited

janeyx99 commented Oct 15, 2023 •

edited