Skip to content

shardulbee/memory-alignment-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memory-alignment-tests

I just learned about memory alignment from Production Twitter on One Machine? 100Gbps NICs and NVMe are fast - Tristan Hume and this @mitchellh thread and I wanted to experiment to see how much of a difference it makes.

First round of experiments

I tested:

  • Iterating over an array containing struct elements spanning 3 cache lines, explicitly tagged with repr(align(64))
  • Iterating over an array containing struct elements spanning 3 cache lines, not tagged with repr(align(64))
  • Iterating over an array containing struct elements spanning 7 bytes

Results, using hyperfine

Command Mean [ms] Min [ms] Max [ms] Relative
explicit 3.2 ± 0.3 2.9 4.5 1.02 ± 0.10
implicit 3.1 ± 0.1 2.9 4.5 1.00
no-alignment 30.2 ± 0.3 29.8 32.8 9.68 ± 0.43

I found out that there's no real point of explicitly tagging the structs which are known to have a size which is a multiple of a cache line size. I'll have to learn more about when alignment is actually useful then.

But it's interesting/gratifying to see that the no-alignment case is significantly slower.

Follow-up

I did a second round of experiments doing a similar experiment as above but instead of testing 24 byte (3 cache lines) structs or 7 byte structs I tried struct sizes from 1 byte to 128 bytes. Here are the results:

struct size (bytes) Mean [ms] Min [ms] Max [ms] Relative
1 287.7 ± 10.6 265.8 305.0 31.30 ± 2.66
2 184.7 ± 3.3 179.8 192.5 20.09 ± 1.58
3 229.2 ± 1.0 227.7 232.3 24.93 ± 1.91
4 198.6 ± 0.3 198.1 199.6 21.61 ± 1.66
5 180.7 ± 0.2 180.4 181.0 19.66 ± 1.51
6 168.4 ± 0.2 167.9 169.0 18.32 ± 1.40
7 159.9 ± 0.3 159.3 160.9 17.40 ± 1.33
8 45.3 ± 0.6 44.8 49.1 4.93 ± 0.38
9 52.1 ± 0.2 51.7 52.6 5.67 ± 0.43
10 57.7 ± 0.2 57.4 58.2 6.28 ± 0.48
11 82.1 ± 0.2 81.7 82.5 8.93 ± 0.68
12 84.4 ± 0.2 84.1 84.7 9.18 ± 0.70
13 86.8 ± 1.4 86.2 94.7 9.45 ± 0.74
14 88.1 ± 0.2 87.6 89.0 9.58 ± 0.73
15 89.5 ± 0.3 89.1 90.3 9.74 ± 0.75
16 29.5 ± 0.2 29.2 30.1 3.20 ± 0.25
17 29.6 ± 0.3 28.9 30.7 3.22 ± 0.25
18 33.0 ± 0.2 32.7 33.9 3.60 ± 0.28
19 48.3 ± 0.3 48.0 49.8 5.26 ± 0.40
20 51.4 ± 0.7 51.0 55.3 5.59 ± 0.43
21 54.2 ± 0.2 53.8 55.0 5.90 ± 0.45
22 56.8 ± 0.2 56.3 57.8 6.18 ± 0.47
23 59.0 ± 0.2 58.7 59.5 6.41 ± 0.49
24 33.8 ± 0.2 33.4 34.9 3.68 ± 0.28
25 32.5 ± 0.2 32.1 33.1 3.53 ± 0.27
26 35.5 ± 0.2 35.2 36.7 3.86 ± 0.30
27 46.4 ± 0.5 46.0 49.2 5.05 ± 0.39
28 48.6 ± 0.2 48.3 49.9 5.29 ± 0.41
29 50.5 ± 0.1 50.3 50.8 5.50 ± 0.42
30 52.7 ± 0.2 52.4 53.6 5.73 ± 0.44
31 54.4 ± 0.2 54.2 55.5 5.92 ± 0.45
32 29.0 ± 0.3 28.6 30.3 3.16 ± 0.24
33 28.2 ± 0.2 27.9 28.6 3.07 ± 0.24
34 30.6 ± 0.2 30.3 31.3 3.33 ± 0.26
35 39.2 ± 0.3 38.8 40.2 4.26 ± 0.33
36 41.1 ± 0.2 40.7 42.3 4.47 ± 0.34
37 43.0 ± 0.3 42.6 43.8 4.68 ± 0.36
38 44.6 ± 0.2 44.4 45.0 4.86 ± 0.37
39 46.4 ± 0.4 46.1 48.3 5.05 ± 0.39
40 26.2 ± 0.3 25.8 28.0 2.85 ± 0.22
41 25.6 ± 0.2 25.2 26.2 2.78 ± 0.21
42 27.5 ± 0.2 27.2 27.9 3.00 ± 0.23
43 34.5 ± 0.3 34.2 36.4 3.76 ± 0.29
44 36.3 ± 0.2 35.8 37.3 3.95 ± 0.30
45 37.8 ± 0.2 37.6 38.7 4.11 ± 0.32
46 39.5 ± 0.2 39.1 40.3 4.29 ± 0.33
47 41.1 ± 0.6 40.6 44.7 4.47 ± 0.35
48 24.3 ± 0.4 24.0 26.7 2.65 ± 0.21
49 23.9 ± 0.2 23.6 24.5 2.60 ± 0.20
50 25.6 ± 0.3 25.3 28.8 2.79 ± 0.22
51 31.5 ± 0.2 31.1 32.5 3.43 ± 0.26
52 33.0 ± 0.2 32.6 33.8 3.59 ± 0.28
53 34.5 ± 0.2 34.1 35.7 3.75 ± 0.29
54 35.8 ± 0.3 35.4 37.0 3.89 ± 0.30
55 37.2 ± 0.2 36.8 38.0 4.04 ± 0.31
56 22.8 ± 0.1 22.6 23.4 2.48 ± 0.19
57 22.5 ± 0.2 22.2 23.2 2.44 ± 0.19
58 24.0 ± 0.2 23.7 24.9 2.61 ± 0.20
59 29.2 ± 0.4 28.8 31.3 3.18 ± 0.25
60 30.5 ± 0.2 30.1 31.1 3.31 ± 0.25
61 31.8 ± 0.2 31.4 32.7 3.46 ± 0.27
62 33.0 ± 0.2 32.7 33.6 3.59 ± 0.28
63 34.2 ± 0.2 33.9 34.8 3.72 ± 0.29
64 9.5 ± 0.2 9.1 10.4 1.04 ± 0.08
65 11.3 ± 0.2 10.8 11.8 1.23 ± 0.10
66 12.2 ± 0.4 11.6 14.1 1.33 ± 0.11
67 16.5 ± 0.7 15.8 19.9 1.79 ± 0.16
68 17.5 ± 0.4 17.1 19.7 1.91 ± 0.15
69 19.3 ± 1.0 18.4 24.1 2.10 ± 0.19
70 20.3 ± 0.5 19.8 23.3 2.21 ± 0.18
71 21.4 ± 0.3 21.0 24.2 2.33 ± 0.18
72 10.4 ± 0.5 9.8 13.5 1.14 ± 0.10
73 11.0 ± 0.2 10.4 11.5 1.19 ± 0.09
74 11.3 ± 0.5 10.7 16.4 1.22 ± 0.11
75 14.4 ± 0.1 14.2 15.0 1.57 ± 0.12
76 15.7 ± 0.2 15.4 16.6 1.70 ± 0.13
77 16.9 ± 0.2 16.6 18.2 1.84 ± 0.14
78 18.3 ± 0.4 17.8 21.3 1.99 ± 0.16
79 19.4 ± 0.5 19.0 21.9 2.11 ± 0.17
80 10.1 ± 0.3 9.6 12.2 1.10 ± 0.09
81 10.8 ± 0.2 10.4 12.6 1.18 ± 0.09
82 11.4 ± 0.2 10.9 12.3 1.24 ± 0.10
83 13.2 ± 0.2 12.9 14.2 1.44 ± 0.11
84 14.3 ± 0.1 14.1 15.0 1.56 ± 0.12
85 15.4 ± 0.2 15.1 16.2 1.67 ± 0.13
86 16.6 ± 0.4 16.3 19.4 1.80 ± 0.14
87 17.6 ± 0.2 17.2 18.5 1.91 ± 0.15
88 11.2 ± 0.2 10.9 12.1 1.22 ± 0.10
89 11.3 ± 0.2 10.9 11.9 1.22 ± 0.10
90 12.2 ± 0.2 11.9 13.4 1.33 ± 0.10
91 15.5 ± 0.1 15.3 16.0 1.69 ± 0.13
92 16.6 ± 0.2 16.3 17.5 1.81 ± 0.14
93 17.5 ± 0.2 17.2 18.0 1.91 ± 0.15
94 18.6 ± 0.2 18.3 19.5 2.03 ± 0.16
95 19.5 ± 0.2 19.3 20.1 2.12 ± 0.16
96 11.4 ± 0.1 11.2 11.7 1.24 ± 0.10
97 11.5 ± 0.2 11.2 12.8 1.26 ± 0.10
98 12.4 ± 0.2 12.1 13.6 1.35 ± 0.11
99 16.0 ± 0.8 15.3 18.4 1.74 ± 0.16
100 17.0 ± 0.5 16.3 18.6 1.85 ± 0.15
101 17.7 ± 0.7 17.1 20.5 1.93 ± 0.17
102 19.0 ± 0.7 18.1 21.9 2.06 ± 0.17
103 19.6 ± 0.5 19.0 22.3 2.13 ± 0.17
104 11.6 ± 0.2 11.4 12.7 1.26 ± 0.10
105 12.1 ± 0.5 11.5 14.2 1.31 ± 0.11
106 13.5 ± 0.9 12.4 17.3 1.46 ± 0.15
107 16.3 ± 1.0 15.3 20.1 1.78 ± 0.17
108 17.2 ± 1.1 16.1 20.4 1.87 ± 0.19
109 17.3 ± 0.5 16.9 19.9 1.89 ± 0.15
110 18.1 ± 0.2 17.8 19.2 1.97 ± 0.15
111 19.0 ± 0.6 18.6 22.0 2.07 ± 0.17
112 12.0 ± 0.4 11.6 14.0 1.30 ± 0.11
113 11.9 ± 0.2 11.6 12.4 1.30 ± 0.10
114 13.1 ± 0.8 12.4 16.5 1.42 ± 0.14
115 15.4 ± 0.3 15.1 17.4 1.68 ± 0.13
116 16.4 ± 0.5 16.0 18.9 1.78 ± 0.15
117 17.7 ± 1.0 16.8 20.5 1.93 ± 0.19
118 17.8 ± 0.3 17.5 19.3 1.94 ± 0.15
119 18.6 ± 0.2 18.3 19.4 2.02 ± 0.16
120 12.1 ± 0.1 11.8 12.7 1.31 ± 0.10
121 12.1 ± 0.3 11.7 14.6 1.32 ± 0.11
122 13.0 ± 0.5 12.6 16.1 1.41 ± 0.12
123 15.5 ± 0.5 15.1 18.4 1.69 ± 0.14
124 16.1 ± 0.1 15.8 16.4 1.75 ± 0.13
125 16.9 ± 0.2 16.6 18.4 1.84 ± 0.14
126 18.9 ± 1.1 17.4 23.6 2.05 ± 0.19
127 19.5 ± 1.0 18.1 22.3 2.12 ± 0.20
128 9.2 ± 0.7 8.3 11.5 1.00

After running this though, I'm not sure this experiment is actually isolating what I'm trying to test. When the struct size is 1 bytes, we are allocating a vector of 1_572_864 bytes. This means that the CPU intensive operation that is run on each element of the vector is being called more times, which could explain the increased time. That being said, the best performance comes when the array elements at 64/128 bytes wide which corresponds to an integer multiple of a cache line. So there is something here but I think I need to do a better job of isolating the thing that I'm trying to measure. I'm not sure how to fix this yet so I guess I need to read up more about cache lines. Until next time!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages