network utilization #58

zhengpeirong · 2024-05-18T16:48:23Z

Let's calculate the transfer time theoretically.

llama3 8B

The original experiment data is here.
Since the transfer is full-duplex, there's no interference between uplink and downlink.
So, we can choose the bigger 510 kB as the transfer data volume to calculate the transfer time.

$$510000*8 bit/1G bps = 4.08ms$$

$$4.08ms/199.60 ms ~= 2\%$$

So, the average transfer time should be 4.08ms. However, your result is 199.60 ms, 50 times higher.
So, the network utilization ratio is merely 2%.

llama2 7B

For comparison, I summarize a similar model (llama2 7B) using different devices:

VMs

In this discussion, the Network Bandwidth is 20 Gbps, reference here.

$$590000*8 bit/20G bps = 0.236ms$$

$$0.236ms/7.62ms= 3\%$$

So, the network utilization ratio is merely 3%.
Similarly, we can calculate the result of 4 VMs to be 6%.

RaspberryPi

Also, the result of the Raspberry Pi cluster is calculated to be 9.0%, 48.0%, 14.1% for 2,4,8 devices.

llama2 13B
23.9%,25.75%, 9.8%
llama2 70B
8.5%

Summary

I think the network utilization, average around 11%, ranging from 2% to 48%, is under-optimized.
Developing the code possibly ensures a stable and high network utilization.

Originally posted by @zhengpeirong in #41 (reply in thread)

The text was updated successfully, but these errors were encountered:

b4rtaz · 2024-05-18T20:13:14Z

The test that you are referring to is very unlucky. 2 Raspberry Pi 5 devices + a cheap switch achieve 80.11 ms / token. I think you underestimate the impact of setup quality on the result. In the google cloud I achieved 8.56 ms / token (Llama 7B Q40), but there is probably something faster than Gigabit Ethernet.

The next thing is the transfer characteristics, a node calulates the result of own slice then synchronizes the result. So it looks like this:

So mostly there is no any transfer, and suddenly some data to transfer appear. In this case the latency has a huge impact on the final transfer time.

zhengpeirong · 2024-05-19T02:40:18Z

The test that you are referring to is very unlucky. 2 Raspberry Pi 5 devices + a cheap switch achieve 80.11 ms / token.

80.11 ms means 15% network utilization, which is not high.

I think you underestimate the impact of setup quality on the result. In the google cloud I achieved 8.56 ms / token (Llama 7B Q40), but there is probably something faster than Gigabit Ethernet.

I have included this factor, the Network Bandwidth is 20 Gbps. And the result is calculated in the issue content.

The next thing is the transfer characteristics, a node calulates the result of own slice then synchronizes the result. So it looks like this:

So mostly there is no any transfer, and suddenly some data to transfer appear. In this case the latency has a huge impact on the final transfer time.

This seems simplified because the transfer process is actually composed of 2 procedures: the transfer time should begin when the source node begins to send, and the transfer time should end when the destination node finishes receiving. So, the statistics, collected only on the root node, deviate from the actual value.

If we build analysis on these statistics, then still the network bandwidth should be enough. 1Gbps>>hidden_dim*8 bits!

zhengpeirong · 2024-05-19T02:53:35Z

I am using ＇tcpdump＇ to capture the real speed of packets and analyze the actual bandwidth utilization. But I only have 4 Raspberry Pi and may need your help to run the experiment of 8 devices, usually achieving highest latency, and share logs. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

network utilization #58

network utilization #58

zhengpeirong commented May 18, 2024 •

edited

b4rtaz commented May 18, 2024

zhengpeirong commented May 19, 2024 •

edited

zhengpeirong commented May 19, 2024

network utilization #58

network utilization #58

Comments

zhengpeirong commented May 18, 2024 • edited

Let's calculate the transfer time theoretically.

llama3 8B

llama2 7B

VMs

RaspberryPi

Summary

b4rtaz commented May 18, 2024

zhengpeirong commented May 19, 2024 • edited

zhengpeirong commented May 19, 2024

zhengpeirong commented May 18, 2024 •

edited

zhengpeirong commented May 19, 2024 •

edited