add debug functionality for per chip sizes and bytes #625

morgandu · 2024-04-26T18:23:59Z

No description provided.

patemotter

Overall LGTM, some minor comments.

patemotter · 2024-04-26T19:07:13Z

MaxText/inference_microbenchmark.py

 _WARMUP_ITERS = 2


+def debug_kv_cache(kv_cache):
+  singler_kv_cache = kv_cache["cache"]["decoder"]["layers_0"]["self_attention"]["AttentionOp_0"]


Nit: Is this supposed to be "single" or something else?

added "single_layer"

patemotter · 2024-04-26T19:08:22Z

MaxText/inference_microbenchmark.py

+  singler_kv_cache = kv_cache["cache"]["decoder"]["layers_0"]["self_attention"]["AttentionOp_0"]
+  for cache_key in singler_kv_cache.keys():
+    cache_element = singler_kv_cache[cache_key]
+    print(f"{cache_key}:")


Nit: Would be helpful to print out what the variable name is. You can do this in f-strings by adding an = like this. print(f"{cache_key=}")

patemotter · 2024-04-26T19:09:10Z

MaxText/inference_microbenchmark.py

+    print(f"{cache_key}:")
+    if type(cache_element) == flax.linen.spmd.LogicallyPartitioned:
+      cache_element = cache_element.value
+    jax.debug.print("  shape: {shape}", shape=cache_element.shape)


Nit: This is a dense series of lines, some whitespace would help make it more readable.

A small thing that you can take or leave related to density is that in jax.debug.print() you can ignore the var naming if you are only printing one var. Like this jax.debug.print(" sharding: {}", cache_element.sharding).

patemotter · 2024-04-26T19:42:35Z

MaxText/inference_microbenchmark.py

@@ -227,6 +255,8 @@ def main(config):
  vocab = token_utils.load_vocab(metadata.path, metadata.extra_ids)

  decode_state = engine.init_decode_state()
+  debug_kv_cache(decode_state)


Do we want to run this twice in the script?

Can make this optional, I was also checking the decode_state, which was sharded correctly.

rwitten

Some nits

rwitten · 2024-04-29T16:25:46Z

MaxText/inference_microbenchmark.py

+    print(f"{cache_key=}")
+    if isinstance(cache_element, flax.linen.spmd.LogicallyPartitioned):
+      cache_element = cache_element.value
+    jax.debug.print("  shape: {}", cache_element.shape)


nit: these really shouldn't be jax.debug.print's because you aren't running them in a jit. You can just print.

rwitten · 2024-04-29T16:29:28Z

MaxText/max_utils.py

@@ -87,6 +87,26 @@ def summarize_size_from_pytree(params):
  return num_params, num_bytes, num_bytes / num_params


+def calculate_total_params_across_chip(params):


Sorry what does this mean? I wonder if there is a clearer name (and possibly a docstring?)

Added docstring

rwitten · 2024-04-29T16:30:05Z

MaxText/max_utils.py

@@ -87,6 +87,26 @@ def summarize_size_from_pytree(params):
  return num_params, num_bytes, num_bytes / num_params


+def calculate_total_params_across_chip(params):
+  def calculate_sizes_per_chip(arr):
+    return [np.prod(shard.data.shape) for shard in arr.addressable_shards]


np.prod(shard.data.shape) could be shard.data.size?

rwitten · 2024-04-29T16:31:03Z

MaxText/max_utils.py

+  sizes_across_chips = jax.tree_util.tree_map(calculate_sizes_per_chip, params)
+  num_chips = len(sizes_across_chips)
+  total_sizes_across_chips = jax.tree_util.tree_reduce(lambda x, y: x + y, sizes_across_chips)
+  sizes_per_chip = total_sizes_across_chips / num_chips


This is INCREDIBLY paranoid code because we're SPMD so calculating any chip is adequate

But no worries if you're paranoid!

I took a pass and changed it to a normal paranoid level. But let me share some context why I had this incredibly paranoid code the first place.

If you recall a couple of weeks ago, I was mentioning there was some memory issues that affecting the JetStream serving batch size. One of the issue came down to the prefill_result had an initiation for both prefill cache, and generate cache. The prefill cache was properly sharded, where there was no sharding constraint applied on the generate cache, thus the generate cache created a copy on all TPU chips.

This was confirmed with the utils in this PR. For example, see below ar_key's physical_sizes/bytes versus prefill's:

cached_ar_key: shape: (1024, 32, 1, 128) sharding: NamedSharding(mesh=Mesh('data': 1, 'fsdp': 1, 'fsdp_transpose': 1, 'sequence': 1, 'tensor': 8, 'autoregressive': 1), spec=PartitionSpec()) total_logical_sizes: 4194304 total_logical_bytes: 8388608 n_chips: 8 total_physical_sizes_across_chips: 33554432 total_physical_bytes_across_chip: 67108864 cached_ar_value: ...... (same as cached_ar_key) cached_prefill_key: shape: (1024, 32, 1, 128) sharding: NamedSharding(mesh=Mesh('data': 1, 'fsdp': 1, 'fsdp_transpose': 1, 'sequence': 1, 'tensor': 8, 'autoregressive': 1), spec=PartitionSpec(None, 'tensor')) total_logical_sizes: 4194304 total_logical_bytes: 8388608 n_chips: 8 total_physical_sizes_across_chips: 4194304 total_physical_bytes_across_chip: 8388608 cached_prefill_value: ...... (same as cached_prefill_key) logits: shape: (1, 1, 32000) sharding: NamedSharding(mesh=Mesh('data': 1, 'fsdp': 1, 'fsdp_transpose': 1, 'sequence': 1, 'tensor': 8, 'autoregressive': 1), spec=PartitionSpec()) total_logical_sizes: 32000 total_logical_bytes: 128000 n_chips: 8 total_physical_sizes_across_chips: 256000 total_physical_bytes_across_chip: 1024000

rwitten · 2024-04-29T21:34:17Z

MaxText/max_utils.py

+  return total_sizes_across_chips, sizes_per_chip, num_chips
+
+
+def calculate_total_bytes_across_chip(params):


some similar feedback as above here.

morgandu requested review from rwitten and gobbleturk as code owners April 26, 2024 18:23

morgandu assigned patemotter Apr 26, 2024

morgandu requested a review from patemotter April 26, 2024 18:24

morgandu force-pushed the mor--inference branch from 0a1a790 to 8ee7739 Compare April 26, 2024 19:25

patemotter reviewed Apr 26, 2024

View reviewed changes

morgandu force-pushed the mor--inference branch from 8ee7739 to e3c5fb3 Compare April 26, 2024 20:29

morgandu assigned rwitten and unassigned patemotter Apr 26, 2024

morgandu force-pushed the mor--inference branch 3 times, most recently from 4bdd2c5 to 357b36e Compare April 26, 2024 21:50

rwitten requested changes Apr 29, 2024

View reviewed changes

rwitten removed their assignment Apr 29, 2024

morgandu force-pushed the mor--inference branch from 357b36e to 472dd94 Compare May 7, 2024 21:55

morgandu requested a review from rwitten May 7, 2024 22:08

morgandu assigned rwitten May 7, 2024

morgandu force-pushed the mor--inference branch from 472dd94 to 1b94683 Compare May 7, 2024 22:16

add debug utils for cache sharding and size across chips

18e4605

morgandu force-pushed the mor--inference branch from 1b94683 to 18e4605 Compare May 8, 2024 17:11

rwitten removed their assignment May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add debug functionality for per chip sizes and bytes #625

add debug functionality for per chip sizes and bytes #625

morgandu commented Apr 26, 2024

patemotter left a comment

patemotter Apr 26, 2024

morgandu Apr 26, 2024 •

edited

patemotter Apr 26, 2024

morgandu Apr 26, 2024

patemotter Apr 26, 2024

morgandu Apr 26, 2024

patemotter Apr 26, 2024

morgandu Apr 26, 2024

rwitten left a comment

rwitten Apr 29, 2024

morgandu May 7, 2024

rwitten Apr 29, 2024

morgandu May 7, 2024

rwitten Apr 29, 2024

morgandu May 7, 2024

rwitten Apr 29, 2024

rwitten Apr 29, 2024

morgandu May 7, 2024 •

edited

rwitten Apr 29, 2024

		@@ -87,6 +87,26 @@ def summarize_size_from_pytree(params):
		return num_params, num_bytes, num_bytes / num_params


		def calculate_total_params_across_chip(params):

		return total_sizes_across_chips, sizes_per_chip, num_chips


		def calculate_total_bytes_across_chip(params):

add debug functionality for per chip sizes and bytes #625

Are you sure you want to change the base?

add debug functionality for per chip sizes and bytes #625

Conversation

morgandu commented Apr 26, 2024

patemotter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgandu Apr 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rwitten left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgandu May 7, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morgandu Apr 26, 2024 •

edited

morgandu May 7, 2024 •

edited