add alibi position embedding and support baichuan #54

qyccc · 2023-12-16T09:41:26Z

This adds the ALiBi method and its flash attention version (using triton) for positional information. And it supports baichuan model trainig by porting over the implementation from baichuan-inc/Baichuan2-13B-Base .

add alibi position embedding

CLAassistant · 2023-12-16T09:41:31Z

All committers have signed the CLA.

li-yi-dong · 2023-12-19T02:18:33Z

Cool! It may take some time to review 🙃

li-yi-dong

I'll review again once you solve the comments.

examples/Baichuan_13_standalone.sh

megatron/arguments.py

megatron/fused_kernels/__init__.py

megatron/model/transformer.py

li-yi-dong · 2023-12-19T03:04:24Z

megatron/model/transformer.py

@@ -471,10 +542,24 @@ def __init__(self, init_method,
        self.core_attention = CoreAttention(self.layer_number,
                                            self.attn_mask_type)
        self.checkpoint_core_attention = args.recompute_granularity == 'selective'
+
+        self.apply_query_key_layer_scaling = args.apply_query_key_layer_scaling
+        world_size = mpu.get_tensor_model_parallel_world_size()


tensor_parallel_size

tensor_parallel_size

sorry, I didn't get it. Do you mean the variance name should be tensor_parallel_size？

megatron/model/transformer.py

megatron/tokenizer/tokenizer.py

qyccc · 2023-12-20T13:55:31Z

@li-yi-dong Thanks for your time and cautious review. I have made the necessary changes and addressed the comments you mentioned. Please take another look at the updated version at your convenience.

megatron/training.py

megatron/model/transformer.py

li-yi-dong

Big thanks to your efforts and patience.
I added some comments to resolve.

megatron/model/transformer.py

li-yi-dong · 2023-12-31T08:20:01Z

megatron/model/transformer.py

@@ -1222,11 +1286,106 @@ def set_input_tensor(self, input_tensor):
        forward_step_func"""
        self.input_tensor = input_tensor

+    def _build_alibi_tensor(self, tensor, max_seq_len, num_attention_heads):


Placing this func together with alibi_mask_func

This func requires the internal variable first_run, so it cannot be placed in the utils.

qyccc added 23 commits December 1, 2023 14:01

Update transformer.py

a3e55c1

add alibi position embedding

Create pretrain_baichuan.py

cf7c168

Create baichuan_model.py

9ffe45c

Create baichuan_checkpoint_conversion.py

852f44b

Create baichuan_hf_to_megatron.sh

e08a890

Update baichuan_hf_to_megatron.sh

19460d8

Create baichuan_megatron_to_hf.sh

642599f

Update README.md

af89fda

Update README.md

46ae7dd

Update README_zh.md

eb1a258

Update README_zh.md

8330e86

Update README.md

86d996b

Update arguments.py

6ccfb61

add collate_fn

b0ac4d3

Update training.py

09d01a8

add collate_fn

68c77a5

add alibi

4c582c0

Update tokenizer.py

3d6b633

Update transformer.py

9b709dc

Create Baichuan_13_standalone.sh

a167ea2

Update README.md

d5fbd89

Update __init__.py

448de2b

solve Unsupported gpu architecture 'compute_90'

e2af6ca

li-yi-dong self-assigned this Dec 19, 2023

li-yi-dong requested changes Dec 19, 2023

View reviewed changes

qyccc added 3 commits December 19, 2023 15:32

Update Baichuan_13_standalone.sh

aad5993

Update arguments.py

d9140f4

Update transformer.py

48538ce

qyccc added 10 commits December 19, 2023 15:58

Update utils.py

531c2e4

Update tokenizer.py

ca53b5e

Update tokenizer.py

b4443d8

Update arguments.py

ae70b9b

Update Baichuan_13_standalone.sh

c2e1867

Update training.py

60f218f

support flash attention v1

917992c

Update Baichuan_13_standalone.sh

3b36b50

Update arguments.py

64255e3

initialize certain Tensors on GPU

c6607c2

qyccc requested a review from li-yi-dong December 20, 2023 13:55

qyccc added 3 commits December 21, 2023 10:37

Update transformer.py

39ecc65

Update __init__.py

975bac6

Update Baichuan_13_standalone.sh

5b8fca1

li-yi-dong reviewed Dec 31, 2023

View reviewed changes

megatron/training.py Outdated Show resolved Hide resolved

li-yi-dong reviewed Dec 31, 2023

View reviewed changes

megatron/training.py Outdated Show resolved Hide resolved

li-yi-dong reviewed Dec 31, 2023

View reviewed changes

megatron/model/transformer.py Outdated Show resolved Hide resolved

li-yi-dong requested changes Dec 31, 2023

View reviewed changes

li-yi-dong requested a review from longo11070001 December 31, 2023 08:25

qyccc added 9 commits January 2, 2024 21:39

remove collate_fn

d636039

Update indentent in transformer.py

0dd0c21

Update data_samplers.py

56684aa

Update training.py

451ffab

Update transformer.py

b4a2133

Update transformer.py

7cf20bd

Update utils.py

5af1af2

Update transformer.py

7de3e0a

Update transformer.py

251d6c5

qyccc requested a review from li-yi-dong January 3, 2024 03:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add alibi position embedding and support baichuan #54

add alibi position embedding and support baichuan #54

qyccc commented Dec 16, 2023 •

edited

CLAassistant commented Dec 16, 2023 •

edited

li-yi-dong commented Dec 19, 2023

li-yi-dong left a comment

li-yi-dong Dec 19, 2023

qyccc Dec 26, 2023

qyccc commented Dec 20, 2023

li-yi-dong left a comment

li-yi-dong Dec 31, 2023

qyccc Jan 2, 2024 •

edited

add alibi position embedding and support baichuan #54

Are you sure you want to change the base?

add alibi position embedding and support baichuan #54

Conversation

qyccc commented Dec 16, 2023 • edited

CLAassistant commented Dec 16, 2023 • edited

li-yi-dong commented Dec 19, 2023

li-yi-dong left a comment

Choose a reason for hiding this comment

li-yi-dong Dec 19, 2023

Choose a reason for hiding this comment

qyccc Dec 26, 2023

Choose a reason for hiding this comment

qyccc commented Dec 20, 2023

li-yi-dong left a comment

Choose a reason for hiding this comment

li-yi-dong Dec 31, 2023

Choose a reason for hiding this comment

qyccc Jan 2, 2024 • edited

Choose a reason for hiding this comment

qyccc commented Dec 16, 2023 •

edited

CLAassistant commented Dec 16, 2023 •

edited

qyccc Jan 2, 2024 •

edited