[GCU] Support llama for GCU #8445

EnflameGCU · 2024-05-15T12:16:18Z

PR types

New features

PR changes

Models

Description

Support llama for GCU

paddle-bot · 2024-05-15T12:16:24Z

Thanks for your contribution!

codecov · 2024-05-15T12:47:00Z

Codecov Report

Attention: Patch coverage is 36.00000% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 54.29%. Comparing base (5170664) to head (32d66ef).
Report is 1 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/transformers/llama/fusion_ops.py	10.00%	9 Missing ⚠️
paddlenlp/transformers/llama/modeling.py	54.54%	5 Missing ⚠️
paddlenlp/generation/utils.py	50.00%	1 Missing ⚠️
paddlenlp/utils/tools.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8445      +/-   ##
===========================================
- Coverage    55.42%   54.29%   -1.14%     
===========================================
  Files          617      617              
  Lines        96286    96340      +54     
===========================================
- Hits         53367    52303    -1064     
- Misses       42919    44037    +1118

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

paddlenlp/transformers/llama/modeling.py

wawltor · 2024-05-16T07:07:53Z

paddlenlp/transformers/llama/modeling.py

@@ -1528,7 +1535,7 @@ def forward(
            attention_mask, (batch_size, seq_length), cache_length, inputs_embeds.dtype
        )  # [bs, 1, seq_len, seq_len]
        is_casual = False
-        if self.config.use_flash_attention:
+        if self.config.use_flash_attention and get_env_device() != "gcu":


这里在attention mask的处理上，GCU不一样的地方是什么？

基于 use_flash_attention kernel 的实现，is_casual 情况下也是需要当前与输入相同dtype的attention_mask，而不是None或者bool类型的mask。

wawltor · 2024-05-16T07:08:46Z

examples/benchmark/wiki_lambada/eval.py

@@ -297,6 +303,7 @@ def do_generation():
    parser = get_eval_parser()
    args = parser.parse_args()
    paddle.set_default_dtype(args.dtype)
+    paddle.set_device(args.device)


在训练初始的位置设置set_device，这里再重新设置的原因是什么？

单独使用的eval.py测试，没有训练初始位置？当然不设置默认应该也是当前的device，可以去除

ZHUI · 2024-05-16T07:27:38Z

paddlenlp/transformers/llama/modeling.py

@@ -934,7 +941,7 @@ def forward(
                        sin.cast(value_states.dtype) if sin.dtype != value_states.dtype else sin,
                    )
                else:
-                    cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
+                    cos, sin, _ = self.rotary_emb(value_states, seq_len=kv_seq_len)


一定要加 cos_sin 的优化吗？代码改动很大，而且会导致其他设备性能下降，凭空多了很多开销。

或者你们需要的时候再自己去造一个 cos_sin

这里主要是因为算子的实现与paper或者vllm一致，使用了与这里不同的sin/cos。关于其他设备性能开销，一方面，应该大多table的计算只在初始化阶段，另一方面，我们将按照第一个issue的建议，在特定设备进行计算，这里仅仅只会多返回一个None。

ZHUI · 2024-05-20T08:30:34Z

https://xly.bce.baidu.com/paddlepaddle/Paddle-NLP/newipipe/detail/10720664/job/26276076

这个PR的 rope 接口改动，貌似导致自动并行代码挂了

paddle-bot bot added the contributor label May 15, 2024

paddle-bot bot assigned DesmonDay May 15, 2024

DesmonDay reviewed May 16, 2024

View reviewed changes

paddlenlp/transformers/llama/modeling.py Outdated Show resolved Hide resolved

wawltor reviewed May 16, 2024

View reviewed changes

ZHUI reviewed May 16, 2024

View reviewed changes

EnflameGCU force-pushed the paddle_llm_upstream branch from 1ce5c16 to 0afe3fa Compare May 16, 2024 08:02

[GCU] Support llama for GCU

32d66ef

EnflameGCU force-pushed the paddle_llm_upstream branch from 0afe3fa to 32d66ef Compare May 16, 2024 08:13

EnflameGCU requested review from DesmonDay, ZHUI and wawltor May 17, 2024 02:55

wawltor merged commit d9dcd9a into PaddlePaddle:develop May 17, 2024
8 of 11 checks passed

heavyrain-lzy mentioned this pull request May 21, 2024

【AutoParallel】Update rotary_emb in auto_parallel #8475

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GCU] Support llama for GCU #8445

[GCU] Support llama for GCU #8445

EnflameGCU commented May 15, 2024

paddle-bot bot commented May 15, 2024

codecov bot commented May 15, 2024 •

edited

wawltor May 16, 2024

EnflameGCU May 16, 2024

wawltor May 16, 2024

EnflameGCU May 16, 2024

ZHUI May 16, 2024

EnflameGCU May 16, 2024

ZHUI commented May 20, 2024

[GCU] Support llama for GCU #8445

[GCU] Support llama for GCU #8445

Conversation

EnflameGCU commented May 15, 2024

PR types

PR changes

Description

paddle-bot bot commented May 15, 2024

codecov bot commented May 15, 2024 • edited

Codecov Report

wawltor May 16, 2024

Choose a reason for hiding this comment

EnflameGCU May 16, 2024

Choose a reason for hiding this comment

wawltor May 16, 2024

Choose a reason for hiding this comment

EnflameGCU May 16, 2024

Choose a reason for hiding this comment

ZHUI May 16, 2024

Choose a reason for hiding this comment

EnflameGCU May 16, 2024

Choose a reason for hiding this comment

ZHUI commented May 20, 2024

codecov bot commented May 15, 2024 •

edited