Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence Parallelism #22

Open
6 tasks
xrsrke opened this issue Oct 25, 2023 · 1 comment
Open
6 tasks

Sequence Parallelism #22

xrsrke opened this issue Oct 25, 2023 · 1 comment
Assignees

Comments

@xrsrke
Copy link
Owner

xrsrke commented Oct 25, 2023

Implement distributed attention in LightSeq, Colossal-AI, or DeepSpeed's SP.... We have not decided which one yet.

from pipegoose.nn.sequence_parallel.attention import DistributedAttention

local_attention = torch.nn.MultiheadAttention
attention = DistributedAttention(local_attention, parallel_context)
outputs = attention(q, k, v)

assert outputs == local_attention(q, k, v)

TODOs

  • Take all the Triton kernels from LightSeq, and structure them in a modular way. Do not directly call the kernel, but call through a middle-man function.
  • Sequence parallelism's scheduler.
  • Send and receive query, key.
  • Calculate local attention.
  • Obtain complete attention output.
  • Activation checkpointing

Reading

@xrsrke xrsrke added help wanted Extra attention is needed good first issue Good for newcomers and removed good first issue Good for newcomers labels Oct 25, 2023
@xrsrke xrsrke self-assigned this Oct 25, 2023
@3outeille
Copy link
Collaborator

on it

@xrsrke xrsrke added help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants