torch.compile + CUDA Graph optimization for bs=1 #272

YJYJLee · 2024-01-18T23:19:41Z

PR request for Pytorch blog post.

Summary:
This post is the fourth part of a multi-series blog focused on how to accelerate generative AI models with pure, native PyTorch. In this blog, we’ll focus on speeding up FAIR’s Seamless M4T-v2 model resulting in 2x speedup for text decoder module and 30x for vocoder module, resulting in 2.7x speedup for end-to-end inference, with no loss of accuracy by using CUDA Graph and native PyTorch optimization: torch.compile.

…nto main

YJYJLee added 4 commits January 8, 2024 16:37

Perform KV cache reordering only when we need to remove beams

44e01f3

update

09428d6

Merge branch 'main' of https://github.com/facebookresearch/fairseq2 i…

78e0d48

…nto main

torch.compile + CUDA Graph optimization for bs=1

311e6ae

YJYJLee requested a review from cbalioglu as a code owner January 18, 2024 23:19

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 18, 2024

YJYJLee changed the title ~~Pytorch blog~~ torch.compile + CUDA Graph optimization for bs=1 Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.compile + CUDA Graph optimization for bs=1 #272

torch.compile + CUDA Graph optimization for bs=1 #272

YJYJLee commented Jan 18, 2024

torch.compile + CUDA Graph optimization for bs=1 #272

Are you sure you want to change the base?

torch.compile + CUDA Graph optimization for bs=1 #272

Conversation

YJYJLee commented Jan 18, 2024