{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":65600975,"defaultBranch":"main","name":"pytorch","ownerLogin":"pytorch","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2016-08-13T05:26:41.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/21003710?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1717779586.0","currentOid":""},"activityList":{"items":[{"before":null,"after":"913957bbbc103eb0e03ba895cca53c6478d05ec6","ref":"refs/heads/gh/H-Huang/125/orig","pushedAt":"2024-06-07T16:59:46.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"H-Huang","name":"Howard Huang","path":"/H-Huang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/14858254?s=80&v=4"},"commit":{"message":"[pipelining] pipelining.rst updates\n\nghstack-source-id: 7df59064cb98072dcd80635d4ec2954c05ee952a\nPull Request resolved: https://github.com/pytorch/pytorch/pull/128228","shortMessageHtmlLink":"[pipelining] pipelining.rst updates"}},{"before":null,"after":"81ec022dbe5e8f5252a366695c9c41ceb72e1811","ref":"refs/heads/gh/H-Huang/125/base","pushedAt":"2024-06-07T16:59:39.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"H-Huang","name":"Howard Huang","path":"/H-Huang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/14858254?s=80&v=4"},"commit":{"message":"Update on \"[pipelining] Remove num_microbatches from stage\"\n\n\r\nThis is similar to https://github.com/pytorch/pytorch/pull/127979, but instead of removing `num_microbatches` from schedule, we remove it from `PipelineStage`. This also means that during `PipelineSchedule` init we need to setup the buffers for the stage(s).\r\n\r\ncc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k\n\n[ghstack-poisoned]","shortMessageHtmlLink":"Update on \"[pipelining] Remove num_microbatches from stage\""}},{"before":null,"after":"e2411ad92fca2a9e9a4df898f7f3bd20f7787ce1","ref":"refs/heads/gh/H-Huang/125/head","pushedAt":"2024-06-07T16:59:39.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"H-Huang","name":"Howard Huang","path":"/H-Huang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/14858254?s=80&v=4"},"commit":{"message":"[pipelining] pipelining.rst updates\n\n[ghstack-poisoned]","shortMessageHtmlLink":"[pipelining] pipelining.rst updates"}},{"before":"514994932afb5aa7a99f54d23093d095948f6020","after":null,"ref":"refs/tags/ciflow/trunk/126098","pushedAt":"2024-06-07T16:59:06.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"514994932afb5aa7a99f54d23093d095948f6020","after":null,"ref":"refs/tags/ciflow/rocm/126098","pushedAt":"2024-06-07T16:59:04.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"514994932afb5aa7a99f54d23093d095948f6020","after":null,"ref":"refs/tags/ciflow/inductor/126098","pushedAt":"2024-06-07T16:59:03.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"46c1da488f8d0c48b8a6ae4ce98f5e80d26c4b66","after":"f9b5ed14407d3ea613da9803a460517185af2e89","ref":"refs/heads/gh/amjames/21/orig","pushedAt":"2024-06-07T16:58:58.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"amjames","name":"Andrew James","path":"/amjames","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9041474?s=80&v=4"},"commit":{"message":"Update triton pin to improve throughput w/ assert\n\n[Triton #3868](https://github.com/triton-lang/triton/pull/3868) adds a\nno-return annotation to generated calls to `__assertFail`. This prevents\nunnecessary register reservation and should help address #120452.\n\ncc peterbell10\n\nghstack-source-id: 7ef99d836c6f9dec9ba7239f45292e35378d174c\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126098","shortMessageHtmlLink":"Update triton pin to improve throughput w/ assert"}},{"before":"514994932afb5aa7a99f54d23093d095948f6020","after":"870e99bc29bed1f5f20a28a9a6bee548252bd594","ref":"refs/heads/gh/amjames/21/head","pushedAt":"2024-06-07T16:58:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"amjames","name":"Andrew James","path":"/amjames","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9041474?s=80&v=4"},"commit":{"message":"Update\n\n[ghstack-poisoned]","shortMessageHtmlLink":"Update"}},{"before":"d88414badcc245d869af146216349014a6616508","after":null,"ref":"refs/tags/ciflow/trunk/127957","pushedAt":"2024-06-07T16:53:11.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"d88414badcc245d869af146216349014a6616508","after":null,"ref":"refs/tags/ciflow/periodic/127957","pushedAt":"2024-06-07T16:53:10.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"59f8ba73867a92b0c45169ff7762108a0eb08bc8","after":null,"ref":"refs/tags/ciflow/trunk/127956","pushedAt":"2024-06-07T16:53:09.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"b9b89ed638d8cb5eb31e6e153219d10d77198eb7","after":"85758fa5ae4d232902e7da6eaa2bcc33cc96b921","ref":"refs/heads/main","pushedAt":"2024-06-07T16:53:02.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"[c10d][TCPStore] make TCPStore server use libuv by default (#127957)\n\n**Summary**\nThis PR switches the default TCPStore server backend to a new implementation that utilizes [`libuv`](https://github.com/libuv/libuv) for significantly lower initialization time and better scalability:\n\"image\"\n\nWe hope this improvement would benefit users from a much shorter startup time in large-scale jobs. Eventually, we hope to fully replace the old TCPStore backend implementation with the libuv one.\n\n**What it changes**\nThis PR changes the underlying TCPStore server backend to `libuv` if users don't explicitly specify to use the old TCPStore server. This change is not supposed to cause any user notice except significant faster TCPStore startup for large-scale jobs.\n\nOne thing to note is, we do not support the initialization approach where user passes in a socket for libuv backend. We plan to support it as a next step but we choose to disable it before fully testing. If you are initializing TCPStore in this approach, you can see the next section to remain using the old TCPStore server.\n\n**Fallback/Remain using the old TCPStore server**\nFor users who want to stay with the old TCPStore backend, there're 3 ways:\n\n1. If user is directly instantiating TCPStore object, user can pass in argument `use_libuv=False` to use the old TCPStore server backend e.g. `store = torch.distributed.TCPStore(..., use_libuv=False)`.\n2. Or, specify the TCPStore backend option in `init_method` when calling default ProcessGroup init, e.g. `torch.distributed.init_process_group(..., init_method=\"{YOUR_RENDEZVOUS_METHOD}://{YOUR_HOSTNAME}:{YOUR_PORT}?use_libuv=0\")`\n3. Or, user can set environment variable `USE_LIBUV` to `\"0\"` when launching.\n\nThese 3 approach are in order of precedence. That being said, if user specifies `use_libuv=0` in `init_method` and also sets environment var `USE_LIBUV=\"1\"`, the former will take effect and the TCPStore backend instantiated will be the old one instead of the one using libuv.\n\n**Operating Systems Compatibility**\nFrom the CI signals, we believe the new implementation has the same behavior as the old TCPStore server on all supported platforms. If you notice any behavior discrepancy, please file an issue with `oncall: distributed` label.\n\n**Test Plan**\n`pytest test/distributed/test_store.py`\n\"image\"\nnote: `TestMultiThreadedWait::test_wait` is a broken test that has been there for some time.\n\n`test/distributed/elastic/utils/distributed_test.py`\n\"image\"\n\n**TODO**\n1. Update the doc at\n\n- https://pytorch.org/docs/stable/distributed.html#distributed-key-value-store\n- https://pytorch.org/docs/stable/distributed.html#tcp-initialization\n\n2. Make torch elastic rendezvous to use libuv TCPStore as well. See `torch/distributed/elastic/rendezvous/c10d_rendezvous_backend.py` cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @kurman\n3. Test if libuv backend is okay with initialization with socket. Change `LibUvTCPStoreTest::test_take_over_listen_socket`.\n\n**Test Plan**\n`pytest test/distributed/test_store.py`\n\"image\"\nnote: `TestMultiThreadedWait::test_wait` is a broken test that has been there for some time.\n\n`test/distributed/elastic/utils/distributed_test.py`\n\"image\"\n\nDifferential Revision: [D58259591](https://our.internmc.facebook.com/intern/diff/D58259591)\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127957\nApproved by: https://github.com/kurman\nghstack dependencies: #127956","shortMessageHtmlLink":"[c10d][TCPStore] make TCPStore server use libuv by default (#127957)"}},{"before":"9e2d2c3c632686a87c2fd48057be4f158e094c14","after":null,"ref":"refs/tags/ciflow/trunk/127796","pushedAt":"2024-06-07T16:46:47.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"d9696ea62482c15f565de3315db7ec40da3cbdc7","after":"b9b89ed638d8cb5eb31e6e153219d10d77198eb7","ref":"refs/heads/main","pushedAt":"2024-06-07T16:46:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"[pipelining] fix LoopedBFS (#127796)\n\n# Issues\n\nCurrently two issues need to be fixed with LoopedBFS:\n1. The wrap around send operation to the looped around stage blocks will cause a hang. For some reason this doesn't surface on single node, but on multihost this surfaces in a hang.\n\"image\"\n2. When microbatches are popped off in `backward_one_chunk` will automatically use the `bwd_chunk_id` starting from 0. This works for interleaved 1f1b and 1f1b, but for loopedBFS we want to pop from starting at `num_microbatches - 1`. Same needs to be fixed for gpipe?\n\n# Changes\n- Update LoopedBFS implementation to share `_step_microbatches` with `Interleaved1F1B`\n- Also share the tests between the two schedules for varying num_microbatches, local_stages, and world_sizes\n- Update `backward_one_chunk` to optionally take a `bwd_chunk_id` argument.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127796\nApproved by: https://github.com/wconstab","shortMessageHtmlLink":"[pipelining] fix LoopedBFS (#127796)"}},{"before":"e8cf8eda949ac93d199db095731bd19ac8d2b166","after":null,"ref":"refs/tags/ciflow/inductor/127574","pushedAt":"2024-06-07T16:46:36.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"e8cf8eda949ac93d199db095731bd19ac8d2b166","after":null,"ref":"refs/tags/ciflow/trunk/127574","pushedAt":"2024-06-07T16:46:35.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"fc6e3ff96d4613dacf4e762de2c3841ed333f5c5","after":"d9696ea62482c15f565de3315db7ec40da3cbdc7","ref":"refs/heads/main","pushedAt":"2024-06-07T16:46:28.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"[AOTInductor] [Tooling] Update NaN and INF Checker for AOTInductor (#127574)\n\nSummary:\n1. Integrate NaN and INF checker with existing config, controllable by env var.\n2. Move inject point of NaN & INF checker earlier, this could prevent buffer freeing before check.\n3. Inject debugging code in Kernel level, which prevents us trying to read buffers that are fused inplace and into a single kernel.\n\nTest Plan:\nDebugging utility.\nTest and check by existing tests with env var:\n```\nTORCHINDUCTOR_NAN_ASSERTS=1 TORCHINDUCTOR_MAX_AUTOTUNE=0 python test/inductor/test_aot_inductor.py -k AOTInductorTestNonABICompatibleCuda.test_seq_non_abi_compatible_cuda\n```\n\nReviewed By: ColinPeppler\n\nDifferential Revision: D57989176\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127574\nApproved by: https://github.com/chenyang78, https://github.com/desertfire","shortMessageHtmlLink":"[AOTInductor] [Tooling] Update NaN and INF Checker for AOTInductor (#…"}},{"before":"80aa9a50a54857e22b9595945b9acc3522e04cc9","after":null,"ref":"refs/tags/ciflow/trunk/127610","pushedAt":"2024-06-07T16:42:31.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"80aa9a50a54857e22b9595945b9acc3522e04cc9","after":null,"ref":"refs/tags/ciflow/inductor/127610","pushedAt":"2024-06-07T16:42:30.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"cd837ac8e8a172a020d5004ed1a30e1e517bbf0a","after":"9ab09738f31af4e40a989b9d49b693a55c92f7b1","ref":"refs/heads/gh/chunyuan-w/14/orig","pushedAt":"2024-06-07T16:42:24.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"[AOTI] align data_size of the constants\n\nghstack-source-id: 1995e02f00f8618c844223439be8f489bcdaa28b\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127610","shortMessageHtmlLink":"[AOTI] align data_size of the constants"}},{"before":"80aa9a50a54857e22b9595945b9acc3522e04cc9","after":"7c1be1d841d45ff55dba26e7c53e8df43aaecf5f","ref":"refs/heads/gh/chunyuan-w/14/head","pushedAt":"2024-06-07T16:42:22.000Z","pushType":"push","commitsCount":26,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"Update\n\n[ghstack-poisoned]","shortMessageHtmlLink":"Update"}},{"before":"fdd50524e12269546340c40f99ccdbd7798781b8","after":"0ca460136b19417f8eba80638a408d4830b743e0","ref":"refs/heads/gh/chunyuan-w/14/base","pushedAt":"2024-06-07T16:42:21.000Z","pushType":"push","commitsCount":25,"pusher":{"login":"pytorchmergebot","name":null,"path":"/pytorchmergebot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97764156?s=80&v=4"},"commit":{"message":"Update (base update)\n\n[ghstack-poisoned]","shortMessageHtmlLink":"Update (base update)"}},{"before":"c0f5250f3acbc6dc66bcbc6a4b38c445789b4e8a","after":null,"ref":"refs/tags/ciflow/trunk/127703","pushedAt":"2024-06-07T16:39:38.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"c0f5250f3acbc6dc66bcbc6a4b38c445789b4e8a","after":null,"ref":"refs/tags/ciflow/inductor/127703","pushedAt":"2024-06-07T16:39:37.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"974308351d1b415e9cad542575bbf7ca0dd6ed3b","after":null,"ref":"refs/tags/ciflow/trunk/127705","pushedAt":"2024-06-07T16:39:36.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"295177f9b3bfec4f23af28cbf7c645948a8b3b44","after":null,"ref":"refs/tags/ciflow/trunk/127707","pushedAt":"2024-06-07T16:39:36.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"6a8e8b454ac19bfffe6d6b007198cfb29a99d6b4","after":null,"ref":"refs/tags/ciflow/trunk/127710","pushedAt":"2024-06-07T16:39:36.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"f2cb1b2d8141bdb82040c662a6d89638a62b4461","after":null,"ref":"refs/tags/ciflow/inductor/127706","pushedAt":"2024-06-07T16:39:36.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"pytorch-bot[bot]","name":null,"path":"/apps/pytorch-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/40112?s=80&v=4"}},{"before":"6d3f5dbcc30ce63ebe4ef98a65bfc904bebc15cd","after":"2baafd48e95206a2119a835a17b8b1d3f169386f","ref":"refs/heads/gh/XuehaiPan/42/orig","pushedAt":"2024-06-07T16:39:35.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"XuehaiPan","name":"Xuehai Pan","path":"/XuehaiPan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16078332?s=80&v=4"},"commit":{"message":"[BE] enable UFMT for `torch/storage.py`\n\nghstack-source-id: 5282762944aff5e1e2b2c292183df6e3b2b9c171\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127706","shortMessageHtmlLink":"[BE] enable UFMT for torch/storage.py"}},{"before":"22e1e0d411d2d8c8d1f3d0ec00069971f29f0708","after":"3311f576af8cc5f6f79f3c22e1507dde76815253","ref":"refs/heads/gh/XuehaiPan/43/orig","pushedAt":"2024-06-07T16:39:35.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"XuehaiPan","name":"Xuehai Pan","path":"/XuehaiPan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16078332?s=80&v=4"},"commit":{"message":"[BE] enable UFMT for top-level files `torch/*.py`\n\nghstack-source-id: 2574da3db25884689aeba3611518561375fca616\nPull Request resolved: https://github.com/pytorch/pytorch/pull/127707","shortMessageHtmlLink":"[BE] enable UFMT for top-level files torch/*.py"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEX2-uegA","startCursor":null,"endCursor":null}},"title":"Activity · pytorch/pytorch"}