Add fake impl for aten.unique_dim #126561

a-gardner1 · 2024-05-17T19:27:28Z

Developed in coordination with the solution to microsoft/onnxscript#1547

This PR adds the missing fake tensor implementation for aten.unique_dim, thus enabling tracing and compilation of torch.unique when dim is not None.

Local testing has proceeded with the following simple script (provided that one has checked out the changes in microsoft/onnxscript#1547):

    import onnx
    import onnxruntime as ort
    import logging
    import numpy as np
    onnx_program = torch.onnx.dynamo_export(
        lambda x: torch.unique(x,
                               dim=0,
                               return_inverse=True),
        torch.arange(10),
        export_options=torch.onnx.ExportOptions(
            dynamic_shapes=True,
            diagnostic_options=torch.onnx.DiagnosticOptions(
                verbosity_level=logging.DEBUG)))
    onnx_program.save("torch_unique.onnx")
    onnx_inputs = onnx_program.adapt_torch_inputs_to_onnx(torch.arange(10))
    onnx_outputs = onnx_program(*onnx_inputs)
    loaded_onnx_program = onnx.load("torch_unique.onnx")
    onnx.checker.check_model(loaded_onnx_program)
    ort_session = ort.InferenceSession("torch_unique.onnx")
    inputs = np.random.randint(0, 10, 10)
    print(f"Inputs: {inputs}")
    outputs = ort_session.run(None,
                              {
                                  "l_x_": inputs
                              })
    print(f"Outputs: {outputs}")
    print("Success")

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang

pytorch-bot · 2024-05-17T19:27:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126561

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Upgrade MacOS runner to 14

❌ 1 New Failure, 1 Unrelated Failure

As of commit b1455fa with merge base 3f28906 ():

NEW FAILURE - The following job has failed:

pull / linux-focal-cuda12.4-py3.10-gcc9 / build (gh)
/var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build (gh) (#127104)
/var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-05-17T19:27:32Z

The committers listed above are authorized under a signed CLA.

✅ login: ezyang / name: Edward Z. Yang (b1455fa)
✅ login: a-gardner1 / name: Andrew Gardner (ab00ca9, a5322c7, 99f2e6c, e040b23, be90ccf)

ezyang · 2024-05-21T00:18:25Z

torch/_subclasses/fake_impls.py

+    if dim is None:
+        arg_dim = arg
+    else:
+        arg_dim = arg.new_empty((arg.shape[dim],))


While I can believe you get correct results this way, this is written in a needlessly circuitous way. When writing a meta, you should prefer to directly allocate the result tensors, do not allocate intermediate tensors that do not actually get returned

Addressed in e040b23

ezyang · 2024-05-21T00:19:02Z

torch/_subclasses/fake_impls.py

+        arg_dim = arg
+    else:
+        arg_dim = arg.new_empty((arg.shape[dim],))
+
    if (nnz := arg.unique_memo) is None:


The memo strategy needs to be adjusted here, because the unique at some dimension differs from uniques at other dims. You are sharing the memo for everything.

I'll admit I haven't dug deep enough to understand what is going on with the memos yet.

Are the memos even necessary? This comment implies to me that they may not be needed for unique.

I think the simplest resolution here is to just not use the memo for unique dim

Adopted the suggested resolution in e040b23

ezyang

It also needs tests, including for the memoization problem I reported.

a-gardner1 · 2024-05-22T17:17:07Z

torch/_subclasses/fake_impls.py

+    if dim is None:
+        ret = [arg.new_empty((nnz,))]
+    else:
+        ret = [arg.new_empty(*arg.shape[:dim], nnz, *arg.shape[dim + 1:])]

    if return_inverse:


CPU/CUDA differ for unique_dim in how they handle return_inverse and return_counts. CPU ignores the arguments and always returns each, whereas CUDA does not.

I'm not sure if that distinction is important at this level of abstraction, but if it is, I assume we should favor the CUDA implementation?

Hmm, this sounds like an eager mode bug. But we have the ability to query the device using fake_device so you can give the exact behavior that eager has (and yes, would be good to do that now)

To clarify, this difference arises from their source implementations: CPU / CUDA. One can note that the return_inverse and return_counts arguments are unused for _unique_dim_cpu_template.

Perhaps this is still something to do with eager mode; I don't know enough about the inner workings and dispatches to rule that out myself.

a-gardner1 · 2024-05-22T17:19:16Z

It also needs tests, including for the memoization problem I reported.

Since the memoization issue is sidestepped by e040b23, I believe the existing tests in test/test_ops.py should suffice. I have confirmed that they cover the fake impls modified by this PR. Let me know if you think more is required.

ezyang · 2024-05-22T18:06:50Z

I'm hoping an xfail test starts succeeding, otherwise a manual test will be needed

…n dim is given

a-gardner1 · 2024-05-22T20:12:28Z

test/test_ops.py

@@ -2522,8 +2522,8 @@ def map_to_fake(e):
                        or name in sometimes_dynamic_output_op_test
                    )
                    self.assertTrue(
-                        mode.shape_env is None
-                        or not mode.shape_env.allow_dynamic_output_shape_ops
+                        fake_mode.shape_env is None


I'm hoping an xfail test starts succeeding, otherwise a manual test will be needed

Previously, a DynamicOutputShapeException was raised when dim was not None but handled here:

pytorch/test/test_ops.py

Line 2519 in b948b1a

except torch._subclasses.fake_tensor.DynamicOutputShapeException:

Traceback

Traceback (most recent call last): File "test/test_ops.py", line 2480, in run_with_fake_mode_and_verify res_fake = op(input, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/testing/_internal/opinfo/core.py", line 1132, in __call__ return self.op(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_jit_internal.py", line 502, in fn return if_false(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_jit_internal.py", line 502, in fn return if_false(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/functional.py", line 996, in _return_output output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/functional.py", line 902, in _unique_impl output, inverse_indices, counts = _VF.unique_dim( ^^^^^^^^^^^^^^^ File "torch/utils/_stats.py", line 20, in wrapper return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "torch/_subclasses/fake_tensor.py", line 973, in __torch_dispatch__ return self.dispatch(func, types, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_subclasses/fake_tensor.py", line 1362, in dispatch return self._cached_dispatch_impl(func, types, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_subclasses/fake_tensor.py", line 1065, in _cached_dispatch_impl output = self._dispatch_impl(func, types, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_subclasses/fake_tensor.py", line 1642, in _dispatch_impl op_impl_out = op_impl(self, func, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/_subclasses/fake_impls.py", line 258, in dyn_shape raise DynamicOutputShapeException(func) torch._subclasses.fake_tensor.DynamicOutputShapeException: aten.unique_dim.default

However, it was erroneously handled because the incorrect mode was used in the exception handler. Switching to fake_mode instead of mode caused two failures and two errors prior to the change in this PR.

The exception is no longer raised because an implementation for unique_dim can be found.

nice catch!

ezyang · 2024-05-28T02:16:06Z

ayyy, just have to remove the xfail now

ezyang · 2024-05-29T23:15:53Z

@pytorchbot merge

pytorchmergebot · 2024-05-29T23:17:48Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

ezyang · 2024-05-29T23:22:04Z

@pytorchbot merge

pytorchmergebot · 2024-05-29T23:24:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-29T23:34:52Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner-noclang / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

ezyang · 2024-05-31T01:51:17Z

@pytorchbot merge

pytorchmergebot · 2024-05-31T01:53:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-31T02:14:57Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.4-py3.10-gcc9 / build

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

ezyang · 2024-05-31T04:17:40Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-31T04:19:31Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build, pull / linux-focal-cuda12.4-py3.10-gcc9 / build, trunk / macos-13-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-31T10:18:09Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

ezyang · 2024-06-01T04:01:15Z

@pytorchbot merge -f "only unrelated failures"

pytorchmergebot · 2024-06-01T04:02:56Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add fake impl for unique_dim

ab00ca9

pytorchbot added the open source label May 17, 2024

drisspg added the oncall: pt2 label May 20, 2024

ezyang reviewed May 21, 2024

View reviewed changes

ezyang requested changes May 21, 2024

View reviewed changes

This comment was marked as resolved.

Sign in to view

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 21, 2024

a-gardner1 mentioned this pull request May 21, 2024

Add unique op microsoft/onnxscript#1547

Open

Do not allocate intermediate tensors or use memos for unique_dim

e040b23

a-gardner1 commented May 22, 2024

View reviewed changes

ezyang approved these changes May 22, 2024

View reviewed changes

a-gardner1 marked this pull request as draft May 22, 2024 19:16

Fix unit test shape_env, normalize negative dim, correct nnz calc whe…

be90ccf

…n dim is given

a-gardner1 commented May 22, 2024

View reviewed changes

a-gardner1 marked this pull request as ready for review May 22, 2024 20:12

a-gardner1 requested a review from mruberry as a code owner May 22, 2024 20:12

Fix handling of zero-dim tensors

99f2e6c

Remove xfail

a5322c7

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2024

pytorchmergebot added the merging label May 29, 2024

pytorchmergebot removed the merging label May 29, 2024

ezyang added the topic: not user facing topic category label May 29, 2024

pytorchmergebot added the merging label May 29, 2024

pytorchmergebot removed the merging label May 29, 2024

lintfix

b1455fa

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

pytorchmergebot added the merging label May 31, 2024

pytorchmergebot removed the merging label May 31, 2024

pytorchmergebot added the merging label May 31, 2024

pytorchmergebot closed this in 3c1cf03 Jun 1, 2024

pytorchmergebot added Merged and removed merging labels Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fake impl for aten.unique_dim #126561

Add fake impl for aten.unique_dim #126561

a-gardner1 commented May 17, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2024 •

edited

linux-foundation-easycla bot commented May 17, 2024 •

edited

ezyang May 21, 2024

a-gardner1 May 22, 2024

ezyang May 21, 2024

a-gardner1 May 21, 2024

ezyang May 21, 2024

a-gardner1 May 22, 2024

ezyang left a comment

This comment was marked as resolved.

a-gardner1 May 22, 2024

ezyang May 22, 2024

a-gardner1 May 22, 2024

a-gardner1 commented May 22, 2024

ezyang commented May 22, 2024

a-gardner1 May 22, 2024

ezyang May 26, 2024

ezyang commented May 28, 2024

ezyang commented May 29, 2024

pytorchmergebot commented May 29, 2024

ezyang commented May 29, 2024

pytorchmergebot commented May 29, 2024

pytorchmergebot commented May 29, 2024

ezyang commented May 31, 2024

pytorchmergebot commented May 31, 2024

pytorchmergebot commented May 31, 2024

ezyang commented May 31, 2024

pytorchmergebot commented May 31, 2024

pytorchmergebot commented May 31, 2024

ezyang commented Jun 1, 2024

pytorchmergebot commented Jun 1, 2024

Add fake impl for aten.unique_dim #126561

Add fake impl for aten.unique_dim #126561

Conversation

a-gardner1 commented May 17, 2024 • edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126561

❗ 1 Active SEVs

❌ 1 New Failure, 1 Unrelated Failure

linux-foundation-easycla bot commented May 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-gardner1 commented May 22, 2024

ezyang commented May 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented May 28, 2024

ezyang commented May 29, 2024

pytorchmergebot commented May 29, 2024

Merge failed

ezyang commented May 29, 2024

pytorchmergebot commented May 29, 2024

Merge started

pytorchmergebot commented May 29, 2024

Merge failed

ezyang commented May 31, 2024

pytorchmergebot commented May 31, 2024

Merge started

pytorchmergebot commented May 31, 2024

Merge failed

ezyang commented May 31, 2024

pytorchmergebot commented May 31, 2024

Merge started

pytorchmergebot commented May 31, 2024

ezyang commented Jun 1, 2024

pytorchmergebot commented Jun 1, 2024

Merge started

a-gardner1 commented May 17, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented May 17, 2024 •

edited

linux-foundation-easycla bot commented May 17, 2024 •

edited