Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fake impl for aten.unique_dim #126561

Closed
wants to merge 6 commits into from

Conversation

a-gardner1
Copy link
Contributor

@a-gardner1 a-gardner1 commented May 17, 2024

Follow-up to #113118 and #124306.

Developed in coordination with the solution to microsoft/onnxscript#1547

This PR adds the missing fake tensor implementation for aten.unique_dim, thus enabling tracing and compilation of torch.unique when dim is not None.

Local testing has proceeded with the following simple script (provided that one has checked out the changes in microsoft/onnxscript#1547):

    import onnx
    import onnxruntime as ort
    import logging
    import numpy as np
    onnx_program = torch.onnx.dynamo_export(
        lambda x: torch.unique(x,
                               dim=0,
                               return_inverse=True),
        torch.arange(10),
        export_options=torch.onnx.ExportOptions(
            dynamic_shapes=True,
            diagnostic_options=torch.onnx.DiagnosticOptions(
                verbosity_level=logging.DEBUG)))
    onnx_program.save("torch_unique.onnx")
    onnx_inputs = onnx_program.adapt_torch_inputs_to_onnx(torch.arange(10))
    onnx_outputs = onnx_program(*onnx_inputs)
    loaded_onnx_program = onnx.load("torch_unique.onnx")
    onnx.checker.check_model(loaded_onnx_program)
    ort_session = ort.InferenceSession("torch_unique.onnx")
    inputs = np.random.randint(0, 10, 10)
    print(f"Inputs: {inputs}")
    outputs = ort_session.run(None,
                              {
                                  "l_x_": inputs
                              })
    print(f"Outputs: {outputs}")
    print("Success")

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang

Copy link

pytorch-bot bot commented May 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126561

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 1 Unrelated Failure

As of commit b1455fa with merge base 3f28906 (image):

NEW FAILURE - The following job has failed:

  • pull / linux-focal-cuda12.4-py3.10-gcc9 / build (gh)
    /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

  • pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build (gh) (#127104)
    /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDASparseDescriptors.h:119:68: error: ‘cusparseStatus_t cusparseCreateBsrsm2Info(bsrsm2Info**)’ is deprecated: The routine will be removed in the next major release [-Werror=deprecated-declarations]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented May 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

if dim is None:
arg_dim = arg
else:
arg_dim = arg.new_empty((arg.shape[dim],))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can believe you get correct results this way, this is written in a needlessly circuitous way. When writing a meta, you should prefer to directly allocate the result tensors, do not allocate intermediate tensors that do not actually get returned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e040b23

arg_dim = arg
else:
arg_dim = arg.new_empty((arg.shape[dim],))

if (nnz := arg.unique_memo) is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memo strategy needs to be adjusted here, because the unique at some dimension differs from uniques at other dims. You are sharing the memo for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I haven't dug deep enough to understand what is going on with the memos yet.

Are the memos even necessary? This comment implies to me that they may not be needed for unique.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simplest resolution here is to just not use the memo for unique dim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted the suggested resolution in e040b23

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs tests, including for the memoization problem I reported.

@a-gardner1

This comment was marked as resolved.

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 21, 2024
if dim is None:
ret = [arg.new_empty((nnz,))]
else:
ret = [arg.new_empty(*arg.shape[:dim], nnz, *arg.shape[dim + 1:])]

if return_inverse:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU/CUDA differ for unique_dim in how they handle return_inverse and return_counts. CPU ignores the arguments and always returns each, whereas CUDA does not.

I'm not sure if that distinction is important at this level of abstraction, but if it is, I assume we should favor the CUDA implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this sounds like an eager mode bug. But we have the ability to query the device using fake_device so you can give the exact behavior that eager has (and yes, would be good to do that now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this difference arises from their source implementations: CPU / CUDA. One can note that the return_inverse and return_counts arguments are unused for _unique_dim_cpu_template.

Perhaps this is still something to do with eager mode; I don't know enough about the inner workings and dispatches to rule that out myself.

@a-gardner1
Copy link
Contributor Author

It also needs tests, including for the memoization problem I reported.

Since the memoization issue is sidestepped by e040b23, I believe the existing tests in test/test_ops.py should suffice. I have confirmed that they cover the fake impls modified by this PR. Let me know if you think more is required.

@ezyang
Copy link
Contributor

ezyang commented May 22, 2024

I'm hoping an xfail test starts succeeding, otherwise a manual test will be needed

@a-gardner1 a-gardner1 marked this pull request as draft May 22, 2024 19:16
@@ -2522,8 +2522,8 @@ def map_to_fake(e):
or name in sometimes_dynamic_output_op_test
)
self.assertTrue(
mode.shape_env is None
or not mode.shape_env.allow_dynamic_output_shape_ops
fake_mode.shape_env is None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping an xfail test starts succeeding, otherwise a manual test will be needed

Previously, a DynamicOutputShapeException was raised when dim was not None but handled here:

except torch._subclasses.fake_tensor.DynamicOutputShapeException:

Traceback
Traceback (most recent call last):
  File "test/test_ops.py", line 2480, in run_with_fake_mode_and_verify
    res_fake = op(input, *args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/testing/_internal/opinfo/core.py", line 1132, in __call__
    return self.op(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_jit_internal.py", line 502, in fn
    return if_false(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_jit_internal.py", line 502, in fn
    return if_false(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/functional.py", line 996, in _return_output
    output, _, _ = _unique_impl(input, sorted, return_inverse, return_counts, dim)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/functional.py", line 902, in _unique_impl
    output, inverse_indices, counts = _VF.unique_dim(
                                      ^^^^^^^^^^^^^^^
  File "torch/utils/_stats.py", line 20, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "torch/_subclasses/fake_tensor.py", line 973, in __torch_dispatch__
    return self.dispatch(func, types, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_subclasses/fake_tensor.py", line 1362, in dispatch
    return self._cached_dispatch_impl(func, types, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_subclasses/fake_tensor.py", line 1065, in _cached_dispatch_impl
    output = self._dispatch_impl(func, types, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_subclasses/fake_tensor.py", line 1642, in _dispatch_impl
    op_impl_out = op_impl(self, func, *args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/_subclasses/fake_impls.py", line 258, in dyn_shape
    raise DynamicOutputShapeException(func)
torch._subclasses.fake_tensor.DynamicOutputShapeException: aten.unique_dim.default

However, it was erroneously handled because the incorrect mode was used in the exception handler. Switching to fake_mode instead of mode caused two failures and two errors prior to the change in this PR.

The exception is no longer raised because an implementation for unique_dim can be found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

@a-gardner1 a-gardner1 marked this pull request as ready for review May 22, 2024 20:12
@a-gardner1 a-gardner1 requested a review from mruberry as a code owner May 22, 2024 20:12
@ezyang
Copy link
Contributor

ezyang commented May 28, 2024

ayyy, just have to remove the xfail now

@ezyang
Copy link
Contributor

ezyang commented May 29, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@ezyang ezyang added the topic: not user facing topic category label May 29, 2024
@ezyang
Copy link
Contributor

ezyang commented May 29, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
@ezyang
Copy link
Contributor

ezyang commented May 31, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor

ezyang commented May 31, 2024

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build, pull / linux-focal-cuda12.4-py3.10-gcc9 / build, trunk / macos-13-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@ezyang
Copy link
Contributor

ezyang commented Jun 1, 2024

@pytorchbot merge -f "only unrelated failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: pt2 open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants