Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cython to speed-up wrap angle #16326

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maxnoe
Copy link
Member

@maxnoe maxnoe commented Apr 23, 2024

Description

This is a follow-up to #13510, #13497 and #16222 to further improve up 13479.

I now use cython.ufunc to get a large speedup, factors of 2 to 3 for larger arrays, but faster in all cases I tested compared to current main (which already includes the improvements of #16222).

I'll open a similar one for the checks in Latitude which remain a bottleneck.

I'd also suggest to keep open #13497 #13479 for the time being, it's still true that these functions are bottlenecks, even with the recent improvements.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

Here is a plot of the improvements, generated using the code below
wrap_at_performance

from pathlib import Path
import astropy.units as u
from astropy.units import Quantity
import numpy as np
from astropy.coordinates.angles import Longitude, Latitude, Angle
import json

rng = np.random.default_rng(0)

shapes = (1, 100, 10000, (500, 500))

p_nan = (0, 1e-2, 0.1, 0.5)
wrap = Angle(180, u.deg)

results = []

for shape in shapes:
    for p in p_nan:
        values = rng.uniform(-720, 720, np.prod(shape))
        if p > 0:
            values[rng.choice(values.size, int(np.ceil(p * values.size)), replace=False)] = np.nan

        values = values.reshape(shape)
        angle = Angle(values, u.deg)

        print('total', shape, 'nans', np.count_nonzero(np.isnan(values)))
        result = get_ipython().run_line_magic("timeit", "-o angle.wrap_at(wrap)")
        results.append([{"shape": shape, "p_nan": p, "mean": result.average, "std": result.stdev}])

Path("performance.json").write_text(json.dumps(results))

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

@neutrinoceros How can I run the same benchmarks as you did in #16222?

@neutrinoceros
Copy link
Contributor

you need to apply the benchmark label :)

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

Could you add it? I don't have the permissions to modify labels

@neutrinoceros
Copy link
Contributor

Neither do I, actually 😅
Let's ask @pllim

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

The test failure looks unrelated at first glance... any idea?

https://github.com/astropy/astropy/actions/runs/8798691509/job/24146244116?pr=16326

@neutrinoceros
Copy link
Contributor

Looks like macOS runners were just migrated to AMR64 and we hit #16319 and #16320, so yeah, definitely unrelated to this PR

@pllim pllim added this to the v7.0.0 milestone Apr 23, 2024
@pllim pllim mentioned this pull request Apr 23, 2024
10 tasks
@pllim
Copy link
Member

pllim commented Apr 23, 2024

I'd also suggest to keep open #13497 for the time being

That is a merged PR, there is nothing to keep open. Was there a typo?

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

Sorry, I meant to tag this issue here: #13479 which is just a switch of the last two numbers, which is funny that it turns out to be related issues / PRs

@pllim
Copy link
Member

pllim commented Apr 23, 2024

Needs:

  • change log
  • fix pre-commit check
  • identify a coordinates maintainer that is well-versed in Cython to review

Thanks!

@pllim
Copy link
Member

pllim commented Apr 23, 2024

Looks like macOS runners were just migrated to ARM64

We can skip those tests in OSX or downgrade that job to older mac x86 runners. But we can discuss in a different issue.

@pllim
Copy link
Member

pllim commented Apr 23, 2024

Re: benchmark label -- I added it. When it completes, you will see a summary in the log and also option to download artifacts.

@pllim
Copy link
Member

pllim commented Apr 23, 2024

Hmm is this related or just VM being flaky?

| Change   | Before [e9ba0dba]    | After [9eb6abd7]    |   Ratio | Benchmark (Parameter)          |
|----------|----------------------|---------------------|---------|--------------------------------|
| +        | 44.1±0.1μs           | 84.6±40μs           |    1.92 | units.time_quantity_times_unit |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
Error: PERFORMANCE DECREASED.

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

84.6±40μs

The uncertainty is 50 % of the value...

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

Now I am very confused... I checked also creating an AltAz instance, which was our original bottleneck, and that is slower here than on main... how can that be if the function itself is faster ...

@maxnoe
Copy link
Member Author

maxnoe commented Apr 23, 2024

Scratch that, I must have fooled myself somehow with switching branches and editable installs...

cdef dtype wrap_angle_floor
wrap_angle_floor = wrap_angle - period

if dtype is float or dtype is double:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity as I haven't seen the cython ufunc stuff before, are these checks going to be done for every single value? Or does this get optimised out?

Copy link
Member Author

@maxnoe maxnoe Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks on fused dtypes are essentially like if constexpr, the function is compiled for each member of the fused type separately

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astrofrog
Copy link
Member

I wonder if the cython wrapping function could be more generally useful in other parts of astropy - for example for folding time series. Perhaps we should think of whether it can be written with more general variable names and placed in e.g. utils or similar, for use in different sub-packages? (only if that generalization does not impact performance).

Copy link
Contributor

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it makes the readonly case an explicit error, which is something I missed when I last worked on this.
I'm assuming that perf regressions aren't real (for now), so I'm ignoring this aspect in this review.

wrap_angle = wrap_angle.to_value(self.unit).astype(
self_angle.dtype, copy=COPY_IF_NEEDED
)
if not self_angle.data.readonly:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to write this logic in a flatter style, such as

if self_angle.data.readonly and needs_wrapping(self_angle, wrap_angle, a360).any():
    raise ValueError(...)

wrap_at(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion, will change


np.import_array()

ctypedef fused dtype:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype feels unexpected here since it has a different meaning in numpy, so I'd suggest using something more idiomatic in templates, like T

Suggested change
ctypedef fused dtype:
ctypedef fused T:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how dtype is used in other astropy cython code, e.g. here: https://github.com/astropy/astropy/blob/main/astropy/stats/_stats.pyx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the file you linked uses it to fuse numpy data types, which in my view is a little different. But in any case this is just a nit. If nobody else finds this confusing feel free to ignore this suggestion.

if (angle >= wrap_angle_floor) and (angle < wrap_angle):
return angle

cdef int wraps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is int guaranteed to be 4 bytes long in Cython ? I'm worried that using platform-dependent-sized types might lead to very subtle errors in the distant future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the native c type, but I really don't expect issues here. It would mean that a value wraps more often than 2 * pi * int max, which seems impossible...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I've been bitten by this kind of problem before which is why I may have jumped on this one a bit quickly. Agreed that it's unlikely enough that it doesn't matter in practice !

if NUMPY_LT_2_0:
# Ensure ndim>=1 so that comparison is done using the angle dtype.
self_angle = self_angle[np.newaxis]
a360 = np.asarray(u.degree.to(self.unit, 360.0), dtype=self_angle.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure that np.asarray would avoid a copy in scenarios where it's not needed. Is this any better ?

Suggested change
a360 = np.asarray(u.degree.to(self.unit, 360.0), dtype=self_angle.dtype)
a360 = u.degree.to_value(self.unit, 360.0).astype(self_angle.dtype, copy=COPY_IF_NEEDED)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the recommended way as of the numpy 2.0 migration guide: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the asarray is needed to make it something that has astype in the first place.

Also, units don't have to_value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


In [7]: u.degree.to(u.rad, 360.0)
Out[7]: 6.283185307179586

In [8]: u.degree.to(u.rad, 360.0).astype(np.float32)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 u.degree.to(u.rad, 360.0).astype(np.float32)

AttributeError: 'float' object has no attribute 'astype'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: this will always copy since we will convert a python float to the wished dtype

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh my, 3 mistakes in one line of suggestion, this must be my new record 🎉
(seriously though, thanks for checking, and sorry I wasn't careful enough here)

@saimn
Copy link
Contributor

saimn commented Apr 24, 2024

Please add a setup_package.py file to define NPY_NO_DEPRECATED_API and avoid NPY_1_7_API_VERSION warnings (see other subpackages for examples).

@saimn
Copy link
Contributor

saimn commented Apr 24, 2024

I'm not expert with Angle but while doing quick tests to play with cython I'm just wondering if we need ufunc/fused types:

  • from what I can see Angle always converts values to float64 (which makes sense for physical units such as angles), so why supporting int variants ?
  • wrap_angle and period are scalar so no need for array/broadcast etc.
  • seems in that case that a simple loop on a memoryview would be faster than the ufunc without needing the extra features (and smaller .so file I guess)

@astrofrog
Copy link
Member

astrofrog commented Apr 24, 2024

For fun: if we don't think we need the fused dtypes, we could also just use a plain C extension which compiles and runs faster: https://gist.github.com/astrofrog/136d15ebd650b4465060e314fccd7988 - runs about twice as fast (as Cython) for large arrays and about 10x faster for small arrays. In practice actually need to do the whole thing with the outer/inner loop to be able to deal with non-contiguous arrays and endian etc but just thought I'd try it out as an experiment (I did not actually write a single line of that module, it was an experiment with AI-generated code!). If this really is a bottleneck in coordinates, it might be worth writing the C ourselves - in a number of projects I've worked on before I've found you can often typically get 2x better performance when dealing with large arrays. To be clear though, we can still merge this PR with Cython, this is more of a 'in future if it's still a bottleneck' comment.

@maxnoe
Copy link
Member Author

maxnoe commented Apr 26, 2024

Thanks @astrofrog, I remember dabbling around with writing a ufunc in C by hand a year ago or so, but ran into all the edge cases we test for

  • wrapping angle needs to be the same type as input to fullfill test edgecases like wrap_angle(np.float32(2 * np.pi), np.float32(2 * np.pi))
  • Needs to work with integers

I am happy to give it a try again though, but I am no expert in writing c numpy code by hand. At the time I asked @mhvk if we could do a pair programming session, but we never followed up on it

@mhvk
Copy link
Contributor

mhvk commented Apr 30, 2024

I have looked at the cython implementations and find them quite simple, which is good, but still they make the code less clear and add some 100kiB of binary library. I wonder whether the speed benefits are actually worth it for the larger question of wanting to speed up coordinate transformations, etc. In particular, from #13479, just initiating an Angle is already substantially slower than a Quantity, which really seems rather odd (though that is probably best discussed there).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants