Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure proper repair of wheels #20

Open
agriyakhetarpal opened this issue Apr 25, 2024 · 24 comments
Open

Ensure proper repair of wheels #20

agriyakhetarpal opened this issue Apr 25, 2024 · 24 comments

Comments

@agriyakhetarpal
Copy link
Contributor

I notice that the wheels currently have hardcoded platform tags that do not check for GLIBC version constraints or the macOS version constraints, and therefore might not be repaired fully, i.e., additional tags can be added for the Linux wheels and macOS wheels can better reflect the minimum required OSX version.

A quick experiment with delocate locally led me to find out that the minum required macOS version is macOS X 11.7 for both M-series and Intel wheels, and not 11.0 and 10.9 respectively. On my local machine (macOS), it is not possible to use auditwheel because of hard-coded checks to stop usage outside of Linux platforms, therefore, I tested using the cross-platform alternative repairwheel, which granted additional tags to some of the Linux wheels:

  1. The Linux x86_64 wheel additionally conforms to GLIBC 2.5, i.e., manylinux_2_5 (even though it's been EOLed for some while12 now). The wheel is correctly marked to have no additional dependencies, being a statically linked binary.
  2. This was similar for the Linux i686 wheel.
  3. For aarch64 and armv7l, it added the linux_1_1_aarch64 and the linux_2_17_armv7l tags for some reason, which won't be accepted on PyPI. I think that is a bug on repairwheel's side. auditwheel might be more robust in this case.

For Windows wheels, delvewheel showcased that no external DLLs are required or need to be mangled, as expected, and did not modify the wheel.

I would like to propose that an additional step in the make_wheels.py file after the wheels have been downloaded can do this action, and repairwheel can be added as a dependency in pyproject.toml. It would also make sense to provide an option through the command-line argument parser to not repair the wheels if needed, but I would argue that it's most likely better to have repaired wheels by default. Owing to how the binary inside the wheel can change during linkage, I would recommend this as an additional step instead of hardcoding the new tags in the values of the ZIG_PYTHON_PLATFORMS dictionary (those can then remain the same, or be updated).

Footnotes

  1. https://manylinuxinspector.joerick.me/

  2. https://mayeut.github.io/manylinux-timeline/

@whitequark
Copy link
Collaborator

I would like to propose that an additional step in the make_wheels.py file after the wheels have been downloaded can do this action, and repairwheel can be added as a dependency in pyproject.toml.

This is broadly speaking desirable but I have some implementation concerns. If you prototype this I can look into it.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Apr 25, 2024

Sure! I had a revelation – instead of making the make_wheels.py script longer, it would be much better to set up some CI in this repository and use actual GitHub Actions runners, since repairwheel does not have a stable v1 release. I would be happy to do so for all three platforms, additional architectures won't bring an issue when repairing wheels as long as the platform remains the same.

@agriyakhetarpal
Copy link
Contributor Author

So, would you be okay with GitHub-hosted CI, and with the general proposition to add CI to this repository?

@whitequark
Copy link
Collaborator

No, I don't think that will work, because it's important that wheels uploaded to PyPI are reproducible by anyone who wants, with the exact same SHA checksum.

So if we use repairwheel it needs to be fixed to a specific commit that is checked into the repo, either using PDM lockfiles, or just a normal pyproject.toml dependency.

As for CI, I'm not concerned about having it, and I don't want it to become a maintenance overhead.

@whitequark
Copy link
Collaborator

Also, repairwheel must produce exact same, bit-for-bit, output for the exact same input, or it can't be used at all.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Apr 25, 2024

Also, repairwheel must produce exact same, bit-for-bit, output for the exact same input, or it can't be used at all.

As far as I understand, repairwheel will produce a deterministic output every time given any input files.

No, I don't think that will work, because it's important that wheels uploaded to PyPI are reproducible by anyone who wants, with the exact same SHA checksum.

This will be much more difficult, however... since all four of these tools, on any platform, will do wheel unpack, copy additional files into the wheel (not here, though), and rename them to add more platform tags the wheel is compatible with:

shasum wheelhouse/ziglang-0.12.0-py3-none-macosx_10_9_x86_64.whl
73ad7c6edcc8e04cd5a819130b3c230d11bd87b3  wheelhouse/ziglang-0.12.0-py3-none-macosx_10_9_x86_64.whl

and

shasum dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64.whl
ac2b140d30c409beebd6438f64eb0ec59927ebaa  dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64.whl

i.e., the macOS wheels before and after have different checksums, and this shall similarly be so for Linux. I checked by renaming the file into the previous name again, and the checksum remained the same as the newly modified one, so, the wheel was definitely unzipped and zipped back again during repair.

Adding CI indeed brings maintenance overhead. One option that we have still remains, and should work for all of the cases, is hardcoding the new tags (at least the delocate ones are correct for macOS, I'll have to run this in CI on my fork to confirm the Linux ones, and the Windows wheels don't need them). I assume from ziglang/zig#14542 that Zig is making three or four releases a year at most, so manually verifying whether the hardcoded tags are correct will be a seasonal effort for maintainers here (but I will be happy to verify them, too, if you wish to ping me). Renaming the wheels is essentially the same as repairing them, in this case, thanks to the static binaries.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Apr 26, 2024

repairwheel is matching outputs with auditwheel, no worries there (my previous complaint was wrong) but auditwheel has another bug, where it's adding the manylinux_2_5_x86_64 and manylinux1_x86_64 platform tags to the

  1. the i386-linux
  2. x86-linux
  3. aarch64-linux
  4. armv7a-linux

wheels, which seems all incorrect to me – I'll file an issue on their tracker.

The workflow run, here: https://github.com/agriyakhetarpal/zig-pypi/actions/runs/8841793735 (PR linked above) has the required output, so I think this should be the diff (the i386-linux and x86-linux keys are already correct with MUSL 1.1 and GLIBC 2.12, and x86_64-linux can receive extra tags):

 ZIG_PYTHON_PLATFORMS = {
     'x86_64-windows': 'win_amd64',
     'x86-windows':    'win32',
-    'x86_64-macos':   'macosx_10_9_x86_64',
-    'aarch64-macos':  'macosx_11_0_arm64',
+    'x86_64-macos':   'macosx_11_7_x86_64',
+    'aarch64-macos':  'macosx_11_7_arm64',
     'i386-linux':     'manylinux_2_12_i686.manylinux2010_i686.musllinux_1_1_i686',
     # renamed i386 to x86 since v0.11.0, i386 was last supported in v0.10.1
     'x86-linux':      'manylinux_2_12_i686.manylinux2010_i686.musllinux_1_1_i686',
-    'x86_64-linux':   'manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64',
+    'x86_64-linux':   'manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.manylinux_2_5_x86_64.manylinux1_x86_64',
     'aarch64-linux':
         'manylinux_2_17_aarch64.manylinux2014_aarch64.musllinux_1_1_aarch64',
     'armv7a-linux':   'manylinux_2_17_armv7l.manylinux2014_armv7l.musllinux_1_1_armv7l',

While I may not be totally sure about Linux, with my macOS machine I can confirm whether delocate is correct with the macOS tags, and yes, it is:

Here's what otool says:

macOS arm64 wheel
$ otool -l dist/ziglang-0.12.0-py3-none-macosx_11_7_arm64/ziglang/zig
dist/ziglang-0.12.0-py3-none-macosx_11_7_arm64/ziglang/zig:
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __PAGEZERO
   vmaddr 0x0000000000000000
   vmsize 0x0000000100000000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
Load command 1
      cmd LC_SEGMENT_64
  cmdsize 792
  segname __TEXT
   vmaddr 0x0000000100000000
   vmsize 0x00000000082cf5e4
  fileoff 0
 filesize 137164260
  maxprot 0x00000005
 initprot 0x00000005
   nsects 9
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x00000001000008f8
      size 0x0000000004dc8384
    offset 2296
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __stubs
   segname __TEXT
      addr 0x0000000104dc8c7c
      size 0x0000000000002fa0
    offset 81562748
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000408
 reserved1 0 (index into indirect symbol table)
 reserved2 12 (size of stubs)
Section
  sectname __stub_helper
   segname __TEXT
      addr 0x0000000104dcbc1c
      size 0x0000000000000c54
    offset 81574940
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __const
   segname __TEXT
      addr 0x0000000104dcc870
      size 0x00000000029b6170
    offset 81578096
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __cstring
   segname __TEXT
      addr 0x00000001077829e0
      size 0x00000000003c973b
    offset 125315552
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000002
 reserved1 0
 reserved2 0
Section
  sectname __gcc_except_tab
   segname __TEXT
      addr 0x0000000107b4c11c
      size 0x0000000000001bdc
    offset 129286428
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __ustring
   segname __TEXT
      addr 0x0000000107b4dcf8
      size 0x0000000000000142
    offset 129293560
     align 2^1 (2)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __unwind_info
   segname __TEXT
      addr 0x0000000107b4de3c
      size 0x0000000000139604
    offset 129293884
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __eh_frame
   segname __TEXT
      addr 0x0000000107c87440
      size 0x00000000006481a4
    offset 130577472
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Load command 2
      cmd LC_SEGMENT_64
  cmdsize 232
  segname __DATA_CONST
   vmaddr 0x00000001082d0000
   vmsize 0x000000000000a9f0
  fileoff 137166848
 filesize 43504
  maxprot 0x00000003
 initprot 0x00000003
   nsects 2
    flags 0x10
Section
  sectname __got
   segname __DATA_CONST
      addr 0x00000001082d0000
      size 0x0000000000009738
    offset 137166848
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000006
 reserved1 1016 (index into indirect symbol table)
 reserved2 0
Section
  sectname __mod_init_func
   segname __DATA_CONST
      addr 0x00000001082d9738
      size 0x00000000000012b8
    offset 137205560
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000009
 reserved1 0
 reserved2 0
Load command 3
      cmd LC_SEGMENT_64
  cmdsize 712
  segname __DATA
   vmaddr 0x00000001082dc000
   vmsize 0x00000000005de948
  fileoff 137216000
 filesize 5629748
  maxprot 0x00000003
 initprot 0x00000003
   nsects 8
    flags 0x0
Section
  sectname __la_symbol_ptr
   segname __DATA
      addr 0x00000001082dc000
      size 0x0000000000001fc0
    offset 137216000
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000007
 reserved1 5855 (index into indirect symbol table)
 reserved2 0
Section
  sectname __const
   segname __DATA
      addr 0x00000001082ddfc0
      size 0x000000000047e348
    offset 137224128
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __data
   segname __DATA
      addr 0x000000010875c310
      size 0x00000000000de2d0
    offset 141935376
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __thread_vars
   segname __DATA
      addr 0x000000010883a5e0
      size 0x0000000000000150
    offset 142845408
     align 2^0 (1)
    reloff 0
    nreloc 0
     flags 0x00000013
 reserved1 0
 reserved2 0
Section
  sectname __thread_data
   segname __DATA
      addr 0x000000010883a730
      size 0x0000000000000004
    offset 142845744
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000011
 reserved1 0
 reserved2 0
Section
  sectname __thread_bss
   segname __DATA
      addr 0x000000010883a738
      size 0x00000000000001e8
    offset 0
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000012
 reserved1 0
 reserved2 0
Section
  sectname __bss
   segname __DATA
      addr 0x000000010883a920
      size 0x0000000000077728
    offset 0
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000001
 reserved1 0
 reserved2 0
Section
  sectname __common
   segname __DATA
      addr 0x00000001088b2050
      size 0x00000000000088f8
    offset 0
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000001
 reserved1 0
 reserved2 0
Load command 4
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __LINKEDIT
   vmaddr 0x00000001088bc000
   vmsize 0x0000000001dfc000
  fileoff 142852096
 filesize 31429984
  maxprot 0x00000001
 initprot 0x00000001
   nsects 0
    flags 0x0
Load command 5
            cmd LC_DYLD_INFO_ONLY
        cmdsize 48
     rebase_off 142852096
    rebase_size 182880
       bind_off 143034976
      bind_size 312
  weak_bind_off 143035288
 weak_bind_size 312248
  lazy_bind_off 143347536
 lazy_bind_size 4984
     export_off 143352520
    export_size 2924208
Load command 6
      cmd LC_FUNCTION_STARTS
  cmdsize 16
  dataoff 146276728
 datasize 0
Load command 7
      cmd LC_DATA_IN_CODE
  cmdsize 16
  dataoff 146276728
 datasize 0
Load command 8
     cmd LC_SYMTAB
 cmdsize 24
  symoff 146276728
   nsyms 226954
  stroff 149935480
 strsize 24006729
Load command 9
            cmd LC_DYSYMTAB
        cmdsize 80
      ilocalsym 0
      nlocalsym 179448
     iextdefsym 179448
     nextdefsym 47201
      iundefsym 226649
      nundefsym 305
         tocoff 0
           ntoc 0
      modtaboff 0
        nmodtab 0
   extrefsymoff 0
    nextrefsyms 0
 indirectsymoff 149907992
  nindirectsyms 6871
      extreloff 0
        nextrel 0
      locreloff 0
        nlocrel 0
Load command 10
          cmd LC_LOAD_DYLINKER
      cmdsize 32
         name /usr/lib/dyld (offset 12)
Load command 11
       cmd LC_MAIN
   cmdsize 24
  entryoff 71432336
 stacksize 33554432
Load command 12
      cmd LC_SOURCE_VERSION
  cmdsize 16
  version 0.0
Load command 13
      cmd LC_BUILD_VERSION
  cmdsize 32
 platform 1
    minos 11.7.1
      sdk 14.0
   ntools 1
     tool 5
  version 0.0
Load command 14
     cmd LC_UUID
 cmdsize 24
    uuid 8190EC2C-A7F2-3ECE-B209-55605CBB2084
Load command 15
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Thu Jan  1 05:30:02 1970
      current version 1336.0.0
compatibility version 1.0.0
Load command 16
      cmd LC_CODE_SIGNATURE
  cmdsize 16
  dataoff 173942224
 datasize 339856
macOS amd64 wheel
$ otool -l dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64/ziglang/zig

returns

dist/ziglang-0.12.0-py3-none-macosx_11_7_x86_64/ziglang/zig:
Load command 0
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __PAGEZERO
   vmaddr 0x0000000000000000
   vmsize 0x0000000100000000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
Load command 1
      cmd LC_SEGMENT_64
  cmdsize 792
  segname __TEXT
   vmaddr 0x0000000100000000
   vmsize 0x00000000088e8eb0
  fileoff 0
 filesize 143560368
  maxprot 0x00000005
 initprot 0x00000005
   nsects 9
    flags 0x0
Section
  sectname __text
   segname __TEXT
      addr 0x0000000100000940
      size 0x0000000005ad32b5
    offset 2368
     align 2^6 (64)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __stubs
   segname __TEXT
      addr 0x0000000105ad3bf6
      size 0x000000000000181e
    offset 95239158
     align 2^1 (2)
    reloff 0
    nreloc 0
     flags 0x80000408
 reserved1 0 (index into indirect symbol table)
 reserved2 6 (size of stubs)
Section
  sectname __stub_helper
   segname __TEXT
      addr 0x0000000105ad5414
      size 0x0000000000000a42
    offset 95245332
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x80000400
 reserved1 0
 reserved2 0
Section
  sectname __const
   segname __TEXT
      addr 0x0000000105ad5e80
      size 0x000000000282c190
    offset 95248000
     align 2^6 (64)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __cstring
   segname __TEXT
      addr 0x0000000108302010
      size 0x00000000003c8ecb
    offset 137371664
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000002
 reserved1 0
 reserved2 0
Section
  sectname __gcc_except_tab
   segname __TEXT
      addr 0x00000001086caedc
      size 0x0000000000001b1c
    offset 141340380
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __ustring
   segname __TEXT
      addr 0x00000001086cca00
      size 0x000000000000017c
    offset 141347328
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __unwind_info
   segname __TEXT
      addr 0x00000001086ccb7c
      size 0x00000000000ef430
    offset 141347708
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __eh_frame
   segname __TEXT
      addr 0x00000001087bbfb0
      size 0x000000000012cf00
    offset 142327728
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Load command 2
      cmd LC_SEGMENT_64
  cmdsize 232
  segname __DATA_CONST
   vmaddr 0x00000001088e9000
   vmsize 0x0000000000002828
  fileoff 143560704
 filesize 10280
  maxprot 0x00000003
 initprot 0x00000003
   nsects 2
    flags 0x10
Section
  sectname __got
   segname __DATA_CONST
      addr 0x00000001088e9000
      size 0x0000000000001570
    offset 143560704
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000006
 reserved1 1029 (index into indirect symbol table)
 reserved2 0
Section
  sectname __mod_init_func
   segname __DATA_CONST
      addr 0x00000001088ea570
      size 0x00000000000012b8
    offset 143566192
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000009
 reserved1 0
 reserved2 0
Load command 3
      cmd LC_SEGMENT_64
  cmdsize 792
  segname __DATA
   vmaddr 0x00000001088ec000
   vmsize 0x00000000005df750
  fileoff 143572992
 filesize 5595428
  maxprot 0x00000003
 initprot 0x00000003
   nsects 9
    flags 0x0
Section
  sectname __la_symbol_ptr
   segname __DATA
      addr 0x00000001088ec000
      size 0x0000000000002028
    offset 143572992
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000007
 reserved1 1715 (index into indirect symbol table)
 reserved2 0
Section
  sectname __const
   segname __DATA
      addr 0x00000001088ee030
      size 0x0000000000474a18
    offset 143581232
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __data
   segname __DATA
      addr 0x0000000108d62a50
      size 0x00000000000df158
    offset 148253264
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __static_data
   segname __DATA
      addr 0x0000000108e41bc0
      size 0x0000000000000410
    offset 149167040
     align 2^6 (64)
    reloff 0
    nreloc 0
     flags 0x00000000
 reserved1 0
 reserved2 0
Section
  sectname __thread_vars
   segname __DATA
      addr 0x0000000108e41fd0
      size 0x0000000000000150
    offset 149168080
     align 2^0 (1)
    reloff 0
    nreloc 0
     flags 0x00000013
 reserved1 0
 reserved2 0
Section
  sectname __thread_data
   segname __DATA
      addr 0x0000000108e42120
      size 0x0000000000000004
    offset 149168416
     align 2^2 (4)
    reloff 0
    nreloc 0
     flags 0x00000011
 reserved1 0
 reserved2 0
Section
  sectname __thread_bss
   segname __DATA
      addr 0x0000000108e42128
      size 0x00000000000001e8
    offset 0
     align 2^3 (8)
    reloff 0
    nreloc 0
     flags 0x00000012
 reserved1 0
 reserved2 0
Section
  sectname __bss
   segname __DATA
      addr 0x0000000108e42310
      size 0x0000000000080298
    offset 0
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000001
 reserved1 0
 reserved2 0
Section
  sectname __common
   segname __DATA
      addr 0x0000000108ec25b0
      size 0x00000000000091a0
    offset 0
     align 2^4 (16)
    reloff 0
    nreloc 0
     flags 0x00000001
 reserved1 0
 reserved2 0
Load command 4
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __LINKEDIT
   vmaddr 0x0000000108ecc000
   vmsize 0x0000000001da9000
  fileoff 149172224
 filesize 31100415
  maxprot 0x00000001
 initprot 0x00000001
   nsects 0
    flags 0x0
Load command 5
            cmd LC_DYLD_INFO_ONLY
        cmdsize 48
     rebase_off 149172224
    rebase_size 183464
       bind_off 149355688
      bind_size 328
  weak_bind_off 149356016
 weak_bind_size 133024
  lazy_bind_off 149489040
 lazy_bind_size 5024
     export_off 149494064
    export_size 2924536
Load command 6
      cmd LC_FUNCTION_STARTS
  cmdsize 16
  dataoff 152418600
 datasize 0
Load command 7
      cmd LC_DATA_IN_CODE
  cmdsize 16
  dataoff 152418600
 datasize 65600
Load command 8
     cmd LC_SYMTAB
 cmdsize 24
  symoff 152484200
   nsyms 231136
  stroff 156193352
 strsize 24079287
Load command 9
            cmd LC_DYSYMTAB
        cmdsize 80
      ilocalsym 0
      nlocalsym 183611
     iextdefsym 183611
     nextdefsym 47220
      iundefsym 230831
      nundefsym 305
         tocoff 0
           ntoc 0
      modtaboff 0
        nmodtab 0
   extrefsymoff 0
    nextrefsyms 0
 indirectsymoff 156182376
  nindirectsyms 2744
      extreloff 0
        nextrel 0
      locreloff 0
        nlocrel 0
Load command 10
          cmd LC_LOAD_DYLINKER
      cmdsize 32
         name /usr/lib/dyld (offset 12)
Load command 11
       cmd LC_MAIN
   cmdsize 24
  entryoff 82274208
 stacksize 33554432
Load command 12
      cmd LC_SOURCE_VERSION
  cmdsize 16
  version 0.0
Load command 13
      cmd LC_BUILD_VERSION
  cmdsize 32
 platform 1
    minos 11.7.1
      sdk 14.0
   ntools 1
     tool 5
  version 0.0
Load command 14
     cmd LC_UUID
 cmdsize 24
    uuid EE6641C7-D65A-3EB5-9A0A-1B119D63ACC9
Load command 15
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Thu Jan  1 05:30:02 1970
      current version 1336.0.0
compatibility version 1.0.0

Therefore, both of them were likely built on a macOS 14 Sonoma device with MACOSX_DEPLOYMENT_TARGET set to 11.7.1, and are therefore incompatible with ABIs for OSX 10.9 and 11.0, as noted in minos and sdk sections for the thirteenth load command.

I'll revise this in a PR for now – so, it's up to you if you wish to keep this issue open, since a restricted set of platform tags provides benefits in other ways, such as lesser problems for those with older Linux systems (because they won't be able to download the wheel at all). I'm by no means an expert on debugging GLIBC compatibility issues, and a very novice user of Zig, so I can only answer as much about the Python packaging ecosystem more.

agriyakhetarpal added a commit to agriyakhetarpal/zig-pypi that referenced this issue Apr 26, 2024
This commit, as noted in ziglang#20, makes sure that the
correct minimum macOS version is asserted
at the time when someone tries to download
and install the wheels via pip or another
package or dependency management tool.
whitequark pushed a commit that referenced this issue Apr 26, 2024
This commit, as noted in #20, makes sure that the
correct minimum macOS version is asserted
at the time when someone tries to download
and install the wheels via pip or another
package or dependency management tool.
@whitequark
Copy link
Collaborator

because they won't be able to download the wheel at all

I think what will happen is that unless we yank and re-publish all the wheels (which raises questions about whether Zig has changed the deployment target--I suspect it did) it'll just result in an older wheel being downloaded.

@whitequark
Copy link
Collaborator

This will be much more difficult, however... since all four of these tools, on any platform, will do wheel unpack, copy additional files into the wheel (not here, though), and rename them to add more platform tags the wheel is compatible with:

As long as this process is completely deterministic that's fine, but if it's not I'm not going to use it since it breaks reproducibility.

so manually verifying whether the hardcoded tags are correct will be a seasonal effort for maintainers here

I don't have time to do this but I can ping you about it.

@agriyakhetarpal
Copy link
Contributor Author

As long as this process is completely deterministic that's fine, but if it's not I'm not going to use it since it breaks reproducibility

Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use repairwheel or delocate.

I don't have time to do this but I can ping you about it.

Much appreciated :) I am happy to help out with every new release of Zig.

@whitequark
Copy link
Collaborator

Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use repairwheel or delocate.

I suspected this would be the case if something unpacks and repacks the archive, since most people are quite blasé about implementing that and don't care much about timestamps or file ordering.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Apr 26, 2024

I suspected this would be the case if something unpacks and repacks the archive, since most people are quite blasé about implementing that and don't care much about timestamps or file ordering.

Yes, it's most certainly doing that – in cases where libraries are needed to be vendored into the wheel, it has to be unzipped and a hidden .dylibs folder gets created inside it where all of them are stored (and this is by default wheel.libs for Windows). However, I did have an idea for cases like the Zig wheels here, I wonder if it would be possible to do a three-way copy, or a dry-run mode, i.e.,

  1. Duplicate the wheel A with wheel B
  2. Look at the duplicate to see if it needs shared dependencies to be copied
  3. If no, and it's just that some new tags can be added, then keep A and therefore the original hash with it, just rename it and therefore you get to keep the hash. Otherwise, repair the wheel into wheel C modifying the hash, and discarding A and B.

We should be able to do this on our own with a shell script or two, but I feel that ensuring reproducibility is important in general – if these wheel repair tools can offer this use case upstream, it would be highly beneficial. I'll put in a feature request there.

@whitequark
Copy link
Collaborator

We should be able to do this on our own with a shell script or two,

This repo is supposed to work on Windows (not Cygwin or MinGW), so I don't think a shell script cuts it.

I'm also not completely sold on the approach of "repairing" the wheel, vs. producing a correct one in first place.

@agriyakhetarpal
Copy link
Contributor Author

This repo is supposed to work on Windows (not Cygwin or MinGW), so I don't think a shell script cuts it.

Ah, apologies if I was terse. With this, I meant having a PowerShell script as well, since it's now the default shell on Windows. A general Python script would be a better cross-platform alternative, in any case.

I'm also not completely sold on the approach of "repairing" the wheel, vs. producing a correct one in first place.

I agree, this is just one of the solutions. I think that out of all of the solutions, it's still better to dry-run repair, and adjust platform tags manually, at least not until this is added as a feature. Therefore, I can offer no better than that at this time.

Although, I wonder if you could version the script with tags for this repository. That would ensure a higher level of reproducibility. For example, changes like #21 have bumped the macOS minimum version, and if someone now wants to build a bundle of macOS wheels for older Zig versions, say, 0.7.0, they'll receive this new constraint too – but Zig 0.7.0 might be compatible with older versions of macOS, too say, 10.15, (I haven't checked though) and having 11.7 here renders the wheel to be incorrectly tagged). Asking users to check out a particular tag is better than asking them to check out an older commit. That way, you can specify lower and upper bounds for Zig wheels for a particular version of the script, i.e., highlighting what they can build wheels for – the layperson's time machine.

This might be too much to maintain, though, so I understand if you don't wish to do this and would prefer responding to users individually if someone runs into this problem and then chooses to report it.

@whitequark
Copy link
Collaborator

The problem with tagging is that e.g. I backfilled the 32-bit wheels for 0.11.0 when you asked for it...

@agriyakhetarpal
Copy link
Contributor Author

Ah, well, as long as there are not a lot of users for the repository (and if there won't be many), tags are not needed and it's possible to edit the script and publish the wheels manually :) (and many thanks for uploading those!)

@whitequark
Copy link
Collaborator

Yeah, it's kind of an error-prone process in first place.

One thing that can be done with CI is OIDC based "trusted publishing". I think PyPI did not have it originally when I came up wit this project or I'd have used it. In that case, there is a transparency record published in a log, referring to the git repo, CI workflow, CI workflow logs, etc every time a package is published.

@agriyakhetarpal
Copy link
Contributor Author

Yeah, it's kind of an error-prone process in first place.

One thing that can be done with CI is OIDC based "trusted publishing". I think PyPI did not have it originally when I came up wit this project or I'd have used it. In that case, there is a transparency record published in a log, referring to the git repo, CI workflow, CI workflow logs, etc every time a package is published.

That works, too. Additionally, please consider signing the binaries with Sigstore, that way they are verifiable (if you publish the relevant signature files to GitHub Releases). I would be happy to set CI up for you for both of these things.

@whitequark
Copy link
Collaborator

I think we can start with a workflow responding to workflow_dispatch event where the workflow itself uses both Sigstore and Trusted Publishing, with the pushing of wheels still being manual. Then we can consider something more automated after that.

I agree that reproducibility is probably not enough and I'd really rather not be on a critical path to accurate delivery of opaque binaries.

@agriyakhetarpal
Copy link
Contributor Author

Thank you for your comments. I agree workflow_dispatch event makes sense – I shall be on to this in a couple of days.

@jvolkman
Copy link

Fair enough – I confirmed with a few more tests and it does break reproducibility, since repairing the same wheel multiple times displays a different SHA-256 checksum each time, regardless of whether I use repairwheel or delocate.

(Hi - author of repairwheel here)

Are you seeing this when repairing a single wheel multiple times (i.e., repairing an already-repaired wheel), or are you seeing different output when repairing the same original wheel multiple times?

If it's the latter I'd definitely be interested in more details. repairwheel takes steps to ensure the reproducibility of outputs by sorting zip entries and clearing timestamps. The repos' own CI tests ensure that a repaired wheel's checksum is equivalent whether the repair operation is run on linux, windows, or macos.

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented May 1, 2024

Hi, @jvolkman! Thanks for chiming in – I just tried to build the wheel here again with python make_wheels.py --platform x86_64-linux – just an experiment with a single wheel:

c7ae866b8a76a568e2d5cfd31fe89cdb629bdd161fdd5018b29a4a0a17045cad https://ziglang.org/download/0.12.0/zig-linux-x86_64-0.12.0.tar.xz
0c777f1d5d9be32d0edffc79956d03ed35f868a8db7b0552da0984182aa10ff0 dist/ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl

I then copied this wheel to a particular directory, repaired it with repairwheel into a different one.

First attempt
pipx run repairwheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl -o wheelhouse
Wheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl has multiple tags; using first (py3-none-musllinux_1_1_x86_64)

ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl
is consistent with the following platform tag: "linux_1_1_x86_64".

The wheel references no external versioned symbols from
system-provided shared libraries.

The wheel requires no external shared libraries! :)
Wrote /Users/agriyakhetarpal/Desktop/zig-pypi/wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl

The SHA-256 checksum of the wheel:

shasum -a 256 wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl
0db5afefea085ea235e4537d3d1c870e9c1071a53f484445e0bd528c03b5f70d  wheelhouse/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl

and I repaired the same wheel, again, this time into a different directory (not repairing an already repaired wheel):

Second time
pipx run repairwheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl -o wheelhouse1
Wheel ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl has multiple tags; using first (py3-none-musllinux_1_1_x86_64)

ziglang-0.12.0-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.musllinux_1_1_x86_64.whl
is consistent with the following platform tag: "linux_1_1_x86_64".

The wheel references no external versioned symbols from
system-provided shared libraries.

The wheel requires no external shared libraries! :)
Wrote /Users/agriyakhetarpal/Desktop/zig-pypi/wheelhouse1/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl

and the checksum is indeed the same:

0db5afefea085ea235e4537d3d1c870e9c1071a53f484445e0bd528c03b5f70d  wheelhouse1/ziglang-0.12.0-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.musllinux_1_1_x86_64.linux_1_1_x86_64.whl

So, yes, you're right – I was indeed repairing an already-repaired wheel multiple times, and repairing the same unrepaired wheel is deterministic, which is good – resolves many of the problems here.

However, as a by-product, the issue that I noticed is about the wheel is receiving a linux_1_1_x86_64 tag upon repair – this does not seem to be valid? It does not have the musl- prefix to conform to MUSL 1.1 (which is already present in the unrepaired wheel). A similar bug with an unwanted tag can be noticed when I build the aarch64 Linux wheels and then proceed to repair them, it adds the linux_1_1_aarch64 tag. I was going to file a bug report this week about this, good to see that you came to the pedestal yourself, haha!

@agriyakhetarpal
Copy link
Contributor Author

Same thing with the armv7l wheels – it adds the linux_2_17_armv7l tag upon repair. I am happy to file a detailed bug report on repairwheel's issue tracker with more details, at least, for organisational purposes in case someone else stumbles into this – and contribute as well, if this is something that can be fixed without changing any vendored libraries that repairwheel could be using internally.

@jvolkman
Copy link

jvolkman commented May 1, 2024

Got it. Good to hear about the reproducible repair.

I'd guess the other issue is just that repairing musl wheels isn't currently supported, and the failure manifests as that ancient linux tag being added. The problem is that auditwheel determines that a wheel is musl and what its musl version is by looking at /lib/libc.musl* (here), which doesn't exist when repairing cross platform. And so a glibc repair is currently assumed.

I don't have a good solution, but it would be nice to support musl wheels somehow. Feel free to open an issue to track that if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants