Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Support Dict popitem #2701

Closed
wants to merge 7 commits into from

Conversation

jayzhan211
Copy link
Contributor

close #2355
close #2542

@jayzhan211 jayzhan211 requested review from a team as code owners May 17, 2024 08:03
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented May 17, 2024

@gabrieldemarmiesse , I open another PR since the previous one is quite messy for fixing conflict

I also rewrite it differently from your previous comment, which I think it looks better!

var key = item[].key
var value = item[].value
_ = self.pop(key)
return DictEntry[K, V](key, value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item[] is not valid at this point, since it is deleted.

Copy link
Contributor

@gabrieldemarmiesse gabrieldemarmiesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks simpler indeed. I'm wondering if it's future-proof though. The code is correct, but I'm afraid the compiler will not be happy with it later on.

for item in reversed(self.items()):
var key = item[].key
var value = item[].value
_ = self.pop(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that in the future this will cause issue when the compiler is stricter, no? I would expect the compiler to prevent us from changing an object during iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a look what python does

a = {'a': 1, 'b': 2}
for k, v in a.items():
    print(k, v)
    item = a.popitem()
    print(item)

Result

a 1
('b', 2)
Traceback (most recent call last):
  File "/Users/bytedance/mojo/sample.py", line 2, in <module>
    for k, v in a.items():
RuntimeError: dictionary changed size during iteration

I agree we may return error for modifying dict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang this is way harder than I anticipated...
Maybe let's go back to using optionals?

@JoeLoser
Copy link
Collaborator

Please ping me when the changelog entry is updated to adhere to the markdown formatting used by CI now (this matches what we use internally to make imports easier).

@@ -18,6 +18,8 @@ what we publish.

### ⭐️ New

- `Dict` now support `popitem`, which remove and return the last dict item. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `Dict` now support `popitem`, which remove and return the last dict item. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211))
- `Dict` now support `popitem`, which removes and returns the last item in the `Dict`. ([PR #2701](https://github.com/modularml/mojo/pull/2701) by [@jayzhan211](https://github.com/jayzhan211))

_ = self.pop(key.value()[])
return DictEntry[K, V](key.value()[], val.value()[])

raise "KeyError"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Let's give a more descriptive error like Python gives.

KeyError: 'popitem(): dictionary is empty'

Perhaps just s/dictionary/Dict for our error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KeyError: 'popitem(): dictionary is empty'

This looks nice

@jayzhan211
Copy link
Contributor Author

Done! @JoeLoser

@msaelices
Copy link
Contributor

msaelices commented May 23, 2024

Ups, sorry, I did not know of this PR and I implemented this one with the same method: #2794

Not sure if you want to take a look. I would guess my implementation could be slightly faster O(1)

Maybe you can be inspired by my PR.

@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented May 23, 2024

Maybe I'm mistaken but the implementation in this PR is O(1) too unless the reverse is simpler than I thought. We don't loop over all the keys here, just one.

@msaelices
Copy link
Contributor

Maybe I'm mistaken but the implementation in this PR is O(1) too unless the reverse is simpler than I thought. We don't loop over all the keys here, just one.

Yes, you are right. I missed the break line

@jayzhan211 jayzhan211 force-pushed the dict-popitem-v2 branch 2 times, most recently from 2038a94 to 08a59a7 Compare May 25, 2024 08:02
@jayzhan211 jayzhan211 requested a review from JoeLoser May 25, 2024 08:08
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label May 27, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels May 27, 2024
@modularbot
Copy link
Collaborator

Landed in 051cdd0! Thank you for your contribution 🎉

modularbot pushed a commit that referenced this pull request May 28, 2024
[External] [stdlib] Support `Dict.popitem()`

Implement `Dict.popitem()` which removes and
returns the last item in the `Dict`.

Fixes #2355

---------

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
Closes #2701
MODULAR_ORIG_COMMIT_REV_ID: c93387c953e05a447e6467270c54093b85e743b0
@modularbot modularbot closed this May 28, 2024
@Mogball
Copy link
Collaborator

Mogball commented May 28, 2024

FYI @jayzhan211 @JoeLoser, this PR introducing nondeterminism into test_dict.mojo

mojo /Users/jeff/Documents/modular/Kernels/test/stdlib/collections/test_dict.mojo
Test test_basic ...PASS
Test test_multiple_resizes ...PASS
Test test_big_dict ...PASS
Test test_compact ...PASS
Test test_compact_with_elements ...PASS
Test test_pop_default ...PASS
Test test_key_error ...PASS
Test test_iter ...PASS
Test test_iter_keys ...PASS
Test test_iter_values ...PASS
Test test_iter_values_mut ...PASS
Test test_iter_items ...PASS
Test test_dict_copy ...PASS
Test test_dict_copy_add_new_item ...PASS
Test test_dict_copy_delete_original ...PASS
Test test_dict_copy_calls_copy_constructor ...PASS
Test test_dict_update_nominal ...PASS
Test test_dict_update_empty_origin ...PASS
Test test_dict_update_empty_new ...PASS
Test test_mojo_issue_1729 ...PASS
Test test dict or ...PASS
Test test dict popteim ...get: wrong variant type
Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
0.      Program arguments: mojo /Users/jeff/Documents/modular/Kernels/test/stdlib/collections/test_dict.mojo
 #0 0x00000001043a10b0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x1000c90b0)
 #1 0x000000010439f210 llvm::sys::RunSignalHandlers() (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x1000c7210)
 #2 0x00000001043a1750 SignalHandler(int) (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x1000c9750)
 #3 0x00000001ab1b2a24 (/usr/lib/system/libsystem_platform.dylib+0x18042ea24)
 #4 0xffff8002a81b8510 
 #5 0x00000001047c1608 M::KGEN::ExecutionEngine::runProgram(llvm::StringRef, llvm::StringRef, llvm::function_ref<M::ErrorOrSuccess (void*)>) (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x1004e9608)
 #6 0x00000001042f8270 executeMain(mlir::ModuleOp, mlir::SymbolTable const&, M::KGEN::ExecutionEngine*, M::LLCL::Runtime&, llvm::ArrayRef<char const*>) (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x100020270)
 #7 0x00000001042f7cb8 run(M::State const&) (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x10001fcb8)
 #8 0x00000001042df774 main (/Users/jeff/Documents/modular/.derived/build-relwithdebinfo/bin/mojo+0x100007774)
 #9 0x00000001aae2bf28 
[1]    44318 trace trap  mojo 

@gabrieldemarmiesse
Copy link
Contributor

Thanks @Mogball I documented it here: #2866 we can continue the discussion there

modularbot pushed a commit that referenced this pull request May 31, 2024
…(Dict.items())` (#40974)

[External] [stdlib] Fix UB in `reversed(Dict.values())` and
`reversed(Dict.items())`

Finally found the culprit in the flakyness that plagued us since a few
week in the `test_reversed.mojo`.

### The actual bug:

When iterating over a list in reverse order, we should start at
`len(my_list) - 1` not at `len(my_list)`.
That triggered an out of bounds access and thus was undefined behavior.

### The effect on our CI
As you know, we have been seeing flakyness lately. It was documented a
number of times and always related to `reverse`:
* #2866 (comment)
* #2369

### Why was it passing sometimes?
This is because there were `Optional[...]` in the List. Thus if the flag
of the `Optional` says that no element is present, it's just skipped
(the dict doesn't have an entry at this index). So the list of the Dict
would often look like this: `["a", "b", "c", "d"] None`
but the last `None` is actually memory that we don't have access to.
Sometimes it's then skipped in the iteration making the tests pass.
Sometimes it would cause segfaults because the test dict worked with
strings. Sometimes we would get `wrong variant type` since we don't know
what happens to the memory between None check and access.

### Why wasn't it found earlier?

First of all, our Dict implementation is too complexe for what it does
and thus is very good at hiding bugs.

Well we did have `debug_assert` before getting the element of the
`List`, but this `debug_assert` looked like this in the dict iterator:
```mojo
            @parameter
            if forward:
                debug_assert(
                    self.index < self.src[]._reserved, "dict iter bounds"
                )
            else:
                debug_assert(self.index >= 0, "dict iter bounds")
```
So one bound was checked when reading in one direction and the other
bound was checked in the other direction. A better `debug_assert` would
have been
```mojo
debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds")
```
When I worked on my PR #2718 the
condition `self.index < self.src[]._reserved` didn't trigger anything
since it was in the wrong branch, it was never executed.

Also before, `__get_ref` didn't have any bounds checks, even when
assertions were enabled.

A recent commit
8d0870e
adds `unsafe_get()` in List and make `__get_ref` use it. It also adds
`debug_assert` to `unsafe_get()`, which means that now `__get_ref` has
bounds checks if run with assertions enabled. This allowed me to catch
the out of bounds access when updating
#2718 making the fail
deterministic and debuggable.

Since we have this, the `debug_assert` in `dict.mojo` isn't necessary
anymore.

### Consequences on ongoing work:
* This fix have been also added to
#2718
* The PR #2701 that we did with
@jayzhan211 was actually correct. It was just using
`reverse(Dict.items())` which was buggy at the time. After the fix is
merged, we can re-revert this PR.
* #2794 is not necessary anymore
since the implementation by @jayzhan211 was correct.
* The real cause of #2866 was
found, the issue has already been closed though.
* #2369 can be closed for good.
* #2832 can be closed for good.

### Closing thoughts
* We really need to run the unit tests with assertions enabled and add
assertions whenever necessary
* The dict implementation is a bit too complicated. For example,
`self._reserved` is the length of the internal list. There is no need to
store the length of the list twice. Let's drop this variable and use
`len(self._entries)` instead. I guess this is a relic of the time when
`List` wasn't completely flushed out. If had done so, it would have been
ovious that we can't do `my_list.__get_ref(len(my_list))`
* Iterating manually over a list like this is bug-prone. The
implementation we have especially is, since
```mojo
                @parameter
                if forward:
                    self.index += 1
                else:
                    self.index -= 1
```
is done twice in the code, it should only be done once. While there is
no bug, code duplication and complexity hides bugs.
* We should iterate over the list with a list iterator, not with a
custom-made iterator. This will remove a lot of code in the `dict.mojo`.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2896
MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
modularbot pushed a commit that referenced this pull request Jun 7, 2024
[External] [stdlib] Support `Dict.popitem()`

Implement `Dict.popitem()` which removes and
returns the last item in the `Dict`.

Fixes #2355

---------

Co-authored-by: Jay Zhan <jayzhan211@gmail.com>
Closes #2701
MODULAR_ORIG_COMMIT_REV_ID: c93387c953e05a447e6467270c54093b85e743b0
modularbot pushed a commit that referenced this pull request Jun 7, 2024
…(Dict.items())` (#40974)

[External] [stdlib] Fix UB in `reversed(Dict.values())` and
`reversed(Dict.items())`

Finally found the culprit in the flakyness that plagued us since a few
week in the `test_reversed.mojo`.

### The actual bug:

When iterating over a list in reverse order, we should start at
`len(my_list) - 1` not at `len(my_list)`.
That triggered an out of bounds access and thus was undefined behavior.

### The effect on our CI
As you know, we have been seeing flakyness lately. It was documented a
number of times and always related to `reverse`:
* #2866 (comment)
* #2369

### Why was it passing sometimes?
This is because there were `Optional[...]` in the List. Thus if the flag
of the `Optional` says that no element is present, it's just skipped
(the dict doesn't have an entry at this index). So the list of the Dict
would often look like this: `["a", "b", "c", "d"] None`
but the last `None` is actually memory that we don't have access to.
Sometimes it's then skipped in the iteration making the tests pass.
Sometimes it would cause segfaults because the test dict worked with
strings. Sometimes we would get `wrong variant type` since we don't know
what happens to the memory between None check and access.

### Why wasn't it found earlier?

First of all, our Dict implementation is too complexe for what it does
and thus is very good at hiding bugs.

Well we did have `debug_assert` before getting the element of the
`List`, but this `debug_assert` looked like this in the dict iterator:
```mojo
            @parameter
            if forward:
                debug_assert(
                    self.index < self.src[]._reserved, "dict iter bounds"
                )
            else:
                debug_assert(self.index >= 0, "dict iter bounds")
```
So one bound was checked when reading in one direction and the other
bound was checked in the other direction. A better `debug_assert` would
have been
```mojo
debug_assert(0 <= self.index < self.src[]._reserved, "dict iter bounds")
```
When I worked on my PR #2718 the
condition `self.index < self.src[]._reserved` didn't trigger anything
since it was in the wrong branch, it was never executed.

Also before, `__get_ref` didn't have any bounds checks, even when
assertions were enabled.

A recent commit
8d0870e
adds `unsafe_get()` in List and make `__get_ref` use it. It also adds
`debug_assert` to `unsafe_get()`, which means that now `__get_ref` has
bounds checks if run with assertions enabled. This allowed me to catch
the out of bounds access when updating
#2718 making the fail
deterministic and debuggable.

Since we have this, the `debug_assert` in `dict.mojo` isn't necessary
anymore.

### Consequences on ongoing work:
* This fix have been also added to
#2718
* The PR #2701 that we did with
@jayzhan211 was actually correct. It was just using
`reverse(Dict.items())` which was buggy at the time. After the fix is
merged, we can re-revert this PR.
* #2794 is not necessary anymore
since the implementation by @jayzhan211 was correct.
* The real cause of #2866 was
found, the issue has already been closed though.
* #2369 can be closed for good.
* #2832 can be closed for good.

### Closing thoughts
* We really need to run the unit tests with assertions enabled and add
assertions whenever necessary
* The dict implementation is a bit too complicated. For example,
`self._reserved` is the length of the internal list. There is no need to
store the length of the list twice. Let's drop this variable and use
`len(self._entries)` instead. I guess this is a relic of the time when
`List` wasn't completely flushed out. If had done so, it would have been
ovious that we can't do `my_list.__get_ref(len(my_list))`
* Iterating manually over a list like this is bug-prone. The
implementation we have especially is, since
```mojo
                @parameter
                if forward:
                    self.index += 1
                else:
                    self.index -= 1
```
is done twice in the code, it should only be done once. While there is
no bug, code duplication and complexity hides bugs.
* We should iterate over the list with a list iterator, not with a
custom-made iterator. This will remove a lot of code in the `dict.mojo`.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes #2896
MODULAR_ORIG_COMMIT_REV_ID: b65009dc51f1e3027f91b5b61a5b7003cb022b87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants