-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: "Unique rows (HashSet)" has a bug and drops records #3908
Comments
.take-issue |
I have taken a look, and indeed it is what it is. the way we do it is use |
hansva
added a commit
to hansva/hop
that referenced
this issue
Jun 5, 2024
I made that option the default and added a warning to the docs on possible collisions |
hansva
added a commit
that referenced
this issue
Jun 6, 2024
Use compare using values by default in Unique rows hashset #3908
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Apache Hop version?
2.8
Java version?
openjdk version "11.0.21" 2023-10-17
Operating system
Windows
What happened?
"Unique rows (HashSet)" seems to drop records even if they only appear once.
Steps to reproduce the error:
Generate 60k records, then add a sequence and one column with random fake data.
Then calculate a SHA256 checksum over it. Since it includes the sequence number from 1 - 60k, those checksums must be all unique.
But still, the "Unique rows (HashSet)" seems to consider one row a duplicate, and only returns 59,999 records.
Test pipeline attached
Unique_Hash_Faulty.zip
Issue Priority
Priority: 3
Issue Component
Component: Hop Gui
The text was updated successfully, but these errors were encountered: