Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incorrect Handling of String Encoding in MultiTableCommittable Serialization #3027

Closed
2 tasks done
Pandas886 opened this issue Mar 15, 2024 · 0 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@Pandas886
Copy link
Contributor

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

0.7

Compute Engine

1.17

Minimal reproduce step

Issue description

When working with a MultiTableCommittableSerializer to handle serialized data, specifically metadata with non-ASCII characters (e.g., Chinese characters for table names), incorrect handling of byte buffers leads to missed characters or potential BufferOverflowException errors.

Steps to reproduce

The issue arises when deserializing byte arrays that represent strings with multibyte characters. For instance, deserializing metadata for tables with Chinese names can produce undesired errors or incorrect results.

What doesn't meet your expectations?

The deserialization process should correctly convert byte arrays back to strings, accommodating multibyte characters without errors or data loss.

Anything else?

Here is the stacktrace from the exception:

Caused by: java.nio.BufferOverflowException
	at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
	at java.nio.ByteBuffer.put(ByteBuffer.java:859)
	at org.apache.paimon.flink.sink.MultiTableCommittableSerializer.serialize(MultiTableCommittableSerializer.java:70)
	at org.apache.paimon.flink.sink.MultiTableCommittableSerializer.serialize(MultiTableCommittableSerializer.java:36)
	at org.apache.flink.core.io.SimpleVersionedSerialization.writeVersionAndSerialize(SimpleVersionedSerialization.java:59)
	at org.apache.flink.core.io.SimpleVersionedSerializerTypeSerializerProxy.serialize(SimpleVersionedSerializerTypeSerializerProxy.java:97)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:165)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:43)

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@Pandas886 Pandas886 added the bug Something isn't working label Mar 15, 2024
Pandas886 added a commit to Pandas886/incubator-paimon that referenced this issue Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants