Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 streams might break in current stream implementation #151

Open
lazydogP opened this issue Apr 11, 2024 · 4 comments
Open

UTF-8 streams might break in current stream implementation #151

lazydogP opened this issue Apr 11, 2024 · 4 comments
Assignees

Comments

@lazydogP
Copy link

I'm using this package in Node.js environment and call cohere.chatStream to generate long Chinese texts. However, The replacement character (�) appears in random places in 'stream-end' event. The following code may convert incomplete UTF-8 chunks, which are yielded in parts by stream, into strings.

// Buffer is present in Node.js environment
if (typeof Buffer !== "undefined") {
bufferChunk += Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk);
}

While Latin Basic characters require only 1 byte in UTF-8, other characters, such as CJK characters, need more bytes to encode. This means there's a chance that a character could be split across chunks.

@billytrend-cohere
Copy link
Collaborator

Hey @lazydogP many thanks for this interesting find! We have repro'd it and will have a fix for you asap.

@billytrend-cohere
Copy link
Collaborator

@lazydogP we're planning to move to SSE to fix this issue for you! thanks for your patience

@danny-avila
Copy link

@lazydogP we're planning to move to SSE to fix this issue for you! thanks for your patience

Is this resolved yet?

@danny-avila
Copy link

Still happening on the latest version, it seems we can't rely on the stream-end event to avoid the issue:

danny-avila/LibreChat@fe93f3a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants