Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Response should not be discarded even the channel is unwritable #8159

Open
3 tasks done
redlsz opened this issue May 17, 2024 · 4 comments · May be fixed by #8160
Open
3 tasks done

[Bug] Response should not be discarded even the channel is unwritable #8159

redlsz opened this issue May 17, 2024 · 4 comments · May be fixed by #8160

Comments

@redlsz
Copy link
Contributor

redlsz commented May 17, 2024

Before Creating the Bug Report

  • I found a bug, not just asking a question, which should be created in GitHub Discussions.

  • I have searched the GitHub Issues and GitHub Discussions of this repository and believe that this is not a duplicate.

  • I have confirmed that this bug belongs to the current repository, not other repositories of RocketMQ.

Runtime platform environment

centos

RocketMQ version

develop

JDK Version

1.8

Describe the Bug

org.apache.rocketmq.remoting.netty.NettyRemotingServer.NettyServerHandler
image

org.apache.rocketmq.proxy.remoting.activity.AbstractRemotingActivity#writeResponse
image

Response should not be discarded even the channel is temporarily unwritable. Instead, we should make sure to write response to buffer and pause channel reading.

Steps to Reproduce

/

What Did You Expect to See?

/

What Did You See Instead?

/

Additional Context

No response

@drpmma
Copy link
Contributor

drpmma commented May 20, 2024

I understand that the purpose of isWriteable is primarily to protect the server from memory overflow and to do backpressure, acting as a sort of fail-fast mechanism. It serves a definite purpose, and I am more inclined towards keeping this piece of logic.

@lizhimins
Copy link
Member

lizhimins commented May 20, 2024

Not responding mainly aims to prevent backpressure from causing server-side OOM errors. Furthermore, there’s a safeguard in NettyServerHandler#channelWritabilityChanged to ensure that an excess of responses isn’t dispatched (a similar approach is adopted by other products). In most cases, connections originate from the client. By not reading and processing requests from the client, the server can remove the channel through an application layer schedule task scan. From the above issue, it's not entirely clear what problem this is intended to solve. Is it possible to adjust the server watermark to address this?

不写回 response 主要就是为了防止反压导致服务端 oom,然后 NettyServerHandler#channelWritabilityChanged 这里还有个保护(其他产品也是类似的做法)保证不会丢到过多的 response。大多数情况下,连接来自客户端,不处理来自客户端的请求之后,服务端可以通过应用层 scan channel 来移除 channel,同时释放内存。从上述 issue 没太理解是为了解决什么问题,是否可以调整 server watermakr 来解决?

@redlsz
Copy link
Contributor Author

redlsz commented May 23, 2024

@drpmma @lizhimins I think that the custom NettyServerHandler#channelWritabilityChanged can achieve the purpose of back-pressure. Once the high watermark of write buffer is reached, reading data from this channel will be suspended, which is expected to prevent the buffer from growing indefinitely.

@lizhanhui
Copy link
Contributor

This is a place where we can further refine instead of simple keep-or-discard binary decision:

  1. Consider the request timestamp and deadline as we may have suspend read for some time: if the request has missed its deadline, request and response should be discarded accordingly;
  2. Channel has exposed a method Channel#bytesBeforeWritable(), if the bytes is just over the watermark and less than configurable max-bytes-per-connection, we should keep the response to improve QoS in case of temporary client-side busy, GC for instance;
  3. If the outbound inflight bytes within the Channel has exceeds hard limit of the channel buffer, treat it as Amplifciation DDoS Attack and discard the response;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants