-
-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues for production usage #251
Comments
Hey @zhp007, The setup, configuration, and results seem normal to me.
What are the results for other percentiles? The Go runtime causes latency fluctuations. It might be a good idea to play with the GC parameters.
The EmbeddecClient/DMap implementation is thread-safe. With the same client instance, you can create any number of goroutines. Two CPUs should be good enough to get a rough idea of the performance characteristics.
Currently, there are no other configuration options to improve performance. Still, you have two replicas, and Olric is trying to fetch all accessible values from the cluster before returning the result to the client. This is called the Last Write Wins(LWW) policy. It compares the timestamps in the returned DMap entries and the most up-to-date result wins. It's not possible to turn off this behavior explicitly. We can quickly implement it, but disabling LWW decreases the consistency. See this: Line 281 in 81e1254
The other thing to know is that if you request a key/value pair that does not belong to the node, the node finds the partition owner, fetches the pair from the owner, and returns it to the client. So, there is no redirection message in the protocol. It works as a reverse proxy. See this: Line 317 in 81e1254
|
@buraksezer Thanks for quick reply!
Set P50 is 0.8ms, P90 is 1.7ms, P95 is 2.3ms We also tried ReplicaCount=1, there is no change on Set, while Get has better performance:
If we create a worker pool of goroutines with the same client, with each worker having its own DMap with the same name. Then use the worker pool to process incoming requests rather than using a single DMap, will it help improve performance?
Won't owner always win in this case, or do I miss sth? It would be great if we can have a option to read from owner/primary only. Also the calls to Besides, condition
Yes, understand that one additional hopping and data transfer can increase overhead/latency. |
I think these numbers are pretty good. Olric is implement in Go.
I have never tried such a thing before but increasing amount of context switches may reduce the performance at some point.
It depends what happened in the cluster before you run the request. If you are adding and removing the nodes frequently, you may encounter such anomalies. Only the owner node has read-write right on the partitions but it apply the LWW policy to return the most up to date result. Olric is an AP store, that means Olric always chooses availability over consistency.
When a node goes down, a new partition owner will be assigned immediately and start processing the incoming requests. There is no active anti-entropy mechanism in Olric. It only tries to read keys from members(using the routing table, based on a consistent hash algorithm) and apply read-repair if it's enabled.
It may improve performance in some cases but I think the increasing amount of parallel network calls may decrease the overall performance for some workloads. This should be carefully designed and tested.
Yeah, this enables the LWW implicitly. |
We plan to use Olric in production. We build our cache service with Olric as embedded Go library.
Olric embedded servers are accessed through a gRPC endpoint. Requests are directed to this endpoint and then evenly distributed among the servers using load balancing.
Each server creates one Olric instance with the following config:
Config env: "wan"
PartitionCount: 271
ReplicaCount: 2
ReplicationMode: AsyncReplicationMode
Each server creates one EmbeddedClient from the Olric instance, and one DMap from the client. The gRPC get and set requests will be handled by this one DMap with its Get and Put operation.
Cluster setup for the testing:
3 pods, each with 4G memory and 2 CPUs.
Load testing scenario:
Key: UUID, value: Random 16 bytes
Test flow: Set a key/value pair -> wait 10ms -> Get the same key/value pair
The flow is executed 1000 times per second
Result:
P99 set is 4.5ms, P99 get is 6ms. It is much higher than we expected.
Any issues or suggestions for our usage and setup?
Will create more than one EmbeddedClient and/or DMap in each server help?
Any other config settings or tunings we need to care about?
Any other performance tuning suggestions?
Thanks in advance!
The text was updated successfully, but these errors were encountered: