RE: [GUIDE] Optimise your RPC server for better performance

You are viewing a single comment's thread from:

RE: [GUIDE] Optimise your RPC server for better performance

View the full context

gtg (70)in #witness-category • 8 years ago

Upvoting for the effort of making things better, but I can't agree with tips. Name lookup while being a part of a benchmark is crazy long and irrelevant. Nginx config is kind of generic use case optimization. We are using it for reverse proxies, so sendfile on is irrelevant.
Network level optimizations is what devices in front of your rpc node are used for.

time_namelookup:    0.004
time_connect:       0.004
time_appconnect:    0.087
time_pretransfer:   0.087
time_redirect:      0.000
time_starttransfer: 0.088
time_total:         0.088

^{That's my endpoint under load, but queried from a VIP zone. No TCP tweaking on a host. 128GB RAM with RAID0 NVMe}

BTW, yeah, I know that my node's performance for general public is currently not as good as it used to be but I'm still trying to serve some high frequency requests to service providers (despite multiple notices of deprecation / making that endpoint obsolete).
I will switch endpoint somewhere in May to what I have currently under tests. Same hardware, new software, you will see the difference :-)

8 years ago in #witness-category by gtg (70)

Sort:

someguy123 (69) 8 years ago

The name lookup part is strange yes, but it was staying between 400ms to 600ms regardless of that until I made the optimisations.

I found one of the most common reasons for RPCs being slow, was too many connections. I saw some IPs making 100s of connections regardless of the nginx rate limiting. I also saw an ungodly amount of TIME_WAIT (waiting to close) connections that were not being cleaned up.

I copied the sendfile on part from our Privex config, I was a little confused as to why that was there too, but I just left it there as it didn't seem to be hurting anything.

If you take a look at the graphs, you can see the insane open connections that my RPC node, and the minnowsupport one were suffering from. This is partly due to asshats using get_block over HTTP rather than websockets, thus opening 100s of connections (of which by default linux takes 4 minutes to close... which is why time_wait optimisation is needed). This does slow down public RPC nodes due to the fact the network scheduler is having to deal with 2000 connections despite the fact less than 300 of those are actually active.

$0.36

1 vote

[-]

gtg (70) 8 years ago

Yeah... I was considering disabling get_block entirely and using separate, smaller and much more robust instance for that (pre-jussi times) but there were also troubles with vops. I'm planning for improvements for June, there's no point in wasting time for temporary solutions.

$0.00

[-]

adasq (63) 8 years ago

This is partly due to asshats using get_block over HTTP rather than websockets

@someguy123 As a witnesses you have strong knowledgebase about stuff working behind the scene, so perhaps you have some preferences / advices aimed to API users?

I mean people using API do not worry about performance, but perhaps your hints (such as this quoted above) make a difference? Thinking about simple Do's and Do Not's list.

$0.00

1 vote

[-]

omitaylor (56) 8 years ago

Agreed. For people using an api, what would be helpful?

$0.00

[-]

gtg (70) 8 years ago

Upvoted myself because those comments from voting bots are super annoying. Nobody is scrolling below that trash.

$0.00

[-]

abit (70) 8 years ago

So what's the tl;dr?

$0.00

[-]

gtg (70) 8 years ago

hardware firewall in front to deal with network level hassle, nginx with ssl termination, jussi for caching, then specialized nodes; appbase+rocksdb, enterprise nvmes and 640kB should be enough for everybody ;-)
Soon in your blockchain (June, after my vacation)

$0.00

[-]

abit (70) 8 years ago

Thanks. Unfortunately unable to upvote at this moment.

$0.00