Building a high availability steemd node for web apis
If you're interested in building a steemd node for use with a web application, this is meant to serve as your guide. I will attempt to repost this guide occasionally with updates as requirements/steps change (since editing doesn't work past 12 hrs).
Running a load balanced steemd node for the web apis
Running a node for the web is slightly different than a node you'd use for mining or as a witness. We need all of the features available through the API to serve out every piece of information possible. Many mining/witness nodes turn most of these options off to reduce load on the server.
For a web node, we want all of these enabled! The plugins this guide will enable are as follows:
database_api
login_api
market_history_api
tags_api
follow_api
network_broadcast_api
(thanks for the tip @rainman)
In the end, your nodes will run as a load balanced websocket server on port 80 (or 443 if you install a ssl cert, not covered here). Both steemstats and piston's API's follow these conventions.
Current hardware requirements:
As of 8/8, the hardware requirements are as follows. I will attempt to recreate this post with best practices over the coming months as requirements increase.
Dual web node:
- 4 vCPU
- 16gb RAM (may exceed this soon)
- Lots of bandwidth to spare (my node uses between 4mb-25mb+ a second)
If you choose to run just a single node, you can effectively cut those requirements in half.
Assumptions
- You have a linux server
- The linux os is already installed
- You have remote access to the server
- You have permissions to install software (you may need to add sudo to some of these commands)
- You have a basic understanding of system administration
Building steemd, twice.
This configuration runs two instances of steemd for load balancing and failover purposes. If one of the nodes goes down, the other node should still be available to take requests. This will only build the steemd
application (thanks to @rainman for the tip here).
If you're interested in compiling the code faster, replace the
4
in-j4
with the number of vCPUs your machine has.
Building node #1
cd ~
git clone https://github.com/steemit/steem.git steem1
cd steem1
git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=Release CMakeLists.txt
make -j4 steemd
Building node #2
cd ~
git clone https://github.com/steemit/steem.git steem2
cd steem2
git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=Release CMakeLists.txt
make -j4 steemd
Configuring steemd
Before we start running anything, we need to configure our two steemd nodes. Listed below are two configuration examples, one for each node, where the only difference is the rpc-endpoint
port number.
steemd node #1
~/steem1/programs/steemd/witness_node_data_dir/config.ini
rpc-endpoint = 127.0.0.1:5090
seed-node=52.38.66.234:2001
seed-node=52.37.169.52:2001
seed-node=52.26.78.244:2001
seed-node=192.99.4.226:2001
seed-node=46.252.27.1:1337
seed-node=81.89.101.133:2001
seed-node=52.4.250.181:39705
seed-node=85.214.65.220:2001
seed-node=104.199.157.70:2001
seed-node=104.236.82.250:2001
seed-node=104.168.154.160:40696
seed-node=162.213.199.171:34191
seed-node=seed.steemed.net:2001
seed-node=steem.clawmap.com:2001
seed-node=seed.steemwitness.com:2001
seed-node=steem-seed1.abit-more.com:2001
enable-plugin = account_history
enable-plugin = follow
enable-plugin = market_history
enable-plugin = private_message
enable-plugin = tags
public-api = database_api login_api market_history_api tags_api follow_api
steemd node #2
~/steem2/programs/steemd/witness_node_data_dir/config.ini
rpc-endpoint = 127.0.0.1:5091
seed-node=52.38.66.234:2001
seed-node=52.37.169.52:2001
seed-node=52.26.78.244:2001
seed-node=192.99.4.226:2001
seed-node=46.252.27.1:1337
seed-node=81.89.101.133:2001
seed-node=52.4.250.181:39705
seed-node=85.214.65.220:2001
seed-node=104.199.157.70:2001
seed-node=104.236.82.250:2001
seed-node=104.168.154.160:40696
seed-node=162.213.199.171:34191
seed-node=seed.steemed.net:2001
seed-node=steem.clawmap.com:2001
seed-node=seed.steemwitness.com:2001
seed-node=steem-seed1.abit-more.com:2001
enable-plugin = account_history
enable-plugin = follow
enable-plugin = market_history
enable-plugin = private_message
enable-plugin = tags
public-api = database_api login_api market_history_api tags_api follow_api
Downloading a snapshot of the blockchain
@fydel offers up a snapshot of the blockchain to help you get sync'd faster. You'll need to download this and place it in the appropriate folders to get started.
You need to do this for each individual node you are running, in both steem1 and steem2 folders.
Automatically starting steemd on boot
It's important to ensure your node is running 24/7. If you're running ubuntu, @steemed wrote a guide that helps you configure it with ubuntu.
You'll have to create two of these, one for each steem node you're setting up. I'd recommend naming them as follows:
/etc/init/steem1
/etc/init/steem2
Once you have the startup scripts created, start steem1
and start steem2
should start both of your nodes. If you'd like to monitor the progress of both nodes simultaneously, you can use:
tail -f path/to/steem1/programs/steemd/debug.log -f path/to/steem2/programs/steemd/debug.log
You will see the nodes replaying the blockchain and once they are ready, you will see lines like this appear:
2163510ms th_a application.cpp:439 handle_block ] Got 2 transactions from network on block 3913580
As more scripts for different distros are created, I'll start adding links to them here or in the next iteration of this guide.
Configuring nginx as your load balancer
I won't go into installing nginx, as you should probably have a basic understanding of how to do this yourself. If you're looking for a package, nginx provides a package for most popular distros.
What we will need though is to configure nginx a little bit. First up, the basic nginx configuration:
nginx config:
/etc/nginx/nginx.conf
events {
worker_connections 768;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
gzip_disable "msie6";
limit_req_zone $binary_remote_addr zone=ws:10m rate=1r/s;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
Most of this should already exist in your nginx configuration, but note the limit_req_zone
line towards the bottom. This is a measure to help prevent overloading your node by setting up some request throttling.
One more file needs to be added to finish off this configuration, the actual file inside of /etc/nginx/sites-enabled
. If this server isn't going to be used for anything else, remove all of the default configurations from that folder and add the following:
nginx vhost config:
/etc/nginx/sites-enabled/default.conf
upstream websockets {
server 127.0.0.1:5090;
server 127.0.0.1:5091;
}
server {
listen 80;
server_name _;
root /var/www/html/;
keepalive_timeout 65;
keepalive_requests 100000;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
location ~ ^(/|/ws) {
limit_req zone=ws burst=5;
access_log off;
proxy_pass http://websockets;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_next_upstream error timeout invalid_header http_500;
proxy_connect_timeout 2;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Reload nginx with service nginx reload
and your server should now be responding on port 80 with the steemd nodes you just created. If you load it with a web browser, or just curl http://localhost
it should return:
11 eof_exception: End Of File
stringstream
{}
th_a sstream.cpp:109 peek
{"str":""}
th_a json.cpp:478 from_string
(which is totally ok, as your web browser isn't issuing the proper request)
Congrats!
You're now running a fully featured web node that will return content, following information, tags, account history and plenty of other information! Go forth, build awesome things, and help make the steem community even greater! :)
Something missing?
If you know of something that should be included in this guide, please let me know. I'm looking to help build a comprehensive guide so others can start hosting their own web APIs.
This is awesome!
Some tips:
Thank you so much, that's awesome.
I'll get these included in the appropriate sections and then get my servers modified for the same.
'Build twice' sounds really funny :) Apparently, you don't really like the
--data-dir
optionI didn't realize that was a thing! Now I wish I could go back and edit this post lol.
Thats a great point lol
Thanks, @jesta! This will be helpful to all devs! Quick question - why clone the repo twice and build separately? Can we instead just build once and duplicate the whole folder, all built? This will save time on building.
You absolutely could (in fact I did that as well). I just wanted to outline a bullet proof way to make it work. I've had issues with the blockchain getting corrupted when I
cp -r
the whole folder to another location, and didn't want others to encounter the same problem :)Haha, maybe the config and make puts the absolute path into the artifacts? Anyway, love nginx and steemstats!! Thanks for all the hard work
What about taking a snapshot of the final VPS, then spawn a few instances of it and load balancing those? Then there'll be multiple nginx nodes, and the setup with have even higher availability!! =D This way, there won't be a case of possible failure of nginx, otherwise that would be the point of failure.
I was actually doing this while I was mining a few weeks ago, it worked pretty well. And yes, this would increase availability even further, though it would get pricey pretty fast!
You also have to consider that if you use multiple nginx nodes and proxy them to different servers, all of the bandwidth will be multiplied by the proxy. Not a problem though if you're using internal networking, since most providers don't charge for it.
This is what we developer looking for. Many thanks to @jesta! It deserved thousand upvotes.
This is amazing and something I've been wanting to explore further. It should be listed as official documentation to encourage people to set up their own APIs. I'm spreading the word!
Great walk-through! I must say that I'm impressed by Steemit so far, but some of the documentation is a bit lacking. These kinds of posts really help.
It is, I've searched high and low for some things and haven't found a lot. But, that's the kind of stuff I thrive on: deconstructing something and learning how it works.
Hopefully what I've learned will help others get involved!
this is awesome. are you in the bay area? if so, would love for you to join our meetup in a few weeks. we're investors in nginx as well :)
https://steemit.com/steem/@ntomaino/silicon-valley-steem-meetup
I'm not unfortunately, I'm in the LA area. I'd love to join but that would require either a long, long drive or a flight :)
Thank you for posting this. I don't have the requisite skills to use it but I'm sure I will be benefitting from things that people make using your post:)
I usually don't upvote like this, but I use your tools all the time.
They are extremely helpful - Without them, I couldn't even follow people.
Here is my blind upvote - I just assume that this post is good.
Hahaha, hey... at least you're being honest :)
This post is primarily to get the instructions out there, so I can reference how my nodes are setup to other people getting involved. Many people use this.piston.rocks and steem.steemstats.com to power their own websites - and we'd like to see others creating nodes as well!
Why clone and build twice? What I do is
This should save some build time.
I ran into issues with blockchain corruption and the dreaded:
Starting chain with 0 blocks...
I seem to get this error everytime I CP a
witness_data_node_dir
around. I didn't know how common that would be, so to avoid people saying "hey this broke" I decided to go the long route and just have people create it twice.But - It's totally possible to do it this way, and it would save time. But if you run into errors like I did, just start over and compile from scratch ;)
Yeah, that makes sense. Thanks.