[steemtools] automatic failover for witness nodes

in #witness-category8 years ago (edited)

I was told "code or it didn't happen" when my node gracefully failed over about an hour ago, so here it is. I'm a bit tired, so I apologize if there's gaps in the documentation.

https://github.com/aaroncox/witness-failover

This script is something I've been tinkering with over the past few weeks as part of my infrastructure as a witness. If you're planning on using this, I'm willing to answer questions, but I'm not willing to set this up for you. You need a basic understanding of infrastructure, docker, and witness management to really be effective with this code. I also make no claims that this will work in the future or on your system, please use responsibly!

Either that or just use it to learn from - and build your own tools based on it's principals.

witness node configuration

This setup requires a 3 system setup as described below:

  • Witness Node #1, with Signing Private Key #1
  • Witness Node #2, with Signing Private Key #2
  • This script, running someplace besides the witness nodes, potentially a workstation.

In this scenario, we're going to assume your account currently has Signing Key #1 active.

To enable the failover trap,

  • Edit the .env file and fill out your witness account, the active wif private key*, and all of your preferred witness properties.
  • Put the public signing key of Witness Node #2 into the configuration as steem_backup.
  • Check how many total_missed blocks you currently have, and +1 or +2 that number, and put that also in the .env file as threshold.

Note: Yes, this requires your private active key, which means you need to ensure this system is secure. This is a valid reason to use a 2nd account for witnessing or make sure all of your liquid funds are locked in savings accounts. (Originally I had linked from here to a post I made about feed_publish operations and permissions. I was tired, that had nothing to do with this)

piston + steemtools + twilio

Once configured and running, this script performs the following actions every interval:

  • Every 30 seconds, checks the defined witness to see how many total_missed blocks have been reported.
  • If total_missed >= threshold (threshold is a number you set), the script triggers.
  • Once triggered, it uses @furion's steemtools, built on @xeroc's piston, to issue a witness_update transaction to change your signing_key to a different key.
  • It also then uses twilio (SMS Service, Paid) to send out an SMS message letting you know the trap has triggered. If you don't want to have to setup a twilio account, you can edit the code and remove the requirement. I'm very happily paying $0.0075 USD for it to notify me once in a blue moon of issues occurring :)

With the witness_update command triggered, it will broadcast a new signing key to your account (that of Witness Node #2 in this example), which will automatically fail your witness server over to the backup.

It's a one-shot failover script that terminates itself

Think of it like a mouse trap - you start running it and once it's triggered, you're going to have to go in and reset the variables and start it again. This is not meant to be completely automated, though it could be taken that far over time. I haven't spent an incredible amount of time on that direction, as this works for now.

I'm releasing this code with The Unlicense, so you're free to do whatever you want with this. I also make no claims as to this actually working, as there are many dependancies that could break along the way. If you're embarking on the "Adventure of Automatic Failover" for witnessing, please make sure you know what you're doing :)

Sort:  

Resteemed..... And tweeted!

Why not just put it in a loop to make it "reset"?

You could, but who's to say that the other server would be online again if the backup went down? It might just ping-pong between two dead servers and spam witness_update operations until you pay attention to it.

It could definitely be taken that direction though with a bit more effort. Effort that I haven't yet put in :)

A better plan would be to have a script run on the witness server that actually checks for steemd running - or just open a port in the firewall and have the failover script check that it can connect.

Steemd could be running, but not ready to accept connections / witness blocks. So you'd get false positives about it being "active" if you were just checking for a running process. The node takes a good 10 minutes to spin up and during that time, it's going to appear to be running.

On the other hand, with the open port... If you wanted to check connections to steemd, you also would have to enable one of the RPC endpoints or plugins for that to work. You probably don't want to use any of those settings for a witness node. I'd also rather not open any ports on the firewall or connect to it at all.

IMHO this is a better solution than those options. I had a node completely melt down and I missed 2 blocks in 2 minutes, then it failed over to the running node.

To each his own though!

Thank you for sharing this! I'm sure this will help a lot of witnesses sleep well. :)

If I ever reach the point where I'll need a backup witness node, I'll definitely use this script! Thanks, @jesta

The Unlicence! I am so pleased to see another user using this licence. Creative work is a highly speculative endeavour, copyright does not in fact help any up and coming artist, because they don't have the money to pay lawyers, which, even if they did, it is questionable what benefit this has to the economy as a whole.

So the whole concept, in my opinion, should be scrapped. False attribution is the only crime that connects with this, it is clearly fraud.

So the whole concept, in my opinion, should be scrapped. False attribution is the only crime that connects with this, it is clearly fraud.

The whole concept of licensing is what you're talking about right?

I got a bit lost in your train of thought :)

It is predicated on a claim of exclusive rights distribution for something that costs practically nothing to reproduce, in a market where subjectivity dictates price more clearly. Art can be expensive to produce but worth nothing because nobody wants it.

And more importantly, it does not benefit new artists who by definition are generally poor. Owning copyright is not bound to the creator, and this overly empowers them to suppress competition and which artists get access to their distribution system and by this allows censorship.

Thus, copyright is not for artists at all, but for curators. The internet renders the need for these distribution controllers. Artists do not depend on these businesses anymore and the privilege has been blatantly abused for political ends. In fact, this is why citation is such an integral component of Steem's architecture, and why the greatest amount of changes have been made to counter methods of exploiting it.