SBDS to MySQL in Docker

blervin (54)in #utopian-io • 8 years ago (edited)

Steem Blockchain Data Service in MySQL on Docker

What Will I Learn?

We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean on flexible storage running the Steem Blockchain Data Service and MySQL in docker.

Deploy a Storage Volume to hold the data
Deploy a CentOS server to run SBDS and MySQL in Docker

Requirements

You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform and have some experience or are comfortable with docker.

Difficulty

While I have simplified nearly everything herein, this is all still pretty complex. Even so, you can literally copy/paste this script and be up and running in minutes.

Intermediate

Tutorial Contents

Here's everything we'll cover in this tutorial.

Deploy SBDS
- Overview
- Initial Setup
- Prepare Script
- Create Droplet and Volume
- Confirm Script Execution
How this script works
- Initial config
- Mount volume
- Yum install
- Docker install and config
- Start MySQL
- MySQL Config
- Git clone SBDS
- Start SBDS
Additional commands
Conclusion
The Script

Deploy SBDS

Overview

At the time of this writing, the size of the MySQL database of the entire blockchain is 417GB, and growing. To handle this growth, we rely on Block Storage at DigitalOcean to deploy flexible storage at any size for $0.10/GB per month and increase that size at any time later as the blockchain grows.

We'll walk through the steps to deploy a cloud server at DigitalOcean using Block Storage running the Steem Blockchain Data Service and MySQL in docker.

Initial Setup

We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:

NYC1
NYC3
SFO2
FRA1
SGP1
TOR1
BLR1
LON1
AMS3

To run SBDS reliably you will need at least 4GB of memory, and unlike steem nodes SBDS is multithreaded so it will use ALL the CPU you throw at it.

I'd give an estimate for times to get up to the current blocks, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but it may easily become a few days before it completes.

Prepare Script

Start by opening your favorite text editor such as Sublime Text or just Notepad and grab The Script from the end of this post.

The first modification is to change the mysql_password in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.

mysql_password="mystrongcomplexpassword"

Create Droplet and Volume

We start by clicking the Create button and selecting Droplets

Then, we select CentOS and 7.4 x64 from the drop-down:

As we look to Choose a size you'll notice there are Standard Droplets and Optimized Droplets available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.

Next, click Add Volume to add block storage.

Given the current size of the Steem blockchain, I recommend using at least 500GB.

Next, we'll pick a region.

As you'll see, only the supported regions are selectable with all others grayed out.

Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index} where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.

That said, we look to the following line in our script setting the volume_name variable:

volume_name="volume-tor1-01"

There are two parts to this name, the region (tor1 here) and the index of the volume (01 in this case) for that region.

If you do not have any volumes in the region you are using, this should deploy with the 01 index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name accordingly.

Next, we check the User data box and paste in our modified script:

Now, we can scroll down and click the Create button.

The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy link to grab that and paste into your SSH client such as Putty.

Confirm Script Execution

The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.

tail -fn5 /var/log/sbds_install.log

Somewhere between a few hours and a few days, you'll finally see:

END install

Just sit back patiently until everything fully syncs up.

How this script works

There are a lot of moving parts in this script so we'll walk through each step, along with a few tweaks along the way that you can make to fit your specific environment better.

Initial config

Everything we're doing here takes a LONG time, like up to days sometimes, so we want some way to keep track of our progress. The opening line of this script starts the logging to /var/log/sbds_install.log and we can follow allong there to check in on the script. You'll see more of this logging in the script itself, but I'll exclude it here.

echo "$(date) set variables" >> /var/log/sbds_install.log

Then, we just set a few basic variables

mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"

Mount volume

Next, we need to format and mount the block storage. This entire command is the same that DigitalOcean provides in their Config instructions for the volume, with some variable replacement for our full path.

sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_$volume_name
sudo mkdir -p /mnt/$volume_name; 
sudo mount -o discard,defaults /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name; 
echo /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name ext4 defaults,nofail,discard 0 0 | sudo tee -a /etc/fstab

Yum install

Then, we use yum to install a few things.

yum -y install docker git wget

Docker install and config

Next, we need to start docker. This is where it gets a little tricky.

We need to update the docker configuration to use the block storage we've mounted rather than the usual /var/lib/docker path. This is set in /etc/docker/daemon.json

In order to automate the update of this path, we need to do a little magic to the string. I'm not going to walk through exactly what's happening here because you might start hyperventilating (I did a little).

This first line rewrites the string containing the volume name slightly so our replacement using sed on the next line works properly.

docker_volume_name=`echo $volume_name | sed -e "s/-/\\\\\-/g"`
sed -i -e "s/{}/{\"graph\": \"\/mnt\/$docker_volume_name\"}/g" /etc/docker/daemon.json

In short, it makes the /etc/docker/daemon.json look like this.

# cat /etc/docker/daemon.json
{"graph": "/mnt/volume-tor1-01"}

Then we start docker and enable it to start automatically.

systemctl start docker
systemctl enable docker

Start MySQL

Now that docker is running, we can start mysql. Here's where we can start to make changes specific to our environment.

If you are running just a 4GB environment, then the standard start should work fine.

docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql

This is running mysql in a detatached container named steem_mysql using the password we specified in the variables at the start.

MySQL Config

Here's another way to start mysql while also setting some configuration variables to optimize for our environment.

docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql --innodb_buffer_pool_size=12G --innodb_log_buffer_size=1G --innodb_log_file_size=4G

Now, this isn't exactly a mysql performance tuning tutorial, so I'm giving the high level points here.

We want to see the innodb_log_buffer_size to roughly 80% of memory. The data and indexes are cached in the buffer pool so setting this high allows us to use as much of the available memory as possible.

--innodb_buffer_pool_size=12G

The innodb_log_buffer_size is the buffer for uncommitted transactions and if set too low you'll often see very high I/O.

--innodb_log_buffer_size=1G

The log file size is a trade off between performance (set higher) and recovery time (set lower).

--innodb_log_file_size=4G

This is barely scratching the surface on these configurations, or the many others not mentioned here. Obviously, figure out what fits your environment best.

Git clone SBDS

Now that mysql is running we want to download SBDS.

We first cd to the block storage as we'll want to store this on that as well.

cd /mnt/$volume_name
git clone https://github.com/steemit/sbds.git
cd /mnt/$volume_name/sbds
git fetch origin pull/81/head
git checkout -b fixes-for-tables FETCH_HEAD

After we clone, we do a checkout of a pull request that addresses some real issues with SBDS that have not yet made it into the master branch. Basically, there are a few fields that are set to TEXT by default but actually need MEDIUMTEXT and this fixes all that for us. I have tested this extensively and it clearly resolved those errors that I was seeing before running this branch.

Start SBDS

After we get the code pulled down, now we need to run it.

Because MySQL is running in docker, we need to look at the docker networking to figure out how to connect to MySQL. This command calls docker inspect to get the IP we can connect on and we set it to the mysql_ip variable.

mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`

Next, we update the SBDS config to use our MySQL database, rather than the default.

sed -i -e "s/sqlite\:\/\/\/\/tmp\/sqlite\.db/mysql\:\/\/root\:$mysql_password\@$mysql_ip:3306\/steem/g" /mnt/$volume_name/sbds/Dockerfile

The public node specified in the config was retired on January 6, 2018 so this just updates that to a different server.

sed -i -e "s/steemd\.steemitdev\.com/api\.steemit\.com/g" /mnt/$volume_name/sbds/Dockerfile

Then, we just move to the SBDS directory, build, and run.

cd /mnt/$volume_name/sbds
docker build -t sbds .
docker run --name steem_sbds -p 8080:8080 -p 9191:9191 --link steem_mysql:mysql sbds

Additional commands

Once everything is up and running, here's a few commands you might find useful.

First of all, if you are connecting remotely, you can just use the public IP of this server and the standard port 3306. However, if you want to connect locally, we need to look to the docker networking config as we touched on earlier.

If we run this command, it will return the IP that we can then use to connect with mysql

# docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql
172.17.0.2
# mysql -h 172.17.0.2 -u root -p
Enter password:

You can check to make sure you're not behind by getting your last block from MySQL.

SELECT MAX(block_num) AS lastblock FROM sbds_core_blocks

Then see how close you are to the head_block_number here: https://api.steemjs.com/getDynamicGlobalProperties

I pretty reliably stay about 20 blocks behind, which is exactly 1 minute and more than reasonable.

You can see the active docker containers as follows.

# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                            NAMES
5d213dddabc8        sbds                "/sbin/my_init"          2 weeks ago         Up 2 days           0.0.0.0:8080->8080/tcp, 0.0.0.0:9191->9191/tcp   steem_sbds
da2401c28bad        mysql               "docker-entrypoint.sh"   3 weeks ago         Up 4 days           0.0.0.0:3306->3306/tcp                           steem_mysql

If you run this but one or both containers are missing, try this.

docker ps -a

This will show all containers, whether running or not.

If you find that you've fallen behind on blocks, you can stop SBDS and restart to try to resolve.

docker stop steem_sbds
docker start steem_sbds

You can obviously stop and start MySQL this way, though it has run very reliably for me so I presume you should have few issues. If you do decide to stop MySQL, first stop SBDS itself otherwise it will continue to attempt to write to the database.

If you have any issues at all, you can take a look at the docker logs.

You can actively follow the logs like this.

docker logs --tail=1 --follow steem_sbds

To dump all of the logs to a file so you can parse through, use this command.

docker logs steem_sbds > ./sbds.log 2>&1

Conclusion

I've had a lot of fun learning about SBDS and running into countless issues. I plan to continue documenting everything I find and sharing with the community to support anyone who wants to have a local copy of the blockchain in MySQL.

Please comment or catch me on steemit.chat with your experiences or issues you run into!

The Script

#!/bin/bash
###############
## VARIABLES ##
###############
echo "$(date) set variables" >> /var/log/sbds_install.log
mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"

##################
## MOUNT VOLUME ##
##################
echo "$(date) mount volume" >> /var/log/sbds_install.log
sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_$volume_name
sudo mkdir -p /mnt/$volume_name; 
sudo mount -o discard,defaults /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name; 
echo /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name ext4 defaults,nofail,discard 0 0 | sudo tee -a /etc/fstab

#################
## YUM INSTALL ##
#################
echo "$(date) yum install" >> /var/log/sbds_install.log
yum -y install docker git wget

##################
## START DOCKER ##
##################
echo "$(date) start docker" >> /var/log/sbds_install.log
docker_volume_name=`echo $volume_name | sed -e "s/-/\\\\\-/g"`
sed -i -e "s/{}/{\"graph\": \"\/mnt\/$docker_volume_name\"}/g" /etc/docker/daemon.json
systemctl start docker
systemctl enable docker

####################################
## INSTALL MYSQL CLIENT UTILITIES ##
####################################
echo "$(date) mysql utilities" >> /var/log/mysbds_install.log
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install -y mysql-community-client

#################
## START MYSQL ##
#################
echo "$(date) start mysql" >> /var/log/sbds_install.log
docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql 

###############
## GIT CLONE ##
###############
echo "$(date) git clone" >> /var/log/sbds_install.log
cd /mnt/$volume_name
git clone https://github.com/steemit/sbds.git
cd /mnt/$volume_name/sbds
git fetch origin pull/81/head
git checkout -b fixes-for-tables FETCH_HEAD

################
## START SBDS ##
################
echo "$(date) build/run sbds" >> /var/log/sbds_install.log
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
sed -i -e "s/sqlite\:\/\/\/\/tmp\/sqlite\.db/mysql\:\/\/root\:$mysql_password\@$mysql_ip:3306\/steem/g" /mnt/$volume_name/sbds/Dockerfile
sed -i -e "s/steemd\.steemitdev\.com/api\.steemit\.com/g" /mnt/$volume_name/sbds/Dockerfile
cd /mnt/$volume_name/sbds
docker build -t sbds .
docker run --name steem_sbds -p 8080:8080 -p 9191:9191 --link steem_mysql:mysql sbds
echo "$(date) END install" >> /var/log/sbds_install.log

Posted on Utopian.io - Rewarding Open Source Contributors

#steemdev #steemitdev #mysql #docker

8 years ago in #utopian-io by blervin (54)

$18.20

Sort:

Trending

[-]

twodollars (52) 8 years ago

Thanks again for putting together such helpful information. This is the way I think people should make tutorials. This is even better than your previous post and that one was very helpful.

I had a question about your mention that the size of the blockchain was currently 417GB in mysql. Do you get that from the amount of space used on /mnt/ or from the size of /_data/ or from a mysql query. My /_data/ directory is currently using 388GB, and I want to make sure that some of the issues I had early on didn't result in missing information.

$0.69

2 votes

[-]

blervin (54) 8 years ago

Our discussion in my earlier post formed much of this post so thanks again for sharing everything!

So I'm one folder deeper in _data/steem/ running this command specifically:

# du -sh _data/steem/
419G    _data/steem/

I'm also surprised I'm at 419GB (already up 2GB!) and you're at 388GB so here's the exact tables that are at least a GB in size.

# ls -lh _data/steem/ | egrep '[0-9]G'
total 419G
-rw-r-----. 1 polkitd ssh_keys 255G Mar  1 21:43 sbds_core_blocks.ibd
-rw-r-----. 1 polkitd ssh_keys 1.5G Mar  1 21:43 sbds_tx_claim_reward_balances.ibd
-rw-r-----. 1 polkitd ssh_keys  77G Mar  1 21:43 sbds_tx_comments.ibd
-rw-r-----. 1 polkitd ssh_keys  25G Mar  1 21:43 sbds_tx_custom_jsons.ibd
-rw-r-----. 1 polkitd ssh_keys 2.5G Mar  1 21:43 sbds_tx_transfers.ibd
-rw-r-----. 1 polkitd ssh_keys  57G Mar  1 21:43 sbds_tx_votes.ibd

I don't have a good explanation for why we'd see different sizes, so I'm pretty curious about this.