SBDS to MySQL in Docker
Steem Blockchain Data Service in MySQL on Docker
What Will I Learn?
We'll walk through the steps to create a script that will deploy a cloud server at DigitalOcean on flexible storage running the Steem Blockchain Data Service and MySQL in docker.
- Deploy a Storage Volume to hold the data
- Deploy a CentOS server to run SBDS and MySQL in Docker
Requirements
You will need an account at DigitalOcean to deploy a cloud server and related storage. Also, it is presumed that you are comfortable in the shell and understand how to connect to the server via SSH and perform basic systems admin work on a Linux platform and have some experience or are comfortable with docker.
Difficulty
While I have simplified nearly everything herein, this is all still pretty complex. Even so, you can literally copy/paste this script and be up and running in minutes.
- Intermediate
Tutorial Contents
Here's everything we'll cover in this tutorial.
- Deploy SBDS
- Overview
- Initial Setup
- Prepare Script
- Create Droplet and Volume
- Confirm Script Execution
- How this script works
- Initial config
- Mount volume
- Yum install
- Docker install and config
- Start MySQL
- MySQL Config
- Git clone SBDS
- Start SBDS
- Additional commands
- Conclusion
- The Script
Deploy SBDS
Overview
At the time of this writing, the size of the MySQL database of the entire blockchain is 417GB, and growing. To handle this growth, we rely on Block Storage at DigitalOcean to deploy flexible storage at any size for $0.10/GB per month and increase that size at any time later as the blockchain grows.
We'll walk through the steps to deploy a cloud server at DigitalOcean using Block Storage running the Steem Blockchain Data Service and MySQL in docker.
Initial Setup
We need to ensure we deploy our droplet in a region that supports block storage. The supported regions for volumes are:
- NYC1
- NYC3
- SFO2
- FRA1
- SGP1
- TOR1
- BLR1
- LON1
- AMS3
To run SBDS reliably you will need at least 4GB of memory, and unlike steem nodes SBDS is multithreaded so it will use ALL the CPU you throw at it.
I'd give an estimate for times to get up to the current blocks, but in my testing it varies a great deal depending on a lot of factors, including the node we're connecting to grab blocks. In general, it is always many hours but it may easily become a few days before it completes.
Prepare Script
Start by opening your favorite text editor such as Sublime Text
or just Notepad
and grab The Script
from the end of this post.
The first modification is to change the mysql_password
in your copy of the script. Make that change and save your copy of the script. We'll modify this a bit more shortly.
mysql_password="mystrongcomplexpassword"
Create Droplet and Volume
We start by clicking the Create
button and selecting Droplets
Then, we select CentOS and 7.4 x64 from the drop-down:
As we look to Choose a size
you'll notice there are Standard Droplets
and Optimized Droplets
available. We can later expand our instance, however that expansion is only within this droplet type. You cannot deploy a Standard and later convert it to an Optimized instance, or vice versa.
Next, click Add Volume
to add block storage.
Given the current size of the Steem blockchain, I recommend using at least 500GB.
Next, we'll pick a region.
As you'll see, only the supported regions are selectable with all others grayed out.
Everything here around the volume name presumes a lot and those presumptions may be wrong. By default, DigitalOcean names a Volume as volume-{region}-{volume_index}
where the index is essentially the count of Volumes you have in that region. Our approach here works in most scenarios, but if you have issues mounting your volume just look to the actual name displayed in DigitalOcean once it is created.
That said, we look to the following line in our script setting the volume_name
variable:
volume_name="volume-tor1-01"
There are two parts to this name, the region (tor1
here) and the index of the volume (01
in this case) for that region.
If you do not have any volumes in the region you are using, this should deploy with the 01
index. If you have existing volumes in that region, it will increment to the next number. Update the volume_name
accordingly.
Next, we check the User data
box and paste in our modified script:
Now, we can scroll down and click the Create
button.
The page will refresh and bring us back to the list of droplets and you'll see this one building. Once it's complete, the IP address will appear and you can click the Copy
link to grab that and paste into your SSH client such as Putty.
Confirm Script Execution
The deploy will take a few minutes and you should check for the install email from DigitalOcean to get the password. Once you are logged in, check the script execution status by tailing the log we created.
tail -fn5 /var/log/sbds_install.log
Somewhere between a few hours and a few days, you'll finally see:
END install
Just sit back patiently until everything fully syncs up.
How this script works
There are a lot of moving parts in this script so we'll walk through each step, along with a few tweaks along the way that you can make to fit your specific environment better.
Initial config
Everything we're doing here takes a LONG time, like up to days sometimes, so we want some way to keep track of our progress. The opening line of this script starts the logging to /var/log/sbds_install.log
and we can follow allong there to check in on the script. You'll see more of this logging in the script itself, but I'll exclude it here.
echo "$(date) set variables" >> /var/log/sbds_install.log
Then, we just set a few basic variables
mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"
Mount volume
Next, we need to format and mount the block storage. This entire command is the same that DigitalOcean provides in their Config instructions
for the volume, with some variable replacement for our full path.
sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_$volume_name
sudo mkdir -p /mnt/$volume_name;
sudo mount -o discard,defaults /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name;
echo /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name ext4 defaults,nofail,discard 0 0 | sudo tee -a /etc/fstab
Yum install
Then, we use yum to install a few things.
yum -y install docker git wget
Docker install and config
Next, we need to start docker. This is where it gets a little tricky.
We need to update the docker configuration to use the block storage we've mounted rather than the usual /var/lib/docker
path. This is set in /etc/docker/daemon.json
In order to automate the update of this path, we need to do a little magic to the string. I'm not going to walk through exactly what's happening here because you might start hyperventilating (I did a little).
This first line rewrites the string containing the volume name slightly so our replacement using sed
on the next line works properly.
docker_volume_name=`echo $volume_name | sed -e "s/-/\\\\\-/g"`
sed -i -e "s/{}/{\"graph\": \"\/mnt\/$docker_volume_name\"}/g" /etc/docker/daemon.json
In short, it makes the /etc/docker/daemon.json
look like this.
# cat /etc/docker/daemon.json
{"graph": "/mnt/volume-tor1-01"}
Then we start docker and enable it to start automatically.
systemctl start docker
systemctl enable docker
Start MySQL
Now that docker is running, we can start mysql. Here's where we can start to make changes specific to our environment.
If you are running just a 4GB environment, then the standard start should work fine.
docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql
This is running mysql
in a detatached container named steem_mysql
using the password we specified in the variables at the start.
MySQL Config
Here's another way to start mysql while also setting some configuration variables to optimize for our environment.
docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql --innodb_buffer_pool_size=12G --innodb_log_buffer_size=1G --innodb_log_file_size=4G
Now, this isn't exactly a mysql
performance tuning tutorial, so I'm giving the high level points here.
We want to see the innodb_log_buffer_size
to roughly 80% of memory. The data and indexes are cached in the buffer pool so setting this high allows us to use as much of the available memory as possible.
--innodb_buffer_pool_size=12G
The innodb_log_buffer_size
is the buffer for uncommitted transactions and if set too low you'll often see very high I/O.
--innodb_log_buffer_size=1G
The log file size is a trade off between performance (set higher) and recovery time (set lower).
--innodb_log_file_size=4G
This is barely scratching the surface on these configurations, or the many others not mentioned here. Obviously, figure out what fits your environment best.
Git clone SBDS
Now that mysql
is running we want to download SBDS.
We first cd
to the block storage as we'll want to store this on that as well.
cd /mnt/$volume_name
git clone https://github.com/steemit/sbds.git
cd /mnt/$volume_name/sbds
git fetch origin pull/81/head
git checkout -b fixes-for-tables FETCH_HEAD
After we clone, we do a checkout of a pull request that addresses some real issues with SBDS that have not yet made it into the master
branch. Basically, there are a few fields that are set to TEXT
by default but actually need MEDIUMTEXT
and this fixes all that for us. I have tested this extensively and it clearly resolved those errors that I was seeing before running this branch.
Start SBDS
After we get the code pulled down, now we need to run it.
Because MySQL is running in docker, we need to look at the docker networking to figure out how to connect to MySQL. This command calls docker inspect
to get the IP we can connect on and we set it to the mysql_ip
variable.
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
Next, we update the SBDS config to use our MySQL database, rather than the default.
sed -i -e "s/sqlite\:\/\/\/\/tmp\/sqlite\.db/mysql\:\/\/root\:$mysql_password\@$mysql_ip:3306\/steem/g" /mnt/$volume_name/sbds/Dockerfile
The public node specified in the config was retired on January 6, 2018 so this just updates that to a different server.
sed -i -e "s/steemd\.steemitdev\.com/api\.steemit\.com/g" /mnt/$volume_name/sbds/Dockerfile
Then, we just move to the SBDS directory, build, and run.
cd /mnt/$volume_name/sbds
docker build -t sbds .
docker run --name steem_sbds -p 8080:8080 -p 9191:9191 --link steem_mysql:mysql sbds
Additional commands
Once everything is up and running, here's a few commands you might find useful.
First of all, if you are connecting remotely, you can just use the public IP of this server and the standard port 3306. However, if you want to connect locally, we need to look to the docker networking config as we touched on earlier.
If we run this command, it will return the IP that we can then use to connect with mysql
# docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql
172.17.0.2
# mysql -h 172.17.0.2 -u root -p
Enter password:
You can check to make sure you're not behind by getting your last block from MySQL.
SELECT MAX(block_num) AS lastblock FROM sbds_core_blocks
Then see how close you are to the head_block_number here: https://api.steemjs.com/getDynamicGlobalProperties
I pretty reliably stay about 20 blocks behind, which is exactly 1 minute and more than reasonable.
You can see the active docker containers as follows.
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5d213dddabc8 sbds "/sbin/my_init" 2 weeks ago Up 2 days 0.0.0.0:8080->8080/tcp, 0.0.0.0:9191->9191/tcp steem_sbds
da2401c28bad mysql "docker-entrypoint.sh" 3 weeks ago Up 4 days 0.0.0.0:3306->3306/tcp steem_mysql
If you run this but one or both containers are missing, try this.
docker ps -a
This will show all containers, whether running or not.
If you find that you've fallen behind on blocks, you can stop SBDS and restart to try to resolve.
docker stop steem_sbds
docker start steem_sbds
You can obviously stop and start MySQL this way, though it has run very reliably for me so I presume you should have few issues. If you do decide to stop MySQL, first stop SBDS itself otherwise it will continue to attempt to write to the database.
If you have any issues at all, you can take a look at the docker logs.
You can actively follow the logs like this.
docker logs --tail=1 --follow steem_sbds
To dump all of the logs to a file so you can parse through, use this command.
docker logs steem_sbds > ./sbds.log 2>&1
Conclusion
I've had a lot of fun learning about SBDS and running into countless issues. I plan to continue documenting everything I find and sharing with the community to support anyone who wants to have a local copy of the blockchain in MySQL.
Please comment or catch me on steemit.chat with your experiences or issues you run into!
The Script
#!/bin/bash
###############
## VARIABLES ##
###############
echo "$(date) set variables" >> /var/log/sbds_install.log
mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"
##################
## MOUNT VOLUME ##
##################
echo "$(date) mount volume" >> /var/log/sbds_install.log
sudo mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_$volume_name
sudo mkdir -p /mnt/$volume_name;
sudo mount -o discard,defaults /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name;
echo /dev/disk/by-id/scsi-0DO_Volume_$volume_name /mnt/$volume_name ext4 defaults,nofail,discard 0 0 | sudo tee -a /etc/fstab
#################
## YUM INSTALL ##
#################
echo "$(date) yum install" >> /var/log/sbds_install.log
yum -y install docker git wget
##################
## START DOCKER ##
##################
echo "$(date) start docker" >> /var/log/sbds_install.log
docker_volume_name=`echo $volume_name | sed -e "s/-/\\\\\-/g"`
sed -i -e "s/{}/{\"graph\": \"\/mnt\/$docker_volume_name\"}/g" /etc/docker/daemon.json
systemctl start docker
systemctl enable docker
####################################
## INSTALL MYSQL CLIENT UTILITIES ##
####################################
echo "$(date) mysql utilities" >> /var/log/mysbds_install.log
wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install -y mysql-community-client
#################
## START MYSQL ##
#################
echo "$(date) start mysql" >> /var/log/sbds_install.log
docker run -d --name steem_mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=$mysql_password -e MYSQL_DATABASE=steem mysql
###############
## GIT CLONE ##
###############
echo "$(date) git clone" >> /var/log/sbds_install.log
cd /mnt/$volume_name
git clone https://github.com/steemit/sbds.git
cd /mnt/$volume_name/sbds
git fetch origin pull/81/head
git checkout -b fixes-for-tables FETCH_HEAD
################
## START SBDS ##
################
echo "$(date) build/run sbds" >> /var/log/sbds_install.log
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
sed -i -e "s/sqlite\:\/\/\/\/tmp\/sqlite\.db/mysql\:\/\/root\:$mysql_password\@$mysql_ip:3306\/steem/g" /mnt/$volume_name/sbds/Dockerfile
sed -i -e "s/steemd\.steemitdev\.com/api\.steemit\.com/g" /mnt/$volume_name/sbds/Dockerfile
cd /mnt/$volume_name/sbds
docker build -t sbds .
docker run --name steem_sbds -p 8080:8080 -p 9191:9191 --link steem_mysql:mysql sbds
echo "$(date) END install" >> /var/log/sbds_install.log
Posted on Utopian.io - Rewarding Open Source Contributors
Thanks again for putting together such helpful information. This is the way I think people should make tutorials. This is even better than your previous post and that one was very helpful.
I had a question about your mention that the size of the blockchain was currently 417GB in mysql. Do you get that from the amount of space used on /mnt/ or from the size of /_data/ or from a mysql query. My /_data/ directory is currently using 388GB, and I want to make sure that some of the issues I had early on didn't result in missing information.
Our discussion in my earlier post formed much of this post so thanks again for sharing everything!
So I'm one folder deeper in
_data/steem/
running this command specifically:I'm also surprised I'm at 419GB (already up 2GB!) and you're at 388GB so here's the exact tables that are at least a GB in size.
I don't have a good explanation for why we'd see different sizes, so I'm pretty curious about this.
I just created a new digitalocean droplet (the 16GB droplet) to try and see what the database size should be, and I'm just following the script you show in this post. Hopefully the database size will end up being the same as yours (currently around 419G) or mine (currently around 370G) and not be some new number. Right now the mysql database size per the size of _data/steem on the new droplet is at 5G, and I plan to periodically check on the progress of the database size over the next few days.
One thing I noticed is that "mysql -h 172.17.0.2 -u root -p" gives the error "-bash: mysql: command not found" and I wanted to bring that to your attention because I think it means you'll want to adjust the script you show in your post.
One other comment I wanted to make regarding tests with droplets and volumes is that the storage volumes that are created when creating a droplet aren't actually deleted when the droplet is destroyed and there is a separate step to destroy a volume. I actually had three $50/month volumes that I wasn't using but still being charged for which I incorrectly assumed were deleted when I deleted the corresponding droplet. Fortunately, however, I noticed them and deleted them after only a few days.
I'll be interested to see where the new droplet comes out, like you said, hopefully not a third new size.
And good catch! I've added
INSTALL MYSQL CLIENT UTILITIES
to the script here now so that'll be installed as well.Sorry about the extra volumes, I'm glad you caught that early before the cost added up too much. DigitalOcean gets expensive quickly when you get to the larger sizes.
Both my servers are caught up to block 20410475. The _data/steem on the most recent server shows as 437G and the _data/steem on my other server (the one I suspect to be wrong) shows as 377G. it would be nice if your server was at 437G around block 20410475 too, but I'm thinking you probably aren't because you were at 419G four days ago.
So, I was running some queries, joining between
sbds_tx_comments
andsbds_tx_votes
and found that there was no index onpermlink
in thesbds_tx_votes
table so my join was pretty slow.First of all, adding an index to a 46GB table is basically impossible so I just created a new duplicate table with no data and added an index on that field, then inserted a few thousand records to compare the performance against the primary table. The difference was significant.
So, now I'm carefully re-inserting the entire
sbds_tx_votes
into my newsbds_tx_votes_copy
table with the additional index.I stopped SBDS to avoid new writes while this is running, and I have a new table now too! So, it'll probably be a day or two before I'm back up to the head_block again, and my numbers might be a little different now with the index, but I'll post an update when that's back up.
And I think I might throw in an
ALTER TABLE sbds_tx_votes ADD INDEX...
in my script so this is implemented at the start.Back up to sync so I wanted to get some updated totals for you.
So, I'm a little confused now. My total size is now lower than yours from two days ago!
Here's the biggest tables. I thought adding an index to
sbds_tx_votes
would have more of an impact on size, but when I still had both tables they were very close so I'm not sure if that has any impact here.Yes, I'm confused by things too. Here is my current data for my most recent server around block number 20475678. I was very careful to follow the script with this server.
Note: I tried comparing this server with the other server I have running sbds to see if I could identify why one was bigger than the other, and I notice that the data contained in the 'raw' field in the 'sbds_core_blocks' table doesn't seem to match on both servers. For example, the 'raw' field for 'block_num' 20470001 on one server is different than the same raw field on the other server.
Speaking of expensive, this hit my bank account yesterday:
I was spinning up massive instances and playing around this month, obviously far more than I thought. This definitely was not in the budget so looks like this will be an interesting month.
Hopefully my mistake is a lesson for others to be aware of their usage and avoid surprises like this.
Wow that is a lot of money to spend playing around. Hopefully it will be well worth it for you in the long run as this opens up so many possibilities. Thanks for the reminder about how costs can quickly add up because I'm spending at a $260/month pace, but I should be able to cut that back after my latest droplet test gives us another number for the total database size. So far after 18 hours the new droplet is at 159G so if everything continues to go well, then we may have another number for total steem mysql database size by tomorrow night.
The difference is actually larger since 388GB is 370G. The issue is most likely on my end, but I don't think I did anything that should cause data to be missing other than have steem_mysql shut down on its own a bunch of times.
Are your tests still staying current with the head_block_number? My new test that was at 169G about 18 hours ago stopped growing at 177G and I can't get it to pick back up again. Also, my other droplet that was typically within 20 blocks of the current block is now 10,000+ blocks behind. Are you experiencing similar issues?
Generally, I stay within about 20 blocks, but sometimes I'll see exactly what you're describing.
I just stop SBDS, restart, and watch the logs:
It'll sit on 'finding' for quite awhile, but soon enough you'll see it writing new blocks.
I'm trying to write a monitoring script that can be called by cron to do all of this automatically.
A monitoring script sounds like a great idea. I'd be happy to help test it out. Various issue's I've experienced are where block height stops going up or one or both of steem_mysql or steem_sbds stops.
That's exactly what I'm seeing, and every time I fix it, the solution is something that could have been scripted. I'll definitely ping you when I have something together for testing.
Hey @blervin I am @utopian-io. I have just upvoted you!
Achievements
Suggestions
Get Noticed!
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
Wow. Mind blown here
I appreciate that! If you dive into the madness feel free to comment here with questions.
Upvoted ☝ Have a great day!
Upvoted ☝ Have a great day!
Awesome tutorial. I tried sbds with mysql before and found an issue with the fields being truncated because of the size of the data. Did you got problems with this? I forked the sbds repo in order to replace the normal text fields with long texts. Look at my changes:
https://github.com/Guitlle/sbds/commit/d30c7d67205a34f7df6318b078950a192dcf8396
Can you check if you are getting truncation warnings in the logs?
I had this exact same problem, but then I found this pull request that changes the fields to
MEDIUMTEXT
and resolves everything.In my script I use that pull request instead of the
master
branch.