Thanks again for putting together such helpful information. This is the way I think people should make tutorials. This is even better than your previous post and that one was very helpful.
I had a question about your mention that the size of the blockchain was currently 417GB in mysql. Do you get that from the amount of space used on /mnt/ or from the size of /_data/ or from a mysql query. My /_data/ directory is currently using 388GB, and I want to make sure that some of the issues I had early on didn't result in missing information.
Our discussion in my earlier post formed much of this post so thanks again for sharing everything!
So I'm one folder deeper in
_data/steem/
running this command specifically:I'm also surprised I'm at 419GB (already up 2GB!) and you're at 388GB so here's the exact tables that are at least a GB in size.
I don't have a good explanation for why we'd see different sizes, so I'm pretty curious about this.
I just created a new digitalocean droplet (the 16GB droplet) to try and see what the database size should be, and I'm just following the script you show in this post. Hopefully the database size will end up being the same as yours (currently around 419G) or mine (currently around 370G) and not be some new number. Right now the mysql database size per the size of _data/steem on the new droplet is at 5G, and I plan to periodically check on the progress of the database size over the next few days.
One thing I noticed is that "mysql -h 172.17.0.2 -u root -p" gives the error "-bash: mysql: command not found" and I wanted to bring that to your attention because I think it means you'll want to adjust the script you show in your post.
One other comment I wanted to make regarding tests with droplets and volumes is that the storage volumes that are created when creating a droplet aren't actually deleted when the droplet is destroyed and there is a separate step to destroy a volume. I actually had three $50/month volumes that I wasn't using but still being charged for which I incorrectly assumed were deleted when I deleted the corresponding droplet. Fortunately, however, I noticed them and deleted them after only a few days.
I'll be interested to see where the new droplet comes out, like you said, hopefully not a third new size.
And good catch! I've added
INSTALL MYSQL CLIENT UTILITIES
to the script here now so that'll be installed as well.Sorry about the extra volumes, I'm glad you caught that early before the cost added up too much. DigitalOcean gets expensive quickly when you get to the larger sizes.
Both my servers are caught up to block 20410475. The _data/steem on the most recent server shows as 437G and the _data/steem on my other server (the one I suspect to be wrong) shows as 377G. it would be nice if your server was at 437G around block 20410475 too, but I'm thinking you probably aren't because you were at 419G four days ago.
So, I was running some queries, joining between
sbds_tx_comments
andsbds_tx_votes
and found that there was no index onpermlink
in thesbds_tx_votes
table so my join was pretty slow.First of all, adding an index to a 46GB table is basically impossible so I just created a new duplicate table with no data and added an index on that field, then inserted a few thousand records to compare the performance against the primary table. The difference was significant.
So, now I'm carefully re-inserting the entire
sbds_tx_votes
into my newsbds_tx_votes_copy
table with the additional index.I stopped SBDS to avoid new writes while this is running, and I have a new table now too! So, it'll probably be a day or two before I'm back up to the head_block again, and my numbers might be a little different now with the index, but I'll post an update when that's back up.
And I think I might throw in an
ALTER TABLE sbds_tx_votes ADD INDEX...
in my script so this is implemented at the start.Back up to sync so I wanted to get some updated totals for you.
So, I'm a little confused now. My total size is now lower than yours from two days ago!
Here's the biggest tables. I thought adding an index to
sbds_tx_votes
would have more of an impact on size, but when I still had both tables they were very close so I'm not sure if that has any impact here.Yes, I'm confused by things too. Here is my current data for my most recent server around block number 20475678. I was very careful to follow the script with this server.
Note: I tried comparing this server with the other server I have running sbds to see if I could identify why one was bigger than the other, and I notice that the data contained in the 'raw' field in the 'sbds_core_blocks' table doesn't seem to match on both servers. For example, the 'raw' field for 'block_num' 20470001 on one server is different than the same raw field on the other server.
Speaking of expensive, this hit my bank account yesterday:
I was spinning up massive instances and playing around this month, obviously far more than I thought. This definitely was not in the budget so looks like this will be an interesting month.
Hopefully my mistake is a lesson for others to be aware of their usage and avoid surprises like this.
Wow that is a lot of money to spend playing around. Hopefully it will be well worth it for you in the long run as this opens up so many possibilities. Thanks for the reminder about how costs can quickly add up because I'm spending at a $260/month pace, but I should be able to cut that back after my latest droplet test gives us another number for the total database size. So far after 18 hours the new droplet is at 159G so if everything continues to go well, then we may have another number for total steem mysql database size by tomorrow night.
The difference is actually larger since 388GB is 370G. The issue is most likely on my end, but I don't think I did anything that should cause data to be missing other than have steem_mysql shut down on its own a bunch of times.