Block.ops - An Analysis Tool - The First Completed Analysis!

Some frantic coding over the last three days. Lots of progress to report!


gears_blockops_pink.png

For the last year I have carried out a monthly analysis of the Steem blockchain activity by application (i.e. by the different websites and applications through which you can post to the Steem blockchain - often termed dApps).

My aim is now to build a tool that can automate such complex analyses of Steem data, providing both historic time series and rapidly updatable real-time results. In addition to the dApp analysis there are many other projects for which such a system could be useful.

This tool is Block.ops.

You can read all about block.ops (including the project aims, technology stack, roadmap and how to contribute) in the introductory post here:
https://steemit.com/utopian-io/@miniature-tiger/block-ops-an-analysis-tool-1537300276791


Repository

https://github.com/miniature-tiger/block.ops


New Features

Mapping and Storing Payout Operations

For each individual block pulled from the blockchain within the data loading process there are a host of different operations that could be present. These operations need to be translated, mapped and inserted to MongoDB.

My first contribution included tackling the "comment" operation since it is the primary operation for the market share and many other analyses. This commit handles the payout operations: author_payout, benefactor_payout and curator_payout. The "Upsert" option has been used with MongoDB which creates a new record even if the initial comment has not been processed. This is to allow blocks to be processed in any order.

The data structure for a single comment within MongoDB now looks like this:
commentstructure.png

The code changes are here:
https://github.com/miniature-tiger/block.ops/commit/120fc51dbd489fbde4f1870a8734ce0cdcb01dbe

Patch for Setup

A patch was required for the blockDates setup following the hardfork20 issues. This was simply due to the large number of missing blocks making the estimation of the first block of the day inaccurate. The workaround estimation technique has been improved and should now be robust.

The code changes are in the above commit (see Mapping and Storing Payout Operations section above).

Enhancing Block Operations Loop

The first version of the block operations loop (used to draw the blocks of data from the blockchain) ran from an opening block number to a closing block number, processing all blocks. The opening and closing block numbers are based on date parameters. This approach has now been enhanced as follows:

  • The loop queries MongoDB to find out which of the scheduled blocks have previously been processed (block numbers which have been processed are recorded in MongoDB with an OK status or Error status).
  • An array of blocksNumbers is returned to the block operations loop representing blocks with Error status or blocks which have not previously been processed.
  • The loop then processes only the blocks listed in the array. Blocks with OK status are not reprocessed.
  • To avoid passing too large an array of blockNumbers this process is split into segments (currently set to an array of 1000 blocks but this choice is a parameter).

This approach allows for the following cases:

  • A complete day of blocks are run by the block operations loop using the filloperations command and one block fails to process (typically due to a connection problem). On completion of the loop the filloperations command can simply be reentered with the same parameters and the loop will pick up and process the one outstanding block.
  • An analysis is run based on a one day timescale as a trial and now the analysis is to be run for the whole week. The loop will skip over the previously processed day of blocks - for example if these blocks are in the middle of the week to be analysed.

These report statuses illustrate the approach:
reportstatus1.pngreportstatus2.png

The code changes are here:
https://github.com/miniature-tiger/block.ops/commit/c714b57e6455ac320082701d19e962ef472fb41b

Function to list sample comments

A smaller change is the addition of a function which lists comments for an application to allow checking of the data structure of individual records from Mongo. This is useful in understanding the data structure for the creation of an analysis.

The code changes are in the above commit (see Enhancing Block Operations Loop above).

Improvements to Market Share by dApps reporting

Having worked on the block operations loop to pull the data from the blockchain and on the translation of operations into records for insertion to MongoDB, the next two commits focus on using this data on the analysis side and in particular on the market share analysis.

This commit tackles the selection of data within MongoDB and reworks the MongoDB aggregation.

  • Data for the analysis is matched to a chosen date range.
  • A double grouping aggregation is employed to capture the number of distinct authors by application. Records are first grouped by application and author and then by application. This allows a count across distinct authors within each application.
  • The final aggregation now includes posts, author numbers and post payouts* by application (* separately in SBD, Steem and Vests - this will require translation to STU).

The code changes are here:
https://github.com/miniature-tiger/block.ops/commit/84110c840ba54bed2c8d86574308ccd0c547ae04

Postprocessing and CSV exporting

I decided to separate out the post-processing of data into a separate file and process from the MongoDB data collation and aggregation.

The process for the marketshare analysis can now be illustrated as follows:

  • Data loading: Extract blocks from Steem AppBase for required date period (on a loop) ---> Translate each operation in each block into a record / record update (on a loop) ---> Store each record in MongoDB
  • Analysis: Extract required data from MongoDB including aggregation as necessary ---> Processing of data / results ---> Export of results

This commit adds:

  • The post processing for the market share analysis including data formatting, collation of 'other' category, and ranking statistics.
  • A data export facility to convert the final results to CSV format (also included in the postprocessing.js file).

So now the process can be run pretty much all the way through for an example analysis. These results are the market share statistics and rankings by application for all posts and comments made on 12 September 2018.

marketshare12Sep.png

Progress!

The code changes are here:
https://github.com/miniature-tiger/block.ops/commit/ba22ec4047c22b2353fc8316fff2b3da6753e634


GitHub Account

My account on github is here:
https://github.com/miniature-tiger

Sort:  

Thank you for your contribution. Few thoughts,

  • +1 for the commit messages, they are really descriptive.
  • I see a lot of console messages, you should try to use different logging method if you intend to use it for production

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thanks @codingdefined!

Do you know of any good resources, i.e. reading materials, on logging methods? Or do you have a preferred tool or approach that you can recommend? It's something I've not worked with at all so far.

Congratulations! Your post has been selected as a daily Steemit truffle! It is listed on rank 21 of all contributions awarded today. You can find the TOP DAILY TRUFFLE PICKS HERE.

I upvoted your contribution because to my mind your post is at least 9 SBD worth and should receive 223 votes. It's now up to the lovely Steemit community to make this come true.

I am TrufflePig, an Artificial Intelligence Bot that helps minnows and content curators using Machine Learning. If you are curious how I select content, you can find an explanation here!

Have a nice day and sincerely yours,
trufflepig
TrufflePig

Hi @miniature-tiger!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server