Steem transaction signing in a nutshell

in #steem8 years ago (edited)

This article is for developers that are trying to implement transaction signing for the Steem (or BitShares) blockchain in their favorite language. It gives a brief introduction of how transactions look like, how they are constructed and most importantly, how they are signed so that they are included in a block.

I am writing this article because it took me almost 2 years to have it fully implemented in python-steem and python-graphene. If it wasn't for those 2 years, we wouldn't be able to use piston today.
Further, I hope that this tutorial allows people to quickly get a picture of what is going on so that they can dig deeper.

What's transaction signing?

Transaction signing, in general, starts with an intention of the user to do something. This intention affects his account and thus needs the authorization of that account by means of a valid (in our case cryptographic) signature. In contrast to physical paper signatures, this digital signature needs to be tied to the actual intention (the transaction), otherwise it could be copied over to other transactions and used multiple times. For that reasons, every transaction is signed independently, and a signature is only valid for that particular transaction.

This tutorial shows how the signature is derived given a particular intention/transaction.

Let's get started

In our case, we start of with a simple intention and that is:

Upvote the blog post @xeroc/piston

Operation

Intentions on Steem (and other graphene based blockchains) are called operations. In our case, the operation has the form:

['vote',
   {'author': 'xeroc',
    'permlink': 'piston',
    'voter': 'xeroc',
    'weight': 10000}]

We can clearly identify

  • the type of the operation (vote)
  • the author and permlink that identify the post (xeroc, piston)
  • the voter (also xeroc as I vote for my own post)
  • the weight of the vote (10000 which corresponds to 100%)

Transaction

In the next step, we encapsulate this (and possible other) operations into a transaction. The purpose of this step is to allow multiple actions to be performed consecutively by appending multiple operations, to append the required signatures, and expiration and add the TaPOS parameters (see below). In our case (one vote operation), it takes the following form:

tx = {'ref_block_num': 36029,
      'ref_block_prefix': 1164960351,
      'expiration': '2016-08-08T12:24:17',
      'operations': [['vote',
                      {'author': 'xeroc',
                       'permlink': 'piston',
                       'voter': 'xeroc',
                       'weight': 10000}]],
      'extensions': [],
      'signatures': [],
      }

We notice that our operation is now part of the transaction (as part of the operations array) and that we now have a field for our signatures and an expiration. The expiration allows for transactions to expire if they are not included into a block by that time. Usually that date is about 30 seconds in the future.

Let's discuss the ref_block_* parameters a little: The ref_block_num indicates a particular block in the past by referring to the block number which has this number as the last two bytes. The ref_block_prefix on the other hand is obtain from the block id of that particular reference block. It is one unsigned integer (4 bytes) of the block id, but not starting at the first position but with an offset of 4 bytes. This would be the corresponding python code to obtain the current head block and calculate those two parameters:

dynBCParams = noderpc.get_dynamic_global_properties()
ref_block_num = dynBCParams["head_block_number"] & 0xFFFF
ref_block_prefix = struct.unpack_from("<I", unhexlify(dynBCParams["head_block_id"]), 4)[0]

The purpose of these two parameters is to prevent replay attacks in the case of a fork. Once two chains have forked, the two parameters identify two different blocks. Applying a signed transaction of one chain at another chain will invalidate the signature.

Serialization

Before we can start signing, we first need to define what is actually signed. And since JSON objects don't have a particular order and are unnecessary big in size, we first perform what is called a serialization. Technically speaking, if you serialize a (JSON) object, you will end up with a simple binary vector (read string) that contains the very same information but has a very strict structure as to where things need to be placed and how. It is essentially just a different representation of the content that makes signing easier and the signature unique and matching independent of how the JSON keys are sorted.

The nice thing about signed transactions in Steem is that the API nodes accept them in their JSON format so that we can read what it does in plain text. That said, a valid signed transaction takes the form:

{'expiration': '2016-08-09T10:06:15',
 'extensions': [],
 'operations': [['vote',
                 {'author': 'piston',
                  'permlink': 'xeroc',
                  'voter': 'xeroc',
                  'weight': 10000}]],
 'ref_block_num': 61828,
 'ref_block_prefix': 3564338418,
 'signatures': ['201b35c7f94d2ae56d940863a8db37edff78e3b8f4935b6c6fc131a04b92d0f9596c368128ab298c446143915e35996a9644314fff88b6a6438946403ec7249a24']}

While the signatures have been derived from the binary, serialized representation of the transaction!

We start by giving our operations numbers instead of calling them vote, comment, transfer, ..

# Operation types
operations = {}
operations["vote"] = 0
# ....

These numbers are defined in the steemd source code or in python-steem.

Now let's get started by serializing the first two entries as per the definition of a transactions:

buf = b""

# ref_block_num
buf += struct.pack("<H", tx["ref_block_num"])

# ref_block_num
buf += struct.pack("<I", tx["ref_block_prefix"])

In this case, < denotes that we are storing integers in their little-endian form. H is a unsigned integer of length 2 (uint16) while I is an unsigned integer of length 4 (uint32).
We simple append them to our newly created serialization buffer b.

The next in our list of elements for the serialization is the expiration time. Now the difficulty is that the actual time is represented differently in the JSON object above and the serialized form for signing. That's why we are converting it to later be able to represent the timestamp as an unsigned integer (uint32).

# expiration
timeformat = '%Y-%m-%dT%H:%M:%S%Z'
buf += struct.pack("<I", timegm(time.strptime((tx["expiration"] + "UTC"), timeformat)))

Next, we add the operations, one at a time. And for the serialization to know how many there will come, we first add the actual number of operations as a varint encoded integer first:

buf += bytes(varint(len(tx["operations"])))

we now go through all of our operations (we just have the vote operation in our example) and the first thing we add to our serialization buffer is an id that identifies our operation. We have collected all of them above such that operations["vote"]=0. This id is also varint encoded.

for op in tx["operations"]:

    # op[0] == "vote"
    buf += varint(operations[op[0]])

All of the above is basically the same independent of the operations themselves. Now we need to distinguish how they are encoded/serialized and for sake of simplicity, I focus on the vote operation only.
It contains (as operation data) the following elements in that order:

  1. voter,
  2. author,
  3. permlink, and
  4. weight.

The first three are all represented as lenght-prefixed strings. That means:

  1. add the lenght of the string encoded as varint to the buffer, then
  2. add the string itself to the buffer

This we can do for the voter, the author and the permlink. The weight is a short (signed) integer of length 2. Instead of an unsigned integer, we here need to distinguish between upvotes and downvotes. Also note that 100% is represented as +10000 and -100% is -10000 and that we only use integer and no floats (the back-end doesn't know floats at all!)

    if op[0] == "vote":
        opdata = op[1]
        buf += (varint(len(opdata["voter"])) +
                bytes(opdata["voter"], "utf-8"))
        buf += (varint(len(opdata["author"])) +
                bytes(opdata["author"], "utf-8"))
        buf += (varint(len(opdata["permlink"])) +
                bytes(opdata["permlink"], "utf-8"))
        buf += struct.pack("<h", int(opdata["weight"]))

Serialization result

So much for adding operations to our serialization. Let's take a look at the binary form:

First we serialize the TaPOS parameter ref_block_num (36029)

bd8c..............................................................

and ref_block_prefix (1164960351) and obtain

....5fe26f45......................................................

Then we add our expiration 2016-08-08T12:24:17

............f179a857..............................................

Afterwards, we need to append the number of operations (01)

....................01............................................

And go through our operation:

the operation id (0)

......................00..........................................

the voter (xeroc, length 5)

........................057865726f63..............................

the author (xeroc, length 5)

....................................057865726f63..................

the permlink (piston, length 6)

................................................06706973746f6e....

and the weight of the vote (+1000)

..............................................................1027

The end result for our serialized transaction looks like this:

bd8c5fe26f45f179a8570100057865726f63057865726f6306706973746f6e1027

ECC Signing

Now comes the hard part. The actual signing of the serialized transactions. That means that going forward we need:

  • The serialized buffer
  • The chain id (to distinguish different instances of the graphene chain and also prevent replay attacks)
  • The private keys you want to use for signing (whether those keys are sufficient to actually authorize the transaction is left to the developer to ensure)

In the case of STEEM, the chain id is a 256bit long 0 sequence (the one and only :P). We use this chain id, represent it in binary form and append the serialized buffer to obtain the message. Instead of signing the actual message, we are interested in signing the SHA256 hash of the message. This hash of a message is called digest.

# Signing
chainid = "0" * int(256 / 4)
message = unhexlify(chainid) + buf
digest = hashlib.sha256(message).digest()

Now we take all private keys and sign our transaction with all of them. Each private key will result in a single signature that needs to be added to the signatures key of the original transactions. In our case, we just work with a single private key, represented in WIF. We obtain the actual binary private key from the WIF (for sake of simplicity, I use the PrivateKey class from steembase.account)

wifs = ["5JLw5dgQAx6rhZEgNN5C2ds1V47RweGshynFSWFbaMohsYsBvE8"]
sigs = []
for wif in wifs:
    p = bytes(PrivateKey(wif))  # binary representation of private key

Fortunately for us, we don't need to do all the Signing by hand, but can use the ecdsa python package. We obtain a SigningKey by loading our private key properly:

    sk = ecdsa.SigningKey.from_string(p, curve=ecdsa.SECP256k1)

Now, we are going to implement a loop that is crucial because the back-end only accepts canonical signatures and we have no way of knowing if the signature that is going to be produced will be canonical. That said we will derive a deterministic k parameter for the ECDSA signing and while generating this parameter will will add our loop counter to the digest before hashing. This results in a new deterministic k each round which will result in either a canonical or non-canonical signature.

    cnt = 0
    i = 0
    while 1 :
        cnt += 1
        # Deterministic k
        #
        k = ecdsa.rfc6979.generate_k(
            sk.curve.generator.order(),
            sk.privkey.secret_multiplier,
            hashlib.sha256,
            hashlib.sha256(digest + bytes([cnt])).digest())

The signature is generated by using the proper ECDSA signing call for digests:

        # Sign message
        #
        sigder = sk.sign_digest(
            digest,
            sigencode=ecdsa.util.sigencode_der,
            k=k)

Now we represent the signature in its r and s values and verify that it is canonical. If it is, then we break the loop and continue:

        # Reformating of signature
        #
        r, s = ecdsa.util.sigdecode_der(sigder, sk.curve.generator.order())
        signature = ecdsa.util.sigencode_string(r, s, sk.curve.generator.order())

        # Make sure signature is canonical!
        #
        lenR = sigder[3]
        lenS = sigder[5 + lenR]
        if lenR is 32 and lenS is 32 :
            # ........

Once we ensured that the signature is canonical, we derive the so called recovery paramter. It simplifies the verification of the signature as it links the signature to a single unique public key. Without this parameter, the back-end would need to test for multiple public keys instead of just one. So we derive this parameter, add 4 and 27 to stay compatible with other protocols and have now obtained our signature.

            # Derive the recovery parameter
            #
            i = recoverPubkeyParameter(digest, signature, sk.get_verifying_key())
            i += 4   # compressed
            i += 27  # compact
            break

Having derived a valid canonical signature, we format it in its hexadecimal representation and add it our transactions signatures. Note that we do not only add the signature, but also the recover parameter. This kind of signature is then called compact signature.

    tx["signatures"].append(
        hexlify(
            struct.pack("<B", i) +
            signature
        ).decode("ascii")
    )

Done.

Once you have derived your new tx including the signatures you could use python-steem to verify your transaction and it's signature by doing something like this:

from steembase import transactions
tx2 = transactions.Signed_Transaction(**tx)
tx2.deriveDigest("STEEM")
pubkeys = [PrivateKey(p).pubkey for p in wifs]
tx2.verify(pubkeys, "STEEM")

If it doesn't throw an exception your did everything right!

The full source code that we have developed in this example can be found on github.

Have fun coding.

Sort:  

Correction 1

John White, has discovered a flaw in the descriptions above.
This flaw is caused by the extensions attribute:

tx = {'ref_block_num': 36029,
      'ref_block_prefix': 1164960351,
      'expiration': '2016-08-08T12:24:17',
      'operations': [['vote',
                      {'author': 'xeroc',
                       'permlink': 'piston',
                       'voter': 'xeroc',
                       'weight': 10000}]],
      'extensions': [],
      'signatures': [],
      }

In the OP, we did not serialize this extensions such that our
serialized sequence was too short. Since we do not make use of
extensions here, and the serialization is a length-prefixed array,
we can fix the flaw by simply appending 0x00 to the serialization

    buf += (varint(len([]))

Everything else should stay the same.

That was really helpful, thanks :)

I upvoted all though I don't have a fucking clue what you are talking about....

Oh and I need a developer over at Steem.Gifts if you or anyone else is interested! - Sorry for the promo but I desperately need help building this site in time for Steemit's 1st Annual Secret Santa Gift Exchange

There is a small typo at the 2nd to last word of the article. :)

fixed

Thanks for reading all the way through :D

"fun" - LOL I see what you did there!

Thanks so much for writing this @xeroc. Definitely one to bookmark!

I have a bookmark folder for Steem -> API.

I think, so far, it's filled with every new post @xeroc does. I should just link to your blog instead. Thank you for providing so many resources for this community. There are now over 50 tools for Steemit because of your efforts.

Bookmarked! Thank you, @xeroc!

This is helpful, I thought about writing this, I guess you beat me to it :) great work...

very good comentar you

Thanks for writing this @xeroc, big ups.