How We Estimate Curation Rewards — and Why We Just Switched Our Default Model

in #votebrokeryesterday

Estimating Curation Rewards on Steem: Weight vs. rshares — A Data-Driven Decision

For the Steem developer community. This is a research write-up: real measurements from VoteBroker's live curation data, the model question behind them, and the decision we made.


The question every curator's tool has to answer

When you upvote a post on Steem, you eventually earn a curation reward in SP. The interesting engineering problem is: can you estimate that reward before the payout lands?

A good estimate turns curation from guesswork into something you can reason about — which post, what weight, what timing. VoteBroker has shown this estimate to users for a while. But there has always been a quiet ambiguity underneath it: which share basis do you use?

When the post pays out, your slice of the curation pool is proportional to your contribution divided by the total contribution. The catch is that "contribution" can be read two ways:

estimate_weight   = pool * 0.20 * (my_weight  / sum_weight)  / sbd_per_steem
estimate_rshares  = pool * 0.20 * (my_rshares / sum_rshares) / sbd_per_steem
  • rshares is the raw influence your vote adds to the post's reward (it drives post payout size).
  • weight is the time-decayed curation weight the chain actually books for reward distribution — the value shaped by the reverse-auction penalty in the early window.

Mechanistically, weight is the basis the blockchain uses to split the curation pool. rshares is the more intuitive number and lines up neatly with external dashboards, so it's tempting to trust it. We didn't want to argue theory — we wanted the chain to tell us which one predicts reality.


The experiment: log both, compare against ground truth

For every live vote, VoteBroker records both estimates at cast time. When the post pays out, we read the actual curation_reward operation from the chain and store the realized SP next to the two predictions. No backfill, no simulation — same posts, same votes, ground truth from the blockchain.

That gives us a clean, per-vote error series. The current evidence base: n = 534 realized votes with both estimates and a confirmed payout.

We score the two models on three standard error metrics:

  • MAE — mean absolute error (the typical miss)
  • RMSE — root mean squared error (punishes large misses much harder)
  • MAPE — mean absolute percentage error (the typical miss, relative to the reward)

The results

ModelMAE ↓RMSE ↓MAPE ↓
weight0.00740.028232.5 %
rshares0.01150.023471.9 %

(n = 534; lower is better on every column.)

Read that carefully, because it's more interesting than a clean sweep:

  • weight wins MAE by ~36 % (0.0074 vs 0.0115). On the typical vote, weight is clearly closer to the truth.
  • weight wins MAPE decisively (32.5 % vs 71.9 %). Relative to the reward size, rshares is off by more than 2× as much.
  • rshares wins RMSE (0.0234 vs 0.0282). This is the honest catch.

So the two models disagree, and the metric you pick decides the winner.


Why RMSE flips — and why we don't optimize for it here

RMSE squares the errors before averaging, so a handful of large misses dominate the score. The fact that rshares wins RMSE while losing MAE tells us something specific: weight is better on the everyday vote, but takes a few larger hits on outlier posts (whale pile-ons, unusual vote distributions) where rshares happens to land closer.

For our use case — estimating the curation yield of a normal curation decision — the right objective is typical-case accuracy, not worst-case-outlier accuracy. You're making hundreds of ordinary votes, not betting the account on the one viral post. MAE and MAPE describe that reality; RMSE over-weights the rare blow-ups.

Two of three metrics favor weight, the two that match our objective favor it strongly, and — crucially — weight is also the mechanism the chain actually uses to distribute curation rewards. The data didn't overturn the mechanism; it confirmed it. That alignment of theory and measurement is what made the call comfortable.


The decision

weight is now VoteBroker's default curation model. New users start on it automatically.

rshares stays available as a legacy / research option. We keep computing it on every vote and showing it side-by-side, because (a) it's the right number to compare against external dashboards, and (b) the RMSE result is a genuine signal that there's structure in the outliers worth understanding. As more outcomes accumulate, we'll keep re-scoring both — the comparison is wired into the dashboard, not a one-off.

We also held ourselves to a threshold before flipping the default: don't switch on a handful of payouts. 534 realized votes clears that bar; early on, with a few dozen samples, the metrics were too noisy to trust.


What this looks like in the product

  • The pending-curation view now marks which model is active, with both totals still visible.
  • The Research Lab shows the live MAE / RMSE / MAPE comparison so the decision stays auditable as data grows — including a "winner" only once the sample is large enough to mean something.
  • Switching a user to rshares is a setting, not a redeploy.

Takeaways for fellow Steem devs

  1. Curation reward distribution is weight-based, not rshares-based. If you're estimating curation SP, start from my_weight / sum_weight.
  2. Pick your error metric on purpose. MAE/MAPE for typical-case tools; RMSE if outliers are the thing you actually care about. They can name different winners on the same data.
  3. Log predictions against on-chain ground truth. The chain is the only judge that matters. Side-by-side logging turned a theoretical debate into a one-line decision.

💬 Questions for the community:

  • How do you estimate curation rewards in your tooling — weight, rshares, or something else?
  • Anyone modeling the early reverse-auction window explicitly? We suspect that's where the RMSE outliers live.
  • Curious about the per-bucket timing data behind these votes? Say so below and we'll do a follow-up.

A technical write-up from VoteBroker development. Metrics are from our live curation data and reflect the current sample (n = 534); they will move as more votes settle.