Towards a reputation system suitable for SteemSTEM

lemouth (72)in #utopian-io • 7 years ago

Although several reputation systems for Steem have been proposed (some of them even very recently), I believe that none of them are appropriate for implementation as a reputation metric that could be potentially used on our up-coming SteemSTEM app (to be officially released very soon).

_{[Credits: @hightouch]}

Therefore, I decided to build one myself. Here I share the code and its ingredients, and would love to read comments or suggestions for potential improvements. The code is open (available from this GitHub repository) and can be used by anyone freely.

What will SteemSTEM do with such reputation indicators for now? Well, I don’t know (yet!), but it was fun to develop ;)

A good reputation for the SteemSTEM community must in my opinion include two ingredients, an authorship component and an engagement component, contributing in equal parts. Indeed, we absolutely need both authors to provide amazing contributions, and readers questioning authors and entertaining worthy discussions. I suspect that holds true for any community by the way.

AUTHORSHIP INDICATOR

The authorship metric is built from a few key principles.

Each SteemSTEM vote on a post at x% gives x reputation points at the time of the vote. The SteemSTEM curation team scours Steem to find the best STEM content. As all these blogs contributed to what SteemSTEM has become today, it makes sense they all enter any given reputation indicator.

It would be weird to assign a large reputation score to someone who contributed a lot two years ago but then left Steem. However, the reputation of that person should be somehow non negligible. After all, this person has left a trace in our memories. For this reason, I decided to introduce reputation points which vary with time. After a given time (the authorship point half-life), one point loses half its value. After twice the time, the remaining 0.5 points are 1/4 point worth, and so on. A simple exponential decay.

As SteemSTEM strives to push for quality as much as possible,e prefer someone writing one excellent post a week (thus supported very strongly) over someone writing 5 good posts a week (thus supported five times moderately). For this reason, the reputation score today (i.e. accounting for the fact that each gained point has lost value with time) is divided by the square root of the number of posts. The square root tames the effect when a very large number of posts is reached.

Finally, we may want to remove individuals from the algorithm, like the team, blacklisted people, bots, etc. Moreover, the total amount of reputation points is fixed to a given value so that each score is renormalized at the end of the day.

RESULTS FROM THE STEEMSTEM AUTHORSHIP INDICATOR

I adopted an authorship half-life of 3.5 months and excluded all team members (management and curators), bots and blacklisted authors from the run. The total number of available authorship reputation points is normalized to 1000.

The top 30 most reputed SteemSTEM authors of all time, out of 2662 authors, are (with their score):

  1 abigail-dantes            6.305
  2 chloroform                6.244
  3 egotheist                 5.918
  4 scienceblocks             5.884
  5 steemit-italia            5.691
  6 lordneroo                 5.632
  7 zen-art                   5.216
  8 nonzerosum                5.089
  9 highonthehog              4.931
 10 conficker                 4.902
 11 nikolanikola              4.893
 12 effofex                   4.808
 13 anaestrada12              4.804
 14 tomastonyperez            4.555
 15 deathbatter               4.466
 16 samminator                4.456
 17 hidden84                  4.385
 18 agmoore                   4.361
 19 romulexx                  4.357
 20 jfermin70                 4.266
 21 answerswithjoe            4.159
 22 dysfunctional             4.093
 23 elvigia                   4.047
 24 n4zrizulkafli             4.021
 25 alexander.alexis          4.017
 26 dedicatedguy              3.993
 27 lupafilotaxia             3.958
 28 anasav                    3.909
 29 scienceangel              3.856
 30 irelandscape              3.809

The code has been run on Sep 24th at 10:11:25 AM.

ENGAGEMENT INDICATOR

Here, I track every single comment to any SteemSTEM-supported post and give reputation points to the comment author.

First, if the comment length is smaller than N characters, it is considered as spammy and no po. Moreover, if the comment has been posted more than W weeks after the SteemSTEM vote, no point is given. I want meaningful comments that help illustrating that supported posts are interesting during the time in which they are hot or trending (on the #steemstem tag).

If non zero, the score is given by the square root of the comment length. The square root allows once again to make a large difference between smallish and average comments, but tame down the difference once a given length is crossed. This is the only way I have found so far to deduce the score, and I am only partially satisfied with it. But at least, it provides some level of quantification of the engagement of the readers.

As with the authorship indicator, any earned engagement point loses value with time, the score today is divided by the square root of the number of comments and some individuals can be removed from the algorithm.

The final score is normalized as for the authorship case, the total number of available points being fixed to a given value (taken to be the same as engagement and authorship are considered as important).

RESULTS FROM THE STEEMSTEM ENGAGEMENT INDICATOR

I adopted an engagement half-life of 1.75 months, and excluded comments whose length is smaller than 100 characters (N=100). I fixed W to 2 weeks. I excluded all team members (management and curators), bots and blacklisted authors from the run. The total number of available engagement points is 1000.

The top 30 most engaging SteemSTEM comment authors of all time, out of 23134 comment authors, are (with their score):

  1 erh.germany               2.886
  2 agmoore                   2.764
  3 steemit-italia            2.623
  4 amestyj                   2.621
  5 abigail-dantes            2.357
  6 scienceblocks             2.175
  7 fran.frey                 2.149
  8 insight-out               2.114
  9 rudyardcatling            2.096
 10 lupafilotaxia             2.079
 11 dedicatedguy              2.031
 12 samminator                1.979
 13 tsoldovieri               1.950
 14 alexander.alexis          1.921
 15 cyprianj                  1.853
 16 herbayomi                 1.847
 17 tomastonyperez            1.833
 18 jamalgayoni               1.756
 19 steepup                   1.726
 20 alexdory                  1.682
 21 kimberlylane              1.678
 22 synick                    1.665
 23 olamseu                   1.656
 24 emperorhassy              1.628
 25 lucylin                   1.625
 26 osariemen                 1.611
 27 ied                       1.576
 28 egotheist                 1.575
 29 delpilar                  1.526
 30 chireerocks               1.487

The code has been run on Sep 24th at 10:11:25 AM.

FINAL REPUTATION INDICATOR

The final reputation is given by the average of the two above metrics. The top 25 (with the score) is given by

  1 abigail-dantes            4.331
  2 steemit-italia            4.157
  3 scienceblocks             4.029
  4 egotheist                 3.746
  5 chloroform                3.579
  6 agmoore                   3.563
  7 lordneroo                 3.489
  8 nonzerosum                3.247
  9 erh.germany               3.245
 10 samminator                3.217
 11 tomastonyperez            3.194
 12 conficker                 3.151
 13 effofex                   3.115
 14 lupafilotaxia             3.019
 15 dedicatedguy              3.012
 16 alexander.alexis          2.969
 17 anaestrada12              2.959
 18 nikolanikola              2.911
 19 cyprianj                  2.795
 20 tsoldovieri               2.734
 21 zen-art                   2.723
 22 jfermin70                 2.605
 23 alexdory                  2.598
 24 highonthehog              2.576
 25 amestyj                   2.572

The code has been run on Sep 24th at 10:11:25 AM.

MORE ABOUT THE CODE

The code can be obtained from the following GitHub repository. It is programmed in Python 3 and requires steem-python.

I am not happy with the way the engagement indicator is computed, because I need to get the information on each post separately, which takes an enormous amount of time. For this reason, the information is saved into a file when the SteemSTEM upvote on a post is older than two weeks (as any later a comment would just bring 0 point). This requires removal of the ‘null’ author from the algorithm, which is used to trace posts without any single comment.

To run it, it is sufficient to complete the setup part of the code,

## Setup
 half_life_vote    = 3.5*30*24*3600.     # 3.5 months - authorship point half-life
 half_life_comment = 1.75*30*24*3600.    # 1.75 month - engagement point half-life
 comment_timelimit = 14*24*3600.         # 2 weeks - the W number
 comment_spam_limit= 100                 # minimum number of characters for a comment to be valid (N)
 comment_filename  = 'comments_data.txt' # where to save the treated comments
 load_backup = True                      # Using the file with the saved comments
 normalized_rep = 1000                   # Score normalization

## Exclusions
 team = [ ‘null’ ]
 bots = [ ]
 blacklist = [ ]

and execute the program.

#development #steemstem #steemdev

7 years ago in #utopian-io by lemouth (72)

$156.20

Sort:

Trending

[-]

helo (70) 7 years ago

Very important development for the Steem ecosystem. I can see a use case for a sorting algorithm for an article feed to replace "trending".
Try to commit often to the code base instead of one huge commit.
Some names of variables are too generic: y, x, V_, c_, d_ cd_ try explicit naming, it will help the next person trying to understand the code.
This post shows very well the outcome and the why, but not the how. Show off your code, it's a dev post after all.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

$10.08

13 votes

[-]

lemouth (72) 7 years ago

Hey, I didn't know you would review this. Thanks for the comments. We are now several of us working on an improved version of this. I am sure all of this will be accounted for!

$0.00

[-]

utopian-io (71) 7 years ago

Thank you for your review, @helo!

So far this week you've reviewed 7 contributions. Keep up the good work!

$0.00

[-]

tarazkp (80) 7 years ago (edited)

I am not sure what your final usage of this.

It seems heavily dependent on the curation team, not the STEM community but that might fit the use case.

A few days ago @heimindanger posted about the Dlive case and a comment he made against them. Anyone who upvoted that comment were then blacklisted for votes at the suggestion of one of the Dlive team (redjepi). This seems to be a risk of sorts here too where the personal relationships between poster and curator will effect decisions heavily in both the positive and negative, regardless of content quality. Does anyone who questions STEM or give any negative view then suffer at the hands of vindictive curators?

Even if a post itself is brilliant and attracts high engagement from the STEM community, it isn't factored in unless it gets a vote and, that size of vote is dictated by the curation team itself.

I think the curators/management should be included in the calculations for transparency purposes also as they factor into the voting and quality of STEM posts.

My concerns are things for example like:

The top 30 most reputed SteemSTEM authors of all time, out of 2662 authors

15 deathbatter 4.466

This person who I found interesting got support from their first post by the looks and all stem posts but hasn't posted at all in 2 months. They have only been registered for 4. Shouldn't reputation also come with some length of track record? Perhaps the degradation takes this into account but they are still number 15 of all time and they managed that in the first 2 of their 4 months with the last 2 being absent.

I am not a coder and this is far out of my area of expertise so perhaps it suits the purpose it is intended for.

What I do like is the degradation which should be applied to witness votes to stop inactive witnesses still holding top 50 positions. I also like that there are many people working on possible solutions to what is the useless Steem rep system. Having many people think from different perspectives means that there is a chance that something decent is developed.

Question and experimentation is good right?

$0.23

5 votes

[-]

tking77798 (62) 7 years ago

If a reputation system like this takes it'll be an excellent motivator for engagement and quality posts. Awesome work!

My only worry is that it might devalue the contribution of longstanding members that post occasionally or members taking a break from SteemSTEM. Maybe if there was a limit to how much an old post's value could decay (like 1/8 or 1/16 of it's original value)? Then again, I wouldn't want to discourage new members so maybe it's not the best idea.

$0.15

6 votes

[-]

alexander.alexis (63) 7 years ago (edited)

Yeah I had a similar thought about 'historical' authors, whose rep like lemouth said should be non-negligible; but then how can you do that when you introduce the half-life? Perhaps a different metric altogether for historical authors? Perhaps a tally should be taken every week or month, and then those tallies get tallied?

In other words, say steemSTEM has existed for 10 weeks. A historical member has been active for the first 5 weeks and then quit. A newer member has been active for the last 3 weeks. Both have contributed equally while they were active, getting the maximum score of 1 for every week of activity. So the historical contributor would get 5 out of 10 and the newer member would get 3.

I'm basically trying not to exclude the St. Anselms of steemSTEM :D I mean, the rep of people like JTM who are seldom publishing now will be effectively 0 in the long term by the current metric. He specifically is excluded by the code since he's a founder, but the future might give us non-founder examples.

And now I'm just typing away to increase my score. Nothing to see here. Move along. Just finger-dancing on the keyboard... la la la la la... :D

A long letterhead automatically affixed to all my comments now seems like a great idea...

$0.12

3 votes

[-]

lemouth (72) 7 years ago

Tagging @tking7798, @tarazkp and @justtryme90 as this may be relevant as an answer to their own comments.

This "max loss" thingie does not change much for the historical author, and actually, the good idea would be to introduce a way to back-trace long term authorship. For the engagement part, I think we all agree we should focus on the recent stuff.

So how to do that in practice? Well I don't know.

I am dividing by the root of the number of posts. Maybe this should be modified?
Or what about a time-dependent half-life? Every time a vote is casted, it acts on the half-life of the future points to be earned. In this way, the more one is active, the longer the point stays. If no activity is recorded during, maybe N days, then the half-life decrease progressively to go back to the nominal value.

Any thoughts?

$0.05

2 votes

[-]

tarazkp (80) 7 years ago

For the engagement part, I think we all agree we should focus on the recent stuff.

Yep, that makes sense.

Other than that, I can't add much to the technical conversation :)

$0.00

[-]

alexander.alexis (63) 7 years ago (edited)

If the point of the reputation score is only to decide how to vote on the latest post, then it does make sense to privilege the recent stuff. I think I was thinking it more like an overall badge of merit (basically, a replacement of the steem rep number, since you're making an independent website) and felt awkwardly about discounting the historical contribution of valuable members who for one reason or another are not as present, currently, on the platform.

$0.00

[-]

lemouth (72) 7 years ago

For the moment, the score is just something funny. Like you said, a kind of badge of merit. This is by no means affecting any future vote as the rules for votes are independent (and clearly stated).

The decaying reputation is more to have a way to slowly remove users that have left. There is a difference between not being present and having left, which is why the half-life is large. However, this may be improved (cf. the above comment by @tarazkp).

$0.00