Thoughts about authenticity on the Steem blockchain - Volume #1: Identity

in #steemexclusive2 years ago (edited)

Thoughts about authenticity on the Steem blockchain - Volume #1: Identity


image.png

Pixabay license from Peter H. at source.

Introduction

Recently @the-gorilla posted a series of articles that revealed some apparently deceptive behavior by some highly influential personalities in the Steem ecosystem. I'm not going to revisit the details of that series, but you can read about it here:

And I recommend that you do. You can also read the replies from @simonnwigwe, here:

I don't want to spend much time recapping these articles because I have a feeling that this post is going to be pretty long, even without digging into the background. In a couple sentences, @the-gorilla was spot-checking the legitimacy of a vote from one of the "booming" accounts, and it led to a whole group of interconnected accounts that were publishing some amount of inauthentic content on the blockchain. The exact amount of inauthentic content is open for debate, but there was - at a minimum - some plagiarism from two very influential Steemizens.

The problem of influential Steemizens posting inauthentic content is bad enough, but there is also circumstantial evidence that multiple identities may have been assumed in order to direct contest prizes and other engagement rewards from one or more contest host to him/herself under an alternate account.

After reading this series of posts, I've been thinking about this from a process perspective. It's clearly not scalable for a single person to go through this exercise at large scale. This sort of thing is the reason why banks and financial institutions hire entire departments to root out fraud. So, how can this sort of problem be dealt with on a decentralized blockchain? From a comment that I posted on the thread, here are some of the topics that occurred to me at the time:

  • Of course I'm back to the rewards algorithm, and the ease of overvaluing a post. If Steem has an Achilles Heel, I think this is it.
  • Recognizing authentic content, vs authentic people (looking for inclusion of relevant quotes and links between accounts, articles and sites can help with this?)
  • The "ideal" whale's job is self-contradictory: be selective and also spread rewards widely
  • Time, scale, measuring and automating.
  • How does something like Dunbar's number apply to curation?
  • This is not unique to us. All social media has content farms. I think the typical response is building an AI system to detect as much as possible before people need to be involved.
  • Ways to use the following/follower graph as an hint about authenticity?
  • Contests, in particular, create incentives that favor Sybil posting.

And I've had some more thoughts after that. When I started this article, I thought it would be a stand-alone post, but I'm already at 2,000 words and nowhere near complete coverage, so let's call this "Volume #1". Here we go....

Identity: Steem is not a KYC blockchain and The Honor System doesn't scale

Challenge

There has always been something ironic about the way that Steemizens begin to break into the ecosystem and build a reputation. For as long as I've been here, Steemizens have asked newcomers to post Verification photos (which I never did, I don't think). However, one or both founders and many of the early adopters were advocates of Steem as a censorship-resistant place where anonymous people can have a voice.

So, there is a natural tension here between the ability to know someone is who they say they are, and the ability for people to preserve their anonymity. By definition, without Know Your Customer (KYC) or some other identity solution, we simply can't know who the other players are. As @the-gorilla revealed in excruciating detail, the verification photos can be gamed without much difficulty.

IMO, there's nothing inherently wrong with a single person using multiple accounts... After all, I operate the accounts @remlaps and @remlaps-lite. I initially did that when some microblogging applications were created, and I wanted to separate my blog style content from shorter content. In fact, in the early days, when proof of work mining was possible, it was mandatory for people to create numerous accounts in order to compete as a miner. Heck, even professional novelists use ghost writers and/or pseudonyms. (funny side-note: Until his alter-ego became public, work by Stephen King writing as Richard Bachman was highly discounted in comparison to work by Stephen King writing as Stephen King).

But, when someone uses multiple accounts in order to divert rewards and prizes in a deceptive manner then it becomes more of a problem. Deceptively posting from multiple accounts could be considered to be a form of a Sybil attack. Here's what the Steem Whitepaper says about Sybil attacks:

Centralized websites prevent spam through rate limiting and some form of ID verification. Even something as simple as reCAPTCHA is sufficient to limit the creation of fake accounts. If someone abuses their account then centralized websites are free to block the account.

In a decentralized system there is no direct way to ban users nor centralized provider able to host a reCAPTCHA and enforce rate limiting of accounts. In fact, the inability to censor users is one of the main selling points of blockchain technology.


So, flaw or feature, this tension is designed into the blockchain. When contests and incentive programs are devised, the project's sponsor must account for it. Whatever the facts of the particular scenario that @the-gorilla uncovered, if incentives are misaligned, some sort of antisocial behavior will always emerge.

Note
This also applies to abuse-detection. If I provide a reward in upvotes for the discovery of plagiarism, it won't be long before a new class of "abuse detective" emerges who posts plagiarized content with one account (or more) and collects rewards for "discovering" it with another.

Opportunity

As I said in the comment above, this problem is not unique to the Steem blockchain. There is vigorous research and development in the area of identity, and even decentralized identity. So, the opportunity is for an entrepreneur to build on that body of research and apply it to the Steem blockchain.

I am reminded of an early service that was available on the blockchain, SteemVerify. Obviously, since they're not still here, the pricing model didn't work. Also, NFTs weren't a thing yet, but that service used to give people a SteemVerify badge in exchange for a fee and some sort of verification action.

Presumably, someone could have contacted the folks at SteemVerify (before it went belly-up) in order to confirm an account verification. (We see something similar with Twitter's $8 per month blue checks.)

So, my thought is that entrepreneurs could revisit this model and make verification into a competitive product. One possible avenue for this is the use of self-sovereign identity with verifiable credentials or other decentralized identity solutions. Then, people running contests could require their participants to be verified by a trusted verification service. Over time, the ones who do it well would remain and the ones who don't would fade away.

Of course, another option is for people running contests to allow participants to enter as many times as they'd like. If I can submit 5 posts under my own name, I would have no less incentive to run any alternate accounts. The rules would be fair, but the drawback here is that the contest judges might have more work to do. This, in turn, could be mitigated through the use of an "entry fee". Participants could be asked to burn some amount of STEEM and/or SBD as a contest entry fee. The burn would have to come out of the wallet, not as a beneficiary reward (beneficiary rewards fall in the category of "easy come/easy go" for someone who is running multiple accounts.).

Another idea for promoters is to learn a lesson from the "proof of life" concept by linking the promotion to some sort of information that only became available recently, maybe a newspaper headline, weather event, or sporting event for example. Of course, this can't stop all deceptive split personalities, but it adds additional friction to the process for someone who is trying to operate multiple accounts, because it takes away the ability to recycle and spin content that was created two or three years ago.

In addition, one of the anti-abuse techniques that I've heard about in the finance industry is that employees are rotated through departments every year or two in order to prevent them from building the kinds of knowledge and relationships that enable embezzlement. Similarly, engagement programs and contests could be changed frequently, in order to create an obstacle to long-lasting abuse strategies.

Finally, lots of small contests and promotions might be better than a small number of large contests or promotions. The smaller the rewards, the less likely it is to be worth the effort to participate with multiple accounts. The larger the reward:time ratio, the more likely that deceptive behavior will occur.

In the end, it's important for 1.) the people who design contests, promotions, and challenges, to take the blockchain's decentralized nature into account and to carefully avoid creating antisocial incentive structures which lead to large rewards; and 2.) curators to closely monitor the contests that they support.

Conclusion

In the Steem Whitepaper, we are told,

The challenge faced by Steem is deriving an algorithm for scoring individual contributions that most community members consider to be a fair assessment of the subjective value of each contribution. In a perfect world, community members would cooperate to rate each other's contribution and derive a fair compensation. In the real world, algorithms must be designed in such a manner that they are resistant to intentional manipulation for profit. Any widespread abuse of the scoring system could cause community members to lose faith in the perceived fairness of the economic system. (emphasis mine)

And

Eliminating “abuse” is not possible and shouldn’t be the goal. Even those who are attempting to “abuse” the system are still doing work. Any compensation they get for their successful attempts at abuse or collusion is at least as valuable for the purpose of distributing the currency as the make-work system employed by traditional Bitcoin mining or the collusive mining done via mining pools. All that is necessary is to ensure that abuse isn’t so rampant that it undermines the incentive to do real work in support of the community and its currency. (emphasis mine)

So, we can look at all episodes like this as opportunities to re-balance things in the ecosystem. Re-balancing can be done at the social layer by behavior changes from participants in the ecosystem, or at the algorithmic layer by the witnesses and development community.

Something that became clear to me after the launch of the STEEM WATCHER community, and was reinforced by the posts from @the-gorilla is that it's not enough to say, "we eliminated abuse cases X, Y, and Z". More importantly, if we're not going to lurch about in a long-running and unproductive game of "Whack-a-mole", we (the community) must come up with some sort of way to measure the types and amounts of abuse that are happening on the blockchain, and use decentralized levers in order to drive for an overall rate-reduction over the course of time.

Clearly, a perfect solution is not possible, but creators of contests, challenges, and other promotional efforts can take certain steps to reduce the likelihood of misbehavior. This article proposed a number of solutions for addressing identity-based antisocial behavior. These include:

  • Requiring verification from a reputable decentralized service (assuming that entrepreneurs make such a service available)
  • Permitting multiple entries per participant
  • Requiring an "entrance fee" by burning STEEM and/or SBD "out of wallet", not just in the form of beneficiary rewards
  • Design promotions that incorporate the use of recently available information to prevent the recycling & spinning of historical content.
  • Change the nature of contests, challenges, etc. frequently to make it difficult to establish long-term abuse protocols.
  • Limit the size of contest rewards so that the risk of reputational damage outweighs the potential for deceptive reward gathering.

As a community, it seems to me that our immediate goal should be to start seeking best practices from among these (and other) options in order to minimize the blockchain abuse potential. In order to accomplish this, individual promoters of contests, challenges, and the like should describe how they're dealing with the potential for identity-related abuse in their contests, and curators should include that information in their decision process when choosing whether or not to support a contest, challenge, or other promotion.

This post has also highlighted an entrepreneurial opportunity for decentralized identity services to establish themselves on the Steem blockchain.

That's it for Volume #1, and I probably won't have more on the topic for at least another week. When time allows, I hope to follow up with one or more additional posts that delve into other aspects of the problem - such as plagiarism, spam, the rewards algorithm, automation, and measurement.


Thank you for your time and attention.

As a general rule, I up-vote comments that demonstrate "proof of reading".




Steve Palmer is an IT professional with three decades of professional experience in data communications and information systems. He holds a bachelor's degree in mathematics, a master's degree in computer science, and a master's degree in information systems and technology management. He has been awarded 3 US patents.


image.png

Pixabay license, source

Reminder


Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.

Sort:  

I've been thinking about this a lot since my series of articles and I'll jump around your article with various thoughts...

if we're not going to lurch about in a long-running and unproductive game of "Whack-a-mole"

I've long thought that the abuse being found is exactly this. Not just the concept that it's a never ending game of trying to stop something from popping up but also that the mole that pops up is often the same mole that has already been whacked down - often an improved version.

So it feels as though abuse on the platform is inevitable and there's no knowing what difference the whack-a-mole makes.

In reply to one of the posts, @jpegg wrote:

What if you was the only real living person on Steem, watching (never sleeping) bots activity :P

Which led me to wonder, if we filtered out all of the bots, all of the plagiarism and all of the duplicate accounts, how many users would actually be left? Would it be me and a handful of others?

And then there's the question of "so what?" The creators knew that this had the potential to be a problem and perhaps unsurprisingly it is. As a community, we allow users to post utter shit every day in order to get their vote from upvu. We "allow" it because these users are seen as investors. So why don't we perceive "miners" in the same way and just accept that there are users who want to mine the currency without a meaningful contribution to the community that's behind it?

The concept of miners is normally neglected when we talk about Steemit's userbase. We have bloggers and investors. With community in mind, we know that these already conflict. When we introduce miners, the community has another conflict. And much like investors, miners need to do something to mine the STEEM coin. Not (always) in the traditional way of getting more machine processing power, but in the form of human processing power - getting people to allow them to sign up on their behalf, entering contests, etc.

As a community, do we need to simply accept that (whether we like it or not), these profiles exist and filter out their shit in the same way that we do from voting bots?

And if we do ignore it, then what is the consequence? Investors already take a huge chunk of the reward pool and that is "accepted". If miners take a larger and larger percentage of contests, genuine users will stop entering and either leave of simply run a traditional blog.

(I'm just typing thoughts as they come to me)

Are we spending too much time thinking about and trying to solve a problem which is unsolvable? In taking action, are we delaying the parasite taking over the entire system? Or do we need to accept that they already have and live on our island far away from them?

I don't know and I honestly wish that I was ignorant to it all.

As a community, we allow users to post utter shit every day in order to get their vote from upvu. We "allow" it because these users are seen as investors.

This type of "investment" has been toxic to Steem as a social blockchain. The original whitepaper envisioned holding Steem Power as an incentive to improve the ecosystem. Instead many large SP holders have tended to operate like slumlords, trying to extract as much value as possible while waiting for somebody else to improve things around them. From the beginning the norm should have been "no self-votes", and delegating to a bot that votes for you based on your delegation is essentially a self-vote. Instead the norm we have is "don't piss off the big accounts".

The whitepaper also envisioned people of equivalent wealth policing each other, but that isn't really happening. So (with one exception), at the upper tiers of stakeholders, we basically have a proof-of-stake blockchain masquerading as a social media platform.

As the whitepaper said, it's still doing the work of distributing the token, so it's really not harmful to the blockchain... but, I agree that it's toxic at the social layer. The problem is, downvote wars are also toxic at the social layer, so it's almost a matter of "pick your poison".

We've talked about the self-vote phenomenon before, and that aspect doesn't really trouble me. I only see over-valued and under-valued. If we find a way to get the values right, I don't care who does the voting. Campaigns against self-voting (IMO) will just spur the creation of Sybil accounts.

As I replied to @the-gorilla, I'll have more to say about this in a future post. Hopefully next week.

So (with one exception), at the upper tiers of stakeholders, we basically have a proof-of-stake blockchain masquerading as a social media platform.

Yes. It crowds out genuine effort. At the top end you have a purely inflationary high-APR DeFi chain that wants plausible deniability. And at the bottom end it's easier to churn out spam or plagiarized posts than it is to write genuinely good content (which has a high chance of getting lost in the shuffle anyway).

We've talked about the self-vote phenomenon before, and that aspect doesn't really trouble me. I only see over-valued and under-valued.

I think that it's a useful rule of thumb to assume that people aren't reliably good at evaluating the quality of their creative output. And humans are better at understanding bright-line rules than at making difficult judgments like what a post is worth, so I think it would be a beneficial norm to have even if it isn't 100% of the solution. I could also be on board with a norm like putting a rewards cap on posts that get auto-voted on (I think there can be good arguments for things like UBI, but it doesn't make sense for people to be making as much as the big accounts are making with upvu posts).

Campaigns against self-voting (IMO) will just spur the creation of Sybil accounts.

Sure, there's no single perfect solution. I just think the chain would work better if it had some more robust norms than "whatever big accounts can get away with".

I think that it's a useful rule of thumb to assume that people aren't reliably good at evaluating the quality of their creative output.

Yeah, I agree with this. I guess the reason I don't focus on self-voting is that a very low percentage of accounts actually have enough stake to over-value their posts without help.

the norm should have been "no self-votes", and delegating to a bot that votes for you based on your delegation is essentially a self-vote. Instead the norm we have is "don't piss off the big accounts".

I agree and similar to the ease with which self-voting could be banned, I can't imagine it being difficult to make it impossible to vote for somebody who's delegated to you.

These voting bots are hugely popular and embraced by the Korean community and I don't think they'd consider using the platform in any other way. Which is the round-a-bout way of saying: "we know nothing will ever change" 🤷‍♂️

As a community, we allow users to post utter shit every day in order to get their vote from upvu. We "allow" it because these users are seen as investors.

You're getting ahead of me here. Plagiarism/spam/AI content is probably my next post. But yeah, when I was highlighting this:

Any widespread abuse of the scoring system could cause community members to lose faith in the perceived fairness of the economic system.

and this:

All that is necessary is to ensure that abuse isn’t so rampant that it undermines the incentive to do real work in support of the community and its currency.

I have to say that both excerpts reminded me more of the Trending page than the situation that launched this conversation.

The blockchain has built-in ways to reward investors with interest for staking (or for holding SBDs) and with curation rewards. In principle, if we want to increase ROI for investors, it should be done by changing those parameters, not by diverting author rewards to investor/spammers at the social layer.

If we want to understand why the number of comments per block has fallen from >1 to 0.4 during the last year or two, part of it is almost certainly the bear phase of the crypto market, but I suspect that the dominance of bidbots is also a major factor.

So it feels as though abuse on the platform is inevitable and there's no knowing what difference the whack-a-mole makes.

This is why I touched on measurement in the conclusion. We really need a way to know whether we're making progress or not - and to be able to change course if we're not. On one hand, it's not possible to measure perfectly, but on the other hand I think that estimates are possible. I'll probably come back to that in a future post, too.

This is why I touched on measurement in the conclusion. We really need a way to know whether we're making progress or not - and to be able to change course if we're not.

In addition to my other comment, this might be a useful tool with regards to measurement...

https://steemit.com/hive-151113/@steemwatcher.com/weekly-statistics-of-steemwatcher-portal-or-or-07-06-2023

In principle, if we want to increase ROI for investors, it should be done by changing those parameters, not by diverting author rewards to investor/spammers at the social layer.

I'd love to see something like that happen - where the "Savings" element took a higher proportion of the reward pool than the author rewards. I guess the downside then is that there's no incentive to power up. I've not really thought this through and my brain's refusing to operate at the moment. Stupid brain.

On one hand, it's not possible to measure perfectly, but on the other hand I think that estimates are possible.

There are so many challenges with this, I don't know how feasible it would be. The Steem Watchers record the cases that they find but I believe the reward incentive to each watcher clouds the motivation to find more. E.g. if they receive $x for finding 6 plagiarised posts, what's the incentive for finding 50? Best to save them for the next report.

There's also the increased difficulty when people start translating content to hide their plagiarism and increasingly complex scams.

I think there was once a plan to write a bot which automatically assesses the originality of each post. @alexmove had something running when papi.mati was running the anti-plagiarism campaign. I don't know if what he developed can be repurposed as some kind of measuring tool?

In principle, if we want to increase ROI for investors, it should be done by changing those parameters, not by diverting author rewards to investor/spammers at the social layer.

I'd love to see something like that happen - where the "Savings" element took a higher proportion of the reward pool than the author rewards. I guess the downside then is that there's no incentive to power up.

If people could ever get past the terror of SBD prices being below $1 USD, doing conversions and arbitrage would be a way for investors to make money with STEEM while keeping it liquid, reducing their incentive to find "passive income" solutions that manipulate the reward pool.

Hi. @the-gorilla

Services like zeroGPT could be used to determine AI generation, as well as to determine non-uniqueness. The problem is volumes - to check ALL posts, huge volumes are needed to pay for the use of these services.

If you check selectively, then a certain structure of thoughtful checking is necessary. For now, this is the question. A small number of posts - you can check automatically. For example, selectively 1 post per month from the author (for example).

The problem is volumes - to check ALL posts, huge volumes are needed to pay for the use of these services.

I think that's the main problem with every approach to fight abuse on Steemit. There's got to be a human element at which point scaling it up becomes a big issue.

Loading...

We can deceive other people but never ourselves, it is impossible, the point here is the support that they take away from other people who really need it, and the saddest thing is that these people who do this do not have such a need, analyzing the way in which we human beings act It's pretty depressing and to think that we are created in the image of GOD, what a sadness so much evil prevailing

I completely agree with you, there are accounts that do a lot of duplicates to collect awards. They are all almost absolutely traceable. But it takes a certain amount of time. I have met dual and extended accounts that enjoy the support of booming.

They are all almost absolutely traceable. But it takes a certain amount of time.

Right. It's very time consuming, and you can often not be 100% sure.

this will take me time to read, till that I'll resteem the post, sounds like important stuff to me.