Thoughts about authenticity on the Steem blockchain - Volume #2: Measurement

in #steemexclusivelast year (edited)

Fighting abuse without measuring it might be a case of putting the cart before the horse.


Introduction

image.png

Pixabay license from christels at source

In the post, Thoughts about authenticity on the Steem blockchain - Volume #1: Identity, I wrote some thoughts about the authenticity of identities on the Steem blockchain. In this post, I'll follow that up with some additional thoughts about measuring the blockchain's inauthentic content.

To be honest, I started this post 3 or 4 times and deleted it every time because it's a frustrating topic. To put it bluntly, if the top-tier stakeholders aren't interested in solving the problem, then it probably can't be solved. And I don't see much evidence that many of the top-tier stakeholders are really motivated to solve it. So, is this something that I should really spend time writing about? I don't know. But, whatever the top-tier stakeholders might think about it - I still think it's an important topic, so here we go.

Let's start with some questions:

  • Does the amount of inauthentic content impact the price of STEEM?
  • Overall, in raw volume and as a percentage, how much inauthentic content is there?
  • STEEM WATCHER has now been operating for somewhere around a year. How has the blockchain's authenticity profile changed in that time? Is it better, worse or the same?
  • How can we measure these things?

I would argue that if we can't answer these questions, then we can't possibly make real progress against inauthentic content. So, how do we answer these questions (and others)? Let's talk about it.

Question 1: Does the amount of inauthentic content impact the price of STEEM?

Seems like a no-brainer to answer, "yes", right? To me, it does. It seems intuitively obvious that inauthentic content would drive down the price of STEEM and organic/authentic content would drive it up.

But, upon further reflection I'm not entirely convinced. Three of the metrics that I do keep my eye on are the price ratio of STEEM vs. BTC, STEEM vs. ETH, and STEEM vs. HIVE. From these metrics, it's not obvious to me whether the quality of our social-media content actually impacts the price of STEEM at all.

Basically, STEEM vs. BTC and STEEM vs. ETH have been trending downwards for a long time. My guess is that this is because of STEEM's inflation rate more than anything else. We'll get a better feel about whether that's right or not starting in 2024.

On the other hand, the ratio of STEEM vs. HIVE has been fairly steady for a while, even though the two blockchains almost certainly have very different authenticity and content profiles.

I have also read academic papers that indicate a relationship that operates in reverse. The higher STEEM's price goes, the more users we have, and the lower the price goes, the fewer users.

So, at this point my guess is this: Up until some minimum content threshold, STEEM's price is driven by speculation and inflation. There may be a point where content quality will take over in the driver's seat, but neither STEEM nor HIVE has reached that point yet (and it may not exist at all). However, that is just a guess. Which leads to the next question.

Question 2: Overall, in raw volume and as a percentage, how much inauthentic content is there?

In order to answer the first question, we need to be able to answer this question first. How much inauthentic content do we have in raw numbers and in comparison to authentic organic content?

My experience with moderating a very small community is that it's far too time consuming to be feasible to accomplish this manually at scale. Imperfect though they might be, there are tools for automated plagiarism detection and AI detection, but they cost money, and development would be needed in order to plug them into the Steem chain.

To answer this question, we need a way to classify every post - or a random sample of posts - as authentic or inauthentic, and then track the numbers over time. I asked this question second, but I think it needs to be the first one we try to answer.

Without an answer to this question, we can never know if any of our anti-abuse initiatives are having an effect. For example, one of the blockchain's preeminent anti-abuse initiatives is STEEM WATCHER.

Question 3: STEEM WATCHER has now been operating for somewhere around a year. How has the blockchain's authenticity profile changed in that time? Is it better, worse or the same?

I came across a post today where an author had apparently reposted the same content from several months before. One of STEEM WATCHER's abuse detectives noticed it and said so in the comments. The author agreed, replaced the title of the post with, "Mistake", and deleted the post. All good, except that the vacuous post is still valued at $25.

Over the years, I have made no secret of the fact that I'm not a big fan of the downvote. It can lead to intimidation and bullying, retaliation, and the fear that downvote wars gin up can drive people away from the blockchain. So, I get the desire to be conservative with the use of downvoting. However, I'm not sure what the value is of diverting blockchain rewards to abuse detectives who offer consequence-free guidance not to abuse the blockchain. In fact, if there's not enough abuse on the blockchain to support the abuse detectives this can even establish an incentive where abuse detectives would use one account to create inauthentic content and a second account to report on it.

The blockchain has sent a lot of rewards to the STEEM WATCHER abuse detectives during these last 11 or so months. Has the blockchain received a positive return on investment? I think the top-tier stakeholders might want to know the answer to this.

Question 4: How can we measure these things?

In SoK: The Ghost Trilemma, Sulagna Mukherjee, Srivatsan Ravi, Paul Schmitt, and Barath Raghavan argued that there are three fundamental aspects to identity that cannot be known simultaneously in a decentralized environment. These are: sentience, location, and uniqueness.

The article also makes this point:

Due to its inherently adversarial nature, the problem of identifying bots and bad actors online is better addressed than before through advances in adversarial machine learning.

This reinforces the point in my previous article that we're never going to have a perfectly authentic platform, and that we need to focus on continuous improvement and optimization. However, it also suggests a path forward on measurement.

As alluded to earlier, I don't think there is a manual solution to the problem. I think it needs to be automated, even knowing that it cannot be perfect. Once upon a time, we had the @cheetah account running around the blockchain with some sorts of automated plagiarism checks, and the account would accuse people of being plagiarists and spammers, and it was backed by enough downvote power to drain all but the worst of the worst.

On the plus side, it was automated. On the negative side, the methodology was opaque and often seemed unfair. Also on the negative side, it was entirely controlled by a single person or a small group. How can we do better? Here are some thoughts:

Guiding principles

  • Be transparent
  • Be decentralized
  • Focus just on measurement. Separate out the enforcement problem.
  • Praise in public, criticize in private.

A decentralized, open source, abuse sentinel

I would propose creation of a Steem community to build our abuse-measurement capability in public. The community would need to be backed by enough stake to reward meaningful participation. Then, the participants could be rewarded with blockchain rewards in similar fashion to how STEEM WATCHER rewards their abuse detectives, but participants would be posting about progress made in developing an open source tool for measuring Steem's on-chain authenticity.

This decentralized tool would let anyone download the code and run their own sentinel account, and it would initially provide the following capabilities:

  • Plug into one or more anti-plagiarism tools
  • Plug into one or more AI-detection tools
  • Track author identity scores over time in 3 dimensions:
    • sentience - Does the author exhibit a pattern of organic, human-like behavior?
    • location - Does the author reveal clues about posting from authentic locations?
    • uniqueness - Does the author distinguish themselves from other accounts

The tool would also let each sentinel prioritize different aspects of authenticity in order to distinguish their own account from the other sentinels. Each sentinel could fund their activity in two ways:

  1. Set up a subscription service and communicate with their clients on specific queries via encrypted memos or encrypted data in custom_json transactions. As far as possible, queries and replies about individual posts or accounts would be kept private.
  2. Post daily summary reports indicating the blockchain's currently observed level of authenticity. Of course, these reports would be eligible for blockchain rewards from the voters.

Anyone could participate, but for the most part, it seems to me that the people who would want to run sentinels would be the top-tier stakeholders and the web site operators. Top-tier stakeholders would be motivated to do it in order to protect the value of their investments. Web site operators would want to use it in order to create the most attractive web site possible.

I am intentionally light on details here because this is something that I think the working community should develop. My vision would be to develop it using python, but that might also require updates to the steem-python library. Maybe there is a better choice? People have been collaborating online in Open Source code development for decades now. There's no reason that we can't do it here on the Steem blockchain. What we need is the right combination of financial backing, skills, and motivation, and I think that a Steem community is the perfect place to create that combination.

Conclusion

One thing that I should note is that this problem is far from unique to Steem. For example, I recently saw this post from Twitter's Elon Musk,

image.png

Musk's thought that,

Only subscription works at scale.

gives rise to some possible enforcement mechanisms, such as charging subscription fees for community memberships and auto-muting non-members or the use of paid verification services (as proposed in Thoughts about authenticity on the Steem blockchain - Volume #1: Identity). Of course, I also have a lot of other thoughts about enforcement, which is where this eventually leads.

Right now, the one and only enforcement tool that we have is the downvote. The existence of a more objective measurement system like the one described above might remove some of the negative behavior that has been observed in conjunction with downvotes in the past.

As I said earlier in this post, however, I'm not a fan of the downvote. So, my highest hope would be that we examine the blockchain rewards mechanism and come up with a solution that incentivizes self-regulation. There might always be a need for downvotes, but hopefully we could minimize it.

Still, all of that is a possible topic for a future post. For this post, I mainly just wanted to highlight the need for measurement of authenticity so that we can know whether or not we're making progress and how (if) it impacts the price of the underlying STEEM cryptocurrency.

What are your thoughts on the topic?


Thank you for your time and attention.

As a general rule, I up-vote comments that demonstrate "proof of reading".




Steve Palmer is an IT professional with three decades of professional experience in data communications and information systems. He holds a bachelor's degree in mathematics, a master's degree in computer science, and a master's degree in information systems and technology management. He has been awarded 3 US patents.


image.png

Pixabay license, source

Reminder


Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.</b

Sort:  

Does the amount of inauthentic content impact the price of STEEM?

In my opinion, the amount of unoriginal content does not affect the price of STEEM. The determining factor is the popularity of the project. If the project is popular, has many visitors, it starts to be interesting to the business.

For example, I'm sure that your post has much less impact on the price of the token than a meme copied from somewhere that went viral and was viewed by 50 million people.

Overall, in raw volume and as a percentage, how much inauthentic content is there?

We cannot know now, but I am convinced that the share of original content does not exceed 20%.

Steem Watchers has now been operating for somewhere around a year. How has the blockchain's authenticity profile changed in that time? Is it better, worse or the same?

Good question. If a person posts a stolen photo outside of the community once a day to get votes from a bid bot, what's to stop them from continuing to do so?

How can we measure these things?

Google can help us in part. Its algorithm determines the original source so that this page occupies a higher position. Thus, attendance can serve as an indirect measure of the quality (originality) of the content.

Unfortunately, we don't even have access to the visitor counter on this front-end.

the share of original content does not exceed 20%

This is a shocking assessment, as it means that 80% of the content is crap as a result. OMG
Shocking also because I trust you on that assessment, because you hang around the whole Steem more than I do. I've already blocked out quite a lot from my "haze".
All the more important that this "phenomenon" is analysed objectively and dryly. How to act? I don't know... :-(
I thought my "application" would be a small sign for "love it, leave it or change it" - but under the given circumstances this thinking is obviously (once again) completely without a chance... :-(

I came to this conclusion not because plagiarism often happens to me. In fact, you and I live in a certain bubble. We spend almost 100% of our time in well-moderated communities we know. There is practically no plagiarism among the posts we find here. However, the Steem blockchain is much more than that. There are a lot of users who post unoriginal content outside of communities every day in order to get guaranteed upvotes. In addition, there are a number of applications that use blockchain to publish various content. For example, if you look at what is being published with the tag #iweb3, you will see many short posts. These are tweets that are duplicated by the Wormhole3 service in the blockchain for the purpose of receiving rewards. There are many such examples.

Therefore, I think that the proportion of moderated content written for readers is small. But I see no reason to give up. Quality decides many things. If we are talking about readers, it is unlikely that any of them will ever find a duplicate tweet from a search engine. The reader is most likely to find a post written by a real person for a real reader.

I believe that we should improve what we can improve without getting distracted by what we cannot change. There will be a chance to change something more, it is worth using it. So step by step we will be able to move forward.

Good point about the view counter. Steemit had one for a while, but they took it away. Recently, I've been wondering if they might consider publishing daily top-20 lists of most viewed posts in various languages. Either that or maybe add the view counter as a premium/subscription service...

I don't know if this will be helpful, but one thing I've been thinking about is that the reward system can be thought of as an exchange: the blockchain is giving a person a certain number of Steem and SBDs in exchange for the post being made public on the chain. The "prices" of these exchanges aren't taken into account by the price-tracking tools the way that a Steem-for-BTC trade on a public exchange would be, but people can informally do it. If they go to the trending page and see a low quality post there, and they see that lots and lots of Steem and SBDs are given to that post they develop the intuition that the value of a Steem is 1/Nth of the "true value" of a low quality post. I am under the impression that the bulk of rewards flow to a relative minority of posts, so it might be the case that you can maximize the efficiency of quality analysis by incorporating the volume of rewards covered by the analyzed posts into a metric.

Interesting. I hadn't thought of it that way, but it makes sense.

I agree with you that a large percentage of rewards probably goes to a small percentage of posts (and authors), so that might help with the analysis. Similarly, I have thought about comparing median vote vs. total payout as an indicator of quality.

I admire your way of approaching or wanting to approach every topic scientifically and with figures (i.e. facts that can hardly be manipulated). I am often far too emotionally and morally affected when I think: "Argh, no, the Steem Watchers haven't changed anything, everything here is and remains full of "abuse/abusers" and is ruining our Steem!" This is often unfair, because of course there are also many, many very good, original contents. And as you say, we are not the only ones with this problem. Accordingly, a factual, neutral analysis is needed! It should not even be necessary to mention that this should be supported by the large and larger stakeholders.
Your call to work TOGETHER with the possibilities of OpenSource programmes for such an initiative hopefully does not go nowhere (it would not be the first time...). Unfortunately, I cannot serve with competence or know-how in the matter of developing. But the motivation to support the initiative, which is just as much in demand, is there... :-))

It should not even be necessary to mention that this should be supported by the large and larger stakeholders.

Thanks for the feedback! Even if it's not this method, I sure hope the large & larger stakeholders prioritize some technique to start demonstrating continuous improvement.

I can remember cheetah and the account was appointed to catch and warn spammers. After shifting the ownership of this platform a lots of things were changed. Changes are positive.

In the past when I enter the content writing sites my earning was based on views like 1 view for 1 cent and 100 views for a dollar, I'm talking about other writing platforms that you may familiar with, I worked for bubblews, blogjobs and many other platform when I was just college student. In 2014 I started my journey as online content writer and the source of views were my twitter account which was created on 2013. I think we are already connected in twitter. I'm lucky to have you there!

Later I found steemit which pay it's native crypto coin or token steem. I met the step sister site of steemit and its known as hive but there some steem haters stopped my writing practice because I'm active with steemit . I was almost abused there in hive and last month my account was hacked with 1200+ hive, all asset stolen by hacker. Sorry I wrote a big essay on my writing journey, it made me nostalgic.

TEAM 5 CURATORS

This post has been upvoted through steemcurator08. We support quality posts anywhere and with any tags. Curated by: @o1eh

There are so many interesting facts in this article that just knowing that someone is thinking about them encourages me a bit.

In fact until recently my last perspective was that in Steem the moderations and checks only existed within the communities, as a kind of bubble where what happens outside the communities is ignored.

A large percentage of the users who write outside the communities do it with the purpose of abusing, using the krsuccess tag or any other automatic voting system without supervision.

On the other hand some time ago I reviewed the work of SteemWatchers and it seemed to me that the only thing they did was to mention the abusers and get paid for it without affecting the abusers or the abuse.

I think we need an automatic system similar to @cheetah with the characteristics you have described and a community of developers as utopian was in its time, look at what happens outside the communities because this also affects us all and be responsible with the automatic votes granted by uses of tags or votes of witnesses as these are currently one of the biggest incentives for users without SP abuse and fill the blockchain with junk content.

Elon Musk is not a man I would trust, I don't like this man, well none, but this one less :)