How my bot learned KoreansteemCreated with Sketch.

in #curation8 years ago

I've spent the past several weeks creating and tuning an Autonomous Curation Daemon; better known as a "voting bot." Over the past couple weeks, I noticed that some of my biggest curation rewards are for posts that I never would have voted for manually. Here are several articles that it has voted for (note: I do not speak Korean):

https://steemit.com/kr/@twinbraid/7guiuv-01
https://steemit.com/kr/@sochul/extreme-trial-competition-october-23-2016-part-3
https://steemit.com/kr/@twinbraid/08
https://steemit.com/suwonsamsung/@rudxor8/20161020
https://steemit.com/kr/@yoonjang0707/2016-10-09-10-14
https://steemit.com/kr/@sanghkaang/3-part-1

Notice anything about this? All of those posts are in Korean. Let me reiterate: I know exactly 0 Korean words. Why did my bot vote for them? Only one conclusion is possible:



By Sirrob01 [CC0], via Wikimedia Commons

My bot speaks Korean.

How is this possible, given that I don't speak Korean?

Well, my bot is different from many of the other voting bots on Steem. Many voting bots have an algorithm that works like this:
if author_reuptation > 50 AND time_since_posting > 27 AND num_votes > 10
THEN vote

This is simple, it's efficient, it's easy to program, and fundamentally there's nothing wrong with it. But my bot is different. I am not disclosing the details of my algorithm yet, but essentially my bot looks at each new post on Steem and predicts how much that post will earn. If the prediction is high enough (and a few other conditions are met, like the author is not on my blacklist and @cheetah hasn't commented on it), my bot votes for the post. Nothing surprising there.

The magic part is what comes next: after the post pays out, the bot goes back and checks to see if its original prediction was good. If the prediction was too high, the bot remembers this; in the future, it is less likely to vote for posts that look like this one. On the other hand, if the prediction was too low (i.e., the post outperformed the bot's expectations), the bot remembers this and makes sure to vote more often for posts like this in the future.

It's a type of learning algorithm called unsupervised reinforcement learning. The "reinforcement" part means that it keeps track of what it's done in the past, and does more of the things that helped and less of the things that hurt. The "unsupervised" part means that I didn't tell it what to look for, I just told it how to look. It learned what to look for all on its own.

Put these two together, and you have a bot that can "learn Korean." If certain aspects of Korean posts have paid out well in the past, my bot will vote for more posts with those aspects - and in so doing, perhaps successfully vote for some Korean posts. Of course, my bot doesn't actually speak Korean. It just knows how to find successful posts...

Credit where credit's due: my bot is written in Python 3.5 and makes liberal use of @xeroc's piston.steem framework, and wouldn't be functional without @furion's steemtools library. Thanks guys! You're both awesome!

Sort:  

Did you use the archived blockchain for training?

This is why I really think that steemit is going to shine. It may be an immature social media platform, but it's a fantastic AI incubator. Not sure if you already saw it, but this old post of mine is relevant.
https://steemit.com/blog/@remlaps/a-far-fetched-prediction

I hadn't seen your post, thanks for pointing me to it. Good thoughts!

I did use the archived blockchain; I trained my model on about a month's worth of posts before the bot went live. I think a week of data would have been fine, but all the data was there - why not use it?

The cool thing is that all the data is still out there: I can simulate any voting algorithm I want. I have a lot of work to do before this is operational, but I'd love to use this to publish a set of benchmarks that people can use to evaluate their own curation (manual or automated). The Steemwhales.com curator trending lists already play this role crudely, but because you can't tell which of those is autonomous or whether their algorithms are adaptive, they don't really give you the information of a standardized benchmark.

Anyway, I'm rambling. I like your thoughts.

The cool thing is that all the data is still out there: I can simulate any voting algorithm I want.

It's a perfect training ground for a genetic algorithm.

So which user is your "Autonomous Curation Daemon"?

Almost all votes on top-level posts by my account @biophil are automated.

So @biophil is in fact a cyborg! I think I'd prefer if I knew if accounts were robots or not...

Yeah, it would be nice to have a little star by bot accounts, wouldn't it?

Amazing how well you are using code to make Steemit better.