The Gridcoin Fireside #6 - Gridcoin 4.0.3.0: The Scraper. This one is kind of a big deal.

in #gridcoin6 years ago

Image Coming Soon.png

Image Coming soon!

The Gridcoin Fireside #6

Gridcoin 4.0.3.0 - The Scraper


The Gridcoin Fireside is a participatory podcast brought to you by the Gridcoin community. It is intended to introduce and explore DLT, cryptocurrency, and crypto-economics from a science oriented perspective.

The audience is welcome to participate through voice-comms or via the discord text chat.

Join us every Thursday at 8:00pm EDT on the Gridcoin discord server.

Catch up on past episodes here.



Recorded on May 17th


Listen to or download the episode here.


This week we discuss the release of Gridcoin 4.0.3.0 and the implementation of the Scraper. With the scraper Gridcoin has become a combined DPoS/PoS protocol with the semi-decentralized "scraper nodes" fetching off-chain data, forming consensus, and integrating it into a fully decentralized blockchain via unique "superblocks".

This is a very significant development for Gridcoin and potentially the larger blockchain space. The potential here is not to be understated.


TimeSubject
1:10Polls
CSG and DrugDiscovery@Home Removal Polls
8:15Status of Gridcoin Community and Protocol
17:15Gridcoin 4.0.3.0 - The Scraper Replacement of the "Neural Net"
23:00How Unique is the Scraper as a Tool?
28:15A Combined DPoS/PoS Protocol for Off-chain Data Collection and Integration
30:55Scraper Security Mechanisms
41:00How Do You Become a Scraper Node?
46:30The Scraper, BOINC GDPR'd Projects, and Possibilities Looking Forward
54:30Introduction to How Scraper Nodes Form the Superblock
58:20Collaborating with Universities and Outreach


What is Gridcoin?

Gridcoin is a multi-incentive open-source blockchain. Incentive one, stake rewards, rewards participants for securing the ledger through a proof-of-stake protocol. Incentive two, research rewards, rewards participants for contributing to approved projects hosted on the distributed computing infrastructure, BOINC.

BOINC, the Berkeley Open Infrastructure for Network Computing, hosts major institutional computing projects such as IBM’s World Community Grid, SETI, and data from the Large Hadron Collider, alongside projects developed by students, enthusiasts, mathematicians, researchers, and citizen scientists.

Want to Learn More?

Website: https://gridcoin.us
Discord: https://discord.gg/jf9XX4a
Steemit: https://steemit.com/created/gridcoin
White Paper: https://gridcoin.us/assets/img/whitepaper.pdf

-- Github --

Codebase: https://github.com/gridcoin-community/Gridcoin-Research
Community: https://github.com/gridcoin-community

Sort:  

I think it would be great to have a scraper node application process: Fulfill these X criteria to become a candidate scraper.

One thing not covered in the discussion was geographic and hardware diversity for scraper nodes. Im sure you realise why having all scapers on AWS in the USA (for example) would be a bad idea; if AWS or the USA was offline for some time then no superblocks would occur. Clearly this isnt as serious as blockchain fullnode diversity, but 5 scrapers feels a little on the light side to me.

I think an enhancement may be to use what you described as the DPoS type thinking, where if we had more validated scrapers, a simple round robin cycle could be used. Perhaps the maximum we should have is 25 to prevent BOINC projects feel like they are being DDoSed, and if we have 50 vailated scrapers we just take turns in who provides the consensus for each SB.

I would probably happily operate a scraper node, Im not the most active in the community, but I've been around Gridcoin since Classic days, and a BOINCer for over a decade.

One thing not covered in the discussion was geographic and hardware diversity for scraper nodes. Im sure you realise why having all scapers on AWS in the USA (for example) would be a bad idea; if AWS or the USA was offline for some time then no superblocks would occur. Clearly this isnt as serious as blockchain fullnode diversity, but 5 scrapers feels a little on the light side to me.

I can answer this for our current status. As of right now, all 5 scrapers are in unique locations and spread around the globe. At present we have scrapers covering US East, US West, the UK, Germany, and Australia. We also have a diverse group of service platforms hosting the scrapers, with at least 3 unique providers being utilized (to my knowledge). I agree more scrapers would be beneficial. Are there any other geographic zones we need to cover? Ideally we should cover more of Asia, but most node graphs seem to show a low percentage of our userbase in Asia (perhaps VPNs are masking this?).

Ok well thats good to hear. We shouldnt try to include a geographic location at the expense of another key attribute, but Asia, S.America and Africa would ideally have some representation.

You don't really need that many more. Perhaps 10-15 is really far more than enough. 5 scapers allow 2 to be down and require 3 to agree to converge, 10 scrapers allow 4 to be down and require 6 to agree to converge. Remember the whole algorithm is designed to essentially make the scrapers transparent. The nodes do not trust individual scrapers. 25 and up is really starting to put on the load on the BOINC servers again, for essentially no benefit.

We also cannot be distributing the Einstein credentials to the entire network, or even a subset unless those people are validated, as that is not in keeping with their intended use of the credentials. Hence the scrapers will always be run by trusted members of the community.

I am all for having more, to a point. Would be happy for you to run one if you want @scalextrix.

For many people, myself included, downtime is not a fear. Collusion is the fear. As you state in the recording, this is blockchain. The rule is "trust no node". So if we must trust for now, which we must, trusting 5 people is incredibly scary, but understandable for a boot-strap. Trusting 10-15 is absolute minimum in my eyes. Trusting 25 would be good. Down the road, as the tech develops, trusting even more would be best.

I would not worry about BOINC server load if we build an ecosystem that encourages BOINC development.

I would not worry about credentials if we encourage or incentivize entities to seek scraper status.

Additionally, the more entities that put reputation at risk by hosting scraping nodes, the fewer scraper nodes are needed to establish reliability. A university department is less likely to collude and risk damaging its reputation than an individual. This also depends on how blockchain law evolves.

This is one of these things we have to be thinking decades ahead on or we'll be building a system that can easily be replaced by one that is better.

Hmm... You are not trusting 5 people. The algorithm is built to cross-verify. Collusion is the real issue. Starting with the position that the scrapers are independent, it requires a bad actor to gain direct control of 60% of the number of scrapers, and publish stats in such a way to achieve convergence (i.e. matching to hash and signature). Because of hashed and signed nature of the messages, man in the middle attacks will not work. (Because the man in the middle does NOT have the private key to sign the messages properly.)

The sole issue here is the probability of non-independence of the nodes. Certainly the chance of "collusion" is higher with a low number of nodes, but getting beyond even 10 or 15, the chances of collusion between 60% of the nodes is exceedingly small.

We need to publish a list of the scraper nodes to show people that they are truly independent.

Sorry for necro-post, but collusion is about trust! haha I hear what you're saying. With 5 scraper nodes we are trusting that those 5... actually just 3 of them... will not collude.

I understand the need for security-through-privacy etc. etc. The boot-strap as is is fine. Looking forward to building it out!

Hi, yes Im certainly not advocating for a lot of scrapers, or as you say you we would be hurting projects again. My thinking is to perhaps have enough validated scrapers available "waiting in the wings" in case of catastrophe.
Not so much because I think its a problem having 5, but I have seen enough "decentralize everything" rhetoric that gets thrown at projects.

Ill drop you a PM on slack, let me know whats required and if I can comply then will be happy to support the network in this effort.

There is a difference between "decentralize everything" and "this is probably not decentralized enough". Having a core mechanism that can be easily corrupted with minimal effort or cost is fairly dangerous. Look at EOS and other DPoS systems along with the words of incredibly reputable figures in the bitcoin and blockchain and how they approach decentralization.

This is a mechanism that should be as decentralized as possible. It does not need to be 100% distributed, however.

To be clear, though, 5 is fine for a boot-strap. It's similar to how bitcoin started.

Scrapers waiting in the wings is a good idea.

I think you hit some hot topics for where I also hope this mechanism goes.

  1. More scrapers
  2. Some sort of rotation system based on some variables like location, OS, hardware, etc.
  3. Transparency and verified identification of scraper operators, so perhaps eventually institutions/entities getting involved. A mix between individuals and entities I think would be best.

I think there are dozens of potentials with regards to where we can take this mechanism and I am very excited to see what people come up with. Things like the gamification/incentive potentials. Maybe second, third, or fourth sets of scraper nodes for collecting additional off-chain data/statistics from other distributed computing platforms (or other things like solar energy production) to be incentivized.

The round robin of 50 scrapers sounds a lot like combining Brod's DWP proposal with this scraper mechanism which I think holds potential, but would need input by more technically minded people.

For now I think keeping things as is is the most practical route as this scraper, while minimal, lets us focus on the other things we need to fix. I see this as a boot-strap of what can become a fairly intricate system that has some pretty neat effects. I would not feel comfortable with the mechanism as is if either the network grows rapidly or if a significant amount of time passes (for now, let's say a year) without revisiting its operation and the points you brought up.

50 isnt a magic number I think we should reach, just saying if we had more quality providers than we need it would be better to share the load than turn offers away.

hi all,

Since version 4.0.3 you can't have your data files in a folder with characters like "á" or so...
After crying a river, I discovered the root cause.