RE: Call for more tools on the testnet!
First and most important is to establish a CI - CD system and everything else can wait. We do NOT have a CI-CD system and unfortunately noone seems to be able to tackle this and efforts are getting confused between static code analysis using few tools instead of plain old-nightly builds. None of the tools matter or will ever work without a build and deployment system existing for atleast one asset/codebase - say STEEM .
A common and simple CI-CD pipeline looks like following from 2000s
https://gitlab.com/SteemCommunity/tinman/pipelines
list of tools
Tinman : http://github.com/steemit/tinman
Jussi : https://github.com/steemit/jussi
Account creation Faucet : https://github.com/steemit/faucet
Once again the single most important item is a automatic build system which I had mentioned again again :-)

There are actually quite extensive automated tests built into the Steem blockchain code, which get run automatically each time a pull request is merged into master. You can see this happening if you look at the PRs - it will show up as Jenkins-CI.
It would be great if more people did audits/reviews of these tests, and even submitted PRs if they noticed things were lacking.
Regarding CD, this is a bit more challenging with the way that the blockchain deployment is structured, as many changes require a hardfork. Even non-hardfork changes often require a lot of coordination to roll out. I'm sure there are some improvements that could be made here even within the constraints we have. Specific recommendations would be good things to post about.
Agree - but this is not a modern CI. This is just testing.
The tests run on STEEM code base via jenkins can be seen here : https://github.com/steemit/steem/tree/master/ciscripts
and build can be auto triggered like this : https://gitlab.com/SteemCommunity/steem/pipelines/33379067/builds
For the faucet its circleCI : https://github.com/steemit/faucet/tree/master/.circleci
The point is, these tests are not enough
What exactly makes it challenging ?
I would encourage you to right GitHub issues to cover specific test cases that are not covered by the existing tests.
Hardfork deployment requires a lot of coordination with exchanges. As can be seen with HF20, there are several major exchanges that still have not gotten back online after needing to upgrade.
Even non-consensus changes (such as 20.5) often require exchanges to upgrade and replay.
The TLDR is that blockchain deployments, which require coordination among lots of different decentralized parties in order to successfully do, are not the best suited for doing frequent small deployments.
List of Challenges in deploying TESTNETs for HF