Gridcoin GPU mining (8): To the Edge and Beyond
So, you have finally set up your powerful, new, multi-GPU rig and you are ready to contribute some serious computing power to BOINC and computational science, storming the Top hosts list, earning a lot of Gridcoins and capturing the vaunted first place in a week or so, with a magnitude of 1000 or more. But alas, after two or three days, you realize that your BOINC credits don't add up, your RAC (Recent Average Credit) is too low and it seems like those Top hosts have 20-30% higher output, although you have already implemented all recommendations from Gridcoin GPU mining (6) and you are already running 24/7.
How is that possible? Worry not, cause here is the latest Gridcoin GPU mining installment, backed by almost nine years of experience of running BOINC on GPUs (yes, BOINC had first GPU applications long before Bitcoin even existed). This time we are going to the very edge of computing envelope, squeezing out every bit of performance, no holds barred. Needless to say, I am not responsible for any damaged hardware, fried GPUs, overloaded power supplies, blown fuses, huge electric bills, system crashes etc. This article is about maximum performance only, at all costs. So kiss your GPU warranty goodbye and let's go.
The 7970s
I am still running the venerable AMD Tahitis (HD7970/R9 280X). Introduced back in late 2011, Tahitis still rule in affordable FP64 computing power (it's easy to achieve 1.3 TFLOPS in FP64 with Tahitis). In comparison, newest AMD Vega is estimated to offer only about 0.8 TFLOPS in FP64 - if you need more, you'll have to buy professional graphic cards which are of course much more expensive. So, screenshots (and exact options shown in them) in this article will sometimes apply only to AMD GPUs, but underlying principles and guidelines are the same for all GPUs (both AMD and Nvidia).
Oldie but goldie - AMD Radeon HD 7970, original cooler, reference design (2012). Very reliable, very overclockable, full voltage control, plenty of excellent waterblocks. One of the best consumer GPUs for BOINC ever made.
1. Add BOINC folders to your antivirus exclusion list
Yes, I am aware this is going to raise many eyebrows right away. But the fact is that BOINC processes are suspicious to many security suites out there, especially on advanced or 'increased security' settings. BOINC and BOINC project's processes regularly access the Internet, download tasks, upload results, write to HDD, access CPUs and GPUs and their related drivers. Additionally, BOINC project's EXE files are often updated (as the science advances and computational algorithms are improved all the time). All this together is more than enough for many security suites to go ape. Many times, I have found BOINC stalled and computations interrupted, with my antivirus software patiently waiting on my decision on something like: "New milkyway_1.46_windows_x86_64_opencl_ati_101.exe is trying to access atikmdag.sys". And stalled BOINC is really the best case scenario here - crashed drivers, frozen Windows or a nasty BSOD are other less pleasant scenarios (which I have also experienced). So, add all BOINC folders to your antivirus exclusion list and spare yourself the trouble. But what if BOINC is tricked into executing some malware on my machine? Such a thing has never happened so far, but I have to admit it's always a possibility, however remote. But this article is about max performance only - you can't have max performance, max security, max usability and max efficiency at the same time.
Both Gridcoin and BOINC folders are on my Microsoft Security Essentials exclusion list
2. Disable CrossFire/SLI
If you have more than one GPU in your system, chances are CrossFire/SLI is enabled automatically in your drivers. Disable it now. CrossFire/SLI is known to interfere with BOINC computations, causing computation errors at worst or just reducing the computing performance at best. But what about that newest shooter I am occasionally playing? Disabling CrossFire will seriously reduce my frame rates. Well, you can always enable CrossFire/SLI right before playing and turn it off afterwards. No restart is required (at least for CrossFire, can't be 100% sure for SLI), so it can be done on the fly.
AMD Crimson Radeon Settings: Preferences->Gaming->>Global Settings->CrossFire Off
7970s still need CrossFire cables. Keep them connected and you can enable and disable CrossFire at will, through AMD drivers. Those cables are now deprecated, newer video cards just use PCI-E 3.0 for CrossFire communication.
3. Disable low power modes and energy saving stuff
Of course, all modern GPUs have energy saving technologies implemented, power efficiency being an important metric today. You don't need any of that stuff when running BOINC. Such technologies are also known to interfere with BOINC computations, causing BSODs and driver crashes at worst or reducing performance at best (and decreasing your Gridcoin earnings in turn). For 7970s, AMD calls this technology ULPS (Ultra-Low Power State). It is designed to turn off all GPUs, except the primary one which is used for video signal. Naturally, you want ALL your GPUs hard at BOINC work ALL the time, so turn off ULPS now. It's easiest to do it with MSI Afterburner:
MSI Afterburner: Settings->Disable ULPS. A restart is required and ULPS is gone.
Note: when you update your GPU drivers, ULPS will get re-enabled.
4. Overclock and overvolt
OK, now we are definitely getting into 'voiding your warranty' territory. But for maximum performance, serious overclock is a must. However, even with decent cooling, it's usually very difficult to increase the clocks for more than 5% before running into problems. For serious overclocking, you will have to increase your GPU voltage as well. I would say 10-15% voltage increase is still within the 'safe zone' and it will enable you to increase the clocks by the approximately same amount (and your BOINC output will follow). Again, MSI Afterburner is probably the best utility to do this:
HD 7970, running at default Core Clock of 1000 MHz and default Core Voltage of 1175 mV. Just move those sliders to the right to overclock and overvolt. I need only 1225 mV to run them at 1150 MHz, 7970s are renowned for their great overclocking potential. Memory clocks are best left alone.
Note: To overvolt, you must first go to MSI Afterburner->Settings and unlock voltage control and voltage monitoring. Some video cards don't support voltage control - in that case, your overclocking is severely limited and best left alone.
5. Increasing the power limits
This is only for hard core enthusiasts, willing to take any risks in order to squeeze every bit of performance out of their hardware. The fact is, even if you have applied all these tips so far, your BOINC computations will still be severely throttled. Your overclocked, overvolted GPU running many BOINC tasks in parallel now probably needs 100 watts over its default power draw to fully handle such a huge computational workload. But your GPU's internal power management won't let that happen, your graphic card would pull close to 300 watts in that case, dangerously exceeding the limits of its thermal design, the power limitations of PCI Express port and maybe even the limitations of your Molex power connectors. Therefore, the power draw is capped and your computations are throttled. So, it's time to take the final step and to remove this last obstacle in your quest for maximum performance.
Fortunately, all modern GPUs have implemented technologies for adjusting and controlling the power consumption. For HD 7970s it's called AMD PowerTune, later upgraded to Radeon WattMan (for RX400 series). Again, probably the easiest way to adjust this is to use MSI Afterburner:
Move that Power Limit slider all the way to the right. At zero, maximum power draw is 217 watts. At +20%, it's 260 watts. Data taken from VBE7 tool (BIOS Editor for Radeon HD 7000 series cards).
Default max is +20% and it's pre-programmed in your GPU BIOS, so you can't move that slider past +20. But 260 watts still isn't enough for maximum GPU performance we are looking for - your computations will still be throttled. Therefore, you need to mod the BIOS (using the above-mentioned VBE7 utility) in order to increase the maximum PowerTune value to +50%, which allows for whopping 326 watts. At +50% we can finally call it a victory - I have never experienced any throttling whatsoever at +50%. Your BOINC output will increase at least 20% with this hack, it's indeed the very edge of computing performance. Needless to say, your power consumption will go through the roof and power efficiency will be ruinously low, completely off the charts. But if it's winter at your place, it's better to run BOINC, mine Gridcoins and help science than to waste energy on heating :)
Now, I am not sure if all this will work with Nvidia GPUs, but you can easily try it out for yourself - install MSI Afterburner, increase the Power Limit to the max and run BOINC. Increased temperatures? Fans screaming? BOINC runtimes significantly decreased? It works. Go to Google, find a way to mod the BIOS/drivers/power circuitry/whateverittakes to increase the Power Limit even further. Mission accomplished.
How can I tell if my BOINC GPU computations are throttled? MSI Afterburner again. In my experience, if GPU utilization in MSI Afterburner ever shows full 100%, your computations are throttled and you need to increase your power limits, if you want maximum performance.
My Windows 7 tray (I like to have a lot of things in tray). Red numbers = GPU temperatures. White numbers = GPU utilization percentages. Yes, there's some throttling, I reduce the power limits during the summer (otherwise I have to run my a/c all the time).
Warning 1: This PowerTune +50 hack stopped working when I upgraded to Windows 10. Never found out why, I guess AMD probably implemented some additional restrictions into the Win10 drivers, to prevent users from damaging their video cards with such crazy hacks. When I reverted back to Win7, I was at +50 again.
Warning 2: Modding the BIOS can brick your video card, if done improperly
Warning 3: Modding the BIOS will definitely void your warranty
Warning 4: 300 watts per GPU is nuts. Without decent cooling, your GPU will have a rather short and unhappy life. Water cooling is a must, unless you live in Antarctica.
4 x HD7970, with EK-FC7970 waterblocks, backplates and a Quad Parallel bridge. Don't increase your power limits past +20, if you are air cooling.
Conclusion
Implemented everything? Enjoying your new performance? Receiving messages from other crunchers "Your RAC is very high, how is that possible"? Well done! But our BOINC adventures don't stop here. With your GPUs so extremely overclocked, you will most likely have to deal with another issue - the validity and integrity of your BOINC results. Because, when you crunch for science, only correct and validated results matter and everything else is discarded, with zero BOINC credits and zero Gridcoins earned in return. So, stay tuned for the next installments of Gridcoin GPU mining!
Excellent post! Vortac the GPU God once again showing all of us how it's done. Also, nice job on being the #1 host. Your 4 cards are generating 13% more RAC than 2nd place's 5 cards, that's nuts. Though maybe those 5 cards aren't the 7970s.
Ah, it's summer - off season for serious BOINC crunching and most crunchers are downclocking. During the winter I was able to reach 1.9 million RAC with that machine. Next winter I am aiming for 2 million :)
Ahh yes, that makes sense. It would be nice if there was an easy way to dump your heat outside in the summer.
Something like this, perhaps? I guess some aesthetics can be sacrificed :)
https://cryptocurrencytalk.com/topic/41967-boinc-hardware-porn-thread-show-your-rigs/?do=findComment&comment=221881
:-) that's pretty cool. boinc has taught me that there actually are benefits to living in a cold location, who woulda thought?