Some GCN GPUs in 16 games and 4 benchmarks
Oh wow this took me ages to do. But here I am actually writing the post on it. Honestly I am really tired and a bit depressed so I am, maybe, going to keep the typing to a minimum and just maybe let the results speak for themselves. Actually I will probably type a ton of crap anyway so just bear with me.
I tested my R9 280X, R9 290X, R9 380X, RX 570 and RX 590 in a load of games and a few benchmarks to evaluate how they performed against each other, while limited to 1000 MHz core and 192GB/s memory bandwidth. With some exceptions, more on that in a moment...
Okay this is a pure synthetic test honestly, for example the Tahiti GPU powering R9 280X was never designed to be limited to 192GB/s, so it will underperform when restricted like this. But why do it, you ask? Because this under-performing also helps me evaluate how much more efficient (especially regarding memory bandwidth) the newer chips are. Well, the ones with the same top-level structure.
There are a few notes though. Firstly, R9 290X wasn't tested at 192GB/s, because this would be completely pointless, as the Hawaii GPU has significantly more Render Back-Ends and Compute Units than the other chips. So I tested this card at reference clocks only. Secondly, RX 590 was tested even though it has 4 more CU than the 280X, 380X and 570. I tested this at the same core clock and memory bandwidth as I was curious to see how much difference the extra 256 SP and 16 Texturing Units actually made, nothing more.
I was unable to test the RX 590 at reference clocks because I was having issues keeping it GPU bound in my Test Suite. yes, my Ryzen 3 1200 is now holding back my test PC. Yes, I could have tested the 590 in the main PC with the 2700X but I am suffering a lot from anxiety and I just got the Radeon VII working and I don't want to remove it again. Okay?
And lastly, I tested the three main comparisons (280X,380X and 570) at their AMD-specified reference clock rates to measure product performance, this would not be a synthetic result. Please keep in mind that even though I think almost all R9 380X cards launched with higher speeds, AMD reference spec is quite low. Especially compared to 280X. This is what I tested with.
Oh look, I typed a huge amount...Anyway let's move on to the results after I give you details on the test rig:
Ryzen 3 1200 quad-core CPU @ 3.7 GHz
16GB (2x8GB) 2400 MHz CL14
MSI B350M Mortar
Silverstone Strider 1000W Gold PSU
AMD Radeon Adrenalin Driver Version 19.4.3
Note: As I mentioned above, yes I am aware the Ryzen 3 CPU here is bit... weak. I mean it was really cheap, but still a good CPU. Remember back before 2017, 4 threads was premium mid-range! (Thanks Intel). At least this one cost what it was worth. Anyway... I am making sure all results are GPU bound by analysing reported D3D Usage reported by Driver and Operating System, and then re-testing results that I suspect, with significantly lower CPU clock rates to verify that the result doesn't change, or at least, not by a huge amount. As you can see most of my results are not particularly high FPS. That is because I am using settings to emphasis GPU workload. Also, I am looking to get a new CPU for my test PC, probably a Ryzen 5 3600. Please just bear with me until then :D
Secondary Note: Yes, most of the titles I benched were using the built-in benchmark. Why does this not "bother" me? it is because it is not really a measure of their in-game performance. This is a synthetic test of GPU rendering performance under artificial limitations to compare architectures. It doesn't matter where the scene is benched. That said, A couple games I wanted to test didn't have a benchmark so I benched those in-game with Afterburner.
Tertiary Note?: You will notice I am using Low, medium or "normal" textures. That is of course deliberate, as this is a measure of GPU render performance, not VRAM restriction performance. The R9 280X is the weakest link here with "only" 3GB of video memory, and that is actually becoming really tight even at 1080p, let alone 1440p. Textures don't have an enormous impact on rendering performance so the results are still relevant.
The first graph results are with all listed GPUs at 1000 MHz core speed, and memory data-rate adjusted to provide 192GB/s. For the 256-bit cards this is 6Gbps, for the 280X which has a 384-bit memory interface, this is 4Gbps. This is synthetic test.
The Second graph includes the cards, with 590 replaced with 290X, at reference speeds to compare the GPUs at their "normal" operating frequency. This is less of a synthetic test.
Sleeping Dogs gains almost 14% from GCN1 to GCN3, when bandwidth and clock normalised. Tonga does have significantly better bandwidth efficiency due to the ability to compress frame-data on the fly, reducing bandwidth requirements in any given scene. Obviously, you are also seeing major Tessellation and primitive-rate gains thanks to doubled (and improved per clock) Geometry and Raster Engines.
RX 570 gains a very small amount, under 2%. This result was repeatable and measurable, but it is within margin of error so make of it what you will. I think Polaris is running into another limitation here, likely Compute through-put. I say that because with the 590 having the full 36 CU enabled silicon, we see 11.2% increase in performance. That is almost linear scaling with the 12.5% more stream processors this GPU has enabled. So Tahiti was unable to feed its compute units due to, likely, lack of memory bandwidth in this test, but Polaris can feed more CU due to larger L2 cache and DCC. Gains!
Well, that sort of backs up my theory that Tahiti is bandwidth limited here. Once Tahiti's memory speeds are unrestrained, and the chip can pump 288GB/s, it actually manages to essentially match Tonga's result in the first test. This shows that Tonga does have major bandwidth efficiency gains. Oh, and it's interesting that at AMD's reference spec, the 380X is actually slower than the 280X, but not by much. 290X here leads the pack with is comparatively enormous Compute and bandwidth on tap, but un-restrained RX 570 does get within 10%, at likely half the power consumption (but I didn't test power consumption. Yet).
Batman: Arkham Knight
Nothing much to report here, small, but definitely noticeable gains between the 3 major comparisons. And the 590 added to see what the full Polaris silicon can do at these speeds, too.
But once we set all GPUs to their "reference specifcation" clock rates, the situation changes a bit. The 280X and 380X are equal in performance, but this is okay in my books for gains, if you consider that the Tonga chip is working with significantly less memory bandwidth and slightly reduced engine clocks to achieve that result. Those are gains, but the consumer wouldn't really notice it. I always considered the 380X to be a sort of, replacement of the 280X, and indeed, it was $70 cheaper. So consider it the same performance, for less price, and higher efficiency, and a gigabyte more video memory. RX 570 almost matches the 290X here, I guess Gameworks tessellation effects are less intensive on the Polaris silicon, with its Primitive Discard, in hardware, and significantly higher core clocks.
Pretty linear scaling here with these GPUs. the Tonga gains 15% over the Tahiti at the same clocks and bandwidth, a decent jump. Tonga to Polaris is half as much, at just over 7% but this is also a decent gain as both these chips feature the same Geometry Front-end. Remember that Polaris has a larger L2 Cache, over twice the size (2MB vs 768KB) and in-hardware Primitive Discard, along with superior DCC algorithms. The extra 4 CU on the 590 provide with with 8.5% higher performance here. That is fairly close to the % gain of extra CU so the scaling is reasonable.