Oh wow this took me ages to do. But here I am actually writing the post on it. Honestly I am really tired and a bit depressed so I am, maybe, going to keep the typing to a minimum and just maybe let the results speak for themselves. Actually I will probably type a ton of crap anyway so just bear with me.
I tested my R9 280X, R9 290X, R9 380X, RX 570 and RX 590 in a load of games and a few benchmarks to evaluate how they performed against each other, while limited to 1000 MHz core and 192GB/s memory bandwidth. With some exceptions, more on that in a moment...
Okay this is a pure synthetic test honestly, for example the Tahiti GPU powering R9 280X was never designed to be limited to 192GB/s, so it will underperform when restricted like this. But why do it, you ask? Because this under-performing also helps me evaluate how much more efficient (especially regarding memory bandwidth) the newer chips are. Well, the ones with the same top-level structure.
There are a few notes though. Firstly, R9 290X wasn't tested at 192GB/s, because this would be completely pointless, as the Hawaii GPU has significantly more Render Back-Ends and Compute Units than the other chips. So I tested this card at reference clocks only. Secondly, RX 590 was tested even though it has 4 more CU than the 280X, 380X and 570. I tested this at the same core clock and memory bandwidth as I was curious to see how much difference the extra 256 SP and 16 Texturing Units actually made, nothing more.
I was unable to test the RX 590 at reference clocks because I was having issues keeping it GPU bound in my Test Suite. yes, my Ryzen 3 1200 is now holding back my test PC. Yes, I could have tested the 590 in the main PC with the 2700X but I am suffering a lot from anxiety and I just got the Radeon VII working and I don't want to remove it again. Okay?
And lastly, I tested the three main comparisons (280X,380X and 570) at their AMD-specified reference clock rates to measure product performance, this would not be a synthetic result. Please keep in mind that even though I think almost all R9 380X cards launched with higher speeds, AMD reference spec is quite low. Especially compared to 280X. This is what I tested with.
Oh look, I typed a huge amount...Anyway let's move on to the results after I give you details on the test rig:
Ryzen 3 1200 quad-core CPU @ 3.7 GHz
16GB (2x8GB) 2400 MHz CL14
MSI B350M Mortar
Silverstone Strider 1000W Gold PSU
AMD Radeon Adrenalin Driver Version 19.4.3
Note: As I mentioned above, yes I am aware the Ryzen 3 CPU here is bit... weak. I mean it was really cheap, but still a good CPU. Remember back before 2017, 4 threads was premium mid-range! (Thanks Intel). At least this one cost what it was worth. Anyway... I am making sure all results are GPU bound by analysing reported D3D Usage reported by Driver and Operating System, and then re-testing results that I suspect, with significantly lower CPU clock rates to verify that the result doesn't change, or at least, not by a huge amount. As you can see most of my results are not particularly high FPS. That is because I am using settings to emphasis GPU workload. Also, I am looking to get a new CPU for my test PC, probably a Ryzen 5 3600. Please just bear with me until then :D
Secondary Note: Yes, most of the titles I benched were using the built-in benchmark. Why does this not "bother" me? it is because it is not really a measure of their in-game performance. This is a synthetic test of GPU rendering performance under artificial limitations to compare architectures. It doesn't matter where the scene is benched. That said, A couple games I wanted to test didn't have a benchmark so I benched those in-game with Afterburner.
Tertiary Note?: You will notice I am using Low, medium or "normal" textures. That is of course deliberate, as this is a measure of GPU render performance, not VRAM restriction performance. The R9 280X is the weakest link here with "only" 3GB of video memory, and that is actually becoming really tight even at 1080p, let alone 1440p. Textures don't have an enormous impact on rendering performance so the results are still relevant.
The first graph results are with all listed GPUs at 1000 MHz core speed, and memory data-rate adjusted to provide 192GB/s. For the 256-bit cards this is 6Gbps, for the 280X which has a 384-bit memory interface, this is 4Gbps. This is synthetic test.
The Second graph includes the cards, with 590 replaced with 290X, at reference speeds to compare the GPUs at their "normal" operating frequency. This is less of a synthetic test.
Sleeping Dogs gains almost 14% from GCN1 to GCN3, when bandwidth and clock normalised. Tonga does have significantly better bandwidth efficiency due to the ability to compress frame-data on the fly, reducing bandwidth requirements in any given scene. Obviously, you are also seeing major Tessellation and primitive-rate gains thanks to doubled (and improved per clock) Geometry and Raster Engines.
RX 570 gains a very small amount, under 2%. This result was repeatable and measurable, but it is within margin of error so make of it what you will. I think Polaris is running into another limitation here, likely Compute through-put. I say that because with the 590 having the full 36 CU enabled silicon, we see 11.2% increase in performance. That is almost linear scaling with the 12.5% more stream processors this GPU has enabled. So Tahiti was unable to feed its compute units due to, likely, lack of memory bandwidth in this test, but Polaris can feed more CU due to larger L2 cache and DCC. Gains!
Well, that sort of backs up my theory that Tahiti is bandwidth limited here. Once Tahiti's memory speeds are unrestrained, and the chip can pump 288GB/s, it actually manages to essentially match Tonga's result in the first test. This shows that Tonga does have major bandwidth efficiency gains. Oh, and it's interesting that at AMD's reference spec, the 380X is actually slower than the 280X, but not by much. 290X here leads the pack with is comparatively enormous Compute and bandwidth on tap, but un-restrained RX 570 does get within 10%, at likely half the power consumption (but I didn't test power consumption. Yet).
Batman: Arkham Knight
Nothing much to report here, small, but definitely noticeable gains between the 3 major comparisons. And the 590 added to see what the full Polaris silicon can do at these speeds, too.
But once we set all GPUs to their "reference specifcation" clock rates, the situation changes a bit. The 280X and 380X are equal in performance, but this is okay in my books for gains, if you consider that the Tonga chip is working with significantly less memory bandwidth and slightly reduced engine clocks to achieve that result. Those are gains, but the consumer wouldn't really notice it. I always considered the 380X to be a sort of, replacement of the 280X, and indeed, it was $70 cheaper. So consider it the same performance, for less price, and higher efficiency, and a gigabyte more video memory. RX 570 almost matches the 290X here, I guess Gameworks tessellation effects are less intensive on the Polaris silicon, with its Primitive Discard, in hardware, and significantly higher core clocks.
Pretty linear scaling here with these GPUs. the Tonga gains 15% over the Tahiti at the same clocks and bandwidth, a decent jump. Tonga to Polaris is half as much, at just over 7% but this is also a decent gain as both these chips feature the same Geometry Front-end. Remember that Polaris has a larger L2 Cache, over twice the size (2MB vs 768KB) and in-hardware Primitive Discard, along with superior DCC algorithms. The extra 4 CU on the 590 provide with with 8.5% higher performance here. That is fairly close to the % gain of extra CU so the scaling is reasonable.
Things are different now the GPUs are unrestrained. The 280X shows yet again that it can match or beat the 380X, simply by brute-forcing the bandwidth equation. 570 and 290X are evenly matched, decent gains for Polaris from architecture and of course raw clock speed.
In HITMAN we see a larger gain from GCN1 to GCN3, and a smaller one to GCN4, like we notice in other games. Of course, actually adding twice the physical hardware logic is the best approach, right? Also, the additional 256 stream processors and 16 texture units on the RX 590's GPU provide 9.6% more performance here.
It is worth noting here, that even with the Tonga chip in the 380X operating at its lower, AMD "reference specification", it can still provide more performance, despite the Tahiti GPU in R9 280X having 58% more raw memory bandwidth (and 3% higher GPU engine clock). This game is using a DX12 renderer, so maybe it has something to do with Tonga's superior hardware-feature support for DX12 (12.0 vs 11.2, Tahiti supports DX12 in API/Software only). The 290X's bigger Hawaii GPU muscles to the top here, but only leading over the Polaris 20 (PRO) chip RX 570 by 7.7%. Polaris's higher raw clock rate and architecture improvements are in action.
Metro 2033 Redux
One of my favourite games! In Metro 2033 Redux, there are nice and solid gains for each card over the previous. Nothing really noteworthy to point out but as you can see between the 3 GPUs with 32 CU (2048 SP, 128 TMU) and 32 ROP, there is 21% "IPC" increase between GCN1 and 4, from various architectural, and other improvements to the non compute/render parts of the GPU. AMD hasn't been doing nothing all this time apparently! (But I knew that already).
The actual, product gains here are much smaller between 380X and 280X, as we saw before also. But I am looking at it architecturally, and Tonga is for sure more efficient whatever way you look at it. RX 570 pulls far ahead as expected from that 24% increase in clock rate. R9 290X actually strikes a draw with the RX 570 here, Metro 2033 Redux is quite heavy on the geometry/tessellation aspect so that is likely playing a part in that.
Middle-Earth: Shadow of Mordor
Here is Shadow of Mordor. I do think this game is bandwidth limited on the Tahiti, as we have seen before. You'll see what I mean in a moment...
A common occurrence is for the R9 280X, when using its reference memory speed, to match or beat the R9 380X even during the first test at 1 GHz and 192GB/s. This really does show that when we artificially limit Tahiti to 192GB/s it is actually bandwidth starved in many frames. And also that Tonga is significantly more efficient when working with just that memory bandwidth.
Rise of the Tomb Raider
Tahiti again showing either major geometry or bandwidth limitation. Probably both. Also this is DX12 so keep that inferior Hardware support in mind (it can make a difference). Otherwise, Polaris needs those four extra Compute Units to really make a difference over Tonga at the same speeds, but there is at least a small gain at 32 CU for GCN4.
Okay so from this we can see even with significantly more memory bandwidth, Tahiti is still eclipsed by Tonga, and the latter is actually running at lower core clock, too. Geometry performance potentially important here. Either way, the R9 290X can muscle through the frames (or parts of the frame) that require comparatively less geometry performance, and more Compute, and produces a higher overall frame-rate because of that.