Mini-Tech Babble #12: Little Navi
Updated: Aug 15, 2020
For awesome-quality Navi 14 die shots, check out Fritzchen Fritz's work here, and Nemez's awesome work annotating the die which you can find here.
This is a pure Babble from my inner self. It might not even make sense at some points. Warning: contains typos. I'll fix them later. Maybe.
UPDATE 15-08-2020: Please read my Knowledge Update on RDNA's Primitive Shaders!
The little Guy
Nobody paid much attention to Little Navi. Well, at least not as much as they did to his bigger Brother, Middle-sized Navi (Navi 10) that first debuted in the RX 5700 XT. People are obsessed with the high performance, large-die size, performance-tier pushing parts that usually cost a fortune too. And that's okay, I love those too. As a tech enthusiast, it's awesome to see the boundries of computing get pushed forward with each new architecture.
However, we must remember that the Little Guys, I am talking about GPUs like Polaris (10/11) and Navi 14, and before those: the likes of Pitcairn, the Immortal GPU. It is the smaller, 'entry-level' processors that enable low-cost access to the latest technology, and it is them that enable everyone - regardless of budget - to access it.
Now, you may already be aware that I hold Polaris in a very high esteem, much like I do Ryzen, especially with how they have enabled people on low incomes to break down the barrier to computing and PC gaming. Indeed, I am a big proponent of high value technology products because I feel they are the most progressive for the vast majority of the population.
That brings me to Navi 14.
But Sash, RX 5500 XT was kinda 'meh' on launch and the value wasn't even that great. You're an idiot, why are you typing this crap?
I am fully aware of that. But the purpose of this post is to talk about a GPU that often doesn't get a lot of attention, and well, I have one as my main GPU and I just wanted to talk about it. So you can just deal with that. CHUMP! That was a joke. :3
After my Polaris 30-based RX 590 burped and I decided to retire him permanently, I found myself using his somewhat ill-fated (value position) replacement, RX 5500 XT; based on the tiny - and somewhat adorable - Navi 14 graphics processor.

Navi 14 is a bit different from the GPU that it 'replaces' in terms of performance level, and it actually is more of the successor for Polaris 10/20/30's little brother, Polaris 11/21 - of which featured in the RX 460 and 560 cards. I actually touched on the subject of GPU succession when some dumb people threw their toys out of the pram complaining that RDNA (Navi) is crap because RX 5700 XT wasn't 'That much faster' than Vega 64.
I mean, RX 5700 XT's Navi 10 processor is designed to replace Vega 64, not succeed it. The job of succeeding the relatively fat (~500mm2) Vega 10 will be down to the so-called 'Big Navi' that everyone is hyped about for the next few months. Anyway, on the subject of Navi 14, this little guy is actually most likely intended to suceed the Polaris 11 processor - also known as 'Baffin' which servs on the RX 460 graphics card.
So in effect, what we have seen with RDNA is a pretty huge jump in performance, so much so that AMD has been able to push an entire tier of GPU up a notch - much like Nvidia did with the GK104-based GTX 680 in 2012. Navi 14; RX 5500 XT; for all intents and purposes, is an RX 660 at the same performance level as an RX 580.

Technical details of Little Navi
We really have to dive into the details of this little chip and compare it to its predecessors to get an understanding of the sort of market this little processor was built for. The post I mentioned above on GPU succession has a nice little layout of specifications that you can read, it is relevant to this subject.
Anyway, Navi 14 is a very small processor that has design choices to make the chip cheaper to make and improve yields, resulting in maximisation for margins in a market that already has very low margins (entry-level). Despite these constraints, Navi 14 achieves a full performance tier gain over GCN, being able to provide RX 580-like performance with fewer stream processors (though interestingly; more transistors - increasing clock speeds and upgrading internal caches eats into the transistor budget, along with new features such as video engine and my belief that an RDNA WGP [2x CU] has signficiantly more transistors than two GCN CU, additional Scalar unit, bigger caches, wow this bracket sentence is huge. It's 5.7b transistors for Polaris 10 and 6.4 for Navi 14, by the way, Navi 14 has 200m more transistos than even Hawaii [R9 290X]), half the memory interface width (GDDR6 memory helps), half the PCI-E Express lanes (4.0!) and only two primitive output units.
Bus Width to Graphics Memory
The memory interface on this GPU is only 128-bits wide. That is to say, it only needs four GDDR chips to occupy the interface to its fullest implementation - as GDDR memory chips have up to 32-bit wide access granularity, or 16-bit when a card is configured in 'clamshell mode'. The very fact that the bus is so narrow is a big indicator that this GPU was built to be cost effective - less complex PCBs due to fewer traces and fewer memory chips. A smaller bus also uses less power, and the physical connections (PHYs) on the chip occupy less space; about half as much space, as you might have thought, as the 256-bit connection on Navi 10, but maybe a bit more than half than Polaris 10 assuming we normalise for the process density: That is because I believe GDDR6 PHYs are slightly larger on chip than GDDR5 ones.
This contrasts the RX 580's 256-bit interface, and is likely helping to offset the added cost of using GDDR6 instead of GDDR5. Obviously, I have to point out that the data-rate on Navi 14's standard GDDR6 rating - 14 Gbps - is significantly greater than the standard of 8 Gbps on the GDDR5 for Polaris 10. Almost twice the signals per second means that despite the bus being 50% the width; the effective data bandwidth is almost the same - netting all those space savings in the process.
I said almost; Navi 14 with 14 Gbps GDDR6 along 128-bit produces a theoretical peak raw memory bandwidth of 224 GB/s. Polaris 10, with its 256-bit interface and 8 Gbps GDDR5 produces 256 GB/s in raw bandwidth; 32 GB/s more than Little Navi.
However. This would bring me to the little extra section tacked on under the Graphics memory Section.
Navi 14 has new tricks to improve bandwidth efficiency and I wish I had more Cache.
Since GCN3 (Tonga/Fiji) AMD has followed Nvidia and implemented a lossless compression technology on their GPUs; that essentially tries to minimise the amount of raw colour data sent to memory by grouping bits of data that are similar up together - essentialy compressing them and reducing the overall bits of data sent to memory. This allows additional traffic to occupy that saved space - inreasing effective bandwidth.
This technology is in its 2nd generation on Polaris; it will almost certainly have been upgraded to a 3rd generation on RDNA1 (Navi 10 & 14) GPUs. Since this compression technology is baked into the silicon logic, upgrades cannot be back-ported to older chips; newer ones with feature the improvements and upgrades.
Polaris also did something very special for bandwidth efficiency: More L2 Cache. A big trend in more modern processor designs is increasing cache sizes and performance; the more data you can keep on-chip, the better performance is going to be; and the lower power consumption will be, because going off-die to an external memory uses a pretty significant amount more energy than accessing an internal cache.
Polaris 10 increased the L2 cache size for the 'mid-tier' GPU from 768KiB on Tonga to 2MiB: this allows more memory accessess to remain on-chip and thus frees up more bandwidth for use by heavier memory operations such as often performed by the Render Back-Ends (ROPs). Now, Navi 14 also has an 2 MiB L2 cache - but before you bite my head off and shout that this isn't an improvement, remember that Navi 14 replaces Polaris 11 (RX 460) Which had just 1MiB of L2 cache - that is a 2X increase for this GPU 'class' (Little ones).
But it's not just L2 that got upgraded on Navi-based GPUs. Little Navi, just like Middle-sized Navi, feature an all-new revamped 'L1" cache system between the L0 within the Compute Units which sits above the register files, and the L2 cache, essentially buffering even smaller requests from accessing the L2 cache at all. This alone allows Navi 14's L2 cache to be used for heavier transactions in memory more often than Polaris's and means efficiency of bandwidth increases even more.
Also, Navi 14's L2 cache will have likely seen some old-fashioned improvements to bandwidth internally, so that helps a lot too.
Okay, so with this memory bit out of the way, you can see that Navi 14 is truly equipped to replace the previous 'middle-tier' GPU (Polaris 10) in performance but with 'Little-tier' hardware specs - that is how improved RDNA is, and we have not seen that very often with a GCN GPU. (Polaris 11 wasn't able to beat Pitcairn in R9 270X viably, and since we never got a GCN5-based 'middle tier' GPU [or at least, it was never released to consumers] we had to deal with GCN4 for this class all the way up until Navi 10 pushed the middle tier up to GCN5's top tier, and the little tier to GCN4's middle tier. If you follow me? :D)
RX 5500XT only has 8 PCI-E lanes because AMD is greedy and they want you to buy a Gen4 motherboard so it's actually worth it!!!
Untangle the jimmies there, this is a joke, aimed mainly at the chumps who actually claimed this. It's true that Navi 14 - and the card it's based on - RX 5500 XT - only has 8 PCI-E lanes, and it's also true that they are rated for Gen4.0 spec like all Navi-based GPUs so far, but the reason for there being half the normal expected amount is really nothing to do with greed or milking like those sensationalists like to believe.