(Tech Babble #17): RDNA2 (Navi 21 specifically), GPU L3, caching in, chiplets? Fabric?!

UPDATE!: I had more I wanted to type but I need to rest my squishy meatfingers as they are tired. I will make a Tech Babble #17.1 post soon! :D

Hello! I apologise for the title pun. It's been a while since I made a tech babble! Yep, it's been ages, in fact. But what better time to type a tech babble than when my new medication has just kicked in (I also just finished a bottle of Lucozade) and I'm literally right here thinking about this subject. Oh, I have #18 planned, too, as I talked to myself (as usual) pacing in my living room about GCN and transistor per TFLOP, compute density and logic use vs RDNA and why CDNA is based on GCN and how RDNA was built for games, similar in nature to my Vega, Shaders and GCN babble but with updated Sash Knowledge! Yes!

Okay, this one is about AMD's relatively new graphics architecture, RDNA2. It launched in the Radeon RX 6000 series a few months ago to a really good reception and, honestly, surprised the crap out of me with how performant and efficient it is compared to the competition of the NVIDIA GeForce RTX 30 series. I felt like, with the Radeon RX 6900XT, AMD was gunning for the top-dog for the first time in a long time and, I know it wasn't, but I giggle and smile if I think 6900XT was made as an answer to the last point in my Radeon Wishlist that AMD's official Twitter account liked and replied to back when Sash had a Twitter account (Big mistake).

So without further ado, let's type a babble about RDNA2's biggest hardware feature, for me, (it's not even the Hardware Accelerated Ray Tracing!); the L3 cache, or as AMD calls it, the 'Infinity Cache'. While N21, 22 and I believe 23 have the IFC (I will call it that or L3), this post is about Navi 21 specifically, as this is the Big Chungus Boi that socks it to Nvidia for the first time in a unrefreshingly long time.

AMD loves Caches.

This is not something GPUs are historically known for, but here we are, in 2021 with a GPU and an L3 cache. In fact, what is interesting to note is that AMD has been working towards really fast, effective on-chip caches for their processors for a long time, kicking off with Zen1; which made use of a really effective L3 cache system and is in large part, why it does so well despite the fabric signal-routing latency innate to the Zen processor designs. (Yes, Zen1's L3 cache isn't notably bigger than competing designs at the time, but it WAS faster and an example of AMD's focus on caches coming to fruition).

Zen2 introduced the chungus L3 cache with 16MiB per 4-core CCX block, a whopping 64MiB per AM4 package (4x16MiB 4-core CCX in 2x CCD), and this is when AMD really took its gloves off with caches. Why am I babbling about CPU Caches in a GPU l3 cache babble post? Well, because it's very important to note the focus upon which AMD has set their talent on these past few years. And that is, as any processor engineer will say is the holy grail (in a way); keeping more data on the silicon, closer to the execution cores is extremely important. Even more so, when you consider that AMD's chiplet/MCM approach adds additional routing and trace length (small but it does add up) latency to larger off-die memory pools like DDR/GDDR. Caches help offset that.

Wait, Sash, Navi 21 isn't a chiplet or MCM, so... Don't interrupt me! I'm flowing. I'll come to that in a moment! Anyway! The deal is, AMD's design philosphy is around caches and making them really, really big, and really, really fast. Moving back to Navi 21, one of AMD's slides from their launch material hits me as underscoring AMD's nod to its proud design of caches from the Zen-class CPU cores.

AMD has been touting this 'Infinity' solution for its processor designs for a while. This includes the all-purpose data fabric/interconnect (the well-known Infinity Fabric) and now, this L3 cache system on RDNA2, specifically here, Navi 21; the largest implementation of that architecture in silicon.

My ideas for this babble are literally flowing faster than my pathetic organic hand-based appendages can facilitate linguistic transcription onto this computing device via physical contact with the human interface device known as a keyboard. They are also cold, which doesn't help. Oh well, back on the subject. What I wanted to say, is... (let's make a sub heading).