This is annotated to the best of AW's knowledge. AW is not making any guarantee that it is 100% accurate, however AW is reasonably confident it is accurate. Please do your own research, or add a disclaimer before citing the diagram.
|| eria.chip.diag.Polaris 10
|| 10 (the 20 [570/580] and 30  variants should look indentical to the naked eye) graphics processor as found in many entry to mid level graphics cards in the Radeon RX 400 and RX 500 series. Die shot credit: Fritzchens Fritz. This is best viewed on the desktop site (Mobile site is not supported for detailed images).
You can click on the image to zoom in and scroll on it.
|| 1. Compute Unit (Graphics Core)
The processing core of the GPU, this is where the numbers are 'crunched'. The Compute Unit of this Polaris 10 chip contains 64 individual number crunching pipelines - known as 'Stream Processors'. These 64 pipelines are arranged into 4 groups known as Vector Units or SIMD Units. Each group has 16 pipelines that work together to perform calculations on 16 individual 32-bit Floating Point numbers, or a lower precision Integer number (on this chip up to 24-bit Integer at lower speed than FP32). These "Vector Units" of 16 pipelines each work on the 16 different numbers, (inputs) but perform the same instruction/calculation on each of them, resulting in 16 different output numbers. This is called "Single Instruction Multiple Data" and in 3D graphics it is used to work on vector matrixes containing the co-ordinates of vertices in 3D space, and manipulate the object such as rotating around it. The "Vector Units" can also work on compute tasks that benefit from this type of massively paralell computational power. Each CU also contains an private L1 cache 16 KiB in size.
|| 2. Shared CU Cache (Shared with 3 CU)
A Fast, small, on-chip memory block that is shared between 3 adjacent Compute Units. It is used for storing data that the Compute Units are working on immediately, after the register files and L1 cache private to each CU.
|| 3. Shader Engine (GPU Substructure / Core Array with Raster/Geoemetry)
A Shader Engine is a collection of GPU Processor Cores, or Compute Units. These CU are arranged together in groups of 9 on this Polaris 10 GPU, and are wired into a Geometry Processor and Raster Engine from the front-end. When shading an image the Scheduler tries to balance the shading load evenly between all four shader engines.
|| 4. L2 Cache Partition (Fast on-chip memory)
Visible here is a large amount of SRAM (blue blocks). This is a very fast on-chip memory that acts as a cache to the GPU before going to its main external Graphics Memory. This block represents a partition of the L2 Cache, the full Polaris 10 chip has 2048 KB (2MB).
|| 5. Front-End (GPU Command Processor / Scheduler, Geometry Engines, Tessellators, Raster Engines, Global Data Share)
This block contains many different logic circuits. Within this highlighted area is the GPU Command Processor; a central logic block that assignes and distributes work to the various Processing Cores and other hardware elements. Here is also the Asynchronous Compute Engines (ACE) and 'Hardware Schedulers' (HWS). Also within this area is a quartet of Geometry Processors, each with a Tessellator and Raster Engine next to it. These Geometry Processors are responsible for setting up primitives like triangles and vertices before they are shaded by the Compute Units. Transformation of the viewport window is likely also performed here. Raster Engines are responsible for taking a primitive and turning it into a pixel-grid of information that can be sent to the ROP to have its final colour determined on the screen. Global Data Share is a cache of information that is shared globally across the GPU.
|| 6. Render Back-End (Pixel Pipeline)
The Render Back-End is one of the final stages in the graphics pipeline. This block is tasked with turning all the crunched 3D data into a final colour that represents a pixel to be displayed on the screen. The RBE highlighted here can work with 4 Pixels per clock (4 ROPs). Polaris 10 has 8 of these blocks for a total of 32 Pixels per clock (32 ROPs)
|| 7. 32-bit GDDR5 PHY (Physical wires to an external GDDR5 chip)
This is a physical connection to the traces on the PCB around the GPU chip that connect with its external memory packages. The block highlighted here connectes to a single GDDR5 memory chip with a 32-bit wide data width. The full chip has 8 of these which aggregated into a 256-bit memory interface with 8 GDDR5 chips. On many video card products featuring this GPU, there will be 8 of these chips each 1GB in size for 8GB of total video memory. The data is striped across all 8 chips simultaneously when reading or writing in order to extract maximum paralellism and memory bandwidth.
|| 8. Display PHY (Physical wires to external displays)
(75% Sure). This is a physical connection to the external displays that output pixel information to a monitor that is connected to the video card.
|| 9. PCI-E PHY (Physical Wires to Peripheral Component Interconnect - Express bus)
(90% Sure). This is a physical connection to the PCI-E interface of the Video card. It is via these wires that the GPU will communicate with the rest of the system, such as recieving frames ready to be rendered, from the CPU, in a video game.
|| 10. Display Engine / Display Controllers / Codec
(75% Sure). Within or around the highlighted area is the GPU's video engine and display controllers. This section contains dedicated fixed-function blocks that output signals to the display PHY for standards like HDMI or DisplayPort. Also within these blocks is fixed-function logic responsible for video encoding and decoding in hardware such as AVC (H.264) and HEVC (H.265).