(GPU) NVIDIA GeForce RTX 2060 SUPER

(Profile updated as of 12th August. 2019)

Here is my GPU profile for the RTX 2060 Super. I wanted RTX so I took the plunge with the second wave of Turing-based 20-series cards.

(click for full images).

(Picture 1) The silicon die of TU106-410. It is surrounded by 8x 1024 MB GDDR6 SDRAM chips, making up the 256-bit interface. The image above is from TechPowerup's review of the card.

(Picture 2) The architectural block-diagram for TU106-410. Note the disabled TPC with its two Streaming Multi-Processors.

(Picture 3) Actual silicon die-shot using Infrared imaging of the TU106 GPU, the chip pictured is the 200 silicon, from RTX 2060, but I have annotated a single disabled TPC with two SMs that the 2060 Super has laser cut. Image credit is to Fritzchens Fritz for the die shot, and the information on GPC structure on die is from this highly useful tool.
2060 super.jpg

Graphics Card Information

Graphics Card: NVIDIA GeForce RTX 2060 SUPER

Graphics Card Manufacturer: NVIDIA

Graphics Card Release Date: July 9th, 2019

Graphics Card MSRP: $399 USD

Graphics Processor Codename: TU106-410

Graphics Processor Manufacturer: NVIDIA

Graphics Processor Implementation: Cut die

Graphics Interface: PCI-E 16x Gen3

Architecture: Turing (TU10x)

Lithography Process: TSMC 12nmFFN FinFET

Approximate die size: 445mm²

Sasha's GPU die Size Rating: Large

Approximate Transistor Count: 10,800 Million

Approximate Transistor Density: 24 Million / Square Milimetre

GPU Features

Double-speed FP16 Shading: Yes (FP16x2 Facilitated by Tensor Cores)

Asynchronous Compute Capability: Full

DirectX Hardware Support: DX12.1 (FL 12_1)

Dedicated DXR Acelleration on chip: Yes (RTX)

Variable-rate Shading: Yes (Adaptive Shading)

Adv. Geometry shading: Yes (Mesh Shading)

Adv. Geometry shading (Programmable/DX12 Mesh Shaders): Yes

AI/ML Acceleration: Yes (Tensor Cores)

Advanced Memory Management: No

Integer and Float Shader Co-execution: Yes

Tile-based Renderer: Yes

GPU Computing Resources

GPU Substructures: 3 Graphics Processing Clusters, 17 Texture Processing Clusters (18 TPC Full chip)

Graphics Cores: 34 Streaming Multi-processors (36 Full Chip)

Graphics Cores per Substructure: 2 per TPC, 2 x GPC with 12, 1 x GPC with 10

Total Stream Processors (ALU/Shaders): 2172 (float/Int) (2304 Full Chip) *

Stream Processors per Graphics Core: 64 Float32, 64 INT32

Graphics Core SIMD Structure: 4 x 16 Float32, 4 x 16 INT32

Total Special Execution Units: 544 Special Function Units (576 Full Chip), 544 Load/Store Units (576 Full Chip) 272 Tensor Cores (288 Full chip) 34 Ray Tracing Cores (36 Full Chip), 68 FP64 CUDA Cores (72Full Chip)

Special Execution Units per Graphics Core: 16 Special Function Units, 16 Load/Store Units, 8x Tensor Cores, 2 FP64 CUDA Cores, 1x Ray Tracing Core

Total Texturing Units: 136 (144 Full Chip)

Texturing Units per Graphics Core: 4

Pixel Pipelines (ROPs): 64 (8 x ROP Partitions with 8 Pixels per clock)

Level 2 shared on-chip cache: 4096 KB

Geometry/Tessellation Processors: 17 (18 Full Chip)

Raster Engines: 3

GPU Memory Subsystem

Graphics Memory Type: GDDR6

Graphics Memory Standard Capacity: 8192 MB

Graphics Memory Composition: 8 x 1024 MB GDDR6 SDRAM Chips

Graphics Memory Access Granularity: 32-bit (4 bytes)

Graphics Memory Standard Clock Speed / Data Rate: 1750 MHz / 14000 MHz

Graphics Memory Full Interface Width: 256-bit (32 bytes per clock)

Graphics Memory Peak Memory Bandwidth: 448 GB/s

GPU Frequency and Peak performance

Graphics Engine Clock: 1650 MHz *

GPU Computing Power FP16: 14,335,200‬‬ Million operations per second with FMA

GPU Computing Power FP32: 7,167,600‬ Million operations per second with FMA

GPU Computing Power FP64: 223,987 Million operations per second with FMA

GPU Texturing Rate INT8: 224,400‬‬ Million texels per second

GPU Texturing Rate FP16: 224,400‬ Million texels per second

GPU Pixel Rate: 105,600‬‬ Million pixels per second

GPU Primitive Rate: 4,950‬ Million triangles per second *

GPU Thermal and Power

Standard Cooling Solution: Dual-Fan Axial cooler with Vapour chamber heatsink

Typical Board Power: 175 W

Maximum Board Power: Varies per design (210W standard)

Maximum Allowed Junction Temperature (TJ Max): 89*C

Graphics Card description

GeForce RTX 2060 Super launched in early July, 2019 as a refresh of the 'mid-range' 20 series graphics cards, based on the TU10x Turing architecture. This card doesn't entirely replace the 2060 vanilla, instead it occupies a price point just above that card, and under the original RTX 2070 by 100 USD, while offering more or less the same performance as that card. A major advantage of the RTX 2060 Super is that it now features 8GB of video memory, and along the 256-bit memory interface with the same 14Gbps GDDR6; has a lot more memory bandwidth to back it up. RTX 2060 Super for all intents and purposes, is an RTX 2070 with a single TPC disabled and slightly more aggressive clock rates, resulting in the same performance.

Being based on the TU10x Turing architecture, using the TU106 silicon, the RTX 2060 Super fully supports Hardware-accelerated Ray Tracing, using Microsoft's DXR or other API versions. In addition, the GPU also contains "Tensor Cores" for acceleration of machine-learning and AI workloads. Like all Turing GPUs, TU106 silicon represents a fairly significant change over previous generation 'Pascal' processors, some major changes include a switch from Instruction Level Paralellism and dual-issue warps, to a thread-level paralellism design and major overhauls to the streaming multi-processor with enhanced L1 cache performance and size. In addition, Turing-based GPUs have dedicated pipelines for Integer shader code, which can now execute non-dependent instructions of Floating-point and Integer types, concurrently. In games that utilise lots of mixed instructions of INT and FP, this can result in a fairly significant increase in shading efficiency.

Turing GPUs are built on TSMC's 12nmFF 'N' Process (FFN, the "N" designated that this process is optimised especially for Nvidia). Due to the increased transistor requirements of the additional hardware logic (INT pipes, Tensors, Ray Trace, and vastly increased caches), and the lack of any significant density improvement afforded over 16nmFF from TSMC, the Turing chips of the 20 and 16-series are very large. For example, the 'smallest' Ray-Trace capable Turing GPU, TU106, a truly 'mid-range' GPU is now almost as large as the flagship GP102 processor from the previous-generation GTX 1080 Ti, and TITAN Xp video cards. (445 vs 471mm²), with almost as many transistors (10.8 vs ~12 billion).

The RTX 2060 Super holds the accolade of being NVIDIA's first "60" ('Mid-range') positioned graphics card with 8GB of video memory, but the price puts it more in the territory of the "70" series of previous generations.

Graphics Card approximate 3D Performance

Sasha's gaming performance rating (2019): Great for 1440p maximum settings 60 FPS (1080p High settings with DXR), or 1080p maximum settings high refresh (no DXR)

GeForce RTX 2060 Super provides performance around the same as the original RTX 2070, putting it slightly ahead of the RX Vega 64 and GTX 1080. Performance is comparable to AMD's latest RX 5700 (non XT) video card. This results in great performance for a 1440p monitor, with maximum detail settings, in the latest titles in 2019. Unique to the 20-series, the RTX 2060 Super can use dedicated hardware blocks to accelerate Real Time Ray Tracing in video games. With this feature enabled, the card is pushed down a performance tier to 1080p, where it provides reasonable FPS.

Notes

Graphics Engine Clock

NVIDIA-spec rated boost is listed. Actual gaming clock will be higher due to GPU Boost. It varies on power limit and cooling capacity, per design but will likely be around 1900 MHz. As a result is almost impossible to say what each card will run at in gaming situations.

GPU Primitive Rate

Raw triangle output based on my understanding of the Raster Engines. PolyMorph engines attached to each TPC may have an effect on total triangles rastered.

Total Stream Processors (ALU/Shaders)

Only 32-bit precision CUDA cores are listed, and only advertised CUDA cores. You can see the SIMD structure for the full pipeline count in 32-bits.

Misc.

This bit is for my personal opinion on this Graphics card / Graphics processor

Sasha's Awesomeness Rating: Cool and innovative technology. But still too expensive to be truly awesome.