(GPU) NVIDIA GeForce GTX 1660
(Profile updated as of 10th July. 2019)
Here is my GPU profile for the GTX 1660. NVIDIA's probably best value Turing-based graphics card in my opinion and even then it's not like Oh My God I want To Buy It Now.
TU116 is interesting to me because NVIDIA has stripped out the Tensor and RT hardware to make a smaller, leaner die to address the high volume market that is very sensitive to die size and cost of manufactuering, and probably doesn't care about RT just yet (even if this chip had RT cores: It wouldn't be fast enough to actually do anything with them. RTX 2060 barely is).
(click for full images).
(Picture 1) The silicon die of TU116-300. It is surrounded by 6x 1024 MB GDDR5 SDRAM chips, making up the 192-bit interface. Note the two missing solder points indicating this PCB supports a 256-bit GPU interface, but is unused on TU116-based cards.
(Picture 2) Actual silicon die-shot using Infrared imaging of the TU116 GPU, the chip pictured is the 400 silicon, from GTX 1660 Ti, but I have annotated a single disabled TPC with two SMs that the 300 (GTX 1660) has laser cut. Image credit is to Fritzchens Fritz for the die shot, and the information on GPC structure on die is from this highly useful tool: https://misdake.github.io/ChipAnnotationViewer
(Picture 3) The architectural block-diagram for TU116-300. Note the disabled TPC with its two Streaming Multi-Processors.
Graphics Card Information
Graphics Card: NVIDIA GeForce GTX 1660
Graphics Card Manufacturer: NVIDIA
Graphics Card Release Date: March 14, 2019
Graphics Card MSRP: $219 USD
Graphics Processor Codename: TU116-300
Graphics Processor Manufacturer: NVIDIA
Graphics Processor Implementation: Cut die
Graphics Interface: PCI-E 16x Gen3
Architecture: Turing (TU11x)
Lithography Process: TSMC 12nmFFN FinFET
Approximate die size: 284mm²
Sasha's GPU die Size Rating: small-medium
Approximate Transistor Count: 6,600 Million
Approximate Transistor Density: 23.2 Million / Square Milimetre
Double-speed FP16 Shading: Yes (dedicated FP16x2 pipelines)
Asynchronous Compute Capability: Full
DirectX Hardware Support: DX12.1 (FL 12_1)
Dedicated DXR Acelleration on chip: No
Variable-rate Shading: Yes (Adaptive Shading)
Adv. Geometry shading: Yes (Mesh Shading)
Adv. Geometry shading (Programmable/DX12 Mesh Shaders): Yes
AI/ML Acceleration: No
Advanced Memory Management: No
Integer and Float Shader Co-execution: Yes
Tile-based Renderer: Yes
GPU Computing Resources
GPU Substructures: 3 Graphics Processing Clusters, 11 Texture Processing Clusters
Graphics Cores: 22 Streaming Multi-processors (24 Full Chip)
Graphics Cores per Substructure: 2 per TPC, 2 x GPC with 8, 1 x GPC with 6
Total Stream Processors (ALU/Shaders): 1408 (float/Int) (1536 Full Chip) *
Stream Processors per Graphics Core: 64 Float32, 64 INT32
Graphics Core SIMD Structure: 4 x 16 Float32, 4 x 16 INT32
Total Special Execution Units: 352 Special Function Units (384 Full Chip), 352 Load/Store Units (384 Full Chip) 1408 FP16x2 CUDA Cores, 44 FP64 CUDA Cores (48 Full Chip)
Special Execution Units per Graphics Core: 16 Special Function Units, 16 Load/Store Units, 64 FP16x2 CUDA Cores, 2 FP64 CUDA Cores
Total Texturing Units: 88 (96 Full Chip)
Texturing Units per Graphics Core: 4
Pixel Pipelines (ROPs): 48 (6 x ROP Partitions with 8 Pixels per clock)
Level 2 shared on-chip cache: 1536 KB
Geometry/Tessellation Processors: 11 (12 Full Chip)
Raster Engines: 3
GPU Memory Subsystem
Graphics Memory Type: GDDR5
Graphics Memory Standard Capacity: 6144 MB
Graphics Memory Composition: 6 x 1024 MB GDDR5 SDRAM Chips
Graphics Memory Access Granularity: 32-bit (4 bytes)
Graphics Memory Standard Clock Speed / Data Rate: 2000 MHz / 8000 MHz
Graphics Memory Full Interface Width: 192-bit (24 bytes per clock)
Graphics Memory Peak Memory Bandwidth: 192 GB/s
GPU Frequency and Peak performance
Graphics Engine Clock: 1785 MHz *
GPU Computing Power FP16: 10,053,120 Million operations per second with FMA
GPU Computing Power FP32: 5,026,560 Million operations per second with FMA
GPU Computing Power FP64: 157,080 Million operations per second with FMA
GPU Texturing Rate INT8: 157,080 Million texels per second
GPU Texturing Rate FP16: 157,080 Million texels per second
GPU Pixel Rate: 85,680 Million pixels per second
GPU Primitive Rate: 5,355 Million triangles per second *
GPU Thermal and Power
Standard Cooling Solution: Custom designs with various heatsink types from small single-fan to large multi-fan designs
Typical Board Power: 120 W
Maximum Board Power: Varies per design
Maximum Allowed Junction Temperature (TJ Max): 95*C
Graphics Card description
GeForce GTX 1660 is a low-mid range graphics card released by Nvidia in early 2019 to provide a low-cost entry to the Turing architecture without the bloated dies caused by dedicated Ray Tracing and Tensor hardware. This card is a die-cut TU116 processor, the same used in the more expensive GTX 1660 Ti but with a TPC disabled losing 128 CUDA cores, a Tessellator and 8 Texturing units. It also trades the latest GDDR6 memory technology for cheaper, more common GDDR5. As a result the GTX 1660 can hit the very sweet spot of around £200, making it compete with AMD's incumbent Radeon RX 580 in price, however price cuts to that card have reduced its cost even further and now the GTX 1660 is competing with RX 590 (due to unofficial price cuts) and offers slightly more performance at similar price. GTX 1660's advantage is significantly reduced thermal output and power consumption, but at the cost of having less video memory (6GB vs 8GB on the RX 590).
Interesting to note is that all TU116 boards feature pinouts for 8 DRAM chips meaning these PCB were built to house a 256-bit GPU (TU116 is natively 192-bit, you can see on the die shot). I think it is a cost saving measure to reuse trace designs from the 256-bit TU106 chip used by the RTX 2070.
Graphics Card approximate 3D Performance
Sasha's gaming performance rating (2020): Great for 1080p High settings 60 FPS
GeForce GTX 1660 provides great performance paired with a 1920x1080 monitor and running games at, or close to, maximum detail settings and 60 frames per second. It provides performance slightly ahead (~5%) AMD's Radeon RX 590 but with significantly lower power consumption. Performance is around 10-15% ahead of the last-gen GTX 1060 6GB and Radeon RX 580.
Graphics Engine Clock
NVIDIA-spec rated boost is listed. Actual gaming clock will be higher due to GPU Boost. It varies on power limit and cooling capacity, per design but will likely be around 1900 MHz. As a result is almost impossible to say what each card will run at in gaming situations.
GPU Primitive Rate
Raw triangle output based on my understanding of the Raster Engines. PolyMorph engines attached to each TPC may have an effect on total triangles rastered.
Total Stream Processors (ALU/Shaders)
Only 32-bit precision CUDA cores are listed, and only advertised CUDA cores. You can see the SIMD structure for the full pipeline count in 32-bits. For example the GTX 1660 actually has 1408 FP32 CUDA cores and 1408 INT32 CUDA cores, that is a total of 2,816 CUDA cores, but as I just said only half can do Floats and half can do Integers. Gaming performance uplift in shading from this design is from limited (<10%) to fairly significant (30-40%) Depending on the types of instructions in shader code.
This bit is for my personal opinion on this Graphics card / Graphics processor
Sasha's Awesomeness Rating: Pretty Good