So, about a week ago, I posted this about my WCG farm, which was up and running with 5x 8-core processor machines all for the good of Science! Well, as they often do, some of my plans changed because I can now announce that the Farm has been renamed to The SCIENCE! Farm, and four of the Ryzen 7 2700 processors have been replaced with a pair of Ryzen 9 3900X processors, which are 12-core, 24-thread, Zen 2 on 7nm! In terms of core count alone, that is equal to 3x R7 2700s, but with the microarchitectural improvements and much higher efficiency, the performance per watt (the most important metric here) is likely to be around 2X what I was at before.
The last 8-core, a trusty Ryzen 7 PRO 2700X is still going alongside the new 3rd gen parts, limited to a 65W Package Power so it's by no means inefficient (around 1G Instructions retired/completed in WCG per watt, which is my minimum efficiency target. Source: AMD internal performance counters and a program made by a friend). Anyway, the restructuring of the farm is as follows:
4x Ryzen 7 2700s have been retired (sold).
2x Ryzen 9 3900Xs will be installed (1 installed, 1 pending) (70W cTDP)
Ryzen Threadripper 1920X has been retired (storage for now)
Ryzen Threadripper 2950X has been ULP optimised (manually: =< PPT: 100W, 3.2 GHz, ~950mV Core, 900mV SoC, 1333 MHz FCLK)
Sash's main PC now has a Ryzen 9 3900XT (the slightly enhanced one), which will crunch when I'm not using it. (80W cTDP)
Ryzen 7 PRO 2700X will continue to work as it was. (65W cTDP)
I really should update the Sash's Rigs page, as it hasn't been updated in ages and I ended up just putting a notice there. Maybe it will get some love soon when all my machines are up and running and all the kinks have been ironed out.
Anyway, the total computational resources of The SCIENCE! Farm will be, upon completion:
24x Zen+ (2nd Generation) Cores with SMT (48T)
36x Zen2 (3rd Generation) Cores with SMT (72T)
That is a total of 60 cores, 120 threads, of which 48/96 will be 24/7. The 12/24 for my main PC will probably be 18/7, if I assume I use my PC for about 6 hours a day (ish).
Notes on Zen 2 cores
The obvious advantage here is the 7nm process technology and improved architecture allowing for much higher efficiency - crucial for 24/7 compute on a power bill that I pay for directly. However, there is one more important advantage for the Zen2-based processors that I have just installed: L3 cache size.
Zen2 has 2X the L2 cache per core/thread vs Zen1 and its Zen+ derivative. Per 4-core "CCX" on Zen1, there is a 8MiB L3 cache, allowing each core (if balanced with a uniform access pattern/workload) about 2MiB, or 1MiB per thread with SMT enabled. This is doubled to 16MiB per CCX, and 2MiB per thread on Zen2. The advantage here is that many workloads on WCG are able to fit almost entirely in this L3 cache; drastically improving performance and reducing power-expensive off-die communication with the system memory. It also lowers the reliance on RAM speed for bandwidth reasons, letting me run cheaper, and lower power 1.2V DDR4 modules with less of an impact on performance.
Using the performance counter tool that my good friend Clam made, I measured around a 70-75% Hit-rate on the L3 cache on Zen1/+; on a Ryzen 7 2700. This is actually really good for Zen1, it means that around 75% of the time, the core is requesting required data that is already in the local L3 cache to that core - 35% of the time, it is missing that and accessing the system RAM. On Zen2, measuring on a 3900XT, this is around 85-90%, which is a marked improvement and demonstrates that huge L3 cache in action.
An interesting point to consider is that the 3900XT is a 12-core bin of a 16-core package: each CCX (There are 4) has a single core disabled, meaning the CPU is configured like 3+3+3+3, each group of 3 has a 16MiB L3 cache associated with it. Technically, in a fully saturated workload with all threads in use (like WCG at 100%), the 3900XT has even more of an advantage in that it has slightly more L3 cache per core, vs a 3950X, for example, which has the same 16+16+16+16 L3 config, but with 4 cores per group instead of 3. Well, I thought it was interesting, anyway. Of course, the architecture is built with 4C/16MiB in mind, and scales most effectively there; the bonus of the extra core per CCX in WCG would vastly outweigh the slightly lower L3 per core, but still. I thought it was cool to note that. :D
Anyway, I've digressed enough. Suffice it to say; I am quite happy. The 3900Xs will all be running Ubuntu Server with no (or minimal) GUI from Command Line, and limited to about the same package power that the 2700s run at by default: around 70W. With that in mind, each 3900X machine should be able to provide around 2x the results per day at the same power usage. But I will have to wait a couple months for the averages to even out.
MEOW!
Comentários