Torbjörn Granlund, principal author of GMP, told The Register, "AMD is (again) replacing my CPU. They are collecting detailed information on my system and the load I ran when it died.
This is the critical piece of info. AMD is standing behind the product and replacing the failed part for the customer. Moreover, it sounds like AMD is interested in digging into why this is happening and not just sweeping it under the rug.
I wonder if this can be fixed without any reduction in performance for the GMP test case (and similar use case profiles).
My Ryzen 9 7950X3D died a couple months ago. I just tried rebooting one day, screen went black during startup, and that was the end of it. Nothing but two red POST lights on the motherboard after that.
I had been donating some CPU time to a couple BOINC projects (solving math, physics, and biology problems), so it’s possible that might have contributed to its demise. Although, I typically had it throttled to not exceed about 25% of CPU usage, so I would think the CPU should have been able to handle it. But, ¯_(ツ)_/¯.
I’ve only seen one other CPU die in my entire life, and that was years ago when I was playing around with overclocking an old CPU. So, I was rather surprised when my 6-month old CPU went to an early grave.
The fine folks at AMD were kind enough to RMA the.dead CPU and replace it with a 9950X3D, which runs like a champ. But it also rarely gets above 5% CPU usage, which seems like a bit of a waste. But, I’m not in a hurry to use any of that excess capacity to run another BIONC project any time soon.
Of course the CPU shouldn’t be able to fry, there are built in functions that should supposedly throttle the CPU if it overheats.
But in the old days we could have the motherboard shut down the computer if the CPU overheated.
IDK why this feature was removed? I always had better than required cooling and the system set to shut down way below the heat tolerance of the CPU.Anyways it’s weird this happens with a mere 16 core CPU, it could possibly be the motherboard supplying way to much power.
Some motherboards stealthily OC the CPU as default, and some have been revealed to push voltage too, to make their motherboards appear faster in benchmarks. This happened on Asus motherboards, and Asus is absolutely doing exactly what I described.reports of AM5 socket burnouts can be attributed to hardware partners failing to adhere to recommended specifications.
Absolutely, this is IMO the equivalent to diesel-gate for motherboards. But this is not a specific problem to AM5, it’s done on ALL sockets. But apparently AM5 has lower tolerance.
Anyways AMD is investigating, and I’m sure they will make good on customers if it’s a systemic bug in the CPU.
But with only 2 cases being ½ a year apart, it’s a rare occurrence, and it could prove difficult to track the exact problem.From the pictures we can see the heat-sink was not optimally mounted, and to me it seems like too much paste was used.
Not bad enough that it should fry the CPU, but it may have contributed.