> AMD’s software experience is riddled with bugs [...] AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.
This has anecdotally been true since forever. Back in the day, OpenCL implementations were passing conformance test but performance was poor. They could not turn hardware capabilities into performance for compute users. Drivers were buggy. Documentation was poor compared to NVidia's docs and forum. Offerings were inconsistent (look up Sycl from Codeplay) and ownership of what it is like to develop for AMD was unclear. The notion that it might not have improved or is only now improving is puzzling. It can't be for the lack of recognizing the problem. Intuitively it does not seem so difficult. I'm curious what the reasons are.
Havoc [3 hidden]5 mins ago
Mirrors the geohotz rants about AMD at the time, though as others point out this - 2024 - is ancient news in AI world and not quite sure what value it adds to the current discussions
tripledry [3 hidden]5 mins ago
Has this changed, If I want to go hands on with development using pytorch or whatever is used now, would you recommend an AMD card?
Genuine question, I have not followed this topic closely for years :)
Havoc [3 hidden]5 mins ago
Still rocking a 3090 so can't speak from experience but general vibe around simple at home inference seems like it has improved (esp since both vulkan and rocm are now viable paths on newer cards).
>development using pytorch
Would probably still play it nvidia safe for more adventurous stuff than token generation even if it has improved
andy_ppp [3 hidden]5 mins ago
Please just get everything in PyTorch to work, and work well (and across all graphics cards too). This is the starting point and it doesn't matter how you do it. But the fact you cannot even do some very basic stuff on AMD is going to mean you are left unused by researchers, so getting further up the stack is going to be almost impossible.
roenxi [3 hidden]5 mins ago
Does PyTorch not work on AMD cards? I remain very nervous about returning to the AMD ecosystem. On paper AMD has been a compelling choice for GPGPU work for years, up until it turns out the hardware can't actually do what it claims. But the PyTorch problem seemed to be largely solved years. The issues weren't on the application layer, it was crippling firmware bugs that they didn't seem interested in getting a handle on. PyTorch ran fine until the computer kernel paniced or whatever, but that isn't a PyTorch problem.
joelthelion [3 hidden]5 mins ago
The problem is "just". "Just" getting pytorch to work and to work well is a huge undertaking.
andy_ppp [3 hidden]5 mins ago
Just here means at minimum or first and foremost, no excuses. I obviously understand this is a huge undertaking. Nobody said attempting to be competitive with NVIDIA in AI would be a walk in the park.
blitzar [3 hidden]5 mins ago
for a trillion dollars, they should be able to figure it out.
fancyfredbot [3 hidden]5 mins ago
Please amend the title, this is a December 2024 article and the conclusions are misleading in 2026
ZiiS [3 hidden]5 mins ago
Correction: Why wasn't it competitive 2 years ago; basically half the AI summer ago.
threepts [3 hidden]5 mins ago
NVIDIA has such a big moat around their CUDA architecture such that I don't think AMD will ever be able to outcompete them in AI compute unless they somehow find 2-3 nobel prize level breakthroughs today.
pstuart [3 hidden]5 mins ago
If AMD's betting the company on their AI compute, they had best follow the advice in the article because the only way to compete with NVIDIA is to meet/exceed not just the performance but also the DevX.
dingdingdang [3 hidden]5 mins ago
These days it's for sure the dev environment that is lacking, hardware is okay (potentially great?!), software abysmal. To run a local llm in a stable manner implies using Vulkan.. any attempt at ROCm is totally hamstrung by haphazard support of hardware alongside with an online presence poisoned by people primarily discussing work-arounds rather than work when it comes to AMD as a platform. Argh.
KeplerBoy [3 hidden]5 mins ago
You can't have good performance without good DevX. There's a reason why we get a new python dsl for nvidia GPUs every week.
agunapal [3 hidden]5 mins ago
Nvidia had the first movers advantage. Nvidia spent so many years perfecting CUDA to work well with PyTorch. Before ROCM, there was only CUDA. There were so many developers building their use cases on top of PyTorch+CUDA, and bringing all that feedback to PyTorch, this made CUDA battle ready and stable. AMD can get there, especially now with demand for compute, but as someone already said here, the biggest focus needs to be on PyTorch
DiabloD3 [3 hidden]5 mins ago
I love how they just butcher that article.
I remember when it came out a little over a year ago, and its just as wrong as it is today as it was then.
arka2147483647 [3 hidden]5 mins ago
The important part of Hardware, is Software
After all, if the Software does not work, its just a Paperweight
wewewedxfgdf [3 hidden]5 mins ago
AMD just doesn't seem to be that good at software.
This has anecdotally been true since forever. Back in the day, OpenCL implementations were passing conformance test but performance was poor. They could not turn hardware capabilities into performance for compute users. Drivers were buggy. Documentation was poor compared to NVidia's docs and forum. Offerings were inconsistent (look up Sycl from Codeplay) and ownership of what it is like to develop for AMD was unclear. The notion that it might not have improved or is only now improving is puzzling. It can't be for the lack of recognizing the problem. Intuitively it does not seem so difficult. I'm curious what the reasons are.
Genuine question, I have not followed this topic closely for years :)
>development using pytorch
Would probably still play it nvidia safe for more adventurous stuff than token generation even if it has improved
I remember when it came out a little over a year ago, and its just as wrong as it is today as it was then.
After all, if the Software does not work, its just a Paperweight