This is the sort of thing that to me highlights the inherent inefficiency of proprietary software and processes.
"Oh sorry, you'll need our magic hardware in order to run this software. It simply can't happen any other way."
Turns out that wasnt true which of course it isn't.
Imagine instead of everyone could have been working together on a fully open graphics compute stack. Sure, optimize it for the hardware you sell, why not, but then it's up to the "best" product instead of the one with the
magic software juice.
Not a shill. Don't like Nvidia. But, this drop-in replacement is more like a framework for a future fully compatible drop-in replacement than a fully functional one. It's like wine from two decades ago to windows -- you might get a few things to work...
After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs. One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today.
A serious question - when will nvidia stop selling their products and start asking for rent? Like 50 bucks a month is a 4070, your hardware can be a 4090 but thats a 100 a month. I give it a year
While there have been efforts by AMD over the years to make it easier to port codebases targeting NVIDIA's CUDA API to run atop HIP/ROCm, it still requires work on the part of developers.
The tooling has improved such as with HIPIFY to help in auto-generating but it isn't any simple, instant, and guaranteed solution -- especially if striving for optimal performance.
In practice for many real-world workloads, it's a solution for end-users to run CUDA-enabled software without any developer intervention.
Here is more information on this "skunkworks" project that is now available as open-source along with some of my own testing and performance benchmarks of this CUDA implementation built for Radeon GPUs.
For reasons unknown to me, AMD decided this year to discontinue funding the effort and not release it as any software product.
Andrzej Janik reached out and provided access to the new ZLUDA implementation for AMD ROCm to allow me to test it out and benchmark it in advance of today's planned public announcement.
The original article contains 617 words, the summary contains 167 words. Saved 73%. I'm a bot and I'm open source!