Skip Navigation

LocalLLaMA @sh.itjust.works ylai @lemmy.ml 7 mo. ago

LLaMA Now Goes Faster on CPUs

justine.lol LLaMA Now Goes Faster on CPUs

I wrote 84 new matmul kernels to improve llamafile CPU performance.

LLaMA Now Goes Faster on CPUs

Hacker News @lemmy.smeargle.fans bot @lemmy.smeargle.fans

7 mo. ago

LLaMA now goes faster on CPUs

justine.lol /matmul/

AI Companions @lemmy.world pavnilschanda @lemmy.world 7 mo. ago

[News] LLaMA Now Goes Faster on CPUs

justine.lol /matmul/

Performance @programming.dev agilob @programming.dev 7 mo. ago

LLaMA Now Goes Faster on CPUs

justine.lol /matmul/

1 comments

Very nice speedups for people running CPU inference on supported hardware, but unfortunately does not help CPU+GPU split according to comment on one of the PRs.. That person says that for prompt evaluation, where these kernels would make a difference, llama.cpp performs all the calculations on the GPU. And during token generation it is IO-bound, so the faster CPU calculation becomes negligible.