I'm not talking about the speed I'm talking about the quality of output. I don't think you understand how these models are transformed into "uncensored models" but a lot of the time using abliteration messed then up.
Running an uncensored deepseek model that doesn't perform significantly worse than the regular deepseek models? I know how to dl and run models, I haven't seen an uncensored deepseek model that performs as well as the baseline deepseek model
Every python library that does any real work is optimized to hell. The amount of power any ML training takes has nothing to do with python. You could write it all in assembly and I doubt you would gain more than 0.01% more efficiency and it would be way more difficult to develop and test new methods, and it would be way more error prone and likely waste more energy because of that.
Same, I've got a modded 2080ti with 22gb of vram running deepseek 32b and it's great... But it's an old card, and with it being modded idk what the life expectancy is.
If AMD was smart they would release an upper-mid range card with like 40+ gb of vram. Doesn't event have to be their high end card, people wanting to do local/self serve AI stuff would swarm on those.
The code is open source, people look at the code, I've dug through their code a fair bit. It wouldnt be quiet, and it would take major code rewrites, it would be pretty obvious
Agreed, and thanks for adding the background. Looks like someone abliterated deepseek already trying to make it "uncensored" but it sounds the process ruined the model.
Try endeavoros, it's an opinionated arch with a simple installer