5d ago

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io QwQ-32B: Embracing the Power of Reinforcement Learning

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For in...

Artificial Intelligence @lemmy.sdf.org

Tea @programming.dev

4d ago

Alibaba releases QwQ-32B, an open-source reasoning model, on Hugging Face and ModelScope, claiming performance similar to DeepSeek-R1 with lower compute needs.

qwenlm.github.io /blog/qwq-32b/

9 0

Technology @lemmy.ml

☆ Yσɠƚԋσʂ ☆ @lemmy.ml

5d ago

QwQ-32B is a 32 billion parameter language model achieves comparable performance to DeepSeek-R1 with 671 billion parameters, using reinforcement learning for scaling

qwenlm.github.io /blog/qwq-32b/

7 10

3 comments

Isnt deepseek based on qwen? at least the distilled models?
- I think so, but this looks like an update of qwen with some new tricks.

can grab it here
https://ollama.com/library/qwq:32b
https://huggingface.co/Qwen/QwQ-32B
I find it absolutely wild how quickly we went from needing a full blown data centre to run models of this scale to being able to run them on a laptop.