21h ago

Zoom researchers detail a “chain of draft” method to let LLMs accurately solve reasoning problems with as little as 7.6% of the tokens used by current methods.

arxiv.org Chain of Draft: Thinking Faster by Writing Less

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermedia...

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks.

LocalLLaMA @sh.itjust.works

morrowind @lemm.ee

6d ago

Chain of Draft: Thinking Faster by Writing Less

arxiv.org /abs/2502.18600

8 0

2 comments

    
Answer the question directly. Do not return any
preamble, explanation, or reasoning.

    
Chain-of-Thought
Think step by step to answer the following question.
Return the answer at the end of the response after a
separator ####.

    
Chain-of-Draft
Think step by step, but only keep a minimum draft for
each thinking step, with 5 words at most. Return the
answer at the end of the response after a separator
 ####.

Thats interesting. Good tip.

https://arxiv.org/pdf/2502.18600

Looking at their repo, they’ve tested this with LLM models that have not been trained to generate chain of thought outputs, by varying the system prompts. It’s therefore more of a proof of concept, but I can imagine that if you train a model to do this natively it could work.
Using the same prompt with QwQ made no difference for me (the chain of thought was still very long and quite verbose), while using it with Qwen2.5 Coder made the output extremely terse and not very useful for open-ended questions.