Most AI responses happen in one pass: the model reads your message and starts generating tokens immediately. Extended Thinking changes this. When enabled, Taipei 3.1 allocates a reasoning budget — a block of tokens it uses to work through the problem before forming its final answer.
Think of it as the model scratchpadding. It can explore dead ends, reconsider assumptions, and verify intermediate steps. The final response benefits from all of this invisible work.
When to use it
Extended Thinking shines on problems that require multi-step reasoning: complex debugging, mathematical proofs, architectural design decisions, research synthesis, or any question where the "obvious" first answer is probably wrong. If you're asking something where you'd expect a smart human to say "let me think about this for a moment," Extended Thinking is a good fit.
It's not useful for simple factual questions, short summaries, or casual conversation. In those cases, it adds latency without improving quality. The model still works correctly without it — Extended Thinking is a tool you deploy deliberately, not a setting you leave on all the time.
The performance trade-off
Extended Thinking increases time to first token by 2-4x, depending on problem complexity. On a task that normally takes 1.1 seconds to start streaming, Extended Thinking might take 3-5 seconds. The full response time is also longer.
The quality improvement more than justifies this on hard tasks. In our benchmarks, Extended Thinking improves accuracy on complex reasoning problems by 23% on average. For the hardest tasks in our suite — the ones where base Taipei 3.0 scored under 50% — the improvement was 41%.
How to enable it
Open the model selector in any chat and toggle "Extended Thinking" on. You'll see an amber indicator in the input bar while it's active. Extended Thinking is always active on Max mode, which uses Taipei 3.1 by default.
You can also control the reasoning token budget via the API (Coming Soon). Higher budgets allow deeper reasoning on the most complex problems, at correspondingly higher latency and cost.