Context-only reasoning in smaller models
Added 2026-04-27
Core concern
- Current LLM performance is strongly tied to scale because many useful facts are implicitly stored in model parameters (compressed memory).
- This helps benchmarks and enables reasoning with very little prompt context.
- But parametric memory is hard to trust fully: we cannot always verify that retrieved facts are correct.
Research direction
- Keep reasoning quality while reducing dependence on memorized world knowledge.
- Train models toward context-only reasoning: reason from provided evidence and abstain when evidence is insufficient.
Key question
Can post-training (SFT/RL) teach smaller models a policy to only use context, avoid unsupported conclusions, and say “insufficient evidence” when needed?
Hypothesis
We may be able to distill “reasoning code” from larger models into smaller ones via supervision, while explicitly constraining factual grounding to context. Open question: how separable are reasoning skills from language and memory skills?
Next lead
We already have signals that post-training can induce “thinking” behavior in SLMs. Next step is extending that toward faithful, context-grounded reasoning with explicit objectives.