ORCAI · Public Notes

Context-only reasoning in smaller models

Added 2026-04-27

Core concern

Current LLM performance is strongly tied to scale because many useful facts are implicitly stored in model parameters (compressed memory).
This helps benchmarks and enables reasoning with very little prompt context.
But parametric memory is hard to trust fully: we cannot always verify that retrieved facts are correct.

Research direction

Keep reasoning quality while reducing dependence on memorized world knowledge.
Train models toward context-only reasoning: reason from provided evidence and abstain when evidence is insufficient.

Key question

Can post-training (SFT/RL) teach smaller models a policy to only use context, avoid unsupported conclusions, and say “insufficient evidence” when needed?

Hypothesis

We may be able to distill “reasoning code” from larger models into smaller ones via supervision, while explicitly constraining factual grounding to context. Open question: how separable are reasoning skills from language and memory skills?

Next lead

We already have signals that post-training can induce “thinking” behavior in SLMs. Next step is extending that toward faithful, context-grounded reasoning with explicit objectives.

Build a harness for in-context reasoning with efficient SLMs

Added 2026-04-27

We want to start by building a practical harness that pushes small language models to reason from context, use tokens efficiently, and abstain when the context does not support a conclusion. We can prototype this openly at orcai.eu/chat.

Reading links

Updated 2026-04-27

ORCAI Public Notes