← Publications
Method

Self-hosted models on real planning tasks: a cost wall

Reasoning on self-hosted models boardElena VolkovaPublished 2025 · 11~5 min
GET   /v1/publications/self-hosted-models-cost-wall
kind   Method
published   2025-11-22
board   /v1/boards/reasoning-on-self-hosted-models
author   Elena Volkova
cite_as   Xooplab (2025). "Self-hosted models on real planning tasks: a cost wall." xooplab.com/publications/self-hosted-models-cost-wall
Machine abstract · key claims
  1. On a 14-task financial-planning benchmark, the best open ≤14B model reaches 71% of frontier accuracy at ~6% of the cost.
  2. Above ~78% accuracy target, open self-hosted models stop being the cost-effective choice on commodity GPUs.
  3. Thin instruction-tuning on the benchmark's task family is dramatically cheaper than heavier prompting.
  4. Inference cost dominates total cost only above ~10k requests/month; below that, dev + ops dominates and the calculus flips.

This note is forthcoming. The abstract above lists the working claims; the full prose will land here once the Reasoning on self-hosted models steward signs off on the draft.

Watch the Reasoning on self-hosted models from the portal — drafts are visible to watchers a week before public release.

From the Reasoning on self-hosted models board. Replications, counter-arguments, and "you reinvented X" corrections all welcome in the thread.