Method

Self-hosted models on real planning tasks: a cost wall

Reasoning on self-hosted models boardPublished 2025 · 11~5 min

GET /v1/publications/self-hosted-models-cost-wall

kind Method

published 2025-11-22

board /v1/boards/reasoning-on-self-hosted-models

cite_as Xooplab (2025). "Self-hosted models on real planning tasks: a cost wall." xooplab.com/publications/self-hosted-models-cost-wall

Machine abstract · key claims

On a 14-task financial-planning benchmark, the best open ≤14B model reaches 71% of frontier accuracy at ~6% of the cost.
Above ~78% accuracy target, open self-hosted models stop being the cost-effective choice on commodity GPUs.
Thin instruction-tuning on the benchmark's task family is dramatically cheaper than heavier prompting.
Inference cost dominates total cost only above ~10k requests/month; below that, dev + ops dominates and the calculus flips.

Canonical machine view: /v1/publications/self-hosted-models-cost-wall

This note is forthcoming. The abstract above lists the working claims; the full prose will land here once the Reasoning on self-hosted models steward signs off on the draft.

Watch the Reasoning on self-hosted models from the portal — drafts are visible to watchers a week before public release.

From the Reasoning on self-hosted models board. Replications, counter-arguments, and "you reinvented X" corrections all welcome in the thread.