v0.2
LCOI calculator
Levelized cost of inference per million tokens. Pick a hardware, model, and region preset to see a plausible $/M-tokens figure split into GPU amortisation, electricity, and cooling. Every numeric field is overridable — presets only fill defaults.
Configuration
Market is $35–40k for SXM units; PCIe variants trade $25–30k. Using $35k as a defensible midpoint for the SXM form factor.
3,000 tok/s prefill / 310 tok/s decode on NVIDIA H100 SXM.
EIA Electricity Monthly Update (industrial forecast, 2025)
~15% inter-node overhead.
Share of wall-clock time the GPU is producing tokens.
26% of GPU time on prefill. Affects sessions/yr and cost/session — not the $/M token price.
23% of tokens by count. Changing token counts shifts cost/session, not $/M. See Advanced to move the per-token price.
Blended — cost allocated by GPU time (26% prefill / 74% decode).
Cost breakdown
| Component | $/yr | Share |
|---|---|---|
| GPU amortisation | $331,098 | 70.8% |
| Facility overhead | $65,190 | 13.9% |
| Electricity | $6,728 | 1.4% |
| Cooling | $2,691 | 0.6% |
| OpEx | $62,000 | 13.3% |
| Total | $467,707 | 100% |
OpEx: fixed $30,000 + marginal $32,000.
- Annual input tokens
- 395.6B
- Annual output tokens
- 118.7B
- Cost allocation in / out
- $119,826 / $347,881
- Combined tok/s / GPU
- 999
- Annual sessions
- 395.6M
- Cost per session
- $0.0012
- Utilisation peak / avg
- 60% / 60%
Related Resources: Read the main insights of this calculator and see assumptions for sources, dates, and methodology caveats.