The Pulse Inference Token Index is a family of per-model series, not a single number — different models, quantizations, and jurisdictions are tracked separately so that a price comparison is always like-for-like. The card immediately below shows the v1.0 anchor: a single series chosen by methodology (§3.4 / audit §0.1) to be the headline figure used in homepage cards and external citations of "the Inference Token Index" as one number. The anchor for v1.0 is the Llama 3.3 70B FP8 US series — the most liquid, methodology-locked admissible series at launch. The full list of tracked series, including the anchor, appears in the table further down the page.

v1.0 anchor — Pulse Llama 3.3 70B FP8 Blended
Awaiting first published snapshot.

Host-by-host dispersion

Every point is one commodity-host endpoint on the pinned checkpoint for its model. X-axis groups by series; Y-axis is blended USD per million tokens (3:1 input:output). The spread within each series is the dispersion that motivated the index. Speed-tier specialists and hyperscaler-managed endpoints are excluded (methodology §3.1, §3.5).

Host-by-host observations are not yet exposed on this surface. The scatter populates with the next data refresh.

DeepSeek jurisdictional gap

The price difference between the US-hosted commodity basket (minimum) and the China-direct DeepSeek API on the V3.2 pinned checkpoint. Tracked as a time series in its own right; allowed to expand, contract, or cross zero — no fixed multiplier is asserted (methodology §3.8).

Jurisdictional-gap series populates once both US and China-direct sibling series have qualifying snapshots.

Cost in real terms

The same index value, expressed as cost-per-task and tokens-per-dollar against a reference series. Token-workload assumptions are pinned by the Pulse Inference Workload Spec v1.0; every figure carries the workload identifier it was computed against. Derived figures are explanatory translations of the indexed price, not standalone indices.

Loading…

Named-model series

Each row is a separate series. Llama 3.3 70B publishes two — FP8 and BF16 — because mixing quantizations into a single number would hide a real product difference. FP8 is the standard high-throughput quantization for production inference; BF16 preserves more numerical precision and is typically priced 2–3× higher. Methodology §3.4.

Series Model Quant Jurisdiction Tier Median Dispersion Hosts As of
Loading…

Cite this index

Pick a format and copy. Citations include the methodology version that produced the displayed values.

For comparison — proprietary frontier APIs

Not indexed

Proprietary inference APIs are excluded from the Pulse Inference Token Index by design (methodology §3.1) — only one vendor sells each endpoint, so there is no dispersion to measure. Posted rates from the current frontier proprietary families are listed below for reference only. These figures are not part of the index, do not carry a tier, and do not appear in the /api/indices feed or the citation widget above.

Family Model Provider Input $/Mtok Output $/Mtok Blended 3:1
Loading…

Posted-rate snapshot. Source date and provider page URLs are recorded in /data/proprietary-api-reference.json. These rates are not collected on the inference index pipeline and do not pass the §3.9 eligibility rubric — they cannot be cited as Pulse Inference Token Index figures.