Product

The Layer Nobody Else Built: How PromptKing Maps Confidence Across AI Vendors

7 min read

Every AI FinOps platform shows a green "Connected" badge when an API key validates. That badge tells you the pipe works. It tells you nothing about what kind of truth flows through it.

The problem with "connected" AI tools

Enterprise AI spend today comes from vendors with radically different data maturity:

  • Anthropic exposes invoice-grade usage via Admin API — billing truth and behavior truth in one stream.
  • GitHub Copilot exposes seat billing and credit consumption — authoritative for seat economics.
  • Google Gemini (Workspace) exposes behavioral intelligence via Reports API — active users, last-used timestamps — but billing is a flat per-seat invoice. Behavioral truth without billing truth.
  • IBM Watsonx exposes IAM health and inference telemetry — you estimate RU cost from token data. Estimated economics, not invoice-matched.
  • Manual connectors (CSV import, seat counts entered by hand) — useful fallbacks with explicit manual labels.

When a dashboard sums these sources into one "total AI spend" number without grading each contribution, it creates a category error: the CFO sees a single figure that mixes invoice-grade and estimated data as if they were equivalent.

That is silent mixing. It is the most common failure mode in enterprise AI FinOps today.

A data philosophy, not a feature

PromptKing treats confidence as a first-class dimension — parallel to cost, not derived from it.

For Gemini, we ingest Workspace Reports API behavioral data: active users in 28 days, Ghost Seat candidates, last-used timestamps. We label cost as manual because the invoice is per-seat, not per-usage. The connector is Grade B: behavioral intelligence plus manual cost.

For Watsonx, we exchange IAM tokens, enumerate foundation models, and estimate daily RU cost from token volumes in inference logs. We label cost as estimated. The connector is Grade C: health telemetry plus estimated economics.

Neither connector is "broken." Both are honestly labeled. That honesty is the product.

The four grades — defined operationally

| Grade | Meaning | |-------|---------| | A | Authoritative billing + behavior — invoice-grade cost and observed usage | | B | Behavioral intelligence + manual cost — strong usage data, seat-level billing only | | C | Estimated economics + health — token-derived cost, API health confirmed | | D | Manual only / no telemetry — CSV import or unvalidated key |

Grades are written to connector config by daily crons. They are visible on every connector card. They never collapse into a single "connected" boolean.

The rule we enforce

CONFIDENCE_PROPAGATION_RULE:
A rollup inherits the lowest confidence grade of all contributing connectors.
This rule is immutable. Estimated and authoritative data must never be silently mixed.

When OCR aggregates cost from Anthropic (A) and Watsonx (C), the rollup inherits C. The ConfidenceBadge on the OCR headline reflects that inheritance. The data composition bar shows the authoritative / manual / estimated mix as percentages.

Freshness is a separate axis. A Grade A connector that hasn't synced in 72 hours degrades its effective grade for rollup computation (A→B→C→D) — but the source grade badge on the connector card stays A. Grade and freshness are never collapsed.

What "board-ready" actually means

A metric is board-ready only when all five conditions pass:

  1. billing_truth_source = authoritative
  2. behavior_truth_source is not none
  3. cost_confidence = authoritative (no silent mixing with estimated or manual)
  4. last_cron_sync within 48 hours
  5. effective_grade = A

When all five pass, the connector truth panel shows a green Board Ready pulse badge. When they don't, the panel shows exactly which condition failed — not a vague warning.

The six questions

Before your next board presentation, ask your AI FinOps platform:

  1. What grade is each connected vendor?
  2. What percentage of total spend is invoice-grade vs estimated vs manual?
  3. When did each connector last sync — and does staleness degrade downstream metrics?
  4. Does any rollup silently mix authoritative and estimated sources?
  5. Can you filter the CFO report to board-ready data only?
  6. If a connector degrades from A to B overnight, which KPIs change and by how much?

If your tool cannot answer all six, you are presenting numbers you cannot defend.

What's live and what's next

v3.54.0 — Connector Truth Panel: grade badge, billing/behavior/cost chips, health row, ConfidenceBadge on OCR and KPI cards.

v3.55.0 — Confidence Propagation Engine: CONFIDENCE_PROPAGATION_RULE, freshness degradation after 48h, board-ready certification, data_confidence on all usage upserts, simulator authoritative-only toggle.

v3.56.0 — Confidence-Aware OCR UI: source mix breakdown on OCR headline, board-ready filter on CFO report, grade + freshness shown as separate axes on connector cards, this insight published.

The confidence system is complete. Every metric has a grade. Every grade has a freshness state. Both are always visible, never collapsed.


Vendors expose cost. PromptKing exposes confidence.

You don't need perfect data. You need to know how much to trust it.

See your organization's AI spend data

PromptKing connects to your AI vendors and surfaces exactly this analysis — for your seats, your vendors, your budget.

← Back to Insights