AI API

The AI API is live in production. One HTTP endpoint, 8 models across 3 providers, USDC-credits billing with a 40% margin baked in at the endpoint layer. Smoke-tested 2026-05-20 against all 3 providers; full code-traced verification trail at the bottom of this page.

At a glance

8 models

Across Anthropic, OpenAI, Google

6 endpoints

chat, think, execute, models, usage, chat/stream

Pay per token

USDC credits, no monthly plan, no minimum

DEV sandbox

pk_dev_ returns canned responses, never calls the provider

Models

The model registry lives in chipi-back at src/ai/model-router.service.ts and is exposed live at GET /v1/ai/models (with the 40% margin already applied to the per-1M-token prices). Pull from the endpoint rather than hardcoding the table below — pricing can change without a docs deploy.

Anthropic
OpenAI
Google

ID	Display name	Use case
`haiku`	Claude Haiku 4.5	Fast, cheap default
`sonnet`	Claude Sonnet 4.5	Heavy reasoning, agents

ID	Display name	Use case
`gpt-4o-mini`	GPT-4o Mini	Cheapest in catalog
`gpt-4o`	GPT-4o	Multimodal, fast
`gpt-4.1-mini`	GPT-4.1 Mini	Long context, budget
`gpt-4.1`	GPT-4.1	Long context, premium

ID	Display name	Use case
`gemini-flash`	Gemini 2.5 Flash	Fast, low cost
`gemini-pro`	Gemini 2.5 Pro	Strong reasoning

If model is omitted from the request body, Chipi defaults to haiku (DEFAULT_MODEL in the same source file).

Endpoints

All endpoints live under /v1/ai/* on https://api.chipipay.com and are guarded by ApiKeyGuard, which accepts either:

Authorization: Bearer sk_prod_... (server-to-server), or
x-api-key: pk_prod_... (client-side, no secret leaked)

Method + path	Purpose
`POST /v1/ai/chat`	General-purpose chat completion
`POST /v1/ai/chat/stream`	Same as `/chat`, server-sent-event token stream
`POST /v1/ai/think`	DeFi-specialised decision: returns a structured JSON `{ action, confidence, ... }`
`POST /v1/ai/execute`	Build unsigned AVNU swap calldata (Starknet). Pure builder, doesn’t submit.
`GET /v1/ai/models`	Lists all 8 models with live margin-applied per-1M-token pricing
`GET /v1/ai/usage`	Your org’s call counts, costs, and breakdowns per model + endpoint

See the Quickstart for a working curl against /chat, Chat & Streaming for the SSE wire format, and DeFi Intelligence for the /think JSON schema.

How billing works

Every /v1/ai/chat and /v1/ai/chat/stream response includes a cost.charged field expressed in USD. That’s the amount debited from your org’s Chipi credits balance for the call. The math (from ai.controller.ts):

chargedAmount = (inputTokens × inputCostPer1M / 1_000_000 + outputTokens × outputCostPer1M / 1_000_000) × 1.4

inputCostPer1M / outputCostPer1M come from the model registry (per-provider wholesale).
1.4 is the MARGIN constant — Chipi adds 40% on top.
/execute charges a flat $0.002 per call regardless of size (AVNU API call, no token usage).

You see the wholesale-vs-charged split nowhere because we only expose the charged number. The 40% covers infra, paymaster gas for /execute, and platform margin.

DEV sandbox

If your API key starts with pk_dev_ or sk_dev_, the AI endpoints short-circuit to canned responses and never call the underlying provider.

Behaviour	DEV (`pk_dev_` / `sk_dev_`)	PROD (`pk_prod_` / `sk_prod_`)
Provider call	None	Real Anthropic / OpenAI / Google
Response shape	Identical to production	Real model output
`cost.charged`	`0`	Real per-token cost × 1.4 margin
`OrgBalance`	Untouched	Debited
`LedgerEntry` row	Synthetic `le-dev-*`	Real DB row
`sandbox: true` on response	Yes	Field absent
Rate limit + balance check	Skipped	Enforced

Use DEV keys to wire up the integration end-to-end against the real response shape, then swap the prefix to pk_prod_ / sk_prod_ to go live. Zero code changes.

Live smoke (2026-05-20)

Real production calls from dashboard.chipipay.com admin showing all three providers respond with sane costs. Each row is a single POST /v1/ai/chat with max_tokens: 20 to one phrase:

Model	Provider	Latency	Output	Charged (USD)
`haiku`	Anthropic	691 ms	”pong”	$0.000056
`gemini-flash`	Google	673 ms	”pong”	$0.000024
`gemini-pro`	Google	1072 ms	(truncated)	$0.000112
`gpt-4o-mini`	OpenAI	1870 ms	”Pong”	$0.000006

A Legaria-org dashboard pull on the same day showed 11 production calls (8 haiku, 1 each of the other three) totaling $0.0069 charged for 3.4K total tokens. Numbers match the per-call costs above to within Math.round.

Quirks worth knowing

OpenAI minimum output tokens is 16. The controller clamps max_tokens upward to config.minOutputTokens per model so a request with max_tokens: 10 against gpt-4o-mini doesn’t 400. Anthropic and Google accept any positive integer. Source: model-router.service.ts minOutputTokens field, applied via clampMaxTokens(...) in ai.controller.ts.
/think truncates if the model gets verbose. It hardcodes max_tokens: 1024 because the endpoint expects a single structured JSON decision, not free-form text. If you need more, use /chat with your own system prompt.
max_tokens is validated @IsInt() @Min(1) at the DTO layer. Fractional or non-positive values get a clean 400 before the request leaves Nest.
No webhook for AI events. Every response is the immediate request return; no async settlement, no ai.completion webhook. Different from Bills.

Where to go next

Quickstart — first /v1/ai/chat call in 30 seconds, no SDK.
Chat & Streaming — SSE format, chat/stream examples.
DeFi Intelligence — /think JSON schema, risk-param defaults.
Prices & Data — token-price + market-signal endpoints feeding /think.
Models — model-by-model pricing + RPM limits.

Gasless Wallets

Bill Payments

Gift Cards

Billing & Credits

AI API

At a glance

8 models

6 endpoints

Pay per token

DEV sandbox

Models

Endpoints

How billing works

DEV sandbox

Live smoke (2026-05-20)

Quirks worth knowing

Where to go next

Gasless Wallets

Bill Payments

AI API

Gift Cards

Billing & Credits

Documentation Index

​At a glance

8 models

6 endpoints

Pay per token

DEV sandbox

​Models

​Endpoints

​How billing works

​DEV sandbox

​Live smoke (2026-05-20)

​Quirks worth knowing

​Where to go next

At a glance

Models

Endpoints

How billing works

DEV sandbox

Live smoke (2026-05-20)

Quirks worth knowing

Where to go next