Documentation Index
Fetch the complete documentation index at: https://docs.chipipay.com/llms.txt
Use this file to discover all available pages before exploring further.
The AI API is live in production. One HTTP endpoint, 8 models across 3 providers, USDC-credits billing with a 40% margin baked in at the endpoint layer. Smoke-tested 2026-05-20 against all 3 providers; full code-traced verification trail at the bottom of this page.
At a glance
8 models
Across Anthropic, OpenAI, Google
6 endpoints
chat, think, execute, models, usage, chat/stream
Pay per token
USDC credits, no monthly plan, no minimum
DEV sandbox
pk_dev_ returns canned responses, never calls the providerModels
The model registry lives in chipi-back atsrc/ai/model-router.service.ts and is exposed live at GET /v1/ai/models (with the 40% margin already applied to the per-1M-token prices). Pull from the endpoint rather than hardcoding the table below — pricing can change without a docs deploy.
- Anthropic
- OpenAI
- Google
| ID | Display name | Use case |
|---|---|---|
haiku | Claude Haiku 4.5 | Fast, cheap default |
sonnet | Claude Sonnet 4.5 | Heavy reasoning, agents |
model is omitted from the request body, Chipi defaults to haiku (DEFAULT_MODEL in the same source file).
Endpoints
All endpoints live under/v1/ai/* on https://api.chipipay.com and are guarded by ApiKeyGuard, which accepts either:
Authorization: Bearer sk_prod_...(server-to-server), orx-api-key: pk_prod_...(client-side, no secret leaked)
| Method + path | Purpose |
|---|---|
POST /v1/ai/chat | General-purpose chat completion |
POST /v1/ai/chat/stream | Same as /chat, server-sent-event token stream |
POST /v1/ai/think | DeFi-specialised decision: returns a structured JSON { action, confidence, ... } |
POST /v1/ai/execute | Build unsigned AVNU swap calldata (Starknet). Pure builder, doesn’t submit. |
GET /v1/ai/models | Lists all 8 models with live margin-applied per-1M-token pricing |
GET /v1/ai/usage | Your org’s call counts, costs, and breakdowns per model + endpoint |
curl against /chat, Chat & Streaming for the SSE wire format, and DeFi Intelligence for the /think JSON schema.
How billing works
Every/v1/ai/chat and /v1/ai/chat/stream response includes a cost.charged field expressed in USD. That’s the amount debited from your org’s Chipi credits balance for the call.
The math (from ai.controller.ts):
inputCostPer1M/outputCostPer1Mcome from the model registry (per-provider wholesale).1.4is theMARGINconstant — Chipi adds 40% on top./executecharges a flat$0.002per call regardless of size (AVNU API call, no token usage).
/execute, and platform margin.
DEV sandbox
If your API key starts withpk_dev_ or sk_dev_, the AI endpoints short-circuit to canned responses and never call the underlying provider.
| Behaviour | DEV (pk_dev_* / sk_dev_*) | PROD (pk_prod_* / sk_prod_*) |
|---|---|---|
| Provider call | None | Real Anthropic / OpenAI / Google |
| Response shape | Identical to production | Real model output |
cost.charged | 0 | Real per-token cost × 1.4 margin |
OrgBalance | Untouched | Debited |
LedgerEntry row | Synthetic le-dev-* | Real DB row |
sandbox: true on response | Yes | Field absent |
| Rate limit + balance check | Skipped | Enforced |
pk_prod_ / sk_prod_ to go live. Zero code changes.
Live smoke (2026-05-20)
Real production calls fromdashboard.chipipay.com admin showing all three providers respond with sane costs. Each row is a single POST /v1/ai/chat with max_tokens: 20 to one phrase:
| Model | Provider | Latency | Output | Charged (USD) |
|---|---|---|---|---|
haiku | Anthropic | 691 ms | ”pong” | $0.000056 |
gemini-flash | 673 ms | ”pong” | $0.000024 | |
gemini-pro | 1072 ms | (truncated) | $0.000112 | |
gpt-4o-mini | OpenAI | 1870 ms | ”Pong” | $0.000006 |
Quirks worth knowing
- OpenAI minimum output tokens is 16. The controller clamps
max_tokensupward toconfig.minOutputTokensper model so a request withmax_tokens: 10againstgpt-4o-minidoesn’t 400. Anthropic and Google accept any positive integer. Source:model-router.service.tsminOutputTokensfield, applied viaclampMaxTokens(...)inai.controller.ts. /thinktruncates if the model gets verbose. It hardcodesmax_tokens: 1024because the endpoint expects a single structured JSON decision, not free-form text. If you need more, use/chatwith your own system prompt.max_tokensis validated@IsInt() @Min(1)at the DTO layer. Fractional or non-positive values get a clean 400 before the request leaves Nest.- No webhook for AI events. Every response is the immediate request return; no async settlement, no
ai.completionwebhook. Different from Bills.
Where to go next
- Quickstart — first
/v1/ai/chatcall in 30 seconds, no SDK. - Chat & Streaming — SSE format,
chat/streamexamples. - DeFi Intelligence —
/thinkJSON schema, risk-param defaults. - Prices & Data — token-price + market-signal endpoints feeding
/think. - Models — model-by-model pricing + RPM limits.
