Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.chipipay.com/llms.txt

Use this file to discover all available pages before exploring further.

The AI API is live in production. One HTTP endpoint, 8 models across 3 providers, USDC-credits billing with a 40% margin baked in at the endpoint layer. Smoke-tested 2026-05-20 against all 3 providers; full code-traced verification trail at the bottom of this page.

At a glance

8 models

Across Anthropic, OpenAI, Google

6 endpoints

chat, think, execute, models, usage, chat/stream

Pay per token

USDC credits, no monthly plan, no minimum

DEV sandbox

pk_dev_ returns canned responses, never calls the provider

Models

The model registry lives in chipi-back at src/ai/model-router.service.ts and is exposed live at GET /v1/ai/models (with the 40% margin already applied to the per-1M-token prices). Pull from the endpoint rather than hardcoding the table below — pricing can change without a docs deploy.
IDDisplay nameUse case
haikuClaude Haiku 4.5Fast, cheap default
sonnetClaude Sonnet 4.5Heavy reasoning, agents
If model is omitted from the request body, Chipi defaults to haiku (DEFAULT_MODEL in the same source file).

Endpoints

All endpoints live under /v1/ai/* on https://api.chipipay.com and are guarded by ApiKeyGuard, which accepts either:
  • Authorization: Bearer sk_prod_... (server-to-server), or
  • x-api-key: pk_prod_... (client-side, no secret leaked)
Method + pathPurpose
POST /v1/ai/chatGeneral-purpose chat completion
POST /v1/ai/chat/streamSame as /chat, server-sent-event token stream
POST /v1/ai/thinkDeFi-specialised decision: returns a structured JSON { action, confidence, ... }
POST /v1/ai/executeBuild unsigned AVNU swap calldata (Starknet). Pure builder, doesn’t submit.
GET /v1/ai/modelsLists all 8 models with live margin-applied per-1M-token pricing
GET /v1/ai/usageYour org’s call counts, costs, and breakdowns per model + endpoint
See the Quickstart for a working curl against /chat, Chat & Streaming for the SSE wire format, and DeFi Intelligence for the /think JSON schema.

How billing works

Every /v1/ai/chat and /v1/ai/chat/stream response includes a cost.charged field expressed in USD. That’s the amount debited from your org’s Chipi credits balance for the call. The math (from ai.controller.ts):
chargedAmount = (inputTokens × inputCostPer1M / 1_000_000 + outputTokens × outputCostPer1M / 1_000_000) × 1.4
  • inputCostPer1M / outputCostPer1M come from the model registry (per-provider wholesale).
  • 1.4 is the MARGIN constant — Chipi adds 40% on top.
  • /execute charges a flat $0.002 per call regardless of size (AVNU API call, no token usage).
You see the wholesale-vs-charged split nowhere because we only expose the charged number. The 40% covers infra, paymaster gas for /execute, and platform margin.

DEV sandbox

If your API key starts with pk_dev_ or sk_dev_, the AI endpoints short-circuit to canned responses and never call the underlying provider.
BehaviourDEV (pk_dev_* / sk_dev_*)PROD (pk_prod_* / sk_prod_*)
Provider callNoneReal Anthropic / OpenAI / Google
Response shapeIdentical to productionReal model output
cost.charged0Real per-token cost × 1.4 margin
OrgBalanceUntouchedDebited
LedgerEntry rowSynthetic le-dev-*Real DB row
sandbox: true on responseYesField absent
Rate limit + balance checkSkippedEnforced
Use DEV keys to wire up the integration end-to-end against the real response shape, then swap the prefix to pk_prod_ / sk_prod_ to go live. Zero code changes.

Live smoke (2026-05-20)

Real production calls from dashboard.chipipay.com admin showing all three providers respond with sane costs. Each row is a single POST /v1/ai/chat with max_tokens: 20 to one phrase:
ModelProviderLatencyOutputCharged (USD)
haikuAnthropic691 ms”pong”$0.000056
gemini-flashGoogle673 ms”pong”$0.000024
gemini-proGoogle1072 ms(truncated)$0.000112
gpt-4o-miniOpenAI1870 ms”Pong”$0.000006
A Legaria-org dashboard pull on the same day showed 11 production calls (8 haiku, 1 each of the other three) totaling $0.0069 charged for 3.4K total tokens. Numbers match the per-call costs above to within Math.round.

Quirks worth knowing

  1. OpenAI minimum output tokens is 16. The controller clamps max_tokens upward to config.minOutputTokens per model so a request with max_tokens: 10 against gpt-4o-mini doesn’t 400. Anthropic and Google accept any positive integer. Source: model-router.service.ts minOutputTokens field, applied via clampMaxTokens(...) in ai.controller.ts.
  2. /think truncates if the model gets verbose. It hardcodes max_tokens: 1024 because the endpoint expects a single structured JSON decision, not free-form text. If you need more, use /chat with your own system prompt.
  3. max_tokens is validated @IsInt() @Min(1) at the DTO layer. Fractional or non-positive values get a clean 400 before the request leaves Nest.
  4. No webhook for AI events. Every response is the immediate request return; no async settlement, no ai.completion webhook. Different from Bills.

Where to go next