AI Usage & Quotas | Sylphx Documentation

Usage Analytics

Tokens & requests

Cost Tracking

Per-model breakdown

Spending Quotas

Monthly & daily limits

Rate Limits

Per-model throttling

Overview

The platform tracks all AI usage automatically, providing detailed analytics on tokens, requests, and costs. Set quotas to control spending and prevent unexpected bills.

Get Usage Statistics

Query your AI usage data:

import { platform } from '@/lib/platform'

// Get usage for the last 30 days
const usage = await platform.ai.getUsage({
  days: 30,
})

console.log(usage)
// {
//   totalRequests: 15420,
//   totalTokens: 2345678,
//   totalCost: 45.67,
//   byModel: {
//     'anthropic/claude-sonnet-4': { requests: 1200, tokens: 890000, cost: 32.50 },
//     'anthropic/claude-haiku-3.5': { requests: 14000, tokens: 1400000, cost: 12.00 },
//     'openai/text-embedding-3-small': { requests: 220, tokens: 55678, cost: 1.17 },
//   },
//   byDay: [
//     { date: '2024-01-15', requests: 520, tokens: 78000, cost: 1.52 },
//     { date: '2024-01-16', requests: 480, tokens: 72000, cost: 1.44 },
//     // ...
//   ],
// }

// Get usage for a specific user
const userUsage = await platform.ai.getUsage({
  userId: user.id,
  days: 30,
})

Per-Request Usage

Each AI response includes usage information:

const response = await platform.ai.chat({
  model: 'anthropic/claude-sonnet-4',
  messages: [...],
})

console.log(response.usage)
// {
//   promptTokens: 150,
//   completionTokens: 320,
//   totalTokens: 470,
// }

// Cost is tracked automatically by the platform
// View per-model costs in the dashboard

Set Quotas

Control AI spending with quotas:

Dashboard or API

// Set app-wide monthly quota
await platform.ai.setQuota({
  type: 'monthly',
  limit: 100, // $100 per month
  action: 'block', // 'block' or 'warn'
})

// Set per-user quota
await platform.ai.setUserQuota({
  userId: user.id,
  type: 'daily',
  limit: 5, // $5 per day per user
})

Configure quotas in the dashboard under App Settings → AI Services → Quotas.

Rate Limits

Default rate limits per model category:

Property	Type	Description
`Large models (Claude Sonnet 4, GPT-4o)`	`100 req/min`	80,000 tokens/min
`Fast models (Claude Haiku 3.5, Gemini 2.0 Flash)`	`500 req/min`	150,000 tokens/min
`Open-weight models (Llama 3.3, DeepSeek R1)`	`200 req/min`	100,000 tokens/min
`Embeddings`	`1000 req/min`	1,000,000 tokens/min
`Image generation`	`10 req/min`	N/A

Increase Limits

Contact support to increase rate limits for your app if you need higher throughput.

Handle Rate Limits

Handle rate limit errors gracefully:

try {
  const response = await platform.ai.chat({
    model: 'anthropic/claude-sonnet-4',
    messages: [...],
  })
} catch (error) {
  if (error.code === 'RATE_LIMIT_EXCEEDED') {
    // Wait and retry
    const retryAfter = error.retryAfter // Seconds to wait
    await new Promise(resolve => setTimeout(resolve, retryAfter * 1000))
    // Retry the request
  }

  if (error.code === 'QUOTA_EXCEEDED') {
    // User or app quota exceeded
    // Show upgrade prompt or wait until quota resets
  }
}