Usage & Quotas

AI

Monitor AI service usage, track costs, and manage rate limits.

Usage Analytics

Tokens & requests

Cost Tracking

Per-model breakdown

Spending Quotas

Monthly & daily limits

Rate Limits

Per-model throttling

Overview

The platform tracks all AI usage automatically, providing detailed analytics on tokens, requests, and costs. Set quotas to control spending and prevent unexpected bills.

Get Usage Statistics

Query your AI usage data:

import { platform } from '@/lib/platform'

// Get usage for the last 30 days
const usage = await platform.ai.getUsage({
  days: 30,
})

console.log(usage)
// {
//   totalRequests: 15420,
//   totalTokens: 2345678,
//   totalCost: 45.67,
//   byModel: {
//     'anthropic/claude-3.5-sonnet': { requests: 1200, tokens: 890000, cost: 32.50 },
//     'anthropic/claude-3-haiku': { requests: 14000, tokens: 1400000, cost: 12.00 },
//     'openai/text-embedding-3-small': { requests: 220, tokens: 55678, cost: 1.17 },
//   },
//   byDay: [
//     { date: '2024-01-15', requests: 520, tokens: 78000, cost: 1.52 },
//     { date: '2024-01-16', requests: 480, tokens: 72000, cost: 1.44 },
//     // ...
//   ],
// }

// Get usage for a specific user
const userUsage = await platform.ai.getUsage({
  userId: user.id,
  days: 30,
})

Per-Request Usage

Each AI response includes usage information:

const response = await platform.ai.chat({
  model: 'anthropic/claude-3.5-sonnet',
  messages: [...],
})

console.log(response.usage)
// {
//   promptTokens: 150,
//   completionTokens: 320,
//   totalTokens: 470,
// }

// Cost is tracked automatically by the platform
// View per-model costs in the dashboard

Set Quotas

Control AI spending with quotas:

Dashboard or API
// Set app-wide monthly quota
await platform.ai.setQuota({
  type: 'monthly',
  limit: 100, // $100 per month
  action: 'block', // 'block' or 'warn'
})

// Set per-user quota
await platform.ai.setUserQuota({
  userId: user.id,
  type: 'daily',
  limit: 5, // $5 per day per user
})

Configure quotas in the dashboard under App Settings → AI Services → Quotas.

Rate Limits

Default rate limits per model category:

PropertyTypeDescription
Large models (Claude 3 Opus, etc.)50 req/min40,000 tokens/min
Medium models (Claude 3.5 Sonnet, Gemini Pro)100 req/min80,000 tokens/min
Fast models (Claude 3 Haiku, Gemini Flash)500 req/min150,000 tokens/min
Embeddings1000 req/min1,000,000 tokens/min
Image generation10 req/minN/A

Increase Limits

Contact support to increase rate limits for your app if you need higher throughput.

Handle Rate Limits

Handle rate limit errors gracefully:

try {
  const response = await platform.ai.chat({
    model: 'anthropic/claude-3.5-sonnet',
    messages: [...],
  })
} catch (error) {
  if (error.code === 'RATE_LIMIT_EXCEEDED') {
    // Wait and retry
    const retryAfter = error.retryAfter // Seconds to wait
    await new Promise(resolve => setTimeout(resolve, retryAfter * 1000))
    // Retry the request
  }

  if (error.code === 'QUOTA_EXCEEDED') {
    // User or app quota exceeded
    // Show upgrade prompt or wait until quota resets
  }
}

Cost Optimization

Tips to reduce AI costs:

Right-size models

Haiku/Flash for simple tasks

Limit tokens

Set maxTokens to cap responses

Cache responses

Avoid duplicate API calls

Batch embeddings

Multiple texts per request

Trim prompts

Remove unnecessary context

Stream early

Fail fast on bad outputs

Billing Tiers

Free

Limited monthly AI credits included

Pro

Higher credits + pay-as-you-go

Enterprise

Custom pricing & volume discounts

View detailed billing in the dashboard under Settings → Billing → AI Usage.