Understanding Monthly Quota and Rate Limits

Sticky Calls uses TWO independent limiting systems that both apply to your API usage. Understanding the difference between them is critical for successful integration.

TL;DR

Monthly Quota (monthly budget) → 402 Payment Required when exhausted
Rate Limits (per-minute throttle) → 429 Too Many Requests when exceeded
Both systems apply - you can be blocked by either one

The Two Systems

1. Monthly Quota System

What it controls: How many API calls you can make per billing cycle

Error Code: 402 Payment Required
Counting: Each successful API call counts toward quota (/v1/calls/start and /v1/calls/end)
Grace Period: None - hard stop at 0 remaining calls
Reset: Monthly at billing cycle renewal
Tracking: Per billing account

2. Rate Limiting System (Per-Minute Throttle)

What it controls: How fast you can make API calls

Error Code: 429 Too Many Requests
Limit: Varies by tier (10-1,000 requests/minute)
Grace Period: Yes - 10% buffer before hard blocking
Reset: Every 60 seconds (rolling window)
Tracking: Per organization

Why Two Systems?

These systems serve different purposes:

System	Purpose	Prevents
Monthly Quota	Control monthly usage and costs	Budget overruns, surprise bills
Rate Limits	Prevent burst traffic abuse	DDoS attacks, API overload, infrastructure strain

Example Scenario: A customer with 10,000 API calls remaining but exceeding 100 req/min would be rate limited (429) even though they have quota remaining. Conversely, a customer at 0 remaining calls but making only 1 req/min would see 402 errors even though they're under the rate limit.

Monthly Quota System Deep Dive

How Monthly Quota Works

Every successful API call counts toward your monthly quota:

Monthly quota: 10,000
Call /v1/calls/start  → 9,999 calls remaining
Call /v1/calls/end    → 9,998 calls remaining

Failed requests (4xx, 5xx): Do NOT count toward quota Both test and production API keys: Count toward your quota

Monthly Quota by Tier

Tier	Monthly Quota	Cost
Starter	10,000 calls	$29/month
Growth	50,000 calls	$99/month
Scale	250,000 calls	$299/month

Quota Renewal

Your monthly quota automatically renews at the start of each billing cycle:

Renews on your subscription billing date
Unused quota: Does NOT roll over (use it or lose it)

When Quota Is Exhausted

When your remaining calls reach 0, all billable API calls return:

HTTP 402 Payment Required

{
  "error": "Payment Required",
  "message": "Monthly quota exhausted. Your account has 0 API calls remaining.",
  "quota_remaining": 0,
  "billing_account_id": "ba_1a2b3c4d5e6f..."
}

No grace period - The API immediately blocks at 0 remaining calls.

What to do:

Upgrade to a higher tier
Wait for monthly renewal

Monitoring Your Quota

Dashboard: View real-time quota balance at stickycalls.com/dashboard

API Response Headers (every successful call):

X-Quota-Remaining: 9,998
X-Quota-Limit: 10,000
X-Billing-Cycle-Ends: 2026-03-01T00:00:00Z

Rate Limiting System Deep Dive

Rate Limits by Tier

Rate limits control how fast you can make requests:

Tier	Base Limit	Grace Period (+10%)	Hard Block
Free	10 req/min	11 req/min	12+ req/min
Starter	100 req/min	110 req/min	111+ req/min
Growth	300 req/min	330 req/min	331+ req/min
Scale	1,000 req/min	1,100 req/min	1,101+ req/min

The 10% Grace Period

The rate limiter has a soft warning phase before hard blocking:

Example (Free tier - 10 req/min base limit):

Requests 1-10:   ✅ Normal operation
Requests 11:     ⚠️  Grace period - requests succeed but warning logged
Request 12+:     ❌ Hard block - 429 error returned

Purpose: Allow brief traffic bursts without immediate blocking. Gives you time to implement backoff logic.

When Rate Limit Is Exceeded

After exceeding the grace period (11 req/min for free tier), you get:

HTTP 429 Too Many Requests

{
  "error": "Too Many Requests",
  "message": "Rate limit exceeded for your tier. Limit: 10 requests/minute.",
  "retryAfter": 42,
  "currentUsage": 12,
  "limit": 10,
  "resetAt": "2026-02-08T12:01:00.000Z"
}

What to do:

Wait retryAfter seconds before retrying
Implement exponential backoff
Monitor X-RateLimit-Remaining header
Upgrade tier if you consistently hit limits

Rate Limit Headers

Every API response includes these headers:

X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1707393660000

Use these headers to proactively slow down requests before hitting the limit.

Rate Limit Window

Window duration: 60 seconds (rolling window)
Reset: Continuous - every second the oldest requests drop off
Not fixed: Unlike "requests per hour" where all resets happen at once, this is a rolling calculation

Example:

12:00:00 - Make 10 requests
12:00:30 - Make 1 more request → Grace period (11 total in last 60s)
12:01:01 - First 10 requests dropped from window → Back to 1 request/min

How the Systems Work Together

Scenario 1: Both Systems OK

Quota: 5,000 calls remaining
Rate: 50 requests in last minute (under 100 limit)
Result: ✅ 200 OK - Request succeeds

Scenario 2: Quota Exhausted, Rate Limit OK

Quota: 0 calls remaining
Rate: 50 requests in last minute (under 100 limit)
Result: ❌ 402 Payment Required - Blocked by quota

Even though you're under the rate limit, 0 remaining calls = hard stop.

Scenario 3: Quota OK, Rate Limit Exceeded

Quota: 5,000 calls remaining
Rate: 111 requests in last minute (over 100 limit + grace)
Result: ❌ 429 Too Many Requests - Blocked by rate limiter

Even though you have 5,000 calls remaining, exceeding rate limit = blocked.

Scenario 4: Both Exhausted

Quota: 0 calls remaining
Rate: 111 requests in last minute
Result: ❌ 402 Payment Required (quota checked first)

Quota is checked before rate limiting in the middleware stack, so you'll see 402 first.

Visual Decision Tree

┌─────────────────────┐
│  API Request Arrives │
└──────────┬──────────┘
           │
           ▼
   ┌───────────────┐
   │ Quota > 0?    │
   └───┬───────┬───┘
       │       │
      NO      YES
       │       │
       │       ▼
       │  ┌─────────────────────┐
       │  │ Rate limit exceeded? │
       │  │ (after grace period) │
       │  └────┬────────┬────────┘
       │       │        │
       │      YES       NO
       │       │        │
       ▼       ▼        ▼
   ┌─────┐ ┌─────┐  ┌────┐
   │ 402 │ │ 429 │  │200 │
   │Error│ │Error│  │ OK │
   └─────┘ └─────┘  └────┘

Monitoring Your Limits

Dashboard

View real-time usage at stickycalls.com/dashboard:

Quota Remaining: Current month's balance
Next Renewal: When quota refreshes
Current Tier: Rate limit and monthly quota allocation
Recent Usage: API call history

Response Headers

Monitor these headers in every response:

const response = await fetch('https://api.stickycalls.com/v1/calls/start', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: JSON.stringify(requestBody)
});

// Check quota
const quotaRemaining = response.headers.get('X-Quota-Remaining');
const quotaLimit = response.headers.get('X-Quota-Limit');

// Check rate limit
const rateLimitRemaining = response.headers.get('X-RateLimit-Remaining');
const rateLimitReset = response.headers.get('X-RateLimit-Reset');

console.log(`Quota: ${quotaRemaining}/${quotaLimit}`);
console.log(`Rate: ${rateLimitRemaining} requests remaining`);

Proactive Monitoring

Best practice: Alert yourself before hitting limits:

// Alert when quota is low
if (parseInt(quotaRemaining) < 100) {
  alertBillingTeam('Low quota - upgrade plan before exhaustion');
}

// Slow down when approaching rate limit
const rateLimitPercent =
  (rateLimitLimit - rateLimitRemaining) / rateLimitLimit * 100;

if (rateLimitPercent > 90) {
  console.warn('Approaching rate limit - implementing backoff');
  await sleep(1000); // Add delay
}

Best Practices

1. Monitor Both Systems

Don't just track one - both can block you:

function checkLimits(response) {
  const quota = parseInt(response.headers.get('X-Quota-Remaining'));
  const rateRemaining = parseInt(response.headers.get('X-RateLimit-Remaining'));

  return {
    quotaOK: quota > 100,
    rateOK: rateRemaining > 10,
    bothOK: quota > 100 && rateRemaining > 10
  };
}

2. Handle Both Error Codes

if (response.status === 402) {
  // Quota exhausted - notify billing team
  throw new Error('Upgrade Sticky Calls plan - monthly quota exhausted');
}

if (response.status === 429) {
  // Rate limited - implement backoff
  const retryAfter = response.data.retryAfter;
  await sleep(retryAfter * 1000);
  return retry();
}

3. Upgrade Proactively

Don't wait until you hit limits:

Quota running low? Upgrade before exhaustion
Consistently hitting rate limits? Upgrade tier for higher throughput
Unpredictable traffic? Choose a tier with headroom

4. Implement Exponential Backoff

For 429 errors, don't retry immediately:

async function callWithBackoff(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (error.status === 429 && i < maxRetries - 1) {
        const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
        await sleep(delay);
        continue;
      }
      throw error;
    }
  }
}

5. Spread Traffic Evenly

Avoid bursts that trigger rate limits:

// BAD: Fire 100 requests at once
for (const call of calls) {
  await makeAPICall(call); // All at once = rate limit
}

// GOOD: Add small delays
for (const call of calls) {
  await makeAPICall(call);
  await sleep(1000); // 1 req/second = 60 req/min (under Starter tier limit of 100/min)
}

Frequently Asked Questions

Do both systems apply simultaneously?

Yes. You must satisfy BOTH conditions:

Have monthly quota remaining (avoid 402)
Stay under rate limit (avoid 429)

Does the grace period apply to monthly quota?

No. The 10% grace period only applies to rate limiting (429). Monthly quota has a hard stop at 0 (402).

Do failed requests count toward quota?

No. Only successful requests (2xx status) count toward quota. If your request fails with 4xx or 5xx, it does not count.

Do test API keys count toward quota?

Yes. Both test and production API keys count toward your monthly quota. The "test" vs "production" label is just for organization in your dashboard.

Do rate limits reset monthly?

No. Rate limits use a 60-second rolling window and reset continuously. Monthly quota resets monthly.

Can I check limits without making an API call?

Yes. Use the /v1/health endpoint (no auth required, no quota usage):

curl https://api.stickycalls.com/v1/health

Returns {"status": "ok"} and includes rate limit headers (but not quota headers, as it's unauthenticated).

What happens during grace period?

Requests still succeed (200 OK), but:

A warning is logged server-side
You're using your 10% buffer
Next request after grace period will be hard blocked

Why separate systems instead of one?

Different purposes:

Monthly Quota: Financial control (prevents surprise bills)
Rate limits: Technical control (prevents system overload)

Combining them would mean either:

No burst protection (bad for infrastructure)
No monthly budget control (bad for customers)

Can I increase just one limit?

Yes. Upgrade your tier to increase BOTH:

Higher monthly quota (more total API calls)
Higher per-minute rate limit (faster throughput)

See pricing for tier details.

Error Handling Guide - Complete error code reference
API Reference - Endpoint documentation
Best Practices - Production integration patterns
Advanced Topics - Edge cases and design decisions

Need Help?

Documentation: docs.stickycalls.com
Dashboard: stickycalls.com/dashboard
Support: nate@bananaintelligence.ai

The Two Systems​

1. Monthly Quota System​

2. Rate Limiting System (Per-Minute Throttle)​

Why Two Systems?​

Monthly Quota System Deep Dive​

How Monthly Quota Works​

Monthly Quota by Tier​

Quota Renewal​

When Quota Is Exhausted​

Monitoring Your Quota​

Rate Limiting System Deep Dive​

Rate Limits by Tier​

The 10% Grace Period​

When Rate Limit Is Exceeded​

Rate Limit Headers​

Rate Limit Window​

How the Systems Work Together​

Scenario 1: Both Systems OK​

Scenario 2: Quota Exhausted, Rate Limit OK​

Scenario 3: Quota OK, Rate Limit Exceeded​

Scenario 4: Both Exhausted​

Visual Decision Tree​

Monitoring Your Limits​

Dashboard​

Response Headers​

Proactive Monitoring​

Best Practices​

1. Monitor Both Systems​

2. Handle Both Error Codes​

3. Upgrade Proactively​

4. Implement Exponential Backoff​

5. Spread Traffic Evenly​

Frequently Asked Questions​

Do both systems apply simultaneously?​

Does the grace period apply to monthly quota?​

Do failed requests count toward quota?​

Do test API keys count toward quota?​

Do rate limits reset monthly?​

Can I check limits without making an API call?​

What happens during grace period?​

Why separate systems instead of one?​

Can I increase just one limit?​

Related Documentation​

Need Help?​