Understanding Monthly Quota and Rate Limits
Sticky Calls uses TWO independent limiting systems that both apply to your API usage. Understanding the difference between them is critical for successful integration.
- Monthly Quota (monthly budget) →
402 Payment Requiredwhen exhausted - Rate Limits (per-minute throttle) →
429 Too Many Requestswhen exceeded - Both systems apply - you can be blocked by either one
The Two Systems
1. Monthly Quota System
What it controls: How many API calls you can make per billing cycle
- Error Code:
402 Payment Required - Counting: Each successful API call counts toward quota (
/v1/calls/startand/v1/calls/end) - Grace Period: None - hard stop at 0 remaining calls
- Reset: Monthly at billing cycle renewal
- Tracking: Per billing account
2. Rate Limiting System (Per-Minute Throttle)
What it controls: How fast you can make API calls
- Error Code:
429 Too Many Requests - Limit: Varies by tier (10-1,000 requests/minute)
- Grace Period: Yes - 10% buffer before hard blocking
- Reset: Every 60 seconds (rolling window)
- Tracking: Per organization
Why Two Systems?
These systems serve different purposes:
| System | Purpose | Prevents |
|---|---|---|
| Monthly Quota | Control monthly usage and costs | Budget overruns, surprise bills |
| Rate Limits | Prevent burst traffic abuse | DDoS attacks, API overload, infrastructure strain |
Example Scenario: A customer with 10,000 API calls remaining but exceeding 100 req/min would be rate limited (429) even though they have quota remaining. Conversely, a customer at 0 remaining calls but making only 1 req/min would see 402 errors even though they're under the rate limit.
Monthly Quota System Deep Dive
How Monthly Quota Works
Every successful API call counts toward your monthly quota:
Monthly quota: 10,000
Call /v1/calls/start → 9,999 calls remaining
Call /v1/calls/end → 9,998 calls remaining
Failed requests (4xx, 5xx): Do NOT count toward quota Both test and production API keys: Count toward your quota
Monthly Quota by Tier
| Tier | Monthly Quota | Cost |
|---|---|---|
| Starter | 10,000 calls | $29/month |
| Growth | 50,000 calls | $99/month |
| Scale | 250,000 calls | $299/month |
Quota Renewal
Your monthly quota automatically renews at the start of each billing cycle:
- Renews on your subscription billing date
- Unused quota: Does NOT roll over (use it or lose it)
When Quota Is Exhausted
When your remaining calls reach 0, all billable API calls return:
HTTP 402 Payment Required
{
"error": "Payment Required",
"message": "Monthly quota exhausted. Your account has 0 API calls remaining.",
"quota_remaining": 0,
"billing_account_id": "ba_1a2b3c4d5e6f..."
}
No grace period - The API immediately blocks at 0 remaining calls.
What to do:
- Upgrade to a higher tier
- Wait for monthly renewal
Monitoring Your Quota
Dashboard: View real-time quota balance at stickycalls.com/dashboard
API Response Headers (every successful call):
X-Quota-Remaining: 9,998
X-Quota-Limit: 10,000
X-Billing-Cycle-Ends: 2026-03-01T00:00:00Z
Rate Limiting System Deep Dive
Rate Limits by Tier
Rate limits control how fast you can make requests:
| Tier | Base Limit | Grace Period (+10%) | Hard Block |
|---|---|---|---|
| Free | 10 req/min | 11 req/min | 12+ req/min |
| Starter | 100 req/min | 110 req/min | 111+ req/min |
| Growth | 300 req/min | 330 req/min | 331+ req/min |
| Scale | 1,000 req/min | 1,100 req/min | 1,101+ req/min |
The 10% Grace Period
The rate limiter has a soft warning phase before hard blocking:
Example (Free tier - 10 req/min base limit):
Requests 1-10: ✅ Normal operation
Requests 11: ⚠️ Grace period - requests succeed but warning logged
Request 12+: ❌ Hard block - 429 error returned
Purpose: Allow brief traffic bursts without immediate blocking. Gives you time to implement backoff logic.
When Rate Limit Is Exceeded
After exceeding the grace period (11 req/min for free tier), you get:
HTTP 429 Too Many Requests
{
"error": "Too Many Requests",
"message": "Rate limit exceeded for your tier. Limit: 10 requests/minute.",
"retryAfter": 42,
"currentUsage": 12,
"limit": 10,
"resetAt": "2026-02-08T12:01:00.000Z"
}
What to do:
- Wait
retryAfterseconds before retrying - Implement exponential backoff
- Monitor
X-RateLimit-Remainingheader - Upgrade tier if you consistently hit limits
Rate Limit Headers
Every API response includes these headers:
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1707393660000
Use these headers to proactively slow down requests before hitting the limit.
Rate Limit Window
- Window duration: 60 seconds (rolling window)
- Reset: Continuous - every second the oldest requests drop off
- Not fixed: Unlike "requests per hour" where all resets happen at once, this is a rolling calculation
Example:
- 12:00:00 - Make 10 requests
- 12:00:30 - Make 1 more request → Grace period (11 total in last 60s)
- 12:01:01 - First 10 requests dropped from window → Back to 1 request/min
How the Systems Work Together
Scenario 1: Both Systems OK
Quota: 5,000 calls remaining
Rate: 50 requests in last minute (under 100 limit)
Result: ✅ 200 OK - Request succeeds
Scenario 2: Quota Exhausted, Rate Limit OK
Quota: 0 calls remaining
Rate: 50 requests in last minute (under 100 limit)
Result: ❌ 402 Payment Required - Blocked by quota
Even though you're under the rate limit, 0 remaining calls = hard stop.
Scenario 3: Quota OK, Rate Limit Exceeded
Quota: 5,000 calls remaining
Rate: 111 requests in last minute (over 100 limit + grace)
Result: ❌ 429 Too Many Requests - Blocked by rate limiter
Even though you have 5,000 calls remaining, exceeding rate limit = blocked.
Scenario 4: Both Exhausted
Quota: 0 calls remaining
Rate: 111 requests in last minute
Result: ❌ 402 Payment Required (quota checked first)
Quota is checked before rate limiting in the middleware stack, so you'll see 402 first.
Visual Decision Tree
┌─────────────────────┐
│ API Request Arrives │
└──────────┬──────────┘
│
▼
┌───────────────┐
│ Quota > 0? │
└───┬───────┬───┘
│ │
NO YES
│ │
│ ▼
│ ┌─────────────────────┐
│ │ Rate limit exceeded? │
│ │ (after grace period) │
│ └────┬────────┬────────┘
│ │ │
│ YES NO
│ │ │
▼ ▼ ▼
┌─────┐ ┌─────┐ ┌────┐
│ 402 │ │ 429 │ │200 │
│Error│ │Error│ │ OK │
└─────┘ └─────┘ └────┘
Monitoring Your Limits
Dashboard
View real-time usage at stickycalls.com/dashboard:
- Quota Remaining: Current month's balance
- Next Renewal: When quota refreshes
- Current Tier: Rate limit and monthly quota allocation
- Recent Usage: API call history
Response Headers
Monitor these headers in every response:
const response = await fetch('https://api.stickycalls.com/v1/calls/start', {
method: 'POST',
headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
body: JSON.stringify(requestBody)
});
// Check quota
const quotaRemaining = response.headers.get('X-Quota-Remaining');
const quotaLimit = response.headers.get('X-Quota-Limit');
// Check rate limit
const rateLimitRemaining = response.headers.get('X-RateLimit-Remaining');
const rateLimitReset = response.headers.get('X-RateLimit-Reset');
console.log(`Quota: ${quotaRemaining}/${quotaLimit}`);
console.log(`Rate: ${rateLimitRemaining} requests remaining`);
Proactive Monitoring
Best practice: Alert yourself before hitting limits:
// Alert when quota is low
if (parseInt(quotaRemaining) < 100) {
alertBillingTeam('Low quota - upgrade plan before exhaustion');
}
// Slow down when approaching rate limit
const rateLimitPercent =
(rateLimitLimit - rateLimitRemaining) / rateLimitLimit * 100;
if (rateLimitPercent > 90) {
console.warn('Approaching rate limit - implementing backoff');
await sleep(1000); // Add delay
}
Best Practices
1. Monitor Both Systems
Don't just track one - both can block you:
function checkLimits(response) {
const quota = parseInt(response.headers.get('X-Quota-Remaining'));
const rateRemaining = parseInt(response.headers.get('X-RateLimit-Remaining'));
return {
quotaOK: quota > 100,
rateOK: rateRemaining > 10,
bothOK: quota > 100 && rateRemaining > 10
};
}
2. Handle Both Error Codes
if (response.status === 402) {
// Quota exhausted - notify billing team
throw new Error('Upgrade Sticky Calls plan - monthly quota exhausted');
}
if (response.status === 429) {
// Rate limited - implement backoff
const retryAfter = response.data.retryAfter;
await sleep(retryAfter * 1000);
return retry();
}
3. Upgrade Proactively
Don't wait until you hit limits:
- Quota running low? Upgrade before exhaustion
- Consistently hitting rate limits? Upgrade tier for higher throughput
- Unpredictable traffic? Choose a tier with headroom
4. Implement Exponential Backoff
For 429 errors, don't retry immediately:
async function callWithBackoff(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (error.status === 429 && i < maxRetries - 1) {
const delay = Math.pow(2, i) * 1000; // 1s, 2s, 4s
await sleep(delay);
continue;
}
throw error;
}
}
}
5. Spread Traffic Evenly
Avoid bursts that trigger rate limits:
// BAD: Fire 100 requests at once
for (const call of calls) {
await makeAPICall(call); // All at once = rate limit
}
// GOOD: Add small delays
for (const call of calls) {
await makeAPICall(call);
await sleep(1000); // 1 req/second = 60 req/min (under Starter tier limit of 100/min)
}
Frequently Asked Questions
Do both systems apply simultaneously?
Yes. You must satisfy BOTH conditions:
- Have monthly quota remaining (avoid 402)
- Stay under rate limit (avoid 429)
Does the grace period apply to monthly quota?
No. The 10% grace period only applies to rate limiting (429). Monthly quota has a hard stop at 0 (402).
Do failed requests count toward quota?
No. Only successful requests (2xx status) count toward quota. If your request fails with 4xx or 5xx, it does not count.
Do test API keys count toward quota?
Yes. Both test and production API keys count toward your monthly quota. The "test" vs "production" label is just for organization in your dashboard.
Do rate limits reset monthly?
No. Rate limits use a 60-second rolling window and reset continuously. Monthly quota resets monthly.
Can I check limits without making an API call?
Yes. Use the /v1/health endpoint (no auth required, no quota usage):
curl https://api.stickycalls.com/v1/health
Returns {"status": "ok"} and includes rate limit headers (but not quota headers, as it's unauthenticated).
What happens during grace period?
Requests still succeed (200 OK), but:
- A warning is logged server-side
- You're using your 10% buffer
- Next request after grace period will be hard blocked
Why separate systems instead of one?
Different purposes:
- Monthly Quota: Financial control (prevents surprise bills)
- Rate limits: Technical control (prevents system overload)
Combining them would mean either:
- No burst protection (bad for infrastructure)
- No monthly budget control (bad for customers)
Can I increase just one limit?
Yes. Upgrade your tier to increase BOTH:
- Higher monthly quota (more total API calls)
- Higher per-minute rate limit (faster throughput)
See pricing for tier details.
Related Documentation
- Error Handling Guide - Complete error code reference
- API Reference - Endpoint documentation
- Best Practices - Production integration patterns
- Advanced Topics - Edge cases and design decisions
Need Help?
- Documentation: docs.stickycalls.com
- Dashboard: stickycalls.com/dashboard
- Support: nate@bananaintelligence.ai