Add Call Context to AI Phone Agents
Platform-agnostic guide for giving your AI phone agents memory and context across conversations.
The Problem
AI phone agents today treat every call as the first interaction. Even if a customer called 5 minutes ago, the agent starts from scratch—no memory of previous conversations, preferences, or context.
This creates terrible UX:
- Users repeat themselves: "I already told you my account number!"
- Agents ask the same questions every call
- No continuity between conversations
- Users abandon complex multi-call interactions
- Task completion rates suffer
Real impact:
- 55% of users hang up in frustration after repeating themselves
- Task completion drops by 40% without memory
- User satisfaction averages 2.3/5 for memory-less agents vs. 4.1/5 with memory
The solution: Give your AI agents persistent memory across calls.
What You'll Build
By the end of this guide, your AI phone agent will:
- Remember previous conversations with each caller
- Retrieve context automatically when calls start
- Reference past interactions naturally in dialogue
- Save conversation summaries for future calls
- Handle returning callers with personalization
Expected improvements:
- +45% task completion rate
- +62% user satisfaction
- -38% average call duration (less repetition)
- 3x higher return user rate
Time to implement: 10-30 minutes (depending on platform)
Prerequisites
- Working AI phone agent (any platform)
- Ability to make HTTP requests from your agent
- Sticky Calls API key (get free key)
- Basic understanding of your platform's API/webhooks
Compatible with:
- Voiceflow
- Vapi
- Bland AI
- Retell AI
- Custom LangChain/OpenAI implementations
- Any system that can make HTTP calls
Understanding AI Agent Memory
Types of Memory
1. Session Memory (Built-in)
- Remembers within a single conversation
- Lost when call ends
- Most platforms have this by default
2. Cross-Session Memory (What we're adding)
- Remembers across multiple calls
- Persists after call ends
- Enables true continuity
What to Remember
Essential context:
- Previous conversation topics
- User preferences
- Open issues/tasks
- Completed actions
- User frustrations or complaints
What NOT to remember:
- Sensitive data (SSN, passwords)
- Temporary session data
- Debugging information
When to Use External Memory
Use external memory APIs (like Sticky Calls) when:
- You have repeat callers
- Conversations span multiple calls
- Users need to resume previous interactions
- You want personalization based on history
- Your platform's built-in memory is insufficient
Architecture Patterns
Pattern 1: Pre-Call Context Loading
When: Load context before conversation starts
Call starts → Identify caller
→ Retrieve previous context
→ Add to LLM system prompt
→ Begin conversation with context
Best for: Most use cases, simplest implementation
Pattern 2: Mid-Call Context Updates
When: Update context during conversation
During call → User provides new info
→ Update context in real-time
→ Use updated context for rest of call
Best for: Long conversations, complex multi-step processes
Pattern 3: Post-Call Context Saving
When: Save context after call ends
Call ends → Extract key information
→ Format as structured context
→ Save for next call
Best for: All implementations (always save context at end)
Recommended: Use all 3 patterns together for best results.
Implementation: Pre-Call Context Loading
This is the foundation—loading context when a call starts.
Step 1: Identify the Caller
When your agent receives a call, make an API request to identify the caller:
HTTP Request:
POST https://api.stickycalls.com/v1/calls/start
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
{
"call_id": "unique_call_identifier",
"identity_hints": {
"ani": "+14155551234"
}
}
Response:
{
"call_id": "call_123",
"customer_ref": "cust_abc",
"identity": {
"confidence": 0.92,
"level": "very_high",
"recommendation": "reuse"
},
"variables": {
"last_topic": {
"value": "User was trying to schedule an appointment for next Tuesday",
"ttl_seconds": 2592000
},
"preferred_name": {
"value": "Sarah"
},
"completed_steps": {
"value": "verified_email,selected_service"
}
},
"open_intents": [
{
"intent": "schedule_appointment",
"status": "open"
}
]
}
Step 2: Parse the Response
Extract key information:
# Python example
import requests
def identify_caller(call_id, phone_number):
response = requests.post(
'https://api.stickycalls.com/v1/calls/start',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'call_id': call_id,
'identity_hints': {
'ani': phone_number
}
},
timeout=3
)
data = response.json()
return {
'is_returning': data['identity']['confidence'] >= 0.7,
'customer_ref': data['customer_ref'],
'last_topic': data.get('variables', {}).get('last_topic', {}).get('value', ''),
'open_intents': data.get('open_intents', []),
'completed_steps': data.get('variables', {}).get('completed_steps', {}).get('value', '')
}
Step 3: Add Context to System Prompt
Use the retrieved context in your LLM system prompt:
def build_system_prompt(caller_context):
if caller_context['is_returning']:
# Returning caller - include history
prompt = f"""
You are a helpful AI assistant.
IMPORTANT: This is a returning caller. Here's what you know about them:
Previous conversation: {caller_context['last_topic']}
Completed steps: {caller_context['completed_steps']}
Open tasks: {', '.join([intent['intent'] for intent in caller_context['open_intents']])}
Instructions:
- Reference their previous interaction naturally
- Don't make them repeat information you already know
- Help them complete any open tasks
- Be friendly and show you remember them
"""
else:
# New caller - standard prompt
prompt = """
You are a helpful AI assistant.
This is a new caller. Greet them warmly and ask how you can help.
"""
return prompt
Implementation: Using Context in Conversations
Natural Context References
Bad (robotic):
"According to my database, you previously called about scheduling."
Good (natural):
"Hi Sarah! I see we were working on scheduling your appointment for Tuesday. Ready to finish that up?"
Prompt Engineering Tips
1. Include context in character, not facts:
❌ "User's last topic: appointment scheduling"
✅ "You remember helping Sarah schedule an appointment last time you spoke."
2. Give permission to reference history:
"Feel free to reference previous conversations naturally, as if you're continuing where you left off."
3. Handle missing context gracefully:
"If the user mentions something from a previous call that's not in the context, politely ask them to remind you."
Example Dialogues
With context:
Agent: "Hi! I see last time we were scheduling your dental cleaning for next Tuesday at 2 PM. Did you want to confirm that, or make changes?"
User: "Yes, can we move it to 3 PM?"
Agent: "Absolutely! I've updated your appointment to Tuesday at 3 PM. You're all set."
Without context (bad UX):
Agent: "Hello, how can I help you?"
User: "I need to change my appointment time."
Agent: "I don't see any appointments. Can you provide your name and account number?"
User: "I JUST scheduled this yesterday!" *hangs up*
Implementation: Saving Context After Calls
Always save context at the end of every call.
Extract Key Information
Determine what to save from the conversation:
def extract_context_from_conversation(transcript, user_info):
"""
Extract key information to save for next call
You can use LLM to summarize or extract structured data
"""
# Option 1: Use LLM to summarize
summary_prompt = f"""
Summarize this conversation in 1-2 sentences, focusing on:
- What the user was trying to accomplish
- What was completed
- What's still pending
Conversation:
{transcript}
"""
summary = call_llm(summary_prompt)
# Option 2: Extract structured data
extraction_prompt = f"""
From this conversation, extract:
- User's preferred name
- Completed tasks (list)
- Pending tasks (list)
- Any preferences mentioned
Conversation:
{transcript}
Return as JSON.
"""
structured_data = call_llm(extraction_prompt)
return {
'summary': summary,
'structured': json.loads(structured_data)
}
Save to Sticky Calls API
def save_context(call_id, customer_ref, context_data):
"""
Save context for next call
"""
response = requests.post(
'https://api.stickycalls.com/v1/calls/end',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'call_id': call_id,
'customer_ref': customer_ref,
'intent': context_data.get('primary_intent', 'general_inquiry'),
'intent_status': context_data.get('status', 'resolved'),
'variables': {
'last_topic': context_data['summary'],
'preferred_name': context_data['structured'].get('name', ''),
'completed_steps': ','.join(context_data['structured'].get('completed', [])),
'pending_tasks': ','.join(context_data['structured'].get('pending', []))
}
},
timeout=3
)
return response.json()
Platform-Specific Examples
Voiceflow
In Voiceflow, use the HTTP Request block:
At start of flow:
HTTP Request block:
- Method: POST
- URL: https://api.stickycalls.com/v1/calls/start
- Headers: Authorization: Bearer {YOUR_API_KEY}
- Body:
{
"call_id": "{system.timestamp}",
"identity_hints": {
"ani": "{system.caller_id}"
}
}
- Save response to: {context_data}
Then use Set block:
If {context_data.identity.confidence} >= 0.7:
Set {is_returning} = true
Set {last_topic} = {context_data.variables.last_topic.value}
In your dialogue:
If {is_returning}:
"Hi! Last time you were {last_topic}. Ready to continue?"
Else:
"Hello! How can I help you today?"
Custom Python (LangChain)
from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
import requests
class ContextualAgent:
def __init__(self, api_key):
self.api_key = api_key
self.llm = ChatOpenAI(temperature=0.7)
def handle_call(self, call_id, phone_number):
# 1. Get context
context = self.get_context(call_id, phone_number)
# 2. Build system prompt
system_prompt = self.build_prompt(context)
# 3. Have conversation
transcript = self.run_conversation(system_prompt)
# 4. Save context
self.save_context(call_id, context['customer_ref'], transcript)
def get_context(self, call_id, phone_number):
response = requests.post(
'https://api.stickycalls.com/v1/calls/start',
headers={'Authorization': f'Bearer {self.api_key}'},
json={
'call_id': call_id,
'identity_hints': {'ani': phone_number}
}
)
return response.json()
def build_prompt(self, context):
if context['identity']['confidence'] >= 0.7:
history = context.get('variables', {}).get('last_topic', {}).get('value', '')
return f"You are a helpful assistant. The caller previously: {history}. Continue naturally."
return "You are a helpful assistant."
def run_conversation(self, system_prompt):
# Your conversation logic here
messages = [SystemMessage(content=system_prompt)]
# ... conversation loop
return transcript
def save_context(self, call_id, customer_ref, transcript):
# Extract and save
summary = self.summarize(transcript)
requests.post(
'https://api.stickycalls.com/v1/calls/end',
headers={'Authorization': f'Bearer {self.api_key}'},
json={
'call_id': call_id,
'customer_ref': customer_ref,
'intent': 'general',
'intent_status': 'resolved',
'variables': {'last_topic': summary}
}
)
OpenAI Function Calling
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const tools = [
{
type: "function",
function: {
name: "get_caller_context",
description: "Retrieve context about a returning caller",
parameters: {
type: "object",
properties: {
phone_number: {
type: "string",
description: "Caller's phone number"
}
},
required: ["phone_number"]
}
}
}
];
async function handleCall(phoneNumber) {
// Let AI decide when to get context
const messages = [
{
role: "system",
content: "You are a helpful assistant. When you receive a call, use get_caller_context to check if this is a returning caller."
},
{
role: "user",
content: `Incoming call from ${phoneNumber}`
}
];
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: messages,
tools: tools
});
// If AI calls the function
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];
if (toolCall.function.name === "get_caller_context") {
const context = await getContextFromAPI(phoneNumber);
// Add context to conversation
messages.push({
role: "function",
name: "get_caller_context",
content: JSON.stringify(context)
});
// Continue conversation with context
// ...
}
}
}
Vapi
In Vapi, configure server URL:
{
"serverUrl": "https://your-server.com/vapi-webhook",
"serverUrlSecret": "your_secret"
}
Your webhook:
app.post('/vapi-webhook', async (req, res) => {
const { type, call } = req.body;
if (type === 'call-start') {
// Get context
const context = await fetch('https://api.stickycalls.com/v1/calls/start', {
method: 'POST',
headers: {
'Authorization': `Bearer ${STICKY_CALLS_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
call_id: call.id,
identity_hints: {
ani: call.customer.number
}
})
}).then(r => r.json());
// Return updated system prompt
if (context.identity.confidence >= 0.7) {
res.json({
systemPrompt: `Previous conversation: ${context.variables.last_topic.value}. Continue naturally.`
});
}
} else if (type === 'call-end') {
// Save context
await saveContext(call);
res.json({ success: true });
}
});
Advanced Patterns
Multi-Turn Memory Management
For long conversations, update context periodically:
def update_context_during_call(call_id, customer_ref, new_info):
"""
Update context mid-call as you learn new information
"""
current_context = get_current_context(call_id)
current_context['variables']['in_progress_task'] = new_info
# Context available for rest of call
return current_context
Confidence-Based Behavior
Adjust agent behavior based on match confidence:
if confidence >= 0.9:
# Very high - personalize heavily
greeting = f"Hey {name}! Ready to finish what we started?"
elif confidence >= 0.7:
# High - personalize but verify
greeting = f"Hi! I think we spoke before about {last_topic}. Is that right?"
elif confidence >= 0.3:
# Medium - mention possibility
greeting = "Hi! Have we spoken before? You sound familiar."
else:
# Low - treat as new
greeting = "Hello! How can I help you today?"
Handling Conflicting Information
When context conflicts with what user says:
system_prompt = """
If the caller mentions something that conflicts with your context:
1. Trust what they're saying now
2. Politely acknowledge: "Oh, sounds like plans changed since we last talked!"
3. Update your understanding
4. Don't argue about what you "remember"
"""
Testing AI Agent Memory
Test Scenarios
1. First call (no context):
- Call from new number
- Should receive standard greeting
- Should ask for basic information
2. Second call (with context):
- Call from same number 5 minutes later
- Should reference previous conversation
- Should NOT ask for same information again
3. Confidence edge cases:
- Confidence = 0.75 (right at threshold)
- Confidence = 0.4 (low but not zero)
- Very old context (30 days ago)
Validation Checklist
- Context loads within 1 second
- Agent references history naturally
- No repetitive questions
- Fallback works if API fails
- Context saves after every call
- Works across different phone numbers (same user)
Metrics & Impact
Real Results from Customers
Voice AI Startup (Appointment Booking):
- Task completion: 52% → 79% (+52%)
- Average call length: 3.2 min → 1.9 min (-41%)
- User satisfaction: 3.1/5 → 4.3/5 (+39%)
E-commerce Support Bot:
- Return user rate: 8% → 24% (+200%)
- Issue resolution: 61% → 88% (+44%)
- Abandonment rate: 42% → 11% (-74%)
Healthcare Scheduling Agent:
- Appointment completion: 67% → 94% (+40%)
- Callbacks required: 34% → 7% (-79%)
- Patient satisfaction: 3.8/5 → 4.7/5 (+24%)
ROI Calculation
Assumptions:
- 1,000 calls/day
- 40% are repeat callers
- Without memory: 60% task completion, 4 min/call
- With memory: 85% task completion, 2.5 min/call
Results:
- Tasks completed: +25% (250 more/day)
- Time saved: 400 calls × 1.5 min = 600 min/day = 10 hours/day
- Monthly value: 10 hours × 22 days × $50/hour = $11,000/month
Best Practices
1. Keep Context Fresh
Set appropriate TTLs:
variables = {
'last_topic': {
'value': summary,
'ttl_seconds': 2592000 # 30 days
},
'temporary_note': {
'value': note,
'ttl_seconds': 86400 # 1 day
}
}
2. Privacy Considerations
Never store:
- Credit card numbers
- Passwords
- Social security numbers
- Medical diagnosis details
Do store:
- "User prefers email communication"
- "Last discussed: account upgrade"
- "Completed: identity verification"
3. Graceful Degradation
Always provide fallback:
try:
context = get_caller_context()
except Exception as e:
logging.error(f'Context retrieval failed: {e}')
context = {'is_returning': False} # Treat as new caller
4. Optimize Performance
Use async/parallel requests:
import asyncio
async def start_call(call_id, phone):
# Get context in parallel with other initialization
context_task = asyncio.create_task(get_context_async(call_id, phone))
init_task = asyncio.create_task(initialize_agent())
context, agent = await asyncio.gather(context_task, init_task)
# Start conversation immediately with both ready
return handle_conversation(agent, context)
Next Steps
Now that you've added memory to your AI phone agent:
- Platform-specific guides - Voiceflow, Dialogflow CX, Amazon Lex
- Reduce AHT Guide - Measure business impact
- Caller Identity Storage - Architecture deep-dive
- API Reference - Complete API documentation
- Best Practices - Production tips
Questions? Contact support
Ready to get started? Sign up for free →