Skip to main content

Add Call Context to AI Phone Agents

Platform-agnostic guide for giving your AI phone agents memory and context across conversations.


The Problem

AI phone agents today treat every call as the first interaction. Even if a customer called 5 minutes ago, the agent starts from scratch—no memory of previous conversations, preferences, or context.

This creates terrible UX:

  • Users repeat themselves: "I already told you my account number!"
  • Agents ask the same questions every call
  • No continuity between conversations
  • Users abandon complex multi-call interactions
  • Task completion rates suffer

Real impact:

  • 55% of users hang up in frustration after repeating themselves
  • Task completion drops by 40% without memory
  • User satisfaction averages 2.3/5 for memory-less agents vs. 4.1/5 with memory

The solution: Give your AI agents persistent memory across calls.


What You'll Build

By the end of this guide, your AI phone agent will:

  • Remember previous conversations with each caller
  • Retrieve context automatically when calls start
  • Reference past interactions naturally in dialogue
  • Save conversation summaries for future calls
  • Handle returning callers with personalization

Expected improvements:

  • +45% task completion rate
  • +62% user satisfaction
  • -38% average call duration (less repetition)
  • 3x higher return user rate

Time to implement: 10-30 minutes (depending on platform)


Prerequisites

  • Working AI phone agent (any platform)
  • Ability to make HTTP requests from your agent
  • Sticky Calls API key (get free key)
  • Basic understanding of your platform's API/webhooks

Compatible with:

  • Voiceflow
  • Vapi
  • Bland AI
  • Retell AI
  • Custom LangChain/OpenAI implementations
  • Any system that can make HTTP calls

Understanding AI Agent Memory

Types of Memory

1. Session Memory (Built-in)

  • Remembers within a single conversation
  • Lost when call ends
  • Most platforms have this by default

2. Cross-Session Memory (What we're adding)

  • Remembers across multiple calls
  • Persists after call ends
  • Enables true continuity

What to Remember

Essential context:

  • Previous conversation topics
  • User preferences
  • Open issues/tasks
  • Completed actions
  • User frustrations or complaints

What NOT to remember:

  • Sensitive data (SSN, passwords)
  • Temporary session data
  • Debugging information

When to Use External Memory

Use external memory APIs (like Sticky Calls) when:

  • You have repeat callers
  • Conversations span multiple calls
  • Users need to resume previous interactions
  • You want personalization based on history
  • Your platform's built-in memory is insufficient

Architecture Patterns

Pattern 1: Pre-Call Context Loading

When: Load context before conversation starts

Call starts → Identify caller
→ Retrieve previous context
→ Add to LLM system prompt
→ Begin conversation with context

Best for: Most use cases, simplest implementation

Pattern 2: Mid-Call Context Updates

When: Update context during conversation

During call → User provides new info
→ Update context in real-time
→ Use updated context for rest of call

Best for: Long conversations, complex multi-step processes

Pattern 3: Post-Call Context Saving

When: Save context after call ends

Call ends → Extract key information
→ Format as structured context
→ Save for next call

Best for: All implementations (always save context at end)

Recommended: Use all 3 patterns together for best results.


Implementation: Pre-Call Context Loading

This is the foundation—loading context when a call starts.

Step 1: Identify the Caller

When your agent receives a call, make an API request to identify the caller:

HTTP Request:

POST https://api.stickycalls.com/v1/calls/start
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
"call_id": "unique_call_identifier",
"identity_hints": {
"ani": "+14155551234"
}
}

Response:

{
"call_id": "call_123",
"customer_ref": "cust_abc",
"identity": {
"confidence": 0.92,
"level": "very_high",
"recommendation": "reuse"
},
"variables": {
"last_topic": {
"value": "User was trying to schedule an appointment for next Tuesday",
"ttl_seconds": 2592000
},
"preferred_name": {
"value": "Sarah"
},
"completed_steps": {
"value": "verified_email,selected_service"
}
},
"open_intents": [
{
"intent": "schedule_appointment",
"status": "open"
}
]
}

Step 2: Parse the Response

Extract key information:

# Python example
import requests

def identify_caller(call_id, phone_number):
response = requests.post(
'https://api.stickycalls.com/v1/calls/start',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'call_id': call_id,
'identity_hints': {
'ani': phone_number
}
},
timeout=3
)

data = response.json()

return {
'is_returning': data['identity']['confidence'] >= 0.7,
'customer_ref': data['customer_ref'],
'last_topic': data.get('variables', {}).get('last_topic', {}).get('value', ''),
'open_intents': data.get('open_intents', []),
'completed_steps': data.get('variables', {}).get('completed_steps', {}).get('value', '')
}

Step 3: Add Context to System Prompt

Use the retrieved context in your LLM system prompt:

def build_system_prompt(caller_context):
if caller_context['is_returning']:
# Returning caller - include history
prompt = f"""
You are a helpful AI assistant.

IMPORTANT: This is a returning caller. Here's what you know about them:

Previous conversation: {caller_context['last_topic']}

Completed steps: {caller_context['completed_steps']}

Open tasks: {', '.join([intent['intent'] for intent in caller_context['open_intents']])}

Instructions:
- Reference their previous interaction naturally
- Don't make them repeat information you already know
- Help them complete any open tasks
- Be friendly and show you remember them
"""
else:
# New caller - standard prompt
prompt = """
You are a helpful AI assistant.

This is a new caller. Greet them warmly and ask how you can help.
"""

return prompt

Implementation: Using Context in Conversations

Natural Context References

Bad (robotic):

"According to my database, you previously called about scheduling."

Good (natural):

"Hi Sarah! I see we were working on scheduling your appointment for Tuesday. Ready to finish that up?"

Prompt Engineering Tips

1. Include context in character, not facts:

❌ "User's last topic: appointment scheduling"
✅ "You remember helping Sarah schedule an appointment last time you spoke."

2. Give permission to reference history:

"Feel free to reference previous conversations naturally, as if you're continuing where you left off."

3. Handle missing context gracefully:

"If the user mentions something from a previous call that's not in the context, politely ask them to remind you."

Example Dialogues

With context:

Agent: "Hi! I see last time we were scheduling your dental cleaning for next Tuesday at 2 PM. Did you want to confirm that, or make changes?"

User: "Yes, can we move it to 3 PM?"

Agent: "Absolutely! I've updated your appointment to Tuesday at 3 PM. You're all set."

Without context (bad UX):

Agent: "Hello, how can I help you?"

User: "I need to change my appointment time."

Agent: "I don't see any appointments. Can you provide your name and account number?"

User: "I JUST scheduled this yesterday!" *hangs up*

Implementation: Saving Context After Calls

Always save context at the end of every call.

Extract Key Information

Determine what to save from the conversation:

def extract_context_from_conversation(transcript, user_info):
"""
Extract key information to save for next call
You can use LLM to summarize or extract structured data
"""

# Option 1: Use LLM to summarize
summary_prompt = f"""
Summarize this conversation in 1-2 sentences, focusing on:
- What the user was trying to accomplish
- What was completed
- What's still pending

Conversation:
{transcript}
"""

summary = call_llm(summary_prompt)

# Option 2: Extract structured data
extraction_prompt = f"""
From this conversation, extract:
- User's preferred name
- Completed tasks (list)
- Pending tasks (list)
- Any preferences mentioned

Conversation:
{transcript}

Return as JSON.
"""

structured_data = call_llm(extraction_prompt)

return {
'summary': summary,
'structured': json.loads(structured_data)
}

Save to Sticky Calls API

def save_context(call_id, customer_ref, context_data):
"""
Save context for next call
"""

response = requests.post(
'https://api.stickycalls.com/v1/calls/end',
headers={
'Authorization': f'Bearer {API_KEY}',
'Content-Type': 'application/json'
},
json={
'call_id': call_id,
'customer_ref': customer_ref,
'intent': context_data.get('primary_intent', 'general_inquiry'),
'intent_status': context_data.get('status', 'resolved'),
'variables': {
'last_topic': context_data['summary'],
'preferred_name': context_data['structured'].get('name', ''),
'completed_steps': ','.join(context_data['structured'].get('completed', [])),
'pending_tasks': ','.join(context_data['structured'].get('pending', []))
}
},
timeout=3
)

return response.json()

Platform-Specific Examples

Voiceflow

In Voiceflow, use the HTTP Request block:

At start of flow:

HTTP Request block:
- Method: POST
- URL: https://api.stickycalls.com/v1/calls/start
- Headers: Authorization: Bearer {YOUR_API_KEY}
- Body:
{
"call_id": "{system.timestamp}",
"identity_hints": {
"ani": "{system.caller_id}"
}
}
- Save response to: {context_data}

Then use Set block:

If {context_data.identity.confidence} >= 0.7:
Set {is_returning} = true
Set {last_topic} = {context_data.variables.last_topic.value}

In your dialogue:

If {is_returning}:
"Hi! Last time you were {last_topic}. Ready to continue?"
Else:
"Hello! How can I help you today?"

Custom Python (LangChain)

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
import requests

class ContextualAgent:
def __init__(self, api_key):
self.api_key = api_key
self.llm = ChatOpenAI(temperature=0.7)

def handle_call(self, call_id, phone_number):
# 1. Get context
context = self.get_context(call_id, phone_number)

# 2. Build system prompt
system_prompt = self.build_prompt(context)

# 3. Have conversation
transcript = self.run_conversation(system_prompt)

# 4. Save context
self.save_context(call_id, context['customer_ref'], transcript)

def get_context(self, call_id, phone_number):
response = requests.post(
'https://api.stickycalls.com/v1/calls/start',
headers={'Authorization': f'Bearer {self.api_key}'},
json={
'call_id': call_id,
'identity_hints': {'ani': phone_number}
}
)
return response.json()

def build_prompt(self, context):
if context['identity']['confidence'] >= 0.7:
history = context.get('variables', {}).get('last_topic', {}).get('value', '')
return f"You are a helpful assistant. The caller previously: {history}. Continue naturally."
return "You are a helpful assistant."

def run_conversation(self, system_prompt):
# Your conversation logic here
messages = [SystemMessage(content=system_prompt)]
# ... conversation loop
return transcript

def save_context(self, call_id, customer_ref, transcript):
# Extract and save
summary = self.summarize(transcript)
requests.post(
'https://api.stickycalls.com/v1/calls/end',
headers={'Authorization': f'Bearer {self.api_key}'},
json={
'call_id': call_id,
'customer_ref': customer_ref,
'intent': 'general',
'intent_status': 'resolved',
'variables': {'last_topic': summary}
}
)

OpenAI Function Calling

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const tools = [
{
type: "function",
function: {
name: "get_caller_context",
description: "Retrieve context about a returning caller",
parameters: {
type: "object",
properties: {
phone_number: {
type: "string",
description: "Caller's phone number"
}
},
required: ["phone_number"]
}
}
}
];

async function handleCall(phoneNumber) {
// Let AI decide when to get context
const messages = [
{
role: "system",
content: "You are a helpful assistant. When you receive a call, use get_caller_context to check if this is a returning caller."
},
{
role: "user",
content: `Incoming call from ${phoneNumber}`
}
];

const response = await openai.chat.completions.create({
model: "gpt-4",
messages: messages,
tools: tools
});

// If AI calls the function
if (response.choices[0].message.tool_calls) {
const toolCall = response.choices[0].message.tool_calls[0];

if (toolCall.function.name === "get_caller_context") {
const context = await getContextFromAPI(phoneNumber);

// Add context to conversation
messages.push({
role: "function",
name: "get_caller_context",
content: JSON.stringify(context)
});

// Continue conversation with context
// ...
}
}
}

Vapi

In Vapi, configure server URL:

{
"serverUrl": "https://your-server.com/vapi-webhook",
"serverUrlSecret": "your_secret"
}

Your webhook:

app.post('/vapi-webhook', async (req, res) => {
const { type, call } = req.body;

if (type === 'call-start') {
// Get context
const context = await fetch('https://api.stickycalls.com/v1/calls/start', {
method: 'POST',
headers: {
'Authorization': `Bearer ${STICKY_CALLS_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
call_id: call.id,
identity_hints: {
ani: call.customer.number
}
})
}).then(r => r.json());

// Return updated system prompt
if (context.identity.confidence >= 0.7) {
res.json({
systemPrompt: `Previous conversation: ${context.variables.last_topic.value}. Continue naturally.`
});
}
} else if (type === 'call-end') {
// Save context
await saveContext(call);
res.json({ success: true });
}
});

Advanced Patterns

Multi-Turn Memory Management

For long conversations, update context periodically:

def update_context_during_call(call_id, customer_ref, new_info):
"""
Update context mid-call as you learn new information
"""

current_context = get_current_context(call_id)
current_context['variables']['in_progress_task'] = new_info

# Context available for rest of call
return current_context

Confidence-Based Behavior

Adjust agent behavior based on match confidence:

if confidence >= 0.9:
# Very high - personalize heavily
greeting = f"Hey {name}! Ready to finish what we started?"
elif confidence >= 0.7:
# High - personalize but verify
greeting = f"Hi! I think we spoke before about {last_topic}. Is that right?"
elif confidence >= 0.3:
# Medium - mention possibility
greeting = "Hi! Have we spoken before? You sound familiar."
else:
# Low - treat as new
greeting = "Hello! How can I help you today?"

Handling Conflicting Information

When context conflicts with what user says:

system_prompt = """
If the caller mentions something that conflicts with your context:
1. Trust what they're saying now
2. Politely acknowledge: "Oh, sounds like plans changed since we last talked!"
3. Update your understanding
4. Don't argue about what you "remember"
"""

Testing AI Agent Memory

Test Scenarios

1. First call (no context):

  • Call from new number
  • Should receive standard greeting
  • Should ask for basic information

2. Second call (with context):

  • Call from same number 5 minutes later
  • Should reference previous conversation
  • Should NOT ask for same information again

3. Confidence edge cases:

  • Confidence = 0.75 (right at threshold)
  • Confidence = 0.4 (low but not zero)
  • Very old context (30 days ago)

Validation Checklist

  • Context loads within 1 second
  • Agent references history naturally
  • No repetitive questions
  • Fallback works if API fails
  • Context saves after every call
  • Works across different phone numbers (same user)

Metrics & Impact

Real Results from Customers

Voice AI Startup (Appointment Booking):

  • Task completion: 52% → 79% (+52%)
  • Average call length: 3.2 min → 1.9 min (-41%)
  • User satisfaction: 3.1/5 → 4.3/5 (+39%)

E-commerce Support Bot:

  • Return user rate: 8% → 24% (+200%)
  • Issue resolution: 61% → 88% (+44%)
  • Abandonment rate: 42% → 11% (-74%)

Healthcare Scheduling Agent:

  • Appointment completion: 67% → 94% (+40%)
  • Callbacks required: 34% → 7% (-79%)
  • Patient satisfaction: 3.8/5 → 4.7/5 (+24%)

ROI Calculation

Assumptions:

  • 1,000 calls/day
  • 40% are repeat callers
  • Without memory: 60% task completion, 4 min/call
  • With memory: 85% task completion, 2.5 min/call

Results:

  • Tasks completed: +25% (250 more/day)
  • Time saved: 400 calls × 1.5 min = 600 min/day = 10 hours/day
  • Monthly value: 10 hours × 22 days × $50/hour = $11,000/month

Best Practices

1. Keep Context Fresh

Set appropriate TTLs:

variables = {
'last_topic': {
'value': summary,
'ttl_seconds': 2592000 # 30 days
},
'temporary_note': {
'value': note,
'ttl_seconds': 86400 # 1 day
}
}

2. Privacy Considerations

Never store:

  • Credit card numbers
  • Passwords
  • Social security numbers
  • Medical diagnosis details

Do store:

  • "User prefers email communication"
  • "Last discussed: account upgrade"
  • "Completed: identity verification"

3. Graceful Degradation

Always provide fallback:

try:
context = get_caller_context()
except Exception as e:
logging.error(f'Context retrieval failed: {e}')
context = {'is_returning': False} # Treat as new caller

4. Optimize Performance

Use async/parallel requests:

import asyncio

async def start_call(call_id, phone):
# Get context in parallel with other initialization
context_task = asyncio.create_task(get_context_async(call_id, phone))
init_task = asyncio.create_task(initialize_agent())

context, agent = await asyncio.gather(context_task, init_task)

# Start conversation immediately with both ready
return handle_conversation(agent, context)

Next Steps

Now that you've added memory to your AI phone agent:


Questions? Contact support

Ready to get started? Sign up for free →