Add Call Context to AI Phone Agents

Platform-agnostic guide for giving your AI phone agents memory and context across conversations.

The Problem

AI phone agents today treat every call as the first interaction. Even if a customer called 5 minutes ago, the agent starts from scratch—no memory of previous conversations, preferences, or context.

This creates terrible UX:

Users repeat themselves: "I already told you my account number!"
Agents ask the same questions every call
No continuity between conversations
Users abandon complex multi-call interactions
Task completion rates suffer

Real impact:

55% of users hang up in frustration after repeating themselves
Task completion drops by 40% without memory
User satisfaction averages 2.3/5 for memory-less agents vs. 4.1/5 with memory

The solution: Give your AI agents persistent memory across calls.

What You'll Build

By the end of this guide, your AI phone agent will:

Remember previous conversations with each caller
Retrieve context automatically when calls start
Reference past interactions naturally in dialogue
Save conversation summaries for future calls
Handle returning callers with personalization

Expected improvements:

+45% task completion rate
+62% user satisfaction
-38% average call duration (less repetition)
3x higher return user rate

Time to implement: 10-30 minutes (depending on platform)

Prerequisites

Working AI phone agent (any platform)
Ability to make HTTP requests from your agent
Sticky Calls API key (get free key)
Basic understanding of your platform's API/webhooks

Compatible with:

Voiceflow
Vapi
Bland AI
Retell AI
Custom LangChain/OpenAI implementations
Any system that can make HTTP calls

Understanding AI Agent Memory

Types of Memory

1. Session Memory (Built-in)

Remembers within a single conversation
Lost when call ends
Most platforms have this by default

2. Cross-Session Memory (What we're adding)

Remembers across multiple calls
Persists after call ends
Enables true continuity

What to Remember

Essential context:

Previous conversation topics
User preferences
Open issues/tasks
Completed actions
User frustrations or complaints

What NOT to remember:

Sensitive data (SSN, passwords)
Temporary session data
Debugging information

When to Use External Memory

Use external memory APIs (like Sticky Calls) when:

You have repeat callers
Conversations span multiple calls
Users need to resume previous interactions
You want personalization based on history
Your platform's built-in memory is insufficient

Architecture Patterns

Pattern 1: Pre-Call Context Loading

When: Load context before conversation starts

Call starts → Identify caller
           → Retrieve previous context
           → Add to LLM system prompt
           → Begin conversation with context

Best for: Most use cases, simplest implementation

Pattern 2: Mid-Call Context Updates

When: Update context during conversation

During call → User provides new info
            → Update context in real-time
            → Use updated context for rest of call

Best for: Long conversations, complex multi-step processes

Pattern 3: Post-Call Context Saving

When: Save context after call ends

Call ends → Extract key information
          → Format as structured context
          → Save for next call

Best for: All implementations (always save context at end)

Recommended: Use all 3 patterns together for best results.

Implementation: Pre-Call Context Loading

This is the foundation—loading context when a call starts.

Step 1: Identify the Caller

When your agent receives a call, make an API request to identify the caller:

HTTP Request:

POST https://api.stickycalls.com/v1/calls/start
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "call_id": "unique_call_identifier",
  "identity_hints": {
    "ani": "+14155551234"
  }
}

Response:

{
  "call_id": "call_123",
  "customer_ref": "cust_abc",
  "identity": {
    "confidence": 0.92,
    "level": "very_high",
    "recommendation": "reuse"
  },
  "variables": {
    "last_topic": {
      "value": "User was trying to schedule an appointment for next Tuesday",
      "ttl_seconds": 2592000
    },
    "preferred_name": {
      "value": "Sarah"
    },
    "completed_steps": {
      "value": "verified_email,selected_service"
    }
  },
  "open_intents": [
    {
      "intent": "schedule_appointment",
      "status": "open"
    }
  ]
}

Step 2: Parse the Response

Extract key information:

# Python example
import requests

def identify_caller(call_id, phone_number):
    response = requests.post(
        'https://api.stickycalls.com/v1/calls/start',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={
            'call_id': call_id,
            'identity_hints': {
                'ani': phone_number
            }
        },
        timeout=3
    )

    data = response.json()

    return {
        'is_returning': data['identity']['confidence'] >= 0.7,
        'customer_ref': data['customer_ref'],
        'last_topic': data.get('variables', {}).get('last_topic', {}).get('value', ''),
        'open_intents': data.get('open_intents', []),
        'completed_steps': data.get('variables', {}).get('completed_steps', {}).get('value', '')
    }

Step 3: Add Context to System Prompt

Use the retrieved context in your LLM system prompt:

def build_system_prompt(caller_context):
    if caller_context['is_returning']:
        # Returning caller - include history
        prompt = f"""
        You are a helpful AI assistant.

        IMPORTANT: This is a returning caller. Here's what you know about them:

        Previous conversation: {caller_context['last_topic']}

        Completed steps: {caller_context['completed_steps']}

        Open tasks: {', '.join([intent['intent'] for intent in caller_context['open_intents']])}

        Instructions:
        - Reference their previous interaction naturally
        - Don't make them repeat information you already know
        - Help them complete any open tasks
        - Be friendly and show you remember them
        """
    else:
        # New caller - standard prompt
        prompt = """
        You are a helpful AI assistant.

        This is a new caller. Greet them warmly and ask how you can help.
        """

    return prompt

Implementation: Using Context in Conversations

Natural Context References

Bad (robotic):

"According to my database, you previously called about scheduling."

Good (natural):

"Hi Sarah! I see we were working on scheduling your appointment for Tuesday. Ready to finish that up?"

Prompt Engineering Tips

1. Include context in character, not facts:

❌ "User's last topic: appointment scheduling"
✅ "You remember helping Sarah schedule an appointment last time you spoke."

2. Give permission to reference history:

"Feel free to reference previous conversations naturally, as if you're continuing where you left off."

3. Handle missing context gracefully:

"If the user mentions something from a previous call that's not in the context, politely ask them to remind you."

Example Dialogues

With context:

Agent: "Hi! I see last time we were scheduling your dental cleaning for next Tuesday at 2 PM. Did you want to confirm that, or make changes?"

User: "Yes, can we move it to 3 PM?"

Agent: "Absolutely! I've updated your appointment to Tuesday at 3 PM. You're all set."

Without context (bad UX):

Agent: "Hello, how can I help you?"

User: "I need to change my appointment time."

Agent: "I don't see any appointments. Can you provide your name and account number?"

User: "I JUST scheduled this yesterday!" *hangs up*

Implementation: Saving Context After Calls

Always save context at the end of every call.

Extract Key Information

Determine what to save from the conversation:

def extract_context_from_conversation(transcript, user_info):
    """
    Extract key information to save for next call
    You can use LLM to summarize or extract structured data
    """

    # Option 1: Use LLM to summarize
    summary_prompt = f"""
    Summarize this conversation in 1-2 sentences, focusing on:
    - What the user was trying to accomplish
    - What was completed
    - What's still pending

    Conversation:
    {transcript}
    """

    summary = call_llm(summary_prompt)

    # Option 2: Extract structured data
    extraction_prompt = f"""
    From this conversation, extract:
    - User's preferred name
    - Completed tasks (list)
    - Pending tasks (list)
    - Any preferences mentioned

    Conversation:
    {transcript}

    Return as JSON.
    """

    structured_data = call_llm(extraction_prompt)

    return {
        'summary': summary,
        'structured': json.loads(structured_data)
    }

Save to Sticky Calls API

def save_context(call_id, customer_ref, context_data):
    """
    Save context for next call
    """

    response = requests.post(
        'https://api.stickycalls.com/v1/calls/end',
        headers={
            'Authorization': f'Bearer {API_KEY}',
            'Content-Type': 'application/json'
        },
        json={
            'call_id': call_id,
            'customer_ref': customer_ref,
            'intent': context_data.get('primary_intent', 'general_inquiry'),
            'intent_status': context_data.get('status', 'resolved'),
            'variables': {
                'last_topic': context_data['summary'],
                'preferred_name': context_data['structured'].get('name', ''),
                'completed_steps': ','.join(context_data['structured'].get('completed', [])),
                'pending_tasks': ','.join(context_data['structured'].get('pending', []))
            }
        },
        timeout=3
    )

    return response.json()

Platform-Specific Examples

Voiceflow

In Voiceflow, use the HTTP Request block:

At start of flow:

HTTP Request block:
- Method: POST
- URL: https://api.stickycalls.com/v1/calls/start
- Headers: Authorization: Bearer {YOUR_API_KEY}
- Body:
  {
    "call_id": "{system.timestamp}",
    "identity_hints": {
      "ani": "{system.caller_id}"
    }
  }
- Save response to: {context_data}

Then use Set block:

If {context_data.identity.confidence} >= 0.7:
  Set {is_returning} = true
  Set {last_topic} = {context_data.variables.last_topic.value}

In your dialogue:

If {is_returning}:
  "Hi! Last time you were {last_topic}. Ready to continue?"
Else:
  "Hello! How can I help you today?"

Custom Python (LangChain)

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
import requests

class ContextualAgent:
    def __init__(self, api_key):
        self.api_key = api_key
        self.llm = ChatOpenAI(temperature=0.7)

    def handle_call(self, call_id, phone_number):
        # 1. Get context
        context = self.get_context(call_id, phone_number)

        # 2. Build system prompt
        system_prompt = self.build_prompt(context)

        # 3. Have conversation
        transcript = self.run_conversation(system_prompt)

        # 4. Save context
        self.save_context(call_id, context['customer_ref'], transcript)

    def get_context(self, call_id, phone_number):
        response = requests.post(
            'https://api.stickycalls.com/v1/calls/start',
            headers={'Authorization': f'Bearer {self.api_key}'},
            json={
                'call_id': call_id,
                'identity_hints': {'ani': phone_number}
            }
        )
        return response.json()

    def build_prompt(self, context):
        if context['identity']['confidence'] >= 0.7:
            history = context.get('variables', {}).get('last_topic', {}).get('value', '')
            return f"You are a helpful assistant. The caller previously: {history}. Continue naturally."
        return "You are a helpful assistant."

    def run_conversation(self, system_prompt):
        # Your conversation logic here
        messages = [SystemMessage(content=system_prompt)]
        # ... conversation loop
        return transcript

    def save_context(self, call_id, customer_ref, transcript):
        # Extract and save
        summary = self.summarize(transcript)
        requests.post(
            'https://api.stickycalls.com/v1/calls/end',
            headers={'Authorization': f'Bearer {self.api_key}'},
            json={
                'call_id': call_id,
                'customer_ref': customer_ref,
                'intent': 'general',
                'intent_status': 'resolved',
                'variables': {'last_topic': summary}
            }
        )

OpenAI Function Calling

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

const tools = [
  {
    type: "function",
    function: {
      name: "get_caller_context",
      description: "Retrieve context about a returning caller",
      parameters: {
        type: "object",
        properties: {
          phone_number: {
            type: "string",
            description: "Caller's phone number"
          }
        },
        required: ["phone_number"]
      }
    }
  }
];

async function handleCall(phoneNumber) {
  // Let AI decide when to get context
  const messages = [
    {
      role: "system",
      content: "You are a helpful assistant. When you receive a call, use get_caller_context to check if this is a returning caller."
    },
    {
      role: "user",
      content: `Incoming call from ${phoneNumber}`
    }
  ];

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: messages,
    tools: tools
  });

  // If AI calls the function
  if (response.choices[0].message.tool_calls) {
    const toolCall = response.choices[0].message.tool_calls[0];

    if (toolCall.function.name === "get_caller_context") {
      const context = await getContextFromAPI(phoneNumber);

      // Add context to conversation
      messages.push({
        role: "function",
        name: "get_caller_context",
        content: JSON.stringify(context)
      });

      // Continue conversation with context
      // ...
    }
  }
}

Vapi

In Vapi, configure server URL:

{
  "serverUrl": "https://your-server.com/vapi-webhook",
  "serverUrlSecret": "your_secret"
}

Your webhook:

app.post('/vapi-webhook', async (req, res) => {
  const { type, call } = req.body;

  if (type === 'call-start') {
    // Get context
    const context = await fetch('https://api.stickycalls.com/v1/calls/start', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${STICKY_CALLS_KEY}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        call_id: call.id,
        identity_hints: {
          ani: call.customer.number
        }
      })
    }).then(r => r.json());

    // Return updated system prompt
    if (context.identity.confidence >= 0.7) {
      res.json({
        systemPrompt: `Previous conversation: ${context.variables.last_topic.value}. Continue naturally.`
      });
    }
  } else if (type === 'call-end') {
    // Save context
    await saveContext(call);
    res.json({ success: true });
  }
});

Advanced Patterns

Multi-Turn Memory Management

For long conversations, update context periodically:

def update_context_during_call(call_id, customer_ref, new_info):
    """
    Update context mid-call as you learn new information
    """

    current_context = get_current_context(call_id)
    current_context['variables']['in_progress_task'] = new_info

    # Context available for rest of call
    return current_context

Confidence-Based Behavior

Adjust agent behavior based on match confidence:

if confidence >= 0.9:
    # Very high - personalize heavily
    greeting = f"Hey {name}! Ready to finish what we started?"
elif confidence >= 0.7:
    # High - personalize but verify
    greeting = f"Hi! I think we spoke before about {last_topic}. Is that right?"
elif confidence >= 0.3:
    # Medium - mention possibility
    greeting = "Hi! Have we spoken before? You sound familiar."
else:
    # Low - treat as new
    greeting = "Hello! How can I help you today?"

Handling Conflicting Information

When context conflicts with what user says:

system_prompt = """
If the caller mentions something that conflicts with your context:
1. Trust what they're saying now
2. Politely acknowledge: "Oh, sounds like plans changed since we last talked!"
3. Update your understanding
4. Don't argue about what you "remember"
"""

Testing AI Agent Memory

Test Scenarios

1. First call (no context):

Call from new number
Should receive standard greeting
Should ask for basic information

2. Second call (with context):

Call from same number 5 minutes later
Should reference previous conversation
Should NOT ask for same information again

3. Confidence edge cases:

Confidence = 0.75 (right at threshold)
Confidence = 0.4 (low but not zero)
Very old context (30 days ago)

Validation Checklist

Context loads within 1 second
Agent references history naturally
No repetitive questions
Fallback works if API fails
Context saves after every call
Works across different phone numbers (same user)

Metrics & Impact

Real Results from Customers

Voice AI Startup (Appointment Booking):

Task completion: 52% → 79% (+52%)
Average call length: 3.2 min → 1.9 min (-41%)
User satisfaction: 3.1/5 → 4.3/5 (+39%)

E-commerce Support Bot:

Return user rate: 8% → 24% (+200%)
Issue resolution: 61% → 88% (+44%)
Abandonment rate: 42% → 11% (-74%)

Healthcare Scheduling Agent:

Appointment completion: 67% → 94% (+40%)
Callbacks required: 34% → 7% (-79%)
Patient satisfaction: 3.8/5 → 4.7/5 (+24%)

ROI Calculation

Assumptions:

1,000 calls/day
40% are repeat callers
Without memory: 60% task completion, 4 min/call
With memory: 85% task completion, 2.5 min/call

Results:

Tasks completed: +25% (250 more/day)
Time saved: 400 calls × 1.5 min = 600 min/day = 10 hours/day
Monthly value: 10 hours × 22 days × $50/hour = $11,000/month

Best Practices

1. Keep Context Fresh

Set appropriate TTLs:

variables = {
    'last_topic': {
        'value': summary,
        'ttl_seconds': 2592000  # 30 days
    },
    'temporary_note': {
        'value': note,
        'ttl_seconds': 86400  # 1 day
    }
}

2. Privacy Considerations

Never store:

Credit card numbers
Passwords
Social security numbers
Medical diagnosis details

Do store:

"User prefers email communication"
"Last discussed: account upgrade"
"Completed: identity verification"

3. Graceful Degradation

Always provide fallback:

try:
    context = get_caller_context()
except Exception as e:
    logging.error(f'Context retrieval failed: {e}')
    context = {'is_returning': False}  # Treat as new caller

4. Optimize Performance

Use async/parallel requests:

import asyncio

async def start_call(call_id, phone):
    # Get context in parallel with other initialization
    context_task = asyncio.create_task(get_context_async(call_id, phone))
    init_task = asyncio.create_task(initialize_agent())

    context, agent = await asyncio.gather(context_task, init_task)

    # Start conversation immediately with both ready
    return handle_conversation(agent, context)

Next Steps

Now that you've added memory to your AI phone agent:

Platform-specific guides - Voiceflow, Dialogflow CX, Amazon Lex
Reduce AHT Guide - Measure business impact
Caller Identity Storage - Architecture deep-dive
API Reference - Complete API documentation
Best Practices - Production tips

Questions? Contact support

Ready to get started? Sign up for free →

The Problem​

What You'll Build​

Prerequisites​

Understanding AI Agent Memory​

Types of Memory​

What to Remember​

When to Use External Memory​

Architecture Patterns​

Pattern 1: Pre-Call Context Loading​

Pattern 2: Mid-Call Context Updates​

Pattern 3: Post-Call Context Saving​

Implementation: Pre-Call Context Loading​

Step 1: Identify the Caller​

Step 2: Parse the Response​

Step 3: Add Context to System Prompt​

Implementation: Using Context in Conversations​

Natural Context References​

Prompt Engineering Tips​

Example Dialogues​

Implementation: Saving Context After Calls​

Extract Key Information​

Save to Sticky Calls API​

Platform-Specific Examples​

Voiceflow​

Custom Python (LangChain)​

OpenAI Function Calling​

Vapi​

Advanced Patterns​

Multi-Turn Memory Management​

Confidence-Based Behavior​

Handling Conflicting Information​

Testing AI Agent Memory​

Test Scenarios​

Validation Checklist​

Metrics & Impact​

Real Results from Customers​

ROI Calculation​

Best Practices​

1. Keep Context Fresh​

2. Privacy Considerations​

3. Graceful Degradation​

4. Optimize Performance​

Next Steps​