Migrating from OpenAI Realtime API

rtAV is a drop-in replacement for OpenAI's Realtime API, with identical protocol support plus real-time video avatar output. This guide shows you how to migrate your existing OpenAI Realtime API code to rtAV with minimal changes.

Why Migrate to rtAV?

  • Full Compatibility: Identical protocol to OpenAI Realtime API - your existing code works as-is
  • Video Output: Real-time video avatars alongside audio responses (OpenAI doesn't support video)
  • Cost-Effective: Transparent pricing at $6/hour with no per-token charges
  • Model Flexibility: Use any LLM model, not just OpenAI models
  • Self-Hosted Option: Deploy your own workers for complete control

Quick Migration Checklist

  1. Update API endpoint URL (change api.openai.com to api.rtav.io)
  2. Replace OpenAI API key with rtAV API key
  3. Update model name (e.g., gpt-realtimegpt-5.2)
  4. Optional: Add video output handling for avatar frames
  5. Optional: Configure video avatar (face, voice, driving behavior)

That's it! Your existing OpenAI Realtime API code should work with rtAV with these minimal changes.

WebSocket Migration

Step 1: Update Connection URL

Change the WebSocket URL from OpenAI to rtAV:

// OpenAI (Before)
const ws = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-realtime', {
  headers: {
    'Authorization': `Bearer ${OPENAI_API_KEY}`
  }
});

// rtAV (After) - Only URL and API key changed!
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2', {
  headers: {
    'Authorization': `Bearer ${RTAV_API_KEY}`
  }
});

Step 2: Browser Authentication

Browsers cannot set custom headers on WebSocket connections. For browser-based clients, send authentication as the first message:

// Browser JavaScript - rtAV supports auth message
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2');

ws.onopen = () => {
  // Send auth as first message (browser WebSocket can't set Authorization header)
  ws.send(JSON.stringify({
    type: 'auth',
    api_key: 'rtav_ak_your_api_key_here'
  }));
};

Step 3: Session Configuration

Session configuration is nearly identical. rtAV adds optional video-specific options:

// OpenAI (Before)
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    instructions: 'You are a helpful assistant.',
    audio: {'output': {'voice': 'alloy'}},
    model: 'gpt-realtime'
  }
}));

// rtAV (After) - Same structure, with optional video options
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    instructions: 'You are a helpful assistant.',
    audio: {'output': {'voice': 'alloy'}},
    model: 'gpt-5.2',
    // Optional: Add video avatar configuration
    face: 'your-face-id',        // RTAV face ID (optional)
    driving: 'IdleListeningEncouraging'  // Avatar behavior (optional)
  }
}));

Step 4: Handle Video Output (Optional)

rtAV adds video frame events that OpenAI doesn't have. Add this to your message handler:

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  // Existing OpenAI events (work the same)
  if (data.type === 'response.output_audio.delta') {
    // Handle audio chunk (same as OpenAI)
    const audioChunk = data.delta;
    // Play audio...
  }
  
  if (data.type === 'response.output_text.delta') {
    // Handle text chunk (same as OpenAI)
    const textChunk = data.delta;
    // Display text...
  }
  
  // NEW: Handle video frames (rtAV only)
  if (data.type === 'response.output_image.delta') {
    const frameData = data.delta; // base64-encoded JPEG
    // Display video frame
    const img = document.createElement('img');
    img.src = `data:image/jpeg;base64,${frameData}`;
    videoContainer.appendChild(img);
  }
  
  // Video complete
  if (data.type === 'response.output_image.done') {
    console.log(`Received ${data.total_frames} video frames`);
  }
};

WebRTC Migration

Step 1: Update API Endpoint

Change the WebRTC calls endpoint from OpenAI to rtAV:

// OpenAI (Before)
const response = await fetch('https://api.openai.com/v1/realtime/calls', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${OPENAI_API_KEY}`
  },
  body: formData
});

// rtAV (After) - Only URL and API key changed!
const response = await fetch('https://api.rtav.io/v1/realtime/calls', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${RTAV_API_KEY}`
  },
  body: formData
});

Step 2: Session Configuration

WebRTC session configuration is identical, with optional video options:

// OpenAI (Before)
const sessionConfig = {
  type: 'realtime',
  model: 'gpt-realtime',
  instructions: 'You are a helpful assistant.',
  voice: 'alloy',
  modalities: ['audio', 'text']
};

formData.append('session', JSON.stringify(sessionConfig));

// rtAV (After) - Same structure, with optional video options
const sessionConfig = {
  type: 'realtime',
  model: 'gpt-5.2',
  instructions: 'You are a helpful assistant.',
  voice: 'default',  // or your rtAV voice ID
  modalities: ['audio', 'text', 'image'], // Add 'image' for video
  face: 'default',    // RTAV face ID (optional)
  driving: 'default' // RTAV driving motion (optional)
};

formData.append('session', JSON.stringify(sessionConfig));

Step 3: Handle Video Frames (Optional)

Video frames are sent via the WebRTC data channel:

dataChannel.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  // Existing OpenAI events (work the same)
  if (data.type === 'response.output_audio.delta') {
    // Audio handled via RTP stream (same as OpenAI)
  }
  
  if (data.type === 'response.output_text.delta') {
    // Handle text chunk (same as OpenAI)
    const textChunk = data.delta;
    // Display text...
  }
  
  // NEW: Handle video frames (rtAV only)
  if (data.type === 'response.output_image.delta') {
    const frameData = data.delta; // base64-encoded JPEG
    // Display video frame
    const img = document.createElement('img');
    img.src = `data:image/jpeg;base64,${frameData}`;
    videoContainer.appendChild(img);
  }
  
  // Video complete
  if (data.type === 'response.output_image.done') {
    console.log(`Received ${data.total_frames} video frames`);
  }
};

Event Compatibility

rtAV supports all OpenAI Realtime API events, plus additional video events:

Event TypeOpenAIrtAVNotes
session.updatertAV adds video options
session.createdIdentical
session.updatedIdentical
conversation.item.createIdentical
response.createIdentical
response.output_audio.deltaIdentical
response.output_text.deltaIdentical
response.doneIdentical
response.output_image.deltartAV only - Video frames
response.output_image.donertAV only - Video complete

Key Differences

Connection URLs

OpenAI:

wss://api.openai.com/v1/realtime?model=gpt-realtimehttps://api.openai.com/v1/realtime/calls

rtAV:

wss://api.rtav.io/v1/realtime?model=gpt-5.2https://api.rtav.io/v1/realtime/calls

Model Names

rtAV uses different model identifiers. Common mappings:

  • gpt-realtimegpt-5.2
  • gpt-4o-realtime-previewgpt-5.2

Video Output (rtAV Only)

rtAV adds video avatar output that OpenAI doesn't support:

  • response.output_image.delta - Receive video frame chunks (base64 JPEG)
  • response.output_image.done - Video generation complete
  • Session config: face, driving options

Complete Example: WebSocket

// Complete WebSocket migration example
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2');

ws.onopen = () => {
  // Send auth (browser) or use Authorization header (Node.js/Python)
  ws.send(JSON.stringify({
    type: 'auth',
    api_key: 'rtav_ak_your_api_key_here'
  }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  
  if (data.type === 'session.created') {
    // Configure session
    ws.send(JSON.stringify({
      type: 'session.update',
      session: {
        type: 'realtime',
        instructions: 'You are a helpful assistant.',
        audio: {'output': {'voice': 'alloy'}},
        model: 'gpt-5.2',
        // Optional: Add video avatar
        face: 'your-face-id',
        driving: 'IdleListeningEncouraging'
      }
    }));
  }
  
  if (data.type === 'session.updated') {
    // Send a message
    ws.send(JSON.stringify({
      type: 'conversation.item.create',
      item: {
        type: 'message',
        role: 'user',
        content: [{ type: 'input_text', text: 'Hello!' }]
      }
    }));
    
    // Trigger response
    ws.send(JSON.stringify({
      type: 'response.create'
    }));
  }
  
  // Handle audio (same as OpenAI)
  if (data.type === 'response.output_audio.delta') {
    const audioChunk = data.delta;
    // Play audio...
  }
  
  // Handle text (same as OpenAI)
  if (data.type === 'response.output_text.delta') {
    const textChunk = data.delta;
    // Display text...
  }
  
  // NEW: Handle video frames (rtAV only)
  if (data.type === 'response.output_image.delta') {
    const frameData = data.delta;
    const img = document.createElement('img');
    img.src = `data:image/jpeg;base64,${frameData}`;
    videoContainer.appendChild(img);
  }
  
  if (data.type === 'response.done') {
    console.log('Response complete');
  }
};

Troubleshooting

Common Issues

Browser WebSocket Authentication

Browsers cannot set custom headers on WebSocket connections. Use the auth message method:

ws.send(JSON.stringify({ type: 'auth', api_key: 'your_key' }))

Session Configuration Format

OpenAI GA API uses nested audio.output.voice structure. rtAV supports both nested and flat formats for compatibility.

Video Not Displaying

Ensure modalities includes 'image'and valid face and voice IDs are provided.

Next Steps