Migrating from OpenAI Realtime API
rtAV is a drop-in replacement for OpenAI's Realtime API, with identical protocol support plus real-time video avatar output. This guide shows you how to migrate your existing OpenAI Realtime API code to rtAV with minimal changes.
Why Migrate to rtAV?
- Full Compatibility: Identical protocol to OpenAI Realtime API - your existing code works as-is
- Video Output: Real-time video avatars alongside audio responses (OpenAI doesn't support video)
- Cost-Effective: Transparent pricing at $6/hour with no per-token charges
- Model Flexibility: Use any LLM model, not just OpenAI models
- Self-Hosted Option: Deploy your own workers for complete control
Quick Migration Checklist
- Update API endpoint URL (change
api.openai.comtoapi.rtav.io) - Replace OpenAI API key with rtAV API key
- Update model name (e.g.,
gpt-realtime→gpt-5.2) - Optional: Add video output handling for avatar frames
- Optional: Configure video avatar (face, voice, driving behavior)
That's it! Your existing OpenAI Realtime API code should work with rtAV with these minimal changes.
WebSocket Migration
Step 1: Update Connection URL
Change the WebSocket URL from OpenAI to rtAV:
// OpenAI (Before)
const ws = new WebSocket('wss://api.openai.com/v1/realtime?model=gpt-realtime', {
headers: {
'Authorization': `Bearer ${OPENAI_API_KEY}`
}
});
// rtAV (After) - Only URL and API key changed!
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2', {
headers: {
'Authorization': `Bearer ${RTAV_API_KEY}`
}
});Step 2: Browser Authentication
Browsers cannot set custom headers on WebSocket connections. For browser-based clients, send authentication as the first message:
// Browser JavaScript - rtAV supports auth message
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2');
ws.onopen = () => {
// Send auth as first message (browser WebSocket can't set Authorization header)
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key_here'
}));
};Step 3: Session Configuration
Session configuration is nearly identical. rtAV adds optional video-specific options:
// OpenAI (Before)
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
instructions: 'You are a helpful assistant.',
audio: {'output': {'voice': 'alloy'}},
model: 'gpt-realtime'
}
}));
// rtAV (After) - Same structure, with optional video options
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
instructions: 'You are a helpful assistant.',
audio: {'output': {'voice': 'alloy'}},
model: 'gpt-5.2',
// Optional: Add video avatar configuration
face: 'your-face-id', // RTAV face ID (optional)
driving: 'IdleListeningEncouraging' // Avatar behavior (optional)
}
}));Step 4: Handle Video Output (Optional)
rtAV adds video frame events that OpenAI doesn't have. Add this to your message handler:
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Existing OpenAI events (work the same)
if (data.type === 'response.output_audio.delta') {
// Handle audio chunk (same as OpenAI)
const audioChunk = data.delta;
// Play audio...
}
if (data.type === 'response.output_text.delta') {
// Handle text chunk (same as OpenAI)
const textChunk = data.delta;
// Display text...
}
// NEW: Handle video frames (rtAV only)
if (data.type === 'response.output_image.delta') {
const frameData = data.delta; // base64-encoded JPEG
// Display video frame
const img = document.createElement('img');
img.src = `data:image/jpeg;base64,${frameData}`;
videoContainer.appendChild(img);
}
// Video complete
if (data.type === 'response.output_image.done') {
console.log(`Received ${data.total_frames} video frames`);
}
};WebRTC Migration
Step 1: Update API Endpoint
Change the WebRTC calls endpoint from OpenAI to rtAV:
// OpenAI (Before)
const response = await fetch('https://api.openai.com/v1/realtime/calls', {
method: 'POST',
headers: {
'Authorization': `Bearer ${OPENAI_API_KEY}`
},
body: formData
});
// rtAV (After) - Only URL and API key changed!
const response = await fetch('https://api.rtav.io/v1/realtime/calls', {
method: 'POST',
headers: {
'Authorization': `Bearer ${RTAV_API_KEY}`
},
body: formData
});Step 2: Session Configuration
WebRTC session configuration is identical, with optional video options:
// OpenAI (Before)
const sessionConfig = {
type: 'realtime',
model: 'gpt-realtime',
instructions: 'You are a helpful assistant.',
voice: 'alloy',
modalities: ['audio', 'text']
};
formData.append('session', JSON.stringify(sessionConfig));
// rtAV (After) - Same structure, with optional video options
const sessionConfig = {
type: 'realtime',
model: 'gpt-5.2',
instructions: 'You are a helpful assistant.',
voice: 'default', // or your rtAV voice ID
modalities: ['audio', 'text', 'image'], // Add 'image' for video
face: 'default', // RTAV face ID (optional)
driving: 'default' // RTAV driving motion (optional)
};
formData.append('session', JSON.stringify(sessionConfig));Step 3: Handle Video Frames (Optional)
Video frames are sent via the WebRTC data channel:
dataChannel.onmessage = (event) => {
const data = JSON.parse(event.data);
// Existing OpenAI events (work the same)
if (data.type === 'response.output_audio.delta') {
// Audio handled via RTP stream (same as OpenAI)
}
if (data.type === 'response.output_text.delta') {
// Handle text chunk (same as OpenAI)
const textChunk = data.delta;
// Display text...
}
// NEW: Handle video frames (rtAV only)
if (data.type === 'response.output_image.delta') {
const frameData = data.delta; // base64-encoded JPEG
// Display video frame
const img = document.createElement('img');
img.src = `data:image/jpeg;base64,${frameData}`;
videoContainer.appendChild(img);
}
// Video complete
if (data.type === 'response.output_image.done') {
console.log(`Received ${data.total_frames} video frames`);
}
};Event Compatibility
rtAV supports all OpenAI Realtime API events, plus additional video events:
| Event Type | OpenAI | rtAV | Notes |
|---|---|---|---|
session.update | ✅ | ✅ | rtAV adds video options |
session.created | ✅ | ✅ | Identical |
session.updated | ✅ | ✅ | Identical |
conversation.item.create | ✅ | ✅ | Identical |
response.create | ✅ | ✅ | Identical |
response.output_audio.delta | ✅ | ✅ | Identical |
response.output_text.delta | ✅ | ✅ | Identical |
response.done | ✅ | ✅ | Identical |
response.output_image.delta | ❌ | ✅ | rtAV only - Video frames |
response.output_image.done | ❌ | ✅ | rtAV only - Video complete |
Key Differences
Connection URLs
OpenAI:
wss://api.openai.com/v1/realtime?model=gpt-realtimehttps://api.openai.com/v1/realtime/callsrtAV:
wss://api.rtav.io/v1/realtime?model=gpt-5.2https://api.rtav.io/v1/realtime/callsModel Names
rtAV uses different model identifiers. Common mappings:
gpt-realtime→gpt-5.2gpt-4o-realtime-preview→gpt-5.2
Video Output (rtAV Only)
rtAV adds video avatar output that OpenAI doesn't support:
response.output_image.delta- Receive video frame chunks (base64 JPEG)response.output_image.done- Video generation complete- Session config:
face,drivingoptions
Complete Example: WebSocket
// Complete WebSocket migration example
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2');
ws.onopen = () => {
// Send auth (browser) or use Authorization header (Node.js/Python)
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key_here'
}));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'session.created') {
// Configure session
ws.send(JSON.stringify({
type: 'session.update',
session: {
type: 'realtime',
instructions: 'You are a helpful assistant.',
audio: {'output': {'voice': 'alloy'}},
model: 'gpt-5.2',
// Optional: Add video avatar
face: 'your-face-id',
driving: 'IdleListeningEncouraging'
}
}));
}
if (data.type === 'session.updated') {
// Send a message
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));
// Trigger response
ws.send(JSON.stringify({
type: 'response.create'
}));
}
// Handle audio (same as OpenAI)
if (data.type === 'response.output_audio.delta') {
const audioChunk = data.delta;
// Play audio...
}
// Handle text (same as OpenAI)
if (data.type === 'response.output_text.delta') {
const textChunk = data.delta;
// Display text...
}
// NEW: Handle video frames (rtAV only)
if (data.type === 'response.output_image.delta') {
const frameData = data.delta;
const img = document.createElement('img');
img.src = `data:image/jpeg;base64,${frameData}`;
videoContainer.appendChild(img);
}
if (data.type === 'response.done') {
console.log('Response complete');
}
};Troubleshooting
Common Issues
Browser WebSocket Authentication
Browsers cannot set custom headers on WebSocket connections. Use the auth message method:
ws.send(JSON.stringify({ type: 'auth', api_key: 'your_key' }))Session Configuration Format
OpenAI GA API uses nested audio.output.voice structure. rtAV supports both nested and flat formats for compatibility.
Video Not Displaying
Ensure modalities includes 'image'and valid face and voice IDs are provided.
Next Steps
- Get your rtAV API key from the dashboard
- Check out the WebSocket guide for detailed examples
- See the WebRTC guide for low-latency streaming
- Explore available faces and voices