Realtime WebSocket
WebSocket provides a reliable transport for realtime communication with rtAV. It's ideal for server-to-server integrations and web applications that don't require the lowest latency.
Overview
The WebSocket endpoint implements the OpenAI Realtime API protocol, making it compatible with existing OpenAI Realtime SDKs. All communication happens over a single WebSocket connection, with audio and video data encoded as base64 strings.
- Endpoint:
wss://api.rtav.io/v1/realtime - Protocol: OpenAI Realtime API compatible
- Audio Format: Base64-encoded PCM audio
- Video Format: Base64-encoded JPEG frames
Connection Methods
Method 1: Auto-Create Session (OpenAI-Compatible)
Connect with a model parameter to automatically create a session:
// Browser JavaScript
const ws = new WebSocket('wss://api.rtav.io/v1/realtime?model=gpt-5.2');
ws.onopen = () => {
// Send auth message first
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key_here'
}));
};Method 2: Connect to Existing Session
Create a session via REST API first, then connect with session_id:
// Browser JavaScript
// 1. Create session
const response = await fetch('https://api.rtav.io/v1/sessions', {
method: 'POST',
headers: {
'Authorization': 'Bearer rtav_ak_your_api_key_here',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-5.2',
voice: 'default',
instructions: 'You are a helpful assistant.'
})
});
const { id: sessionId } = await response.json();
// 2. Connect via WebSocket
const ws = new WebSocket(`wss://api.rtav.io/v1/realtime?session_id=${sessionId}`);
ws.onopen = () => {
// Send auth message
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key_here'
}));
};Authentication
Browser WebSocket API doesn't support custom headers, so authentication must be sent as the first message:
// Browser JavaScript
ws.onopen = () => {
// Send auth as first message
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key_here'
}));
};For server-side applications, you can also use the Authorization: Bearer header.
Sending Messages
Text Input (Closed Loop - Triggers LLM)
Send text messages that will be processed by the LLM:
// Browser JavaScript
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello, how are you?' }]
}
}));Text Input (Open Loop - Direct to Worker)
Send text directly to the worker without LLM processing:
// Browser JavaScript
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'assistant', // Use 'assistant' role for open loop
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));Audio Input
Send audio data as base64-encoded PCM:
// Browser JavaScript
// Append audio chunks
ws.send(JSON.stringify({
type: 'input_audio_buffer.append',
audio: base64EncodedPCMAudio
}));
// Commit audio buffer to trigger STT processing
ws.send(JSON.stringify({
type: 'input_audio_buffer.commit'
}));Receiving Messages
Audio Output
Receive audio as base64-encoded PCM deltas:
// Browser JavaScript
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response.output_audio.delta') {
const audioBase64 = data.delta;
// Decode base64 and play audio
const audioData = atob(audioBase64);
const audioBuffer = new ArrayBuffer(audioData.length);
const view = new Uint8Array(audioBuffer);
for (let i = 0; i < audioData.length; i++) {
view[i] = audioData.charCodeAt(i);
}
// Play using Web Audio API
}
};Video Output
Receive video frames as base64-encoded JPEG deltas:
// Browser JavaScript
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response.output_image.delta') {
const imageBase64 = data.delta;
// Decode and display image
const imageUrl = 'data:image/jpeg;base64,' + imageBase64;
document.getElementById('avatar').src = imageUrl;
}
};Transcripts
Receive input and output transcripts:
// Browser JavaScript
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Input audio transcription
if (data.type === 'conversation.item.input_audio_transcription.completed') {
const transcript = data.transcript;
console.log('User said:', transcript);
}
// Output audio transcription
if (data.type === 'response.output_audio_transcript.delta') {
const transcript = data.delta;
console.log('Assistant said:', transcript);
}
};Session Configuration
Update session settings dynamically during an active session:
// Browser JavaScript
ws.send(JSON.stringify({
type: 'session.update',
session: {
model: 'gpt-5.2',
instructions: 'You are a helpful coding assistant.',
voice: 'default',
face: 'face1',
driving: 'idle' // Single motion
}
}));Driving Motion Sequences
You can send a sequence of driving motions as an array. The avatar will transition smoothly between motions, playing intermediate motions once, and looping the final motion:
// Browser JavaScript
// Single motion (permanent change)
ws.send(JSON.stringify({
type: 'session.update',
session: {
driving: 'Wink' // Transitions to Wink and loops
}
}));
// Sequence with intermediate motion
ws.send(JSON.stringify({
type: 'session.update',
session: {
driving: ['AgreeYesTotaly', 'default']
// Plays AgreeYesTotaly once, then transitions to default and loops
}
}));
// Repeating intermediate motion
ws.send(JSON.stringify({
type: 'session.update',
session: {
driving: ['Wink', 'Wink', 'Wink', 'default']
// Plays Wink 3 times, then transitions to default and loops
}
}));See the Assets Guide for more details on driving motion sequences.
Complete Example
// Browser JavaScript
// Create session
const response = await fetch('https://api.rtav.io/v1/sessions', {
method: 'POST',
headers: {
'Authorization': 'Bearer rtav_ak_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'gpt-5.2',
voice: 'default',
instructions: 'You are a helpful assistant.'
})
});
const { id: sessionId } = await response.json();
// Connect WebSocket
const ws = new WebSocket(`wss://api.rtav.io/v1/realtime?session_id=${sessionId}`);
ws.onopen = () => {
// Authenticate
ws.send(JSON.stringify({
type: 'auth',
api_key: 'rtav_ak_your_api_key'
}));
// Send text message
setTimeout(() => {
ws.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));
}, 100);
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Event:', data.type);
// Handle audio
if (data.type === 'response.output_audio.delta') {
// Decode and play audio
}
// Handle video
if (data.type === 'response.output_image.delta') {
// Display video frame
document.getElementById('avatar').src =
'data:image/jpeg;base64,' + data.delta;
}
// Handle transcripts
if (data.type === 'conversation.item.input_audio_transcription.completed') {
console.log('User:', data.transcript);
}
if (data.type === 'response.output_audio_transcript.delta') {
console.log('Assistant:', data.delta);
}
};
ws.onerror = (error) => {
console.error('WebSocket error:', error);
};
ws.onclose = () => {
console.log('WebSocket closed');
};Next Steps
- Try the interactive WebSocket Demo
- Read the API Reference for complete event documentation
- Learn about Models & Prompting
- Explore WebRTC for lower latency