Realtime WebRTC
WebRTC provides the lowest latency transport for realtime audio and video streaming with rtAV. It's ideal for browser-based applications that require low latency and direct peer-to-peer communication.
Overview
WebRTC uses RTP (Real-time Transport Protocol) for audio/video streaming and a data channel for control events. This separation allows for optimal latency and bandwidth usage.
- Endpoint:
POST https://api.rtav.io/v1/realtime/calls - Audio Codec: Opus (via RTP)
- Video Codec: H.264 or VP8 (via RTP)
- Control Channel: WebRTC Data Channel (JSON events)
- Latency: Low latency
Getting Started
Step 1: Get Ephemeral Key (Client Secret)
First, obtain an ephemeral key (client secret) that will be used for the WebRTC call:
// Works in both Browser and Node.js (18+)
const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
method: 'POST',
headers: {
'Authorization': 'Bearer rtav_ak_your_api_key_here',
'Content-Type': 'application/json'
},
body: JSON.stringify({
session: {
type: 'realtime'
}
})
});
const { client_secret } = await tokenResponse.json();
// client_secret format: rtav_sk_...Step 2: Create WebRTC Peer Connection
Set up the peer connection and get user media:
// Browser JavaScript
// Create peer connection
const pc = new RTCPeerConnection({
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});
// Get user media (microphone)
const localStream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false
});
// Add audio track to peer connection
localStream.getAudioTracks().forEach(track => {
pc.addTrack(track, localStream);
});
// Create data channel for control events
const dataChannel = pc.createDataChannel('realtime', {
ordered: true
});Step 3: Create SDP Offer and Send to API
Create an offer and send it to the API server:
// Browser JavaScript
// Create offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// Send to API
const formData = new FormData();
formData.append('sdp', offer.sdp);
formData.append('session', JSON.stringify({
type: 'realtime',
model: 'gpt-5.2',
instructions: 'You are a helpful assistant.',
voice: 'default',
face: 'face1',
driving: 'idle'
}));
const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
method: 'POST',
headers: {
'Authorization': `Bearer ${client_secret}`
},
body: formData
});
// Get SDP answer
const answerSdp = await callResponse.text();
await pc.setRemoteDescription({
type: 'answer',
sdp: answerSdp
});Handling Events
Data Channel Messages
Control events are sent via the data channel:
// Browser JavaScript
dataChannel.onopen = () => {
console.log('Data channel opened');
// Send text message
dataChannel.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));
};
dataChannel.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Received event:', data.type);
// Handle different event types
if (data.type === 'response.created') {
console.log('Response started:', data.response.id);
}
if (data.type === 'response.done') {
console.log('Response completed');
}
// Handle transcripts
if (data.type === 'conversation.item.input_audio_transcription.completed') {
console.log('User said:', data.transcript);
}
if (data.type === 'response.output_audio_transcript.delta') {
console.log('Assistant said:', data.delta);
}
};Audio/Video Tracks
Audio and video are received as RTP tracks:
// Browser JavaScript
pc.ontrack = (event) => {
const [remoteStream] = event.streams;
// Play remote audio
const audioElement = document.getElementById('remoteAudio');
audioElement.srcObject = remoteStream;
// Display remote video (if available)
const videoElement = document.getElementById('remoteVideo');
videoElement.srcObject = remoteStream;
};
// Handle ICE candidates
pc.onicecandidate = (event) => {
if (event.candidate) {
// ICE candidates are handled automatically by the browser
// No need to send them to the server
}
};
// Handle connection state
pc.onconnectionstatechange = () => {
console.log('Connection state:', pc.connectionState);
if (pc.connectionState === 'failed') {
console.error('WebRTC connection failed');
}
if (pc.connectionState === 'closed') {
console.log('WebRTC connection closed');
}
};Sending Messages
Text Messages
Send text messages via the data channel:
// Browser JavaScript
// Closed loop (triggers LLM)
dataChannel.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello, how are you?' }]
}
}));
// Open loop (direct to worker)
dataChannel.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'assistant',
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));Audio Input
Audio is automatically captured from the microphone track and sent via RTP. No manual audio sending needed.
Session Configuration
Update session settings via the data channel during an active session:
// Browser JavaScript
dataChannel.send(JSON.stringify({
type: 'session.update',
session: {
model: 'gpt-5.2',
instructions: 'You are a helpful coding assistant.',
voice: 'default',
face: 'face1',
driving: 'idle' // Single motion
}
}));Driving Motion Sequences
You can send a sequence of driving motions as an array. The avatar will transition smoothly between motions, playing intermediate motions once, and looping the final motion:
// Browser JavaScript
// Single motion (permanent change)
dataChannel.send(JSON.stringify({
type: 'session.update',
session: {
driving: 'Wink' // Transitions to Wink and loops
}
}));
// Sequence with intermediate motion
dataChannel.send(JSON.stringify({
type: 'session.update',
session: {
driving: ['AgreeYesTotaly', 'default']
// Plays AgreeYesTotaly once, then transitions to default and loops
}
}));
// Repeating intermediate motion
dataChannel.send(JSON.stringify({
type: 'session.update',
session: {
driving: ['Wink', 'Wink', 'Wink', 'default']
// Plays Wink 3 times, then transitions to default and loops
}
}));See the Assets Guide for more details on driving motion sequences.
Complete Example
// Browser JavaScript
async function connectWebRTC() {
// 1. Get ephemeral key
const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
method: 'POST',
headers: {
'Authorization': 'Bearer rtav_ak_your_api_key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
session: {
type: 'realtime'
}
})
});
const { client_secret } = await tokenResponse.json();
// 2. Get user media
const localStream = await navigator.mediaDevices.getUserMedia({
audio: true,
video: false
});
// 3. Create peer connection
const pc = new RTCPeerConnection({
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});
// Add local tracks
localStream.getAudioTracks().forEach(track => {
pc.addTrack(track, localStream);
});
// 4. Create data channel
const dataChannel = pc.createDataChannel('realtime', { ordered: true });
dataChannel.onopen = () => {
console.log('Data channel opened');
// Send text message
dataChannel.send(JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Hello!' }]
}
}));
};
dataChannel.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log('Event:', data.type);
if (data.type === 'conversation.item.input_audio_transcription.completed') {
console.log('User:', data.transcript);
}
if (data.type === 'response.output_audio_transcript.delta') {
console.log('Assistant:', data.delta);
}
};
// 5. Handle remote tracks
pc.ontrack = (event) => {
const [remoteStream] = event.streams;
const audioElement = document.getElementById('remoteAudio');
audioElement.srcObject = remoteStream;
};
// 6. Create offer and send to API
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const formData = new FormData();
formData.append('sdp', offer.sdp);
formData.append('session', JSON.stringify({
type: 'realtime',
model: 'gpt-5.2',
instructions: 'You are a helpful assistant.',
voice: 'default'
}));
const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
method: 'POST',
headers: {
'Authorization': `Bearer ${client_secret}`
},
body: formData
});
const answerSdp = await callResponse.text();
await pc.setRemoteDescription({ type: 'answer', sdp: answerSdp });
return { pc, dataChannel, localStream };
}
// Cleanup
function disconnect(pc, dataChannel, localStream) {
if (dataChannel) dataChannel.close();
if (localStream) localStream.getTracks().forEach(track => track.stop());
if (pc) pc.close();
}Advantages of WebRTC
- Lower Latency: Direct RTP streaming reduces latency compared to WebSocket base64 encoding
- Better Bandwidth: Binary RTP is more efficient than base64-encoded audio/video
- Native Browser Support: Uses browser's built-in WebRTC implementation
- Automatic Codec Handling: Browser handles Opus encoding/decoding automatically
Next Steps
- Try the interactive WebRTC Demo
- Read the Create Call API Reference
- Learn about Models & Prompting
- Compare with WebSocket for your use case