Realtime WebRTC

WebRTC provides the lowest latency transport for realtime audio and video streaming with rtAV. It's ideal for browser-based applications that require low latency and direct peer-to-peer communication.

Overview

WebRTC uses RTP (Real-time Transport Protocol) for audio/video streaming and a data channel for control events. This separation allows for optimal latency and bandwidth usage.

  • Endpoint: POST https://api.rtav.io/v1/realtime/calls
  • Audio Codec: Opus (via RTP)
  • Video Codec: H.264 or VP8 (via RTP)
  • Control Channel: WebRTC Data Channel (JSON events)
  • Latency: Low latency

Getting Started

Step 1: Get Ephemeral Key (Client Secret)

First, obtain an ephemeral key (client secret) that will be used for the WebRTC call:

// Works in both Browser and Node.js (18+)
const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    session: {
      type: 'realtime'
    }
  })
});

const { client_secret } = await tokenResponse.json();
// client_secret format: rtav_sk_...

Step 2: Create WebRTC Peer Connection

Set up the peer connection and get user media:

// Browser JavaScript
// Create peer connection
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

// Get user media (microphone)
const localStream = await navigator.mediaDevices.getUserMedia({ 
  audio: true, 
  video: false 
});

// Add audio track to peer connection
localStream.getAudioTracks().forEach(track => {
  pc.addTrack(track, localStream);
});

// Create data channel for control events
const dataChannel = pc.createDataChannel('realtime', {
  ordered: true
});

Step 3: Create SDP Offer and Send to API

Create an offer and send it to the API server:

// Browser JavaScript
// Create offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

// Send to API
const formData = new FormData();
formData.append('sdp', offer.sdp);
formData.append('session', JSON.stringify({
  type: 'realtime',
  model: 'gpt-5.2',
  instructions: 'You are a helpful assistant.',
  voice: 'default',
  face: 'face1',
  driving: 'idle'
}));

const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${client_secret}`
  },
  body: formData
});

// Get SDP answer
const answerSdp = await callResponse.text();
await pc.setRemoteDescription({ 
  type: 'answer', 
  sdp: answerSdp 
});

Handling Events

Data Channel Messages

Control events are sent via the data channel:

// Browser JavaScript
dataChannel.onopen = () => {
  console.log('Data channel opened');
  
  // Send text message
  dataChannel.send(JSON.stringify({
    type: 'conversation.item.create',
    item: {
      type: 'message',
      role: 'user',
      content: [{ type: 'input_text', text: 'Hello!' }]
    }
  }));
};

dataChannel.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Received event:', data.type);
  
  // Handle different event types
  if (data.type === 'response.created') {
    console.log('Response started:', data.response.id);
  }
  
  if (data.type === 'response.done') {
    console.log('Response completed');
  }
  
  // Handle transcripts
  if (data.type === 'conversation.item.input_audio_transcription.completed') {
    console.log('User said:', data.transcript);
  }
  
  if (data.type === 'response.output_audio_transcript.delta') {
    console.log('Assistant said:', data.delta);
  }
};

Audio/Video Tracks

Audio and video are received as RTP tracks:

// Browser JavaScript
pc.ontrack = (event) => {
  const [remoteStream] = event.streams;
  
  // Play remote audio
  const audioElement = document.getElementById('remoteAudio');
  audioElement.srcObject = remoteStream;
  
  // Display remote video (if available)
  const videoElement = document.getElementById('remoteVideo');
  videoElement.srcObject = remoteStream;
};

// Handle ICE candidates
pc.onicecandidate = (event) => {
  if (event.candidate) {
    // ICE candidates are handled automatically by the browser
    // No need to send them to the server
  }
};

// Handle connection state
pc.onconnectionstatechange = () => {
  console.log('Connection state:', pc.connectionState);
  
  if (pc.connectionState === 'failed') {
    console.error('WebRTC connection failed');
  }
  
  if (pc.connectionState === 'closed') {
    console.log('WebRTC connection closed');
  }
};

Sending Messages

Text Messages

Send text messages via the data channel:

// Browser JavaScript
// Closed loop (triggers LLM)
dataChannel.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'user',
    content: [{ type: 'input_text', text: 'Hello, how are you?' }]
  }
}));

// Open loop (direct to worker)
dataChannel.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'assistant',
    content: [{ type: 'input_text', text: 'Hello!' }]
  }
}));

Audio Input

Audio is automatically captured from the microphone track and sent via RTP. No manual audio sending needed.

Session Configuration

Update session settings via the data channel during an active session:

// Browser JavaScript
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    model: 'gpt-5.2',
    instructions: 'You are a helpful coding assistant.',
    voice: 'default',
    face: 'face1',
    driving: 'idle'  // Single motion
  }
}));

Driving Motion Sequences

You can send a sequence of driving motions as an array. The avatar will transition smoothly between motions, playing intermediate motions once, and looping the final motion:

// Browser JavaScript
// Single motion (permanent change)
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: 'Wink'  // Transitions to Wink and loops
  }
}));

// Sequence with intermediate motion
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: ['AgreeYesTotaly', 'default']
    // Plays AgreeYesTotaly once, then transitions to default and loops
  }
}));

// Repeating intermediate motion
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: ['Wink', 'Wink', 'Wink', 'default']
    // Plays Wink 3 times, then transitions to default and loops
  }
}));

See the Assets Guide for more details on driving motion sequences.

Complete Example

// Browser JavaScript
async function connectWebRTC() {
  // 1. Get ephemeral key
  const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer rtav_ak_your_api_key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      session: {
        type: 'realtime'
      }
    })
  });
  const { client_secret } = await tokenResponse.json();

  // 2. Get user media
  const localStream = await navigator.mediaDevices.getUserMedia({ 
    audio: true, 
    video: false 
  });

  // 3. Create peer connection
  const pc = new RTCPeerConnection({
    iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
  });

  // Add local tracks
  localStream.getAudioTracks().forEach(track => {
    pc.addTrack(track, localStream);
  });

  // 4. Create data channel
  const dataChannel = pc.createDataChannel('realtime', { ordered: true });

  dataChannel.onopen = () => {
    console.log('Data channel opened');
    
    // Send text message
    dataChannel.send(JSON.stringify({
      type: 'conversation.item.create',
      item: {
        type: 'message',
        role: 'user',
        content: [{ type: 'input_text', text: 'Hello!' }]
      }
    }));
  };

  dataChannel.onmessage = (event) => {
    const data = JSON.parse(event.data);
    console.log('Event:', data.type);
    
    if (data.type === 'conversation.item.input_audio_transcription.completed') {
      console.log('User:', data.transcript);
    }
    if (data.type === 'response.output_audio_transcript.delta') {
      console.log('Assistant:', data.delta);
    }
  };

  // 5. Handle remote tracks
  pc.ontrack = (event) => {
    const [remoteStream] = event.streams;
    const audioElement = document.getElementById('remoteAudio');
    audioElement.srcObject = remoteStream;
  };

  // 6. Create offer and send to API
  const offer = await pc.createOffer();
  await pc.setLocalDescription(offer);

  const formData = new FormData();
  formData.append('sdp', offer.sdp);
  formData.append('session', JSON.stringify({
    type: 'realtime',
    model: 'gpt-5.2',
    instructions: 'You are a helpful assistant.',
    voice: 'default'
  }));

  const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${client_secret}`
    },
    body: formData
  });

  const answerSdp = await callResponse.text();
  await pc.setRemoteDescription({ type: 'answer', sdp: answerSdp });

  return { pc, dataChannel, localStream };
}

// Cleanup
function disconnect(pc, dataChannel, localStream) {
  if (dataChannel) dataChannel.close();
  if (localStream) localStream.getTracks().forEach(track => track.stop());
  if (pc) pc.close();
}

Advantages of WebRTC

  • Lower Latency: Direct RTP streaming reduces latency compared to WebSocket base64 encoding
  • Better Bandwidth: Binary RTP is more efficient than base64-encoded audio/video
  • Native Browser Support: Uses browser's built-in WebRTC implementation
  • Automatic Codec Handling: Browser handles Opus encoding/decoding automatically

Next Steps