Create Call

Create a WebRTC call for low-latency audio and video streaming. This endpoint implements the OpenAI Realtime API GA Calls API.

Endpoint

POST /v1/realtime/calls

Authentication

This endpoint accepts both API keys and ephemeral keys (client secrets). Ephemeral keys are recommended for WebRTC calls.

Authorization: Bearer ek_68af296e8e408191a1120ab6383263c2
// or
Authorization: Bearer rtav_ak_your_api_key_here

To get an ephemeral key, call POST /v1/realtime/client_secrets first.

Request

Send multipart form data with SDP offer and session configuration:

Form Fields

  • sdp - SDP offer from RTCPeerConnection (string)
  • session - Session configuration (JSON string)

Session Configuration

{
  "type": "realtime",
  "model": "gpt-5.2",                    // LLM model (default: "gpt-5.2")
  "instructions": "You are a helpful assistant.",  // System instructions
  "voice": "default",                  // Voice identifier
  "face": "face1",                     // Face asset ID (optional)
  "driving": "idle"                    // Driving motion asset ID (optional)
}

Response

Returns SDP answer as plain text (Content-Type: text/plain):

v=0
o=- 1234567890 1234567890 IN IP4 0.0.0.0
s=-
t=0 0
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
m=video 9 UDP/TLS/RTP/SAVPF 96
a=rtpmap:96 H264/90000
...

Complete Example

// Browser JavaScript
// 1. Get ephemeral key
const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    session: {
      type: 'realtime'
    }
  })
});
const { client_secret } = await tokenResponse.json();

// 2. Create peer connection and get user media
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

const localStream = await navigator.mediaDevices.getUserMedia({ 
  audio: true, 
  video: false 
});
localStream.getAudioTracks().forEach(track => {
  pc.addTrack(track, localStream);
});

// 3. Create data channel
const dataChannel = pc.createDataChannel('realtime', { ordered: true });

// 4. Create offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

// 5. Send to API
const formData = new FormData();
formData.append('sdp', offer.sdp);
formData.append('session', JSON.stringify({
  type: 'realtime',
  model: 'gpt-5.2',
  instructions: 'You are a helpful assistant.',
  voice: 'default',
  face: 'face1',
  driving: 'idle'
}));

const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${client_secret}`
  },
  body: formData
});

// 6. Get SDP answer and set remote description
const answerSdp = await callResponse.text();
await pc.setRemoteDescription({ type: 'answer', sdp: answerSdp });

// 7. Handle data channel and tracks
dataChannel.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Event:', data.type);
};

pc.ontrack = (event) => {
  const [remoteStream] = event.streams;
  const audioElement = document.getElementById('audio');
  audioElement.srcObject = remoteStream;
};

Error Responses

Common error responses:

  • 401 Unauthorized - Invalid or missing authentication
  • 400 Bad Request - Invalid SDP offer or session configuration
  • 503 Service Unavailable - No workers available

Notes

  • The session is automatically created when the call is established
  • Worker allocation happens when the WebRTC connection is established
  • Audio is streamed via RTP (Opus codec)
  • Video is streamed via RTP (H.264 or VP8 codec)
  • Control events are sent via the WebRTC data channel
  • The session is automatically ended when the WebRTC connection closes