Create Call

Create a WebRTC call for low-latency audio and video streaming. This endpoint implements the OpenAI Realtime API GA Calls API.

Endpoint

POST /v1/realtime/calls

Authentication

This endpoint accepts both API keys and ephemeral keys (client secrets). Ephemeral keys are recommended for WebRTC calls.

Authorization: Bearer ek_68af296e8e408191a1120ab6383263c2
// or
Authorization: Bearer rtav_ak_your_api_key_here

To get an ephemeral key, call POST /v1/realtime/client_secrets first.

Request

Send multipart form data with SDP offer and session configuration:

Form Fields

sdp - SDP offer from RTCPeerConnection (string)
session - Session configuration (JSON string)

Session Configuration

{
  "type": "realtime",
  "model": "gpt-5.2",                    // LLM model (default: "gpt-5.2")
  "instructions": "You are a helpful assistant.",  // System instructions
  "voice": "default",                  // Voice identifier
  "face": "face1",                     // Face asset ID (optional)
  "driving": "idle"                    // Driving motion asset ID (optional)
}

Response

Returns SDP answer as plain text (Content-Type: text/plain):

v=0
o=- 1234567890 1234567890 IN IP4 0.0.0.0
s=-
t=0 0
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10;useinbandfec=1
m=video 9 UDP/TLS/RTP/SAVPF 96
a=rtpmap:96 H264/90000
...

Complete Example

// Browser JavaScript
// 1. Get ephemeral key
const tokenResponse = await fetch('https://api.rtav.io/v1/realtime/client_secrets', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    session: {
      type: 'realtime'
    }
  })
});
const { client_secret } = await tokenResponse.json();

// 2. Create peer connection and get user media
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

const localStream = await navigator.mediaDevices.getUserMedia({ 
  audio: true, 
  video: false 
});
localStream.getAudioTracks().forEach(track => {
  pc.addTrack(track, localStream);
});

// 3. Create data channel
const dataChannel = pc.createDataChannel('realtime', { ordered: true });

// 4. Create offer
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);

// 5. Send to API
const formData = new FormData();
formData.append('sdp', offer.sdp);
formData.append('session', JSON.stringify({
  type: 'realtime',
  model: 'gpt-5.2',
  instructions: 'You are a helpful assistant.',
  voice: 'default',
  face: 'face1',
  driving: 'idle'
}));

const callResponse = await fetch('https://api.rtav.io/v1/realtime/calls', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${client_secret}`
  },
  body: formData
});

// 6. Get SDP answer and set remote description
const answerSdp = await callResponse.text();
await pc.setRemoteDescription({ type: 'answer', sdp: answerSdp });

// 7. Handle data channel and tracks
dataChannel.onmessage = (event) => {
  const data = JSON.parse(event.data);
  console.log('Event:', data.type);
};

pc.ontrack = (event) => {
  const [remoteStream] = event.streams;
  const audioElement = document.getElementById('audio');
  audioElement.srcObject = remoteStream;
};

Error Responses

Common error responses:

401 Unauthorized - Invalid or missing authentication
400 Bad Request - Invalid SDP offer or session configuration
503 Service Unavailable - No workers available

Notes

The session is automatically created when the call is established
Worker allocation happens when the WebRTC connection is established
Audio is streamed via RTP (Opus codec)
Video is streamed via RTP (H.264 or VP8 codec)
Control events are sent via the WebRTC data channel
The session is automatically ended when the WebRTC connection closes