Models & Prompting

Learn about available LLM models and how to effectively prompt them for realtime conversations with your avatar.

Available Models

rtAV supports models from OpenAI and Anthropic. Check availability via the Models API.

OpenAI Models

  • gpt-5.2 - Latest GPT-5 model (recommended)
  • gpt-5.1 - GPT-5.1 model
  • gpt-5 - GPT-5 model (default)
  • gpt-5-mini - Smaller, faster GPT-5 model
  • gpt-5-nano - Smallest GPT-5 model
  • gpt-4.1 - GPT-4.1 model
  • gpt-4o - GPT-4 Optimized model

Anthropic Models

  • claude-sonnet-4-5 - Claude Sonnet 4.5 (balanced performance)
  • claude-haiku-4-5 - Claude Haiku 4.5 (fastest)
  • claude-opus-4-5 - Claude Opus 4.5 (most capable)

Open-Loop Mode (No LLM)

For use cases where you want to use STT (speech-to-text) and TTS+video without an LLM assistant, you can use open-loop mode. This allows you to process user speech and generate video directly from text input without LLM generation.

  • none - Disable LLM, enable STT and TTS+video
  • disable - Disable LLM, enable STT and TTS+video
  • open-loop - Disable LLM, enable STT and TTS+video

Note: In open-loop mode, you must send text directly to the worker using conversation.item.create with role: "assistant" or via Generate events. The LLM will not generate responses automatically.

Setting the Model

Set the model when creating a session:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5.2',  // Set your preferred model
    voice: 'default',
    instructions: 'You are a helpful assistant.'
  })
});

You can also update the model dynamically via session.update event.

System Instructions

System instructions define the avatar's personality, role, and behavior. They're sent as part of the session configuration:

{
  "instructions": "You are a friendly customer support agent. Be helpful, patient, and professional. Always greet customers warmly and ask how you can assist them today."
}

Best Practices

  • Be specific: Clearly define the avatar's role and personality
  • Set tone: Specify the desired communication style (formal, casual, friendly, etc.)
  • Define boundaries: Set limits on what the avatar can and cannot do
  • Include context: Provide relevant background information about the use case
  • Keep it concise: Long instructions may reduce response quality

Example Instructions

// Customer Support
"You are a helpful customer support agent. Answer questions clearly and concisely. If you don't know something, offer to connect the customer with a human agent."

// Educational Tutor
"You are a patient and encouraging tutor. Explain concepts step-by-step. Use examples to help students understand. Ask questions to check understanding."

// Sales Assistant
"You are a knowledgeable sales assistant. Help customers find the right products. Be enthusiastic but not pushy. Focus on understanding customer needs."

// Technical Support
"You are a technical support specialist. Help users troubleshoot issues systematically. Ask clarifying questions to understand the problem. Provide step-by-step solutions."

Updating Instructions

Update instructions dynamically during a session:

// Browser JavaScript - WebSocket
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'gpt-5.2',
    instructions: 'You are now a coding assistant. Help with programming questions.',
    voice: 'default'
  }
}));

// Browser JavaScript - WebRTC Data Channel
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'gpt-5.2',
    instructions: 'You are now a coding assistant. Help with programming questions.',
    voice: 'default'
  }
}));

Model Selection Guide

For General Conversations

  • gpt-5 - Best balance of quality and speed (recommended)
  • gpt-5-mini - Faster responses, slightly lower quality
  • claude-sonnet-4-5 - Good alternative with different strengths

For Complex Tasks

  • gpt-5.2 - Latest and most capable
  • claude-opus-4-5 - Excellent for complex reasoning

For Low Latency

  • gpt-5-nano - Fastest responses
  • gpt-5-mini - Fast with good quality
  • claude-haiku-4-5 - Very fast alternative

For Custom Text Processing (Open-Loop)

  • none / disable / open-loop - Use when you want to handle text processing yourself and only need STT transcription + TTS+video generation
  • Ideal for scenarios where you want to apply your own logic, filters, or processing before generating video responses
  • Perfect for integration with external systems that provide the text to speak

Using Open-Loop Mode

Open-loop mode allows you to use STT for speech transcription and TTS+video generation without LLM processing. This is useful when you want to handle text generation yourself or integrate with external systems.

Setting Open-Loop Mode

Set the model to one of the open-loop values when creating a session:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'none',  // or 'disable' or 'open-loop'
    voice: 'default',
    instructions: 'You are a helpful assistant.'  // Ignored in open-loop mode
  })
});

Sending Text in Open-Loop Mode

In open-loop mode, you need to send text directly to generate video responses. You can do this using conversation.item.create with role: "assistant":

// Browser JavaScript - WebSocket
// User speech is automatically transcribed via STT
// Then send your processed text to generate video

ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'assistant',  // Important: use 'assistant' for open-loop
    content: [
      {
        type: 'input_text',
        text: 'Hello! I can hear you clearly.'  // Your processed text
      }
    ]
  }
}));

// Browser JavaScript - WebRTC Data Channel
dataChannel.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'assistant',
    content: [
      {
        type: 'input_text',
        text: 'Hello! I can hear you clearly.'
      }
    ]
  }
}));

Use Cases for Open-Loop Mode

  • Custom logic processing: Apply your own text processing, filtering, or business rules before generating video responses
  • External system integration: Integrate with external APIs or services that provide the text to speak
  • Rule-based responses: Use rule-based systems instead of LLM for deterministic responses
  • Translation workflows: Translate user input and generate video in a different language
  • Content moderation: Filter or modify user input before generating video responses
  • Testing and debugging: Test TTS and video generation without LLM variability

Tip: In open-loop mode, STT transcription still works automatically. User speech is transcribed and you receive conversation.item.input_audio_transcription.completed events with the transcript. You can then process this transcript and send your response text.

Next Steps