Models & Prompting

Learn about available LLM models and how to effectively prompt them for realtime conversations with your avatar.

Available Models

rtAV supports models from OpenAI and Anthropic. Check availability via the Models API.

OpenAI Models

gpt-5.2 - Latest GPT-5 model (recommended)
gpt-5.1 - GPT-5.1 model
gpt-5 - GPT-5 model (default)
gpt-5-mini - Smaller, faster GPT-5 model
gpt-5-nano - Smallest GPT-5 model
gpt-4.1 - GPT-4.1 model
gpt-4o - GPT-4 Optimized model

Anthropic Models

claude-sonnet-4-5 - Claude Sonnet 4.5 (balanced performance)
claude-haiku-4-5 - Claude Haiku 4.5 (fastest)
claude-opus-4-5 - Claude Opus 4.5 (most capable)

Open-Loop Mode (No LLM)

For use cases where you want to use STT (speech-to-text) and TTS+video without an LLM assistant, you can use open-loop mode. This allows you to process user speech and generate video directly from text input without LLM generation.

none - Disable LLM, enable STT and TTS+video
disable - Disable LLM, enable STT and TTS+video
open-loop - Disable LLM, enable STT and TTS+video

Note: In open-loop mode, you must send text directly to the worker using conversation.item.create with role: "assistant" or via Generate events. The LLM will not generate responses automatically.

Setting the Model

Set the model when creating a session:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5.2',  // Set your preferred model
    voice: 'default',
    instructions: 'You are a helpful assistant.'
  })
});

You can also update the model dynamically via session.update event.

System Instructions

System instructions define the avatar's personality, role, and behavior. They're sent as part of the session configuration:

{
  "instructions": "You are a friendly customer support agent. Be helpful, patient, and professional. Always greet customers warmly and ask how you can assist them today."
}

Best Practices

Be specific: Clearly define the avatar's role and personality
Set tone: Specify the desired communication style (formal, casual, friendly, etc.)
Define boundaries: Set limits on what the avatar can and cannot do
Include context: Provide relevant background information about the use case
Keep it concise: Long instructions may reduce response quality

Example Instructions

// Customer Support
"You are a helpful customer support agent. Answer questions clearly and concisely. If you don't know something, offer to connect the customer with a human agent."

// Educational Tutor
"You are a patient and encouraging tutor. Explain concepts step-by-step. Use examples to help students understand. Ask questions to check understanding."

// Sales Assistant
"You are a knowledgeable sales assistant. Help customers find the right products. Be enthusiastic but not pushy. Focus on understanding customer needs."

// Technical Support
"You are a technical support specialist. Help users troubleshoot issues systematically. Ask clarifying questions to understand the problem. Provide step-by-step solutions."

Updating Instructions

Update instructions dynamically during a session:

// Browser JavaScript - WebSocket
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'gpt-5.2',
    instructions: 'You are now a coding assistant. Help with programming questions.',
    voice: 'default'
  }
}));

// Browser JavaScript - WebRTC Data Channel
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    type: 'realtime',
    model: 'gpt-5.2',
    instructions: 'You are now a coding assistant. Help with programming questions.',
    voice: 'default'
  }
}));

Model Selection Guide

For General Conversations

gpt-5 - Best balance of quality and speed (recommended)
gpt-5-mini - Faster responses, slightly lower quality
claude-sonnet-4-5 - Good alternative with different strengths

For Complex Tasks

gpt-5.2 - Latest and most capable
claude-opus-4-5 - Excellent for complex reasoning

For Low Latency

gpt-5-nano - Fastest responses
gpt-5-mini - Fast with good quality
claude-haiku-4-5 - Very fast alternative

For Custom Text Processing (Open-Loop)

none / disable / open-loop - Use when you want to handle text processing yourself and only need STT transcription + TTS+video generation
Ideal for scenarios where you want to apply your own logic, filters, or processing before generating video responses
Perfect for integration with external systems that provide the text to speak

Using Open-Loop Mode

Open-loop mode allows you to use STT for speech transcription and TTS+video generation without LLM processing. This is useful when you want to handle text generation yourself or integrate with external systems.

Setting Open-Loop Mode

Set the model to one of the open-loop values when creating a session:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'none',  // or 'disable' or 'open-loop'
    voice: 'default',
    instructions: 'You are a helpful assistant.'  // Ignored in open-loop mode
  })
});

Sending Text in Open-Loop Mode

In open-loop mode, you need to send text directly to generate video responses. You can do this using conversation.item.create with role: "assistant":

// Browser JavaScript - WebSocket
// User speech is automatically transcribed via STT
// Then send your processed text to generate video

ws.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'assistant',  // Important: use 'assistant' for open-loop
    content: [
      {
        type: 'input_text',
        text: 'Hello! I can hear you clearly.'  // Your processed text
      }
    ]
  }
}));

// Browser JavaScript - WebRTC Data Channel
dataChannel.send(JSON.stringify({
  type: 'conversation.item.create',
  item: {
    type: 'message',
    role: 'assistant',
    content: [
      {
        type: 'input_text',
        text: 'Hello! I can hear you clearly.'
      }
    ]
  }
}));

Use Cases for Open-Loop Mode

Custom logic processing: Apply your own text processing, filtering, or business rules before generating video responses
External system integration: Integrate with external APIs or services that provide the text to speak
Rule-based responses: Use rule-based systems instead of LLM for deterministic responses
Translation workflows: Translate user input and generate video in a different language
Content moderation: Filter or modify user input before generating video responses
Testing and debugging: Test TTS and video generation without LLM variability

Tip: In open-loop mode, STT transcription still works automatically. User speech is transcribed and you receive conversation.item.input_audio_transcription.completed events with the transcript. You can then process this transcript and send your response text.

Next Steps

Check Models API Reference for complete model list
Learn about Conversation Management
Explore Realtime Guide for more features