Assets Guide

Learn how to use faces, voices, and driving motions to customize your realtime avatar. You can use standard assets or upload your own custom assets.

Overview

rtAV supports three types of assets:

  • Faces - Visual appearance of the avatar
  • Voices - Audio voice for text-to-speech
  • Driving Motions - Animation style and gestures

Standard Assets

rtAV provides a variety of standard assets that you can use immediately. List available assets via the API:

List Standard Faces

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/faces');
const { faces } = await response.json();
console.log('Available faces:', faces);

List Standard Voices

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/voices');
const { voices } = await response.json();
console.log('Available voices:', voices);

List Standard Driving Motions

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/driving');
const { driving } = await response.json();
console.log('Available driving motions:', driving);

Using Assets in Sessions

Specify assets when creating a session:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5.2',
    voice: 'alice',        // Standard voice ID
    face: 'face1',         // Standard face ID
    driving: 'idle'        // Standard driving motion ID
  })
});

Updating Assets During Session

Update assets dynamically via session.update. You can change voice, face, or driving motion at any time during an active session:

// Browser JavaScript - WebSocket
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    voice: 'anthony',     // Change voice
    face: 'face2',        // Change face
    driving: 'SmileTeeth' // Change driving motion
  }
}));

// Browser JavaScript - WebRTC (via data channel)
dataChannel.send(JSON.stringify({
  type: 'session.update',
  session: {
    voice: 'anthony',
    face: 'face2',
    driving: 'SmileTeeth'
  }
}));

Driving Motion Sequences

You can update driving motion with a single motion or a sequence of motions. When you send a sequence, the avatar will transition smoothly between motions, playing intermediate motions once, and looping the final motion.

Single Driving Change

Change to a single driving motion (permanent change):

// Browser JavaScript
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: 'Wink'  // Single string - transitions to Wink and loops
  }
}));
Driving Sequence (Multiple Motions)

Send an array of driving motions to create a sequence. The avatar will:

  • Transition from the current motion to the first motion in the sequence
  • Play all intermediate motions once (in order)
  • Transition between intermediate motions
  • Loop the final motion continuously
Permanent Change with Intermediate

Play an intermediate motion, then transition to a final motion:

// Browser JavaScript
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: ['AgreeYesTotaly', 'default']
    // Plays AgreeYesTotaly once, then transitions to default and loops
  }
}));
Repeating Intermediate Motion

Repeat an intermediate motion multiple times, then transition to a final motion:

// Browser JavaScript
ws.send(JSON.stringify({
  type: 'session.update',
  session: {
    driving: ['Wink', 'Wink', 'Wink', 'default']
    // Plays Wink 3 times, then transitions to default and loops
  }
}));

Note: All transitions are smooth and real-time. The system uses motion template frames (not pre-rendered video) to ensure low latency and seamless transitions.

Custom Assets

Upload your own custom assets via the platform dashboard:

  1. Navigate to Assets in your dashboard
  2. Upload your asset (image for face, audio for voice, video for driving motion)
  3. Wait for processing to complete
  4. Use the asset UUID in your sessions

Using Custom Assets

Use the UUID of your custom asset:

// Browser JS / Node.js
const response = await fetch('https://api.rtav.io/v1/sessions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer rtav_ak_your_api_key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-5.2',
    voice: '550e8400-e29b-41d4-a716-446655440000',  // Custom voice UUID
    face: '660e8400-e29b-41d4-a716-446655440001',    // Custom face UUID
    driving: '770e8400-e29b-41d4-a716-446655440002'   // Custom driving UUID
  })
});

Asset Requirements

Face Images

  • Format: JPEG or PNG
  • Recommended size: 512x512 pixels or larger
  • Clear, front-facing portrait
  • Good lighting and contrast

Voice Audio

  • Format: WAV, MP3, or other common audio formats
  • Duration: 10-60 seconds recommended
  • Clear speech, minimal background noise
  • Single speaker

Driving Motion Videos

  • Format: MP4 or other common video formats
  • Duration: 5-30 seconds recommended
  • Front-facing subject
  • Clear facial expressions and movements

Popular Standard Assets

Voices

  • default - Default voice
  • alice - Female voice
  • anthony - Male voice
  • alloy - Neutral voice
  • echo - Clear, professional voice
  • nova - Warm, friendly voice

Driving Motions

  • idle - Idle animation
  • IdleListeningNatural - Natural listening pose
  • AgreeYesTotaly - Agreeing gesture
  • SmileTeeth - Smiling expression
  • Thinking - Thinking pose
  • Surprised - Surprised expression

Next Steps