Skip to main content
Beta Feature: The ElevenLabs integration uses Anam’s audio passthrough mode, which is currently in beta. APIs may change as we continue to improve the integration.
Combine ElevenLabs Conversational AI with Anam avatars to create voice agents with real-time lip-sync. ElevenLabs handles the intelligence (speech recognition, LLM, and voice synthesis), while Anam provides the visual presence.

View Example

Full source code for the ElevenLabs conversational agent with Anam avatar.

How It Works

The integration uses Anam’s audio passthrough mode, where Anam renders an avatar that lip-syncs to audio you provide—without using Anam’s own AI or microphone input.
Bring Your Own Voice: ElevenLabs provides voice synthesis. Anam adds the visual layer—combining both services in a single experience.

Quick Start

Prerequisites

  • ElevenLabs account with a configured Conversational AI agent
  • Anam account with API access
  • Node.js or Bun runtime
  • Modern browser with WebRTC support (Chrome, Firefox, Safari, Edge)

Installation

npm install @anam-ai/js-sdk chatdio

Basic Integration

Here’s the core pattern for connecting ElevenLabs to Anam:
import { createClient } from "@anam-ai/js-sdk";

// 1. Create Anam client with audio passthrough session
const anamClient = createClient(sessionToken, {
  disableInputAudio: true, // ElevenLabs handles microphone
});
await anamClient.streamToVideoElement("video-element");

// 2. Create agent audio input stream
const audioInputStream = anamClient.createAgentAudioInputStream({
  encoding: "pcm_s16le",
  sampleRate: 16000,
  channels: 1,
});

// 3. Connect to ElevenLabs and forward audio
const ws = new WebSocket(`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${agentId}`);

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  if (msg.type === "audio" && msg.audio_event?.audio_base_64) {
    // Forward audio chunks to Anam for lip-sync
    audioInputStream.sendAudioChunk(msg.audio_event.audio_base_64);
  }

  if (msg.type === "agent_response") {
    // Signal end of audio sequence
    audioInputStream.endSequence();
  }

  if (msg.type === "interruption") {
    // Handle barge-in
    audioInputStream.endSequence();
  }
};

Full Example

Project Structure

src/
├── client.ts          # Main client orchestration
├── elevenlabs.ts      # ElevenLabs WebSocket handling
└── routes/
    └── api/
        └── config.ts  # Server-side session token endpoint

Server: Create Anam Session

Your server creates an Anam session token with enableAudioPassthrough: true:
config.ts
const response = await fetch("https://api.anam.ai/v1/auth/session-token", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${ANAM_API_KEY}`,
  },
  body: JSON.stringify({
    personaConfig: {
      avatarId: AVATAR_ID,
      enableAudioPassthrough: true, // Enable external audio input
    },
  }),
});

const { sessionToken } = await response.json();

Client: ElevenLabs Module

Handle the WebSocket connection and microphone capture:
elevenlabs.ts
import { MicrophoneCapture, arrayBufferToBase64 } from "chatdio";

const SAMPLE_RATE = 16000;

export interface ElevenLabsCallbacks {
  onReady?: () => void;
  onAudio?: (base64Audio: string) => void;
  onUserTranscript?: (text: string) => void;
  onAgentResponse?: (text: string) => void;
  onInterrupt?: () => void;
  onDisconnect?: () => void;
}

export async function connectElevenLabs(agentId: string, callbacks: ElevenLabsCallbacks) {
  const ws = new WebSocket(`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${agentId}`);

  // Set up microphone capture
  const mic = new MicrophoneCapture({
    sampleRate: SAMPLE_RATE,
    echoCancellation: true,
    noiseSuppression: true,
  });

  mic.on("data", (data: ArrayBuffer) => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(
        JSON.stringify({
          user_audio_chunk: arrayBufferToBase64(data),
        })
      );
    }
  });

  ws.onopen = async () => {
    await mic.start();
    callbacks.onReady?.();
  };

  ws.onmessage = (event) => {
    const msg = JSON.parse(event.data);

    switch (msg.type) {
      case "audio":
        callbacks.onAudio?.(msg.audio_event.audio_base_64);
        break;
      case "agent_response":
        callbacks.onAgentResponse?.(msg.agent_response_event.agent_response);
        break;
      case "user_transcript":
        callbacks.onUserTranscript?.(msg.user_transcription_event.user_transcript);
        break;
      case "interruption":
        callbacks.onInterrupt?.();
        break;
      case "ping":
        ws.send(JSON.stringify({ type: "pong", event_id: msg.ping_event.event_id }));
        break;
    }
  };

  ws.onclose = () => {
    mic.stop();
    callbacks.onDisconnect?.();
  };
}

Client: Main Integration

Wire everything together:
client.ts
import { createClient } from "@anam-ai/js-sdk";
import { connectElevenLabs } from "./elevenlabs";

async function startConversation() {
  // Get session config from your server
  const { anamSessionToken, elevenLabsAgentId } = await fetch("/api/config").then((r) => r.json());

  // Initialize Anam avatar (disable input audio since ElevenLabs handles mic)
  const anamClient = createClient(anamSessionToken, {
    disableInputAudio: true,
  });
  await anamClient.streamToVideoElement("anam-video");

  // Create agent audio input stream
  const audioInputStream = anamClient.createAgentAudioInputStream({
    encoding: "pcm_s16le",
    sampleRate: 16000,
    channels: 1,
  });

  // Connect to ElevenLabs
  await connectElevenLabs(elevenLabsAgentId, {
    onAudio: (audio) => {
      audioInputStream.sendAudioChunk(audio);
    },
    onAgentResponse: () => {
      audioInputStream.endSequence();
    },
    onInterrupt: () => {
      audioInputStream.endSequence();
    },
  });
}

Configuration

Environment Variables

1

Get your API credentials

You’ll need credentials from both services:
ServiceWhere to get it
Anamlab.anam.ai → Settings → API Keys
ElevenLabselevenlabs.io → Agents
2

Set environment variables

.env
# Anam credentials
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

# ElevenLabs credentials
ELEVENLABS_AGENT_ID=your_agent_id

ElevenLabs Agent Setup

When configuring your ElevenLabs agent, set the output audio format to match Anam’s expectations:
SettingValue
FormatPCM 16-bit
Sample Rate16000 Hz
ChannelsMono
Mismatched audio formats will cause lip-sync issues. Ensure your ElevenLabs agent outputs PCM16 at 16kHz.

Choosing an Avatar

Audio Passthrough API

createAgentAudioInputStream()

Creates a stream for sending audio chunks to the avatar for lip-sync.
const audioInputStream = anamClient.createAgentAudioInputStream({
  encoding: "pcm_s16le",
  sampleRate: 16000,
  channels: 1,
});
encoding
string
required
Audio encoding format. Only pcm_s16le (16-bit signed little-endian PCM) is supported.
sampleRate
number
required
Sample rate in Hz. Should match your ElevenLabs agent output (typically 16000).
channels
number
required
Number of audio channels. Use 1 for mono.

sendAudioChunk()

Send a base64-encoded audio chunk for lip-sync rendering.
audioInputStream.sendAudioChunk(base64AudioData);
Audio chunks can be sent faster than realtime. Anam buffers them internally and renders lip-sync at the correct pace.

endSequence()

Signal that the current audio sequence has ended. This helps Anam optimize lip-sync timing and handle transitions.
audioInputStream.endSequence();
Call this when:
  • ElevenLabs sends the agent_response event (agent finished speaking)
  • ElevenLabs sends the interruption event (user barged in)

Handling Interruptions

When a user speaks while the agent is talking (barge-in), ElevenLabs sends an interruption event. Handle it by ending the current audio sequence:
onInterrupt: () => {
  audioInputStream.endSequence();
},
This signals Anam to stop the current lip-sync animation and prepare for new audio.

Performance Considerations

Latency

This integration combines two real-time services, which adds latency compared to using Anam’s turnkey solution:
PathTypical Latency
User speech → ElevenLabs STT200-400ms
ElevenLabs LLM processing300-800ms
ElevenLabs TTS → Anam avatar100-200ms
Total end-to-end600-1400ms
For lower latency requirements, consider using Anam’s turnkey solution which handles STT, LLM, and TTS in an optimized pipeline.

Browser Compatibility

The integration requires WebRTC support. Tested browsers:
BrowserSupport
Chrome 80+Full support
Firefox 75+Full support
Safari 14+Full support
Edge 80+Full support
Mobile browsers are supported but may have higher latency on cellular networks.

Billing

When using audio passthrough mode:
  • Anam: Billed for avatar streaming time (session duration)
  • ElevenLabs: Billed separately for STT, LLM, and TTS usage
Check both Anam pricing and ElevenLabs pricing to understand total costs.

Troubleshooting

  • Verify audio format matches (PCM16, 16kHz, mono)
  • Check that sendAudioChunk() is receiving data
  • Ensure the audio input stream was created successfully
  • Look for errors in browser console
  • Call endSequence() when agent responses complete
  • Ensure you’re handling interruptions correctly
  • Check network latency to both services
  • Verify your ElevenLabs agent is configured correctly
  • Check the WebSocket connection is established
  • Look for audio events in the message handler
  • Confirm your agent ID is correct
  • Check browser permissions for microphone access
  • Ensure echoCancellation is enabled to prevent feedback
  • Verify the microphone is sending data at 16kHz
  • Verify your ANAM_API_KEY is valid
  • Check that enableAudioPassthrough: true is set in the session request
  • Ensure the avatar ID exists in your account

When to Use This Integration

This integration is a good fit when you:
  • Already use ElevenLabs Conversational AI and want to add a visual component
  • Need ElevenLabs-specific voice cloning or voice features
  • Want to keep your existing ElevenLabs agent logic unchanged
Consider Anam’s turnkey solution instead if you:
  • Are starting from scratch and want the simplest setup
  • Need the lowest possible latency
  • Want a single billing relationship

Resources

Support