Welcome to Anam

Beta Feature: The ElevenLabs integration uses Anam’s audio passthrough mode, which is currently in beta. APIs may change as we continue to improve the integration.

Combine ElevenLabs Conversational AI with Anam avatars to create voice agents with real-time lip-sync. ElevenLabs handles the intelligence (speech recognition, LLM, and voice synthesis), while Anam provides the visual presence.

View Example

Full source code for the ElevenLabs conversational agent with Anam avatar.

How It Works

The integration uses Anam’s audio passthrough mode, where Anam renders an avatar that lip-syncs to audio you provide—without using Anam’s own AI or microphone input.

Bring Your Own Voice: ElevenLabs provides voice synthesis. Anam adds the visual layer—combining both services in a single experience.

Quick Start

Prerequisites

ElevenLabs account with a configured Conversational AI agent
Anam account with API access
Node.js or Bun runtime
Modern browser with WebRTC support (Chrome, Firefox, Safari, Edge)

Installation

npm install @anam-ai/js-sdk chatdio

Basic Integration

Here’s the core pattern for connecting ElevenLabs to Anam:

import { createClient } from "@anam-ai/js-sdk";

// 1. Create Anam client with audio passthrough session
const anamClient = createClient(sessionToken, {
  disableInputAudio: true, // ElevenLabs handles microphone
});
await anamClient.streamToVideoElement("video-element");

// 2. Create agent audio input stream
const audioInputStream = anamClient.createAgentAudioInputStream({
  encoding: "pcm_s16le",
  sampleRate: 16000,
  channels: 1,
});

// 3. Connect to ElevenLabs and forward audio
const ws = new WebSocket(`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${agentId}`);

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);

  if (msg.type === "audio" && msg.audio_event?.audio_base_64) {
    // Forward audio chunks to Anam for lip-sync
    audioInputStream.sendAudioChunk(msg.audio_event.audio_base_64);
  }

  if (msg.type === "agent_response") {
    // Signal end of audio sequence
    audioInputStream.endSequence();
  }

  if (msg.type === "interruption") {
    // Handle barge-in
    audioInputStream.endSequence();
  }
};

Full Example

Project Structure

src/
├── client.ts          # Main client orchestration
├── elevenlabs.ts      # ElevenLabs WebSocket handling
└── routes/
    └── api/
        └── config.ts  # Server-side session token endpoint

Server: Create Anam Session

Your server creates an Anam session token with enableAudioPassthrough: true:

config.ts

const response = await fetch("https://api.anam.ai/v1/auth/session-token", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${ANAM_API_KEY}`,
  },
  body: JSON.stringify({
    personaConfig: {
      avatarId: AVATAR_ID,
      enableAudioPassthrough: true, // Enable external audio input
    },
  }),
});

const { sessionToken } = await response.json();

Client: ElevenLabs Module

Handle the WebSocket connection and microphone capture:

elevenlabs.ts

import { MicrophoneCapture, arrayBufferToBase64 } from "chatdio";

const SAMPLE_RATE = 16000;

export interface ElevenLabsCallbacks {
  onReady?: () => void;
  onAudio?: (base64Audio: string) => void;
  onUserTranscript?: (text: string) => void;
  onAgentResponse?: (text: string) => void;
  onInterrupt?: () => void;
  onDisconnect?: () => void;
}

export async function connectElevenLabs(agentId: string, callbacks: ElevenLabsCallbacks) {
  const ws = new WebSocket(`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${agentId}`);

  // Set up microphone capture
  const mic = new MicrophoneCapture({
    sampleRate: SAMPLE_RATE,
    echoCancellation: true,
    noiseSuppression: true,
  });

  mic.on("data", (data: ArrayBuffer) => {
    if (ws.readyState === WebSocket.OPEN) {
      ws.send(
        JSON.stringify({
          user_audio_chunk: arrayBufferToBase64(data),
        })
      );
    }
  });

  ws.onopen = async () => {
    await mic.start();
    callbacks.onReady?.();
  };

  ws.onmessage = (event) => {
    const msg = JSON.parse(event.data);

    switch (msg.type) {
      case "audio":
        callbacks.onAudio?.(msg.audio_event.audio_base_64);
        break;
      case "agent_response":
        callbacks.onAgentResponse?.(msg.agent_response_event.agent_response);
        break;
      case "user_transcript":
        callbacks.onUserTranscript?.(msg.user_transcription_event.user_transcript);
        break;
      case "interruption":
        callbacks.onInterrupt?.();
        break;
      case "ping":
        ws.send(JSON.stringify({ type: "pong", event_id: msg.ping_event.event_id }));
        break;
    }
  };

  ws.onclose = () => {
    mic.stop();
    callbacks.onDisconnect?.();
  };
}

Client: Main Integration

Wire everything together:

client.ts

import { createClient } from "@anam-ai/js-sdk";
import { connectElevenLabs } from "./elevenlabs";

async function startConversation() {
  // Get session config from your server
  const { anamSessionToken, elevenLabsAgentId } = await fetch("/api/config").then((r) => r.json());

  // Initialize Anam avatar (disable input audio since ElevenLabs handles mic)
  const anamClient = createClient(anamSessionToken, {
    disableInputAudio: true,
  });
  await anamClient.streamToVideoElement("anam-video");

  // Create agent audio input stream
  const audioInputStream = anamClient.createAgentAudioInputStream({
    encoding: "pcm_s16le",
    sampleRate: 16000,
    channels: 1,
  });

  // Connect to ElevenLabs
  await connectElevenLabs(elevenLabsAgentId, {
    onAudio: (audio) => {
      audioInputStream.sendAudioChunk(audio);
    },
    onAgentResponse: () => {
      audioInputStream.endSequence();
    },
    onInterrupt: () => {
      audioInputStream.endSequence();
    },
  });
}

Configuration

Environment Variables

Get your API credentials

You’ll need credentials from both services:

Service	Where to get it
Anam	lab.anam.ai → Settings → API Keys
ElevenLabs	elevenlabs.io → Agents

Set environment variables

.env

# Anam credentials
ANAM_API_KEY=your_anam_api_key
ANAM_AVATAR_ID=your_avatar_id

# ElevenLabs credentials
ELEVENLABS_AGENT_ID=your_agent_id

ElevenLabs Agent Setup

When configuring your ElevenLabs agent, set the output audio format to match Anam’s expectations:

Setting	Value
Format	PCM 16-bit
Sample Rate	16000 Hz
Channels	Mono

Mismatched audio formats will cause lip-sync issues. Ensure your ElevenLabs agent outputs PCM16 at 16kHz.

Choosing an Avatar

Stock Avatars

Browse ready-to-use avatars in our gallery. Copy the avatar ID directly into your config.

Custom Avatars

Create your own personalized avatar in Anam Lab with custom appearance and style.

Audio Passthrough API

createAgentAudioInputStream()

Creates a stream for sending audio chunks to the avatar for lip-sync.

const audioInputStream = anamClient.createAgentAudioInputStream({
  encoding: "pcm_s16le",
  sampleRate: 16000,
  channels: 1,
});

encoding

string

required

Audio encoding format. Only pcm_s16le (16-bit signed little-endian PCM) is supported.

sampleRate

number

required

Sample rate in Hz. Should match your ElevenLabs agent output (typically 16000).

channels

number

required

Number of audio channels. Use 1 for mono.

sendAudioChunk()

Send a base64-encoded audio chunk for lip-sync rendering.

audioInputStream.sendAudioChunk(base64AudioData);

Audio chunks can be sent faster than realtime. Anam buffers them internally and renders lip-sync at the correct pace.

endSequence()

Signal that the current audio sequence has ended. This helps Anam optimize lip-sync timing and handle transitions.

audioInputStream.endSequence();

Call this when:

ElevenLabs sends the agent_response event (agent finished speaking)
ElevenLabs sends the interruption event (user barged in)

Handling Interruptions

When a user speaks while the agent is talking (barge-in), ElevenLabs sends an interruption event. Handle it by ending the current audio sequence:

onInterrupt: () => {
  audioInputStream.endSequence();
},

This signals Anam to stop the current lip-sync animation and prepare for new audio.

Performance Considerations

Latency

This integration combines two real-time services, which adds latency compared to using Anam’s turnkey solution:

Path	Typical Latency
User speech → ElevenLabs STT	200-400ms
ElevenLabs LLM processing	300-800ms
ElevenLabs TTS → Anam avatar	100-200ms
Total end-to-end	600-1400ms

For lower latency requirements, consider using Anam’s turnkey solution which handles STT, LLM, and TTS in an optimized pipeline.

Browser Compatibility

The integration requires WebRTC support. Tested browsers:

Browser	Support
Chrome 80+	Full support
Firefox 75+	Full support
Safari 14+	Full support
Edge 80+	Full support

Mobile browsers are supported but may have higher latency on cellular networks.

Billing

When using audio passthrough mode:

Anam: Billed for avatar streaming time (session duration)
ElevenLabs: Billed separately for STT, LLM, and TTS usage

Check both Anam pricing and ElevenLabs pricing to understand total costs.

Troubleshooting

Avatar lips not moving

Verify audio format matches (PCM16, 16kHz, mono)
Check that sendAudioChunk() is receiving data
Ensure the audio input stream was created successfully
Look for errors in browser console

Audio/lip-sync out of sync

Call endSequence() when agent responses complete
Ensure you’re handling interruptions correctly
Check network latency to both services

No audio from agent

Verify your ElevenLabs agent is configured correctly
Check the WebSocket connection is established
Look for audio events in the message handler
Confirm your agent ID is correct

Microphone not working

Check browser permissions for microphone access
Ensure echoCancellation is enabled to prevent feedback
Verify the microphone is sending data at 16kHz

Session token errors

Verify your ANAM_API_KEY is valid
Check that enableAudioPassthrough: true is set in the session request
Ensure the avatar ID exists in your account

When to Use This Integration

This integration is a good fit when you:

Already use ElevenLabs Conversational AI and want to add a visual component
Need ElevenLabs-specific voice cloning or voice features
Want to keep your existing ElevenLabs agent logic unchanged

Consider Anam’s turnkey solution instead if you:

Are starting from scratch and want the simplest setup
Need the lowest possible latency
Want a single billing relationship

Resources

ElevenLabs Docs

Official ElevenLabs Conversational AI documentation

Demo Repository

Full source code for this integration

Avatar Gallery

Browse available stock avatars

Anam Lab

Create custom avatars for your brand

Support

Join Slack

Join our community for help and discussions

Report Issues

Report bugs or request features on GitHub

Getting Started

Core Concepts

Tools

Knowledge

Examples

SDK Reference

Resources

Third-Party Integrations

Community

View Example

​How It Works

​Quick Start

​Prerequisites

​Installation

​Basic Integration

​Full Example

​Project Structure

​Server: Create Anam Session

​Client: ElevenLabs Module

​Client: Main Integration

​Configuration

​Environment Variables

​ElevenLabs Agent Setup

​Choosing an Avatar

Stock Avatars

Custom Avatars

​Audio Passthrough API

​createAgentAudioInputStream()

​sendAudioChunk()

​endSequence()

​Handling Interruptions

​Performance Considerations

​Latency

​Browser Compatibility

​Billing

​Troubleshooting

​When to Use This Integration

​Resources

ElevenLabs Docs

Demo Repository

Avatar Gallery

Anam Lab

​Support

Join Slack

Report Issues

How It Works

Quick Start

Prerequisites

Installation

Basic Integration

Full Example

Project Structure

Server: Create Anam Session

Client: ElevenLabs Module

Client: Main Integration

Configuration

Environment Variables

ElevenLabs Agent Setup

Choosing an Avatar

Audio Passthrough API

createAgentAudioInputStream()

sendAudioChunk()

endSequence()

Handling Interruptions

Performance Considerations

Latency

Browser Compatibility

Billing

Troubleshooting

When to Use This Integration

Resources

Support