AudioEchoCancellation with PersonalVoice is not working on the Voice Live API

Arne De Proft 0 Reputation points Microsoft Employee
2026-05-07T11:43:41.7+00:00

AudioEchoCancellation is working without AzureStandardVoice but not with PersonalVoice via the azure.ai.voicelive.models Voice Live python SDK

Azure Speech in Foundry Tools

2 answers

Sort by: Most helpful
  1. Sina Salam 30,486 Reputation points Volunteer Moderator
    2026-05-14T14:22:48.2166667+00:00

    Hello Arne De Proft,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that AudioEchoCancellation with PersonalVoice is not working on the Voice Live API.

    AudioProcessingOptions is not supported with the Voice Live API because Voice Live uses a separate server-side audio processing pipeline. Echo cancellation must be configured through the Voice Live session using:

    "input_audio_echo_cancellation": {
        "type": "server_echo_cancellation"
    }
    

    However, the main reason the feature works with AzureStandardVoice but fails with PersonalVoice is that Personal Voice introduces additional synthesis and playback latency. Voice Live server echo cancellation depends on near-real-time playback synchronization and degrades when playback delay exceeds approximately two seconds.

    Therefore, enabling server_echo_cancellation alone is not sufficient. To make echo cancellation work reliably with PersonalVoice, the application must also:

    1. Stream playback immediately without buffering
    2. Avoid queued audio playback pipelines
    3. Use low-latency audio output
    4. Prefer WebRTC transport over buffered WebSocket playback
    5. Minimize playback delay to under ~2 seconds end-to-end
    6. Avoid combining Speech SDK AudioProcessingOptions with Voice Live

    If these conditions cannot be met consistently, headset-based audio remains the only fully reliable configuration for Personal Voice today.

    Check the links below as references on Voice Live API How-To:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    Was this answer helpful?

    0 comments No comments

  2. SRILAKSHMI C 19,550 Reputation points Microsoft External Staff Moderator
    2026-05-07T14:39:04.8566667+00:00

    Hello @Arne De Proft

    Thank you for reporting this behavior and for the additional details.

    From your description, it appears that AudioEchoCancellation works correctly with standard Azure voices but not when using PersonalVoice through the Azure Voice Live Python SDK.

    The key distinction here is that Voice Live API uses a different audio processing pipeline compared to the standard Speech SDK.

    With standard Azure voices, you may have been using Speech SDK client-side AudioProcessingOptions. However, when using PersonalVoice in Voice Live API, echo cancellation is expected to be enabled through the Voice Live session configuration using server-side echo cancellation properties.

    Recommended approach

    Instead of using Speech SDK AudioProcessingOptions, enable echo cancellation via SessionProperties.

    Example:

    from azure.ai.voicelive import VoiceLiveClient
    from azure.ai.voicelive.models import (
        AudioConfig,
        RealtimeAzurePersonalVoice,
        SessionProperties,
        SessionPropertyKey
    )
    # Create client
    client = VoiceLiveClient(
        endpoint="https://<your-resource>.cognitiveservices.azure.com/",
        credential="<your-key>"
    )
    # Enable server-side echo cancellation
    session_props = SessionProperties(
        properties={
            SessionPropertyKey.SERVER_ECHO_CANCELLATION: True
        }
    )
    # Configure Personal Voice
    personal_voice = RealtimeAzurePersonalVoice(
        type="azure-custom",
        name="your-voice-name",
        endpoint_id="your-endpoint-id"
    )
    # Microphone input
    audio_input = AudioConfig.from_default_microphone_input()
    # Start session
    poller = client.begin_live_speech(
        audio_config=audio_input,
        voice_config=personal_voice,
        session_properties=session_props
    )
    

    Do not combine Speech SDK AudioProcessingOptions with Voice Live API sessions, as they belong to different processing pipelines.

    Ensure you are using the latest version of the azure-ai-voicelive SDK.

    Server-side echo cancellation assumes playback occurs within approximately 2 seconds of receiving audio. Longer playback delays may reduce cancellation effectiveness.

    It is also helpful to verify whether the issue reproduces:

    • Across multiple Personal Voices
    • Across different sample rates/output formats
    • In multiple regions

    Currently, Microsoft documentation does not explicitly state whether all echo cancellation scenarios are fully supported across all PersonalVoice configurations. Since:

    • the feature works with AzureStandardVoice,
    • and the issue appears specifically with PersonalVoice,

    this may also indicate a current limitation or service-side behavior related to Personal Voice processing.

    Please refer this

    How to use the Voice Live API – Session Properties (Server Echo Cancellation): https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-how-to#session-properties

    Voice Live API Reference (RealtimeAzurePersonalVoice): https://learn.microsoft.com/azure/ai-services/speech-service/voice-live-api-reference-2025-10-01#components

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.