Azure Voice Live STT: How to enforce a hard per-session language lock (es-ES only)?

Jurado, Jose Luis 0 Reputation points
2026-07-01T09:06:04.2966667+00:00

Issue context:

We are using Azure Voice Live realtime STT in production voice sessions and need strict monolingual behavior per session.

Observed behavior:

  • We send session.update with input_audio_transcription.language set to es-ES.
    • We also set turn_detection.type to azure_semantic_vad_multilingual.
      • Despite language=es-ES, STT still detects/transcribes other languages when speakers switch language.
    • Expected behavior:
      • A hard language lock so STT only recognizes/transcribes Spanish (es-ES) for that session.
    • Current payload example:
    • {
    • "type": "session.update",
    • "session": {
    • "input_audio_transcription": {
      
    •   "model": "azure-speech",
      
    •   "language": "es-ES"
      
    • },
      
    • "turn_detection": {
      
    •   "type": "azure_semantic_vad_multilingual",
      
    •   "threshold": 0.5,
      
    •   "prefix_padding_ms": 500,
      
    •   "silence_duration_ms": 1400,
      
    •   "barge_in": true
      
    • }
      
    • }
    • }
  • Question:
    1. Is strict hard language lock supported today for Voice Live STT per session?
      1. If yes, what exact parameter(s) and API version enforce it?
        1. If not, what is the recommended workaround and roadmap?
      2. Any official guidance is appreciated.
Azure Speech in Foundry Tools

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.