gpt-realtime-translate (2026-05-06, GA) deploys successfully but inference always fails with OperationNotSupported

Question

Question

Answer 1

Welcome to Microsoft Q&A .Thank you for reaching out to us.

This behavior is most likely occurring because the deployment can be successfully resolved by the service, but the runtime execution path for gpt-realtime-translate is not completing successfully through the Realtime API flows that were tested.

The key observation is that the same environment, authentication method, audio format, and overall workflow successfully work with gpt-realtime-whisper, while gpt-realtime-translate consistently returns OperationNotSupported, InvalidSessionType, HTTP 500 responses or unsuccessful WebSocket sessions. Since the same pattern is reproducible across multiple regions, the behavior does not appear to be related to a deployment configuration, authentication issue, audio formatting issue or a single-region scenario.

The correct supported way to invoke gpt-realtime-translate The currently available guidance references the standard Realtime API patterns and points to the Realtime WebSocket and WebRTC workflows used by other Realtime models. However, a validated Azure-specific end-to-end example demonstrating successful translation inference with gpt-realtime-translate is not currently available.
Endpoint and session type to be used The available guidance does not currently define a dedicated translation-specific session type such as session.type="translation" This aligns with the observed InvalidSessionType response. The recommended approach remains:
- Use the standard Realtime API workflow.
- Use the deployment name as the model identifier.
- Follow the documented Realtime session configuration.
A target language parameter A dedicated Azure-specific translation session contract or target-language parameter for gpt-realtime-translate is not currently documented. Because translation inference is not successfully completing, the expected translation output configuration cannot be validated at this stage.

Please check if the following workarounds help-

Whisper + Translation Workflow Streaming Audio > gpt-realtime-whisper > translation-capable service or model >anslated text or speech output
Azure AI Speech Translation For production real-time speech translation workloads, Azure AI Speech Translation provides a supported speech translation capability and may serve as an alternative

The following references might be helpful , please check them out

Thank you

Please "Accept" the answer with an "Upvote" if the response was helpful. This will be benefitting other community members who face the same issue.

Answer 2

Rayyan Fawad 1,075

Hi there! The reason your gpt-realtime-translate deployment is consistently throwing OperationNotSupported errors—even though it shares the same resource and code flow as your working whisper model—is because the Azure AI Foundry Realtime Audio API handles translation model configurations differently than standard audio-to-text transcriptions. While a basic whisper deployment automatically outputs standard text natively, the real-time speech translation engine requires an explicit target output language array defined during the initial session handshake to clear the backend initialization gates. Because the API endpoints cannot fall back to a default target language on a translation-specific model, hitting the endpoint without this structural parameter forces the server to reject the stream entirely or drop the connection with a 1006 error code. To get this working, you need to modify your WebSocket connection initialization or your client_secrets POST body to explicitly pass a modalities array containing both audio and text, alongside a target_languages property (such as ["es"] or your intended output code) within the session configuration parameters to allow the translation pipeline to properly map and serve your real-time inference requests.

Marco Cheung 0

Hi Fawad, I've followed your comment to revise my 'gpt-realtime-translate' session wrapper, but still encountered HTTP 400:

Response body: '{"error":{"code":"OpperationNotSupported","message":"The realtime operation does not work with the specified model. Please choose different model and try again."}}'

"""
Azure OpenAI gpt-realtime-translate session wrapper.

Connects to the Azure OpenAI Realtime GA endpoint and manages a single
translation session. Audio in/out is raw PCM16 at 24kHz (base64-encoded
over the WebSocket JSON transport).

Azure GA endpoint format (no api-version param):
  wss://{host}/openai/v1/realtime?model={deployment}
  Header: api-key: {AZURE_OPENAI_API_KEY}
"""

import asyncio
import base64
import json
import traceback
from typing import AsyncIterator, Optional
from urllib.parse import urlparse

import websockets
from websockets.exceptions import InvalidStatus

from config import (
    AZURE_OPENAI_API_KEY,
    AZURE_OPENAI_ENDPOINT,
    GPT_REALTIME_TRANSLATE_MODEL,
)
from services.utils import console


def _build_translate_ws_url() -> str:
    """Build the Azure OpenAI Realtime WebSocket URL for translation (GA format).

    GA format: wss://{host}/openai/v1/realtime?model={deployment}
    (no api-version param — that is preview-only)
    """
    raw = (AZURE_OPENAI_ENDPOINT or "").strip().rstrip("/")
    if not raw:
        raise ValueError("AZURE_OPENAI_ENDPOINT is not set")
    if not GPT_REALTIME_TRANSLATE_MODEL:
        raise ValueError("GPT_REALTIME_TRANSLATE_MODEL is not set")

    parsed = urlparse(raw if "://" in raw else f"https://{raw}")
    host = parsed.netloc
    return f"wss://{host}/openai/v1/realtime?model={GPT_REALTIME_TRANSLATE_MODEL}"


class OpenAITranslateSession:
    """
    Manages a single Azure OpenAI gpt-realtime-translate WebSocket session.

    Usage pattern:
        async with OpenAITranslateSession(target_language="zh") as sess:
            await sess.append_audio(pcm16_bytes)
            async for audio_chunk in sess.iter_audio():
                ...  # PCM16 bytes at 24kHz
    """

    def __init__(self, target_language: str, session_label: str = ""):
        """
        Args:
            target_language: BCP-47 language tag the model should output.
                             e.g. 'en', 'zh', 'de', 'fr'
            session_label:   Optional log prefix for debugging.
        """
        self._target_language = target_language
        self._label = session_label or f"translate-{target_language}"
        self._ws: Optional[websockets.WebSocketClientProtocol] = None
        self._audio_out_queue: asyncio.Queue = asyncio.Queue()
        self._recv_task: Optional[asyncio.Task] = None
        self._running = False

    async def __aenter__(self) -> "OpenAITranslateSession":
        await self.start()
        return self

    async def __aexit__(self, *_) -> None:
        await self.stop()

    async def start(self) -> None:
        """Open WebSocket and configure the translation session."""
        if self._running:
            return

        url = _build_translate_ws_url()
        if not AZURE_OPENAI_API_KEY:
            raise ValueError("AZURE_OPENAI_API_KEY is not set")

        # Log the URL (without key) so we can verify it in Cloud Run logs
        console.log(f"[{self._label}] Connecting to: {url}")
        console.log(f"[{self._label}] Model/deployment: {GPT_REALTIME_TRANSLATE_MODEL!r}")
        console.log(f"[{self._label}] Endpoint host: {(AZURE_OPENAI_ENDPOINT or '').strip()!r}")

        try:
            self._ws = await websockets.connect(
                url,
                additional_headers={"api-key": AZURE_OPENAI_API_KEY},
                ping_interval=30,
                ping_timeout=10,
            )
        except InvalidStatus as exc:
            body = ""
            try:
                body = exc.response.body.decode(errors="replace") if exc.response.body else ""
            except Exception:
                pass
            console.log(
                f"[{self._label}] Azure rejected WebSocket (HTTP {exc.response.status_code}). "
                f"Response body: {body!r}"
            )
            raise
        self._running = True
        console.log(f"[{self._label}] Connected to Azure translate endpoint")

        # Wait for session.created before sending session.update
        try:
            raw = await asyncio.wait_for(self._ws.recv(), timeout=10.0)
            event = json.loads(raw)
            if event.get("type") != "session.created":
                console.log(f"[{self._label}] Unexpected first event: {event.get('type')}")
        except asyncio.TimeoutError:
            console.log(f"[{self._label}] Timeout waiting for session.created")

        # Configure the session for translation.
        # gpt-realtime-translate requires explicit target_languages in the
        # session config; without it the backend rejects with OperationNotSupported.
        await self._ws.send(json.dumps({
            "type": "session.update",
            "session": {
                "modalities": ["audio", "text"],
                "target_languages": [self._target_language],
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500,
                },
            },
        }))
        console.log(f"[{self._label}] Session configured (target_lang={self._target_language})")

        # Start background receiver task
        self._recv_task = asyncio.create_task(self._receive_loop())

    async def stop(self) -> None:
        """Close the WebSocket session."""
        self._running = False
        if self._recv_task and not self._recv_task.done():
            self._recv_task.cancel()
            try:
                await self._recv_task
            except asyncio.CancelledError:
                pass
        if self._ws:
            try:
                await self._ws.close()
            except Exception:
                pass
            self._ws = None
        console.log(f"[{self._label}] Session closed")

    async def append_audio(self, pcm16_bytes: bytes) -> None:
        """
        Send a PCM16 audio chunk (24kHz, mono) to the model for translation.
        The bytes are base64-encoded and sent as input_audio_buffer.append.
        """
        if not self._running or not self._ws:
            return
        try:
            await self._ws.send(json.dumps({
                "type": "input_audio_buffer.append",
                "audio": base64.b64encode(pcm16_bytes).decode("utf-8"),
            }))
        except Exception as e:
            console.log(f"[{self._label}] Error appending audio: {e}")

    async def iter_audio(self) -> AsyncIterator[bytes]:
        """
        Async generator that yields translated PCM16 audio chunks (24kHz)
        as they arrive from the model.
        """
        while self._running:
            try:
                chunk = await asyncio.wait_for(self._audio_out_queue.get(), timeout=0.5)
                yield chunk
            except asyncio.TimeoutError:
                continue
            except asyncio.CancelledError:
                break

    async def _receive_loop(self) -> None:
        """Background task: receive events from the WebSocket and enqueue audio."""
        try:
            async for raw in self._ws:
                if not self._running:
                    break
                try:
                    event = json.loads(raw)
                    event_type = event.get("type", "")

                    if event_type == "response.audio.delta":
                        audio_b64 = event.get("delta", "")
                        if audio_b64:
                            await self._audio_out_queue.put(
                                base64.b64decode(audio_b64)
                            )

                    elif event_type == "error":
                        err = event.get("error", {})
                        console.log(
                            f"[{self._label}] API error: "
                            f"{err.get('code')} – {err.get('message')}"
                        )

                except Exception as e:
                    console.log(f"[{self._label}] Error processing event: {e}")

        except websockets.exceptions.ConnectionClosedOK:
            pass
        except asyncio.CancelledError:
            raise
        except Exception as e:
            if self._running:
                console.log(f"[{self._label}] Receive loop error: {e}")
                traceback.print_exc()
        finally:
            self._running = False

gpt-realtime-translate (2026-05-06, GA) deploys successfully but inference always fails with OperationNotSupported

Service / technology

Scenario (what I'm trying to do)

Environment

What works — control: gpt-realtime-whisper (same resource, same flow)

What fails — gpt-realtime-translate (every path I tried)

Key observations

Troubleshooting / docs referenced

Question

2 answers

Your answer