gpt-realtime-translate (2026-05-06, GA) deploys successfully but inference always fails with OperationNotSupported

Kojok 5 Reputation points
2026-06-05T05:31:07.7866667+00:00

Service / technology

Azure AI Foundry (AIServices resource), Azure OpenAI Realtime API, model

gpt-realtime-translate (speech translation). Control model: gpt-realtime-whisper.

Scenario (what I'm trying to do)

Use gpt-realtime-translate to translate live audio in real time (the model listed as

"speech-translation" in the model catalog). I deployed it and tried to send audio to it through

the Realtime API.

Environment

  • Resource kind: Azure AI Foundry (AIServices)
  • Model: gpt-realtime-translate, version 2026-05-06, deployment type GlobalStandard
  • Deployment status in the portal: Provisioning state = Succeeded, Lifecycle = GenerallyAvailable
  • Control model (same resource): gpt-realtime-whisper, version 2026-05-06, GlobalStandard
  • Regions tested: East US 2 (existing resource) and Sweden Central (a brand-new resource I created)
  • Auth: resource API key (api-key header)
  • Test audio: 24 kHz mono PCM16, clean English speech

What works — control: gpt-realtime-whisper (same resource, same flow)

  1. POST https://<resource>.cognitiveservices.azure.com/openai/realtimeapi/transcription_sessions?api-version=2025-04-01-preview body: {"input_audio_transcription":{"model":"gpt-realtime-whisper"}}200, returns client_secret
  2. Connect (wss://<resource>.cognitiveservices.azure.com/openai/realtime?api-version=2025-04-01-preview&intent=transcription), send the PCM16 audio, then input_audio_buffer.commit.
  3. Result: conversation.item.input_audio_transcription.completed with the correct transcript. ✅

What fails — gpt-realtime-translate (every path I tried)

1) Realtime (conversational) session, preview

POST .../openai/realtimeapi/sessions?api-version=2025-04-01-preview

body {"model":"gpt-realtime-translate"}

400 {"error":{"code":"OperationNotSupported","message":"The realtime operation does not work with the specified model. Please choose different model and try again."}}

2) Realtime (conversational) session, GA

POST .../openai/v1/realtime/client_secrets

  • {"session":{"type":"realtime","model":"gpt-realtime-translate"}}400 same "does not work with the specified model"
  • {"session":{"type":"transcription","model":"gpt-realtime-translate"}}500 Internal server error
  • {"session":{"type":"translation",...}}400 InvalidSessionType: Session Type is invalid or not found

3) Transcription session (the exact flow that works for whisper)

POST .../openai/realtimeapi/transcription_sessions?api-version=2025-04-01-preview

body {"input_audio_transcription":{"model":"gpt-realtime-translate"}}200 (mints client_secret),

but after connecting and sending audio:

conversation.item.input_audio_transcription.failed

{"type":"server_error","code":"OperationNotSupported","message":"Input transcription failed for item '...'."}

(Tried with no language, with language=en, and with a translation prompt; all fail identically.)

4) Realtime WebSocket connect

  • GA wss://<resource>.cognitiveservices.azure.com/openai/v1/realtime?model=gpt-realtime-translate → closes with 1006
  • Preview wss://.../openai/realtime?api-version=2025-04-01-preview&deployment=gpt-realtime-translate1006

5) chat/completions

POST .../openai/deployments/gpt-realtime-translate/chat/completions?api-version=2024-02-01

404 "This is not a chat model and thus not supported in the v1/chat/completions endpoint."

6) Foundry Playground for this deployment →

"Playground is not yet supported for this model. Please check back later for updates."

Key observations

  • The deployment is found at runtime: a non-existent model name returns DeploymentNotFound, whereas gpt-realtime-translate returns OperationNotSupported — so the failure is the operation, not a missing deployment.
  • gpt-realtime-whisper works through the identical transcription-session flow and the same audio.
  • session.type=transcription via GA client_secrets returns HTTP 500, which looks like a server-side defect.
  • I created a brand-new resource in Sweden Central, deployed both models there, and got the same result (whisper completes, translate OperationNotSupported). So it is not specific to one resource or region.

Troubleshooting / docs referenced

  • "GPT Realtime Translate" concept doc — it states the model is used "through the Realtime API, the same as other realtime models," but the realtime (conversational) session endpoints reject it.
  • "Use the GPT Realtime API via WebSockets / WebRTC" how-to docs (GA client_secrets / calls, preview sessions / regional realtimertc).
  • Realtime API reference (which redirects to the OpenAI Realtime spec).

Question

What is the correct, supported way to invoke gpt-realtime-translate for real-time speech

translation, and is its inference serving actually enabled?

Specifically:

  1. Which endpoint and session.type should be used, and is there a target/output-language parameter for the translation?
  2. Given the deployment reports GenerallyAvailable / Succeeded, but the realtime endpoint rejects the model, the transcription path returns OperationNotSupported, the GA transcription client_secrets returns HTTP 500, and the Playground says "not yet supported" — is the serving for this model incomplete, or am I calling it incorrectly?

A minimal working example (endpoint + session config + how the translated output is received) would

be greatly appreciated.

Microsoft Foundry
Microsoft Foundry

A unified Azure platform for creating and managing AI models, agents, and applications with built‑in enterprise security, monitoring, and governance


2 answers

Sort by: Most helpful
  1. Karnam Venkata Rajeswari 4,265 Reputation points Microsoft External Staff Moderator
    2026-06-19T18:47:45.06+00:00

    Hello @Kojok

    Welcome to Microsoft Q&A .Thank you for reaching out to us.

    This behavior is most likely occurring because the deployment can be successfully resolved by the service, but the runtime execution path for gpt-realtime-translate is not completing successfully through the Realtime API flows that were tested.

    The key observation is that the same environment, authentication method, audio format, and overall workflow successfully work with gpt-realtime-whisper, while gpt-realtime-translate consistently returns OperationNotSupported, InvalidSessionType, HTTP 500 responses or unsuccessful WebSocket sessions. Since the same pattern is reproducible across multiple regions, the behavior does not appear to be related to a deployment configuration, authentication issue, audio formatting issue or a single-region scenario.

    1. The correct supported way to invoke gpt-realtime-translate The currently available guidance references the standard Realtime API patterns and points to the Realtime WebSocket and WebRTC workflows used by other Realtime models. However, a validated Azure-specific end-to-end example demonstrating successful translation inference with gpt-realtime-translate is not currently available.
    2. Endpoint and session type to be used The available guidance does not currently define a dedicated translation-specific session type such as session.type="translation" This aligns with the observed InvalidSessionType response. The recommended approach remains:
      • Use the standard Realtime API workflow.
      • Use the deployment name as the model identifier.
      • Follow the documented Realtime session configuration.
    3. A target language parameter A dedicated Azure-specific translation session contract or target-language parameter for gpt-realtime-translate is not currently documented. Because translation inference is not successfully completing, the expected translation output configuration cannot be validated at this stage.

    Please check if the following workarounds help-

    1. Whisper + Translation Workflow Streaming Audio > gpt-realtime-whisper > translation-capable service or model >anslated text or speech output
    2. Azure AI Speech Translation For production real-time speech translation workloads, Azure AI Speech Translation provides a supported speech translation capability and may serve as an alternative

    The following references might be helpful , please check them out

       

    Thank you

    Please "Accept" the answer with an "Upvote" if the response was helpful. This will be benefitting other community members who face the same issue.

    Was this answer helpful?

    0 comments No comments

  2. Rayyan Fawad 1,075 Reputation points
    2026-06-05T08:54:22.16+00:00

    Hi there! The reason your gpt-realtime-translate deployment is consistently throwing OperationNotSupported errors—even though it shares the same resource and code flow as your working whisper model—is because the Azure AI Foundry Realtime Audio API handles translation model configurations differently than standard audio-to-text transcriptions. While a basic whisper deployment automatically outputs standard text natively, the real-time speech translation engine requires an explicit target output language array defined during the initial session handshake to clear the backend initialization gates. Because the API endpoints cannot fall back to a default target language on a translation-specific model, hitting the endpoint without this structural parameter forces the server to reject the stream entirely or drop the connection with a 1006 error code. To get this working, you need to modify your WebSocket connection initialization or your client_secrets POST body to explicitly pass a modalities array containing both audio and text, alongside a target_languages property (such as ["es"] or your intended output code) within the session configuration parameters to allow the translation pipeline to properly map and serve your real-time inference requests.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.