Azure OpenAI Realtime API + gpt-realtime-whisper: is realtime transcription currently supported?

Question

Azure OpenAI Realtime API + gpt-realtime-whisper: is realtime transcription currently supported?

Baptiste AUTIN 0

Hello,

I'm trying to implement real-time speech-to-text transcription using Azure OpenAI Realtime API and the gpt-realtime-whisper model, following both the Azure and OpenAI documentation.

However, I am observing what appears to be a contradiction between model availability and API behavior.

Environment

Azure OpenAI resource in France Central

API endpoint:

wss://<resource>.openai.azure.com/openai/v1/realtime

Authentication via API key header

Azure deployments:

gpt-realtime-whisper

gpt-realtime-1.5
Java 17

Test 1: Connect directly with gpt-realtime-whisper

WebSocket URL:

wss://<resource>.openai.azure.com/openai/v1/realtime?model=gpt-realtime-whisper

Azure rejects the handshake with HTTP 400:

{
  "error": {
    "code": "OpperationNotSupported",
    "message": "The realtime operation does not work with the specified model. Please choose different model and try again."
  }
}

Response headers include:

apim-request-id: 4b30c637-e2c5-41f9-8e2c-5d37fa3d22d8
x-ms-region: France Central

This suggests that gpt-realtime-whisper cannot be used as the model for the /realtime connection itself.

Test 2: Connect with gpt-realtime-1.5 and configure transcription

I then created a separate deployment:

realtime deployment      = gpt-realtime-1.5
transcription deployment = gpt-realtime-whisper

Connection:

wss://<resource>.openai.azure.com/openai/v1/realtime?model=gpt-realtime-1.5

The WebSocket handshake succeeds.

However, when sending a transcription-oriented session.update event, Azure returns:

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_parameter",
    "message": "Passing a transcription session update event to a realtime session is not allowed."
  }
}

Event ID:

event_DpCbAikBjakRWZkdns5ey

Test 3: Using the preview transcription session workflow

I also tested the preview transcription-specific workflow documented by Azure.

First, I successfully created a transcription session using:

The response was successful and returned a valid transcription session object:

The response also contained a valid client_secret, but no explicit WebSocket URL.

Following the Azure preview documentation, I then attempted to connect to:

Azure responded with an HTTP 302 redirect to:

However, the redirected endpoint immediately returned:

According to the Azure documentation, preview endpoints should use /openai/realtime, while GA endpoints use /openai/v1/realtime, and mixing the two formats may result in a 404 error. In this case, Azure itself appears to redirect from the documented preview endpoint to a /v1/realtime endpoint that then returns 404. Is this behavior expected, or could this indicate a platform issue in the current implementation of realtime transcription sessions?

Questions

Can Microsoft confirm the current support status of Realtime Transcription in Azure OpenAI?

Specifically:

Is gpt-realtime-whisper currently supported for Azure's /realtime WebSocket endpoint?

Is Azure OpenAI expected to support OpenAI-style transcription sessions (session.type = "transcription")?

If realtime transcription is supported, what is the correct deployment/model combination and session configuration?

If it is not yet supported in Azure, is the current behavior expected even though gpt-realtime-whisper is available as a deployable model?

According to the OpenAI documentation, gpt-realtime-whisper is the recommended model for realtime transcription.

However, in Azure:

gpt-realtime-whisper is rejected as a /realtime connection model.

gpt-realtime-1.5 accepts the connection but rejects transcription session updates.

Therefore, it is unclear whether realtime transcription is currently available in Azure OpenAI or whether only conversational realtime sessions are supported.

Any clarification would be greatly appreciated.

Thank you.

0 comments

3 answers

Your answer

Answer 1

Baptiste AUTIN, Thanks for the detailed follow-up and the request ID — that clears up exactly what's happening with the preview transcription_sessions flow. Let me answer your four questions directly, then give you the Java for the supported GA path.

The key detail is in your own trace: the 302 redirects you to wss://<resource>/v1/realtime?... — note it drops the /openai segment. The GA Realtime WebSocket path is wss://<resource>/openai/v1/realtime, so the redirected /v1/realtime target doesn't resolve → 404. That's a server-side routing gap in the preview transcription-session redirect, not something you can fix from the client.

Is /openai/realtimeapi/transcription_sessions unsupported/deprecated for gpt-realtime-whisper? The transcription_sessions endpoint is a preview surface. The standalone transcription-socket path is not a reliable end-to-end route today — and gpt-realtime-whisper is not wired as a standalone transcription connection model (consistent with the 400 you saw in Test 1). So for practical purposes: treat this preview path as not currently usable end-to-end for gpt-realtime-whisper.
If it's unsupported, why does it return a session object + client_secret? Because session creation and WebSocket connect are two independent operations. The POST is a control-plane call that just allocates a realtime.transcription_session object and mints an ephemeral secret — it does not validate that a downstream transcription socket exists for that deployment/intent, or that the model is valid on that socket. A 200 on the mint is therefore not a signal that the flow is supported; the availability check only happens at connect time (where it 404s).
If it were supported, what WS URL/auth do I use after session creation? For the supported GA pattern there is no separate transcription socket. You connect to the standard Realtime endpoint with a realtime model and enable transcription via session.update:

wss://<resource>.openai.azure.com/openai/v1/realtime?model=<your-realtime-deployment>

No api-version, no intent=transcription, no deployment=gpt-realtime-whisper query params on the GA endpoint.
Auth: header api-key: <key> (or Authorization: Bearer <entra-token>).
Transcription model goes in input_audio_transcription.model as a deployment name (whisper-1 / gpt-4o-transcribe / gpt-4o-mini-transcribe / gpt-4o-transcribe-diarize family).

Minimal Java example (recommended supported flow). Below.

Option A — raw WebSocket (java.net.http, no SDK):

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.WebSocket;
public class RealtimeTranscription {
    public static void main(String[] args) throws Exception {
        String resource = System.getenv("AOAI_RESOURCE");
        String apiKey   = System.getenv("AOAI_API_KEY");
        // GA endpoint: no api-version, no intent/deployment query params
        URI uri = URI.create("wss://" + resource +
                ".openai.azure.com/openai/v1/realtime?model=gpt-realtime-1.5");
        String sessionUpdate = """
            { "type": "session.update",
              "session": {
                "type": "realtime",
                "input_audio_transcription": { "model": "gpt-4o-mini-transcribe", "language": "fr" },
                "turn_detection": { "type": "server_vad" }
              } }""";
        HttpClient.newHttpClient().newWebSocketBuilder()
            .header("api-key", apiKey)
            .buildAsync(uri, new WebSocket.Listener() {
                public void onOpen(WebSocket ws) {
                    ws.sendText(sessionUpdate, true);   // enable transcription
                    ws.request(1);
                }
                public java.util.concurrent.CompletionStage<?> onText(WebSocket ws, CharSequence d, boolean last) {
                    System.out.println("<< " + d);      // session.updated, then transcription events
                    ws.request(1);
                    return null;
                }
            }).join();
        Thread.sleep(60_000); // keep alive; then stream audio via input_audio_buffer.append
    }
}

After session.updated, append audio (input_audio_buffer.append) and you'll receive the input-transcription events (conversation.item.input_audio_transcription.completed).

Option B — Azure VoiceLive SDK for Java (higher-level; handles the socket + typed events). It exposes AudioInputTranscriptionOptions with model values like WHISPER_1, set via VoiceLiveSessionOptions.setInputAudioTranscription(...) and started with client.startSession("<realtime-deployment>", null). Use this if you'd rather not manage the raw WebSocket.

Switch to the GA realtime session with session.input_audio_transcription; that's the fully supported path. The preview transcription_sessions route is currently a dead end for gpt-realtime-whisper because of the /openai-dropping redirect (→ 404), and a successful POST there only mints a token, it doesn't confirm the socket exists. If you specifically need the standalone transcription-session flow, I can raise the redirect/404 (with your apim-request-id: b270a055-1574-4ed6-83ec-f27ddc6f757c) to the product team — just confirm you want that tracked.

References

Realtime API via WebSockets — connection & auth: https://learn.microsoft.com/azure/foundry/openai/how-to/realtime-audio-websockets
Realtime API reference — Azure deviation (transcription model = deployment name): https://learn.microsoft.com/azure/foundry/openai/realtime-audio-reference

Kindly let us know if the above helps or you need further assistance on this issue.

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Answer 2

Thank you, this clarifies the GA /openai/v1/realtime behavior.

Your explanation is consistent with our earlier observations:

using gpt-realtime-whisper as the /realtime connection model returns “The realtime operation does not work with the specified model”;
using an OpenAI-style dedicated transcription session on a standard realtime session is also rejected.

So we understand that the supported GA pattern is:

connect to /openai/v1/realtime with a supported realtime model such as gpt-realtime-1.5;
enable transcription with session.input_audio_transcription using a deployment name.

However, we also observed a different Azure behavior on June 10, 2026 with the preview transcription_sessions flow:

POST https://<resource>/openai/realtimeapi/transcription_sessions?api-version=2025-04-01-preview succeeds
Azure returns:
- object = realtime.transcription_session
- id = sess_DpE8UrTWVVY9rJQhPwDlb
- a valid client_secret
Opening: wss://<resource>/openai/realtime?api-version=2025-04-01-preview&intent=transcription&deployment=gpt-realtime-whisper returns HTTP 302
Azure redirects to: wss://<resource>/v1/realtime?api-version=2025-04-01-preview&intent=transcription&deployment=gpt-realtime-whisper&api-key=...
Following that redirected URL returns HTTP 404 Resource not found

(apim-request-id: b270a055-1574-4ed6-83ec-f27ddc6f757c)

Could you please clarify the status of this preview transcription_sessions flow on Azure OpenAI?

Is /openai/realtimeapi/transcription_sessions currently unsupported or deprecated for gpt-realtime-whisper?
If it is unsupported, why does it successfully create a realtime.transcription_session and return a client_secret?
If it is supported, what is the correct WebSocket URL and authentication pattern to use after the session creation call?
Could you share a minimal official Azure example in Java for the recommended supported websocket transcription flow?

If the supported path is only the GA realtime session with session.input_audio_transcription, we can switch our implementation to that model.

Answer 3

Realtime transcription is supported in Azure OpenAI, but not via gpt-realtime-whisper as a /realtime connection model, and not via OpenAI-style session.type = "transcription" sessions.

Based on the documented Azure behavior:

gpt-realtime-whisper is not a valid model for the /realtime WebSocket connection
- The Realtime API in Azure currently lists only GPT realtime models as supported connection models:
  - gpt-4o-realtime-preview
  - gpt-4o-mini-realtime-preview
  - gpt-realtime
  - gpt-realtime-mini (multiple versions)
  - gpt-realtime-1.5
- These are the models that can be passed in the model query parameter when connecting to /openai/v1/realtime.
- gpt-realtime-whisper is documented as a streaming transcription model that runs alongside other realtime models, not as the primary realtime session model.
- This matches the error observed: “The realtime operation does not work with the specified model. Please choose different model and try again.”
How realtime transcription is enabled in Azure OpenAI
- Realtime transcription is enabled inside a realtime session by configuring input_audio_transcription on the session, not by using gpt-realtime-whisper as the connection model and not by using session.type = "transcription".
- The documented pattern is:
  - Connect to /openai/v1/realtime with a supported realtime model (for example gpt-realtime or gpt-realtime-mini, and by extension gpt-realtime-1.5 as listed in the supported models).
  - Immediately send a session.update event that sets session.input_audio_transcription.
- For Azure OpenAI, the model field inside input_audio_transcription must be the name of an existing model deployment, not a raw model ID:
```
     {
       "type": "session.update",
       "session": {
         "input_audio_transcription": {
           "model": "<your-transcription-deployment-name>"
         }
       }
     }
```
- The realtime reference explicitly calls out this Azure deviation: the transcription model is referenced by deployment name (for example my-gpt-4o-transcribe-deployment), not by whisper-1 or similar.
Which transcription models can be used with realtime sessions
- For Voice Live / realtime-style scenarios, documentation shows two patterns:
  - Using a Foundry/MAI transcription deployment (for example "model": "mai-transcribe-1").
  - When using gpt-realtime or gpt-realtime-mini, using OpenAI transcription models such as whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, or gpt-4o-transcribe-diarize in input_audio_transcription.
- In Azure, these must again be referenced by deployment name of the corresponding transcription model.
- GPT Realtime Whisper is described as a streaming transcription model available via the Realtime API and deployed from the Foundry model catalog, to be used “alongside other realtime models … to provide source-language transcription in parallel with translation.” This aligns with the input_audio_transcription pattern rather than being the primary /realtime model.
OpenAI-style session.type = "transcription" sessions
- Azure’s Realtime API is documented as following the OpenAI Realtime API spec, but with explicit Azure deviations.
- One documented deviation is that the model in input_audio_transcription must be a deployment name.
- The Azure docs and examples for realtime audio and Voice Live show transcription configured via session.update on a standard realtime session, not via a separate session.type = "transcription" session.
- The error “Passing a transcription session update event to a realtime session is not allowed.” indicates that the OpenAI-style dedicated transcription session flow is not currently exposed in the Azure GA Realtime endpoint.
Expected behavior vs. current limitations
- Supported and expected in Azure today:
  - Connect to /openai/v1/realtime with a supported realtime model (for example gpt-realtime-1.5).
  - Enable transcription by setting session.input_audio_transcription to a valid transcription deployment (for example a deployment of gpt-4o-transcribe, whisper-1, or a Foundry transcription model), referenced by deployment name.
  - Receive transcription events (conversation.item.audio_transcription.completed) as part of the same realtime session.
- Not supported / not documented as supported in Azure today:
  - Using gpt-realtime-whisper as the model parameter for the /realtime WebSocket connection.
  - Using OpenAI-style dedicated transcription sessions (session.type = "transcription") on Azure’s GA /openai/v1/realtime endpoint.
Why gpt-realtime-whisper is deployable but rejected as a /realtime model
- GPT Realtime Whisper is available as a Global Standard deployment in Foundry and is intended to be used as a streaming transcription model alongside other realtime models.
- In Azure’s Realtime API, this means it is used via input_audio_transcription within a session whose primary model is one of the supported realtime GPT models, not as the primary model for the WebSocket connection.
- The behavior observed (400 on using it as the connection model, and errors when trying to use OpenAI-style transcription sessions) is therefore consistent with the current Azure documentation and model roles.

In summary:

Realtime transcription is supported in Azure OpenAI, but only as a feature of a standard realtime session using a supported GPT realtime model (gpt-realtime, gpt-realtime-mini, gpt-realtime-1.5, etc.).
gpt-realtime-whisper is not supported as the /realtime connection model; it is used as a transcription model via session.input_audio_transcription, referenced by its deployment name.
OpenAI-style dedicated transcription sessions (session.type = "transcription") are not documented as supported on Azure’s GA /openai/v1/realtime endpoint, and the errors seen when attempting that flow are expected under the current Azure behavior.

References: