Azure OpenAI Realtime client_secrets returns 500 when input_audio_transcription is included (Sweden Central)

Question

Azure OpenAI Realtime client_secrets returns 500 when input_audio_transcription is included (Sweden Central)

James Morgan 0

We are seeing a consistent server-side failure in Azure OpenAI Realtime when requesting client secrets with input_audio_transcription enabled.

Environment

Region: Sweden Central
Resource: LBBD-OpenAI-Sweden-Dev
Subscription: 52a9fd6d-324a-4cc7-861d-b17e4cf9c219
API path: /openai/v1/realtime/client_secrets
Auth: Managed Identity (DefaultAzureCredential)
Deployment tested: gpt-4o-mini-transcribe-sweden-dev-v2 (fresh deployment name)

Observed behavior

Request WITH input_audio_transcription in session payload -> HTTP 500
Same request WITHOUT input_audio_transcription -> HTTP 200

This is reproducible both directly against the endpoint and through our app route that mints realtime tokens.

What we already checked

payload structure
deployment recreation with new name
same auth and api-version across both requests
retries and fallback path

Question

Is there a known region-specific issue or feature-gating requirement for input_audio_transcription in Realtime session creation on Azure OpenAI? If not, what exact prerequisites are required for this field to work?

James Morgan 0 Reputation points

2026-06-02T11:26:32.3166667+00:00
Adding concrete repro IDs from fresh test (UTC 2026-06-02 11:24):

Failing call WITH input_audio_transcription -> HTTP 500

apim-request-id / activity_id: c0d6e5c5-6e40-4039-a4e1-cd21f985cae1

Fallback call WITHOUT input_audio_transcription -> HTTP 200

apim-request-id / activity_id: bbae1531-28f3-4243-b5ee-c780a9681157

Both calls used same endpoint, auth path, and model family; only the transcription block differs.

2 answers

Your answer

James Morgan 0 Reputation points

2026-06-02T11:26:32.3166667+00:00

Adding concrete repro IDs from fresh test (UTC 2026-06-02 11:24):

Failing call WITH input_audio_transcription -> HTTP 500

apim-request-id / activity_id: c0d6e5c5-6e40-4039-a4e1-cd21f985cae1

Fallback call WITHOUT input_audio_transcription -> HTTP 200

apim-request-id / activity_id: bbae1531-28f3-4243-b5ee-c780a9681157

Both calls used same endpoint, auth path, and model family; only the transcription block differs.

Answer 1

Hi James Morgan,

Thanks for the exceptionally clean repro — the paired request IDs and the "remove one block → 200" contrast make this easy to reason about. Here's where it lands.

Root cause

Your payload is valid, and the value you're passing for input_audio_transcription.model (gpt-4o-mini-transcribe-sweden-dev-v2) is a deployment name — which is exactly what Azure OpenAI requires. Azure deviates from OpenAI here and expects the deployment name, not a raw model ID like whisper-1. So that part is correct.

Because the identical call returns 200 the moment the input_audio_transcription block is removed, this is a service-side (500) failure in the /client_secrets mint path when that block is present — not a validation, feature-gating, or configuration problem on your side. There's no documented region-specific prerequisite or feature flag for input_audio_transcription beyond using a supported transcription model referenced by deployment name.

Immediate workaround (unblocks you now)

Configure transcription after the connection instead of at token-mint time:

Mint the ephemeral token without input_audio_transcription (this returns 200).
Open the Realtime WebSocket/WebRTC connection using that token.
Send a session.update event that enables transcription:

{
  "type": "session.update",
  "session": {
    "type": "realtime",
    "input_audio_transcription": { "model": "gpt-4o-mini-transcribe-sweden-dev-v2" }
  }
}

The server replies session.updated with the transcription model applied, and input-transcription events start flowing. This post-connect path is confirmed working for this exact scenario, so it's a reliable way to proceed while the mint-time issue is addressed.

Is this a bug?

Effectively yes — it's a backend fault in the /client_secrets mint path, not something you can fix by changing config:

The payload is valid and the deployment-name usage is correct.
Removing one optional block flips 500 → 200.
The same transcription configuration is accepted cleanly on the post-connect session.update path — proving the content is supported; only the mint-time handling faults.

There's precedent for transient service-side 500s on Sweden Central transcribe models that Microsoft mitigated on the backend, so it's also worth confirming whether this is persistent or intermittent right now.

To get this investigated (please share)

Is the 500 100% reproducible right now, or intermittent?
The exact JSON of the failing /client_secrets request (with and without the block), plus 2–3 fresh failing apim-request-ids + UTC timestamps (you've already given c0d6e5c5-6e40-4039-a4e1-cd21f985cae1).
Does it also fail with API-key auth (to rule out anything Managed-Identity-specific)?
Does the same call succeed in another region (e.g., East US 2) with an equivalent transcription deployment?
Confirm you're on the GA endpoint without an api-version query param.

If the 500 persists with a valid payload and a supported transcription deployment, let us know the details requested over Private message.

Please give the post-connect workaround a try and let me know if it unblocks your flow.

If this helps, consider marking it as accepted so others hitting the same 500 can find it.

Answer 2

There is no region-specific or feature-gating prerequisite for input_audio_transcription documented for Azure OpenAI Realtime beyond using the correct models and schema. A 500 response when adding input_audio_transcription indicates a service-side error rather than a validation or configuration issue on the client.

From the Realtime REST reference:

input_audio_transcription is an optional configuration object on the Realtime session. It “defaults to off and can be set to null to turn off once on.”
It is not tied to a special feature flag in the public documentation.
The supported transcription models for this property are:
- whisper-1
- gpt-4o-mini-transcribe
- gpt-4o-mini-transcribe-2025-12-15
- gpt-4o-transcribe
- gpt-4o-transcribe-diarize

The only documented prerequisites for using input_audio_transcription are therefore:

Use the GA Realtime API endpoints and protocol
- Path: POST {endpoint}/openai/v1/realtime/client_secrets
- Optional api-version query: v1 or preview (defaults to v1 if omitted).
Provide a valid Realtime session payload
- session.type must be set appropriately for a Realtime session.
- If input_audio_transcription is included, its model value must be one of the supported transcription models listed above.
- Optional fields like language and prompt follow the documented schema.
Use a supported Realtime model deployment
- For WebRTC/Realtime, use one of the Realtime models listed for the region (for example gpt-4o-mini-realtime-preview, gpt-4o-realtime-preview, gpt-realtime, or gpt-realtime-mini as documented for East US 2 and Sweden Central).
- Ensure the deployment name in the session configuration matches the actual deployment.

Given that:

The same request succeeds (200) when input_audio_transcription is omitted, and
The documentation does not list any additional region-specific requirements or feature flags for input_audio_transcription,

this pattern is consistent with a backend issue in the Sweden Central Realtime service rather than a missing prerequisite in the request. The next steps that align with the documentation are:

Verify that the input_audio_transcription.model is one of the supported transcription models listed above.
Confirm that the Realtime model deployment being used is supported in Sweden Central and that the deployment name is correct.
If the 500 persists with a valid payload and supported model, treat it as a service-side incident and raise it via Azure support, including:
- Region (Sweden Central), resource name, and subscription ID.
- The exact POST /openai/v1/realtime/client_secrets payload with and without input_audio_transcription.
- Timestamps and apim-request-id values from the failing 500 responses.

There is no documented feature-gating or extra configuration required beyond the above; a persistent 500 in this scenario requires service investigation.

References: