An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Baptiste AUTIN, Thanks for the detailed follow-up and the request ID — that clears up exactly what's happening with the preview transcription_sessions flow. Let me answer your four questions directly, then give you the Java for the supported GA path.
The key detail is in your own trace: the 302 redirects you to wss://<resource>/v1/realtime?... — note it drops the /openai segment. The GA Realtime WebSocket path is wss://<resource>/openai/v1/realtime, so the redirected /v1/realtime target doesn't resolve → 404. That's a server-side routing gap in the preview transcription-session redirect, not something you can fix from the client.
- Is
/openai/realtimeapi/transcription_sessionsunsupported/deprecated forgpt-realtime-whisper? Thetranscription_sessionsendpoint is a preview surface. The standalone transcription-socket path is not a reliable end-to-end route today — andgpt-realtime-whisperis not wired as a standalone transcription connection model (consistent with the 400 you saw in Test 1). So for practical purposes: treat this preview path as not currently usable end-to-end forgpt-realtime-whisper. - If it's unsupported, why does it return a session object +
client_secret? Because session creation and WebSocket connect are two independent operations. ThePOSTis a control-plane call that just allocates arealtime.transcription_sessionobject and mints an ephemeral secret — it does not validate that a downstream transcription socket exists for that deployment/intent, or that the model is valid on that socket. A 200 on the mint is therefore not a signal that the flow is supported; the availability check only happens at connect time (where it 404s). - If it were supported, what WS URL/auth do I use after session creation? For the supported GA pattern there is no separate transcription socket. You connect to the standard Realtime endpoint with a realtime model and enable transcription via
session.update:
wss://<resource>.openai.azure.com/openai/v1/realtime?model=<your-realtime-deployment>
- No
api-version, nointent=transcription, nodeployment=gpt-realtime-whisperquery params on the GA endpoint. - Auth: header
api-key: <key>(orAuthorization: Bearer <entra-token>). - Transcription model goes in
input_audio_transcription.modelas a deployment name (whisper-1/gpt-4o-transcribe/gpt-4o-mini-transcribe/gpt-4o-transcribe-diarizefamily).
- Minimal Java example (recommended supported flow). Below.
Option A — raw WebSocket (java.net.http, no SDK):
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.WebSocket;
public class RealtimeTranscription {
public static void main(String[] args) throws Exception {
String resource = System.getenv("AOAI_RESOURCE");
String apiKey = System.getenv("AOAI_API_KEY");
// GA endpoint: no api-version, no intent/deployment query params
URI uri = URI.create("wss://" + resource +
".openai.azure.com/openai/v1/realtime?model=gpt-realtime-1.5");
String sessionUpdate = """
{ "type": "session.update",
"session": {
"type": "realtime",
"input_audio_transcription": { "model": "gpt-4o-mini-transcribe", "language": "fr" },
"turn_detection": { "type": "server_vad" }
} }""";
HttpClient.newHttpClient().newWebSocketBuilder()
.header("api-key", apiKey)
.buildAsync(uri, new WebSocket.Listener() {
public void onOpen(WebSocket ws) {
ws.sendText(sessionUpdate, true); // enable transcription
ws.request(1);
}
public java.util.concurrent.CompletionStage<?> onText(WebSocket ws, CharSequence d, boolean last) {
System.out.println("<< " + d); // session.updated, then transcription events
ws.request(1);
return null;
}
}).join();
Thread.sleep(60_000); // keep alive; then stream audio via input_audio_buffer.append
}
}
After session.updated, append audio (input_audio_buffer.append) and you'll receive the input-transcription events (conversation.item.input_audio_transcription.completed).
Option B — Azure VoiceLive SDK for Java (higher-level; handles the socket + typed events). It exposes AudioInputTranscriptionOptions with model values like WHISPER_1, set via VoiceLiveSessionOptions.setInputAudioTranscription(...) and started with client.startSession("<realtime-deployment>", null). Use this if you'd rather not manage the raw WebSocket.
Switch to the GA realtime session with session.input_audio_transcription; that's the fully supported path. The preview transcription_sessions route is currently a dead end for gpt-realtime-whisper because of the /openai-dropping redirect (→ 404), and a successful POST there only mints a token, it doesn't confirm the socket exists. If you specifically need the standalone transcription-session flow, I can raise the redirect/404 (with your apim-request-id: b270a055-1574-4ed6-83ec-f27ddc6f757c) to the product team — just confirm you want that tracked.
References
- Realtime API via WebSockets — connection & auth: https://learn.microsoft.com/azure/foundry/openai/how-to/realtime-audio-websockets
- Realtime API reference — Azure deviation (transcription model = deployment name): https://learn.microsoft.com/azure/foundry/openai/realtime-audio-reference
Kindly let us know if the above helps or you need further assistance on this issue.
If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".