Azure OpenAI Realtime WebSocket returns replacement characters (U+FFFD) in Chinese/Japanese transcripts and translations

Question

Azure OpenAI Realtime WebSocket returns replacement characters (U+FFFD) in Chinese/Japanese transcripts and translations

YangQi 0

Summary

When using Azure OpenAI Realtime over WebSocket for live speech transcription and translation, we intermittently receive replacement characters (U+FFFD, rendered as �) in Chinese and Japanese text.

This affects:

Chinese speech transcription (input transcript)
Japanese speech transcription (input transcript)
Chinese/Japanese translation output (output transcript)

In practice, expected CJK text sometimes becomes corrupted, for example �果 or ï¿½æžœ fragments.

Environment

Service: Azure OpenAI
Mode: Realtime API via WebSocket
Endpoints used:
- Translation: /openai/v1/realtime/translations?model=<deployment>
- Transcription intent: /openai/v1/realtime?intent=transcription
Audio format: PCM 24kHz (streamed chunks)
Client path:
- Browser -> backend WebSocket bridge -> Azure Realtime WebSocket
- (We also tested direct browser-to-Azure WebSocket; same symptom)
Language focus: Chinese (zh), Japanese (ja)

Detailed Observations

1) Replacement characters appear in input transcript deltas

In session.input_transcript.delta (or equivalent transcription events), U+FFFD appears intermittently.

Example (simplified):


{

  "type": "session.input_transcript.delta",

  "delta": "...�果..."

}

2) Replacement characters can also appear in translation output

We see similar issues in session.output_transcript.delta and sometimes in done/final text.

Example (simplified):


{

  "type": "session.output_transcript.delta",

  "delta": "...�..."

}

3) Decrypted network payload already contains corrupted text

After decrypting WebSocket traffic using Wireshark + TLS key log, the payload itself already contains corrupted text, which suggests this is not only a frontend rendering issue.

Example snippet from decrypted payload text:

"delta":"ï¿½æžœ"

Reproduction Steps

Open a Realtime WebSocket session (translation or transcription).
Stream continuous Chinese/Japanese speech (normal speaking pace, multiple phrases).
Capture and log events:
- session.input_transcript.delta
- session.input_transcript.done
- session.output_transcript.delta
- session.output_transcript.done
After running for several minutes, observe intermittent � or mojibake-like segments.
Decrypt traffic with TLS key log; the issue is still visible in WebSocket payload text.

Expected Behavior

Chinese/Japanese text should be returned as valid and stable UTF-8 without U+FFFD (�).
No mojibake-like fragments such as ï¿½ should appear in normal transcript/translation output.

Actual Behavior

U+FFFD appears intermittently in ongoing sessions and breaks sentence meaning.
The issue appears in both input transcription and translation output.
Reproducible in both direct WebSocket and backend-bridged WebSocket access patterns.

What We Already Checked

Frontend rendering is not the only cause (decrypted payload already contains corrupted text).
The issue is intermittent, not tied to a single fixed sentence.

0 comments

2 answers

Your answer

Answer 1

Hi @YangQi

Thank you for reaching out to Microsoft Q&A.

From your description, I understand that you're using the Azure OpenAI Realtime API over WebSocket for live speech transcription and translation, and you're intermittently receiving Unicode replacement characters (U+FFFD, displayed as �) or mojibake-like text (for example, ï¿½æžœ) in both Chinese and Japanese transcripts and translations.

Based on your findings:

The issue occurs in both:

session.input_transcript.delta
session.output_transcript.delta

The behavior is reproducible using both:

Direct browser-to-Azure WebSocket connections.
Browser → backend WebSocket bridge → Azure OpenAI.

You've verified that the decrypted WebSocket payload already contains the corrupted characters, indicating that the corruption is present before the client renders the text.

The issue is intermittent and not tied to a specific sentence or phrase.

Thank you for also confirming that you've ruled out client-side rendering by inspecting the decrypted WebSocket payload. That is a very helpful diagnostic step.

Based on the available Azure OpenAI documentation, we do not have any guidance that specifically addresses U+FFFD replacement characters or mojibake appearing within Realtime API transcription or translation events, particularly when the decrypted WebSocket payload itself already contains the corrupted text.

The available documentation primarily covers:

General Realtime API endpoint and event behavior.

API version compatibility.

UTF-8 encoding considerations for Batch (.jsonl) files.

Other unrelated text encoding scenarios.

It does not describe this specific behavior for Realtime speech transcription or translation over WebSocket.

Although the documentation does not directly address this issue, you may want to verify the following:

Confirm the Realtime API endpoint and API version

Ensure you're using the supported GA (v1) Realtime API endpoint and the appropriate API version for your scenario.

Some Realtime API documentation notes that certain preview API versions did not support all expected streaming delta events. While your issue relates to character corruption rather than missing events, it's still worth confirming that your client is using the recommended endpoint and API version.

Verify UTF-8 handling throughout the pipeline

While the documented UTF-8 guidance applies primarily to Batch (.jsonl) workflows rather than WebSocket streaming, it's still worth verifying that every component in your processing pipeline consistently uses UTF-8 encoding without intermediate character set conversions.

Based on your investigation, you've already confirmed that the decrypted WebSocket payload contains the corrupted characters, making a frontend rendering issue less likely. However, validating consistent UTF-8 handling across the browser, backend bridge, logging mechanism, and any intermediate processing remains a useful diagnostic step.

Please refer this

Getting started with Azure OpenAI batch deployments (troubleshooting, UTF-8-BOM): https://learn.microsoft.com/azure/ai-foundry/openai/how-to/batch?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider#troubleshooting

Transparency note for Azure OpenAI (speech-to-text/translation limitations): https://learn.microsoft.com/azure/foundry/responsible-ai/openai/transparency-note?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider#limitations

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

SRILAKSHMI C 19,550 Reputation points Microsoft External Staff Moderator

2026-06-30T05:55:13.35+00:00

Hi @YangQi,

Following up to see if the above answer was helpful. If this answers your query, please do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank you!
SRILAKSHMI C 19,550 Reputation points Microsoft External Staff Moderator

2026-07-01T06:08:29.96+00:00

Hi @YangQi,

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!

Answer 2

Hello YangQi,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your Azure OpenAI Realtime WebSocket returns replacement characters (U+FFFD) in Chinese/Japanese transcripts and translations.

I observed that you have incorrect client-side decoding of Realtime WebSocket payloads. You will need to fix by reading the complete WebSocket message, parse the JSON event, handle text deltas as text, and base64-decode audio deltas as audio bytes. Do not call Encoding.UTF8.GetString() on audio payloads, and do not solve this by trimming or suppressing �; that only hides the symptom and does not fix the broken audio/message handling. - https://learn.microsoft.com/en-us/dotnet/api/system.net.websockets.clientwebsocket.receiveasync?view=net-10.0, https://learn.microsoft.com/en-us/dotnet/api/system.net.websockets.websocketmessagetype?view=net-10.0, https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/9.0/binaryreader

The best practice resolution by steps is to:

Use the correct Azure OpenAI Realtime endpoint format for the selected API version.
Accumulate WebSocket frames until EndOfMessage before decoding.
Decode only WebSocket text messages as UTF-8 JSON.
Parse the event type before processing the payload.
Treat response.audio.delta as base64 audio data, not text.
Configure matching input_audio_format and output_audio_format.
Use WebRTC instead of WebSocket for low-latency browser or mobile client audio, while keeping WebSocket for server-to-server or middleware scenarios. - https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/realtime-audio-websockets, https://learn.microsoft.com/en-us/azure/foundry/openai/how-to/realtime-audio

After correcting the WebSocket receive loop and separating text handling from audio-byte handling, the replacement-character issue is resolved because the client no longer attempts to decode audio or incomplete frames as UTF-8 text.

I hope this is helpful! Do not hesitate to let me know if you have any other questions, steps or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.