Azure OpenAI Realtime WebSocket returns replacement characters (U+FFFD) in Chinese/Japanese transcripts and translations

YangQi 0 Reputation points
2026-06-16T02:39:32.2233333+00:00

Summary

When using Azure OpenAI Realtime over WebSocket for live speech transcription and translation, we intermittently receive replacement characters (U+FFFD, rendered as ) in Chinese and Japanese text.

This affects:

  • Chinese speech transcription (input transcript)
  • Japanese speech transcription (input transcript)
  • Chinese/Japanese translation output (output transcript)

In practice, expected CJK text sometimes becomes corrupted, for example �果 or �果 fragments.

Environment

  • Service: Azure OpenAI
  • Mode: Realtime API via WebSocket
  • Endpoints used:
    • Translation: /openai/v1/realtime/translations?model=<deployment>
    • Transcription intent: /openai/v1/realtime?intent=transcription
  • Audio format: PCM 24kHz (streamed chunks)
  • Client path:
    • Browser -> backend WebSocket bridge -> Azure Realtime WebSocket
    • (We also tested direct browser-to-Azure WebSocket; same symptom)
  • Language focus: Chinese (zh), Japanese (ja)

Detailed Observations

1) Replacement characters appear in input transcript deltas

In session.input_transcript.delta (or equivalent transcription events), U+FFFD appears intermittently.

Example (simplified):


{

  "type": "session.input_transcript.delta",

  "delta": "...�果..."

}

2) Replacement characters can also appear in translation output

We see similar issues in session.output_transcript.delta and sometimes in done/final text.

Example (simplified):


{

  "type": "session.output_transcript.delta",

  "delta": "...�..."

}

3) Decrypted network payload already contains corrupted text

After decrypting WebSocket traffic using Wireshark + TLS key log, the payload itself already contains corrupted text, which suggests this is not only a frontend rendering issue.

Example snippet from decrypted payload text:

  • "delta":"�果"

Reproduction Steps

  1. Open a Realtime WebSocket session (translation or transcription).
  2. Stream continuous Chinese/Japanese speech (normal speaking pace, multiple phrases).
  3. Capture and log events:
    • session.input_transcript.delta
    • session.input_transcript.done
    • session.output_transcript.delta
    • session.output_transcript.done
  4. After running for several minutes, observe intermittent or mojibake-like segments.
  5. Decrypt traffic with TLS key log; the issue is still visible in WebSocket payload text.

Expected Behavior

  • Chinese/Japanese text should be returned as valid and stable UTF-8 without U+FFFD ().
  • No mojibake-like fragments such as � should appear in normal transcript/translation output.

Actual Behavior

  • U+FFFD appears intermittently in ongoing sessions and breaks sentence meaning.
  • The issue appears in both input transcription and translation output.
  • Reproducible in both direct WebSocket and backend-bridged WebSocket access patterns.

What We Already Checked

  • Frontend rendering is not the only cause (decrypted payload already contains corrupted text).
  • The issue is intermittent, not tied to a single fixed sentence.
Azure OpenAI in Foundry Models
0 comments No comments

2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 19,550 Reputation points Microsoft External Staff Moderator
    2026-06-28T07:41:01.9933333+00:00

    Hi @YangQi

    Thank you for reaching out to Microsoft Q&A.

    From your description, I understand that you're using the Azure OpenAI Realtime API over WebSocket for live speech transcription and translation, and you're intermittently receiving Unicode replacement characters (U+FFFD, displayed as ) or mojibake-like text (for example, �果) in both Chinese and Japanese transcripts and translations.

    Based on your findings:

    The issue occurs in both:

    • session.input_transcript.delta
    • session.output_transcript.delta

    The behavior is reproducible using both:

    • Direct browser-to-Azure WebSocket connections.
    • Browser → backend WebSocket bridge → Azure OpenAI.

    You've verified that the decrypted WebSocket payload already contains the corrupted characters, indicating that the corruption is present before the client renders the text.

    The issue is intermittent and not tied to a specific sentence or phrase.

    Thank you for also confirming that you've ruled out client-side rendering by inspecting the decrypted WebSocket payload. That is a very helpful diagnostic step.

    Based on the available Azure OpenAI documentation, we do not have any guidance that specifically addresses U+FFFD replacement characters or mojibake appearing within Realtime API transcription or translation events, particularly when the decrypted WebSocket payload itself already contains the corrupted text.

    The available documentation primarily covers:

    General Realtime API endpoint and event behavior.

    API version compatibility.

    UTF-8 encoding considerations for Batch (.jsonl) files.

    Other unrelated text encoding scenarios.

    It does not describe this specific behavior for Realtime speech transcription or translation over WebSocket.

    Although the documentation does not directly address this issue, you may want to verify the following:

    Confirm the Realtime API endpoint and API version

    Ensure you're using the supported GA (v1) Realtime API endpoint and the appropriate API version for your scenario.

    Some Realtime API documentation notes that certain preview API versions did not support all expected streaming delta events. While your issue relates to character corruption rather than missing events, it's still worth confirming that your client is using the recommended endpoint and API version.

    Verify UTF-8 handling throughout the pipeline

    While the documented UTF-8 guidance applies primarily to Batch (.jsonl) workflows rather than WebSocket streaming, it's still worth verifying that every component in your processing pipeline consistently uses UTF-8 encoding without intermediate character set conversions.

    Based on your investigation, you've already confirmed that the decrypted WebSocket payload contains the corrupted characters, making a frontend rendering issue less likely. However, validating consistent UTF-8 handling across the browser, backend bridge, logging mechanism, and any intermediate processing remains a useful diagnostic step.

    Please refer this

    Getting started with Azure OpenAI batch deployments (troubleshooting, UTF-8-BOM): https://learn.microsoft.com/azure/ai-foundry/openai/how-to/batch?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider#troubleshooting

    Transparency note for Azure OpenAI (speech-to-text/translation limitations): https://learn.microsoft.com/azure/foundry/responsible-ai/openai/transparency-note?wt.mc_id=knowledgesearch_inproduct_azure-cxp-community-insider#limitations

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    Was this answer helpful?


  2. Sina Salam 30,486 Reputation points Volunteer Moderator
    2026-06-16T14:35:10.08+00:00

    Hello YangQi,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that your Azure OpenAI Realtime WebSocket returns replacement characters (U+FFFD) in Chinese/Japanese transcripts and translations.

    I observed that you have incorrect client-side decoding of Realtime WebSocket payloads. You will need to fix by reading the complete WebSocket message, parse the JSON event, handle text deltas as text, and base64-decode audio deltas as audio bytes. Do not call Encoding.UTF8.GetString() on audio payloads, and do not solve this by trimming or suppressing ; that only hides the symptom and does not fix the broken audio/message handling. - https://learn.microsoft.com/en-us/dotnet/api/system.net.websockets.clientwebsocket.receiveasync?view=net-10.0, https://learn.microsoft.com/en-us/dotnet/api/system.net.websockets.websocketmessagetype?view=net-10.0, https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/9.0/binaryreader

    The best practice resolution by steps is to:

    After correcting the WebSocket receive loop and separating text handling from audio-byte handling, the replacement-character issue is resolved because the client no longer attempts to decode audio or incomplete frames as UTF-8 text.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions, steps or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.