Hello @Evonna Nash
Thank you for reaching out to Microsoft Q&A.
I understand that Text-to-Speech (TTS) voices are taking significantly longer to load in the new Azure AI Foundry experience, whereas the previous interface was working as expected.
However, there are a few areas that are worth checking, as they can sometimes present as prolonged loading times.
1. Verify whether the issue is limited to the Foundry UI
Please try the following basic troubleshooting steps:
- Open Azure AI Foundry in an InPrivate/Incognito browser window.
Test using a different browser (Microsoft Edge or Google Chrome recommended).
Clear browser cache and cookies and sign in again.
Check whether the behavior is reproducible across multiple devices or networks.
If speech synthesis works normally through APIs or other Speech tools but loads slowly only within Foundry, this would indicate a portal experience issue rather than a backend Speech service issue.
2. Check whether all voices are affected
Please let us know:
- Whether all voices are loading slowly or only specific voices.
- Whether the delay occurs when browsing the voice gallery, previewing voices, or generating speech output.
- Approximately how long the loading process takes before the voices become available.
3. Review Speech service limits and throttling
If multiple requests are being issued simultaneously, Speech services may throttle or queue requests.
Some relevant limits include:
Text-to-Speech transactions are subject to service throughput limits.
Custom Voice deployments have concurrency limits that can cause requests to queue when exceeded.
High request volumes can result in increased latency that may appear as prolonged loading within the portal.
If your workload has recently increased or multiple users are accessing the same resource, this could contribute to the behavior.
4. Long Audio or Streaming Scenarios
If you are working with long-form synthesis or streaming scenarios:
Long Audio synthesis is processed asynchronously and may take longer to complete.
Streaming TTS sessions can occasionally experience delays if text is not provided promptly after the connection is established.
Although these scenarios typically affect synthesis rather than voice discovery, they are worth considering depending on how the voices are being used.
Custom Neural Voice overview / eligibility & setup: https://docs.microsoft.com/azure/cognitive-services/speech-service/custom-neural-voice
Get started with Custom Neural Voice: https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-voice
Prepare training data: https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-voice-prepare-data
Create and use your voice model: https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-voice-create-voice
Custom voice deprecation note: (in Custom voice docs) https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-custom-voice#migrate-to-custom-neural-voice
TTS with Long Audio API (async workflow) : https://docs.microsoft.com/azure/cognitive-services/speech-service/long-audio-api
Azure Speech in Foundry Tools known issues (TTS streaming 503 issue): https://learn.microsoft.com/azure/ai-services/speech-service/known-issues#active-known-issues-text-to-speech-tts
I Hope this helps. Do let me know if you have any further queries.
Thank you!