A catalog of AI models in Microsoft Foundry that you can discover, compare, and deploy using Azure’s built‑in tools for evaluation, fine‑tuning, and inference
Hey Allen Zhang! That feels weird at first (“zero calls yet I’m getting 429”), but with Azure AI Foundry/Foundry Models throttling there are a few known reasons a 429 can happen even when RPM/TPM in the UI looks like it should be fine.
Based on the provided info, here are the most likely causes and what to try next.
- Check quota at the deployment-level (TPM/RPM), not just subscription-level
The docs call out that you can be “approved” at a high level but still hit 429s because quota isn’t effectively allocated to the specific deployment receiving traffic.
What to do
- In Azure AI Foundry, open the deployment (not just the project/subscription quota view)
- Confirm the deployment’s effective quota/TPM allocation for that model + region
- Look specifically for whether the deployment has any token/per-minute allocation configured
- Transient throttling / backend scaling (429 even when you’re “under quota”)
Even if you’re not exceeding configured quotas, Azure can still return 429 during backend scaling/adjustments. In that scenario:
- the error can occur even with very low actual usage
- retrying later (honoring
retry-after-ms) is expected to resolve it - the throttling can affect effective rate limits temporarily
What to do
- Implement retry with backoff for 429s (prefer SDK built-in retry)
- If you’re seeing the HTTP headers in the response, compare the effective limits (for example,
x-ratelimit-limit-tokens) against your configured TPM to confirm whether there’s a temporary adjustment
-
max_tokens(and similar request parameters) can consume rate-limit budget
Rate-limit calculations can include the request parameters (like max_tokens), not just the eventual billed tokens. So even “small prompts” can trigger throttling if max_tokens is set high.
What to do
- Reduce
max_tokensin the agent/tool request (if you can control it) - Avoid
best_of(if applicable)
- Wait/refresh and retry the deployment path
There’s guidance that refreshing and retrying can resolve transient issues related to loading/handling deployments.
What to do
- Refresh the Foundry page (or the relevant UI)
- Retry the call after refresh
If you continue getting sustained 429s while you believe you’re below effective limits, share the details requested over Private message.
Kindly let us know if the above helps or you need further assistance on this issue.
If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".