Azure OpenAI unexpected Rate Limit Error

Question

Azure OpenAI unexpected Rate Limit Error

Donovan Bouton 1

I have GPT 5.4 priority with 20k requests per minute deployed for employee chatting. It returned the error "Your requests to gpt-5.4 for <deployment name> in <region> have exceeded rate limit.". That's scary because there's no way users should be able to remotely even touch that limit with 0-2 concurrent users. In the foundry, there are no metrics available for the last 5 hours.

No idea if my website code is stuck in an infinite loop costing thousands per hour, or if Azure has metrics downtime and that's the default response when metrics are unavailable.

Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-25T19:32:08.4166667+00:00
Hello @Donovan Bouton

A rate limit error can occasionally occur even when the expected user traffic is low. Azure OpenAI evaluates requests across short time windows, so rapid bursts, retries, parallel requests, or application loops can sometimes trigger throttling before the overall RPM quota is reached.

Since you also mentioned that metrics have been unavailable for several hours, it would be helpful to check:

Whether the issue continues after some time or was only temporary.

If the application is retrying failed requests automatically.

Azure Monitor or diagnostic logs for unexpected request spikes.

Azure Service Health for any ongoing monitoring or platform-related issues affecting metrics visibility.

If possible, could you share:

The Azure region where the deployment is hosted.

The approximate UTC timestamp when the error occurred.

Whether you're using any custom retry logic or SDK-level retries.

Whether the metrics have started appearing again.

That information may help determine whether the throttling was caused by actual request activity or a temporary platform-side issue.

I Hope this helps. Do let me know if you have any further queries.

Thankyou!
Donovan Bouton 1 Reputation point

2026-06-26T21:18:24.5766667+00:00
Hi @Anshika Varshney -

I have rewritten tons of my app's logic out of panic. I am not getting rate limit errors anymore.

However, online in multiple different places, it's only showing the number of API Requests to GPT 5.4, not the token usage (input or output) or cost, so I am still flying extremely blind. This lack of data (which I am familiar with checking daily and has been working in the past) is true across:

Foundry's deployed model "Monitor" tab;

The Azure portal's resource's Metrics tool where you can add stats to the graph;

My azure subscription's Cost Analysis breakdown of daily charges.

In none of the 3 locations does cost or actual usage appear (locations #1 and #2 do show request counts however). My deployments are in US-East-2.

Thanks!
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-07-01T17:03:15.11+00:00
Hello @Donovan Bouton

Thank you for the update. It is helpful to know that the rate limit errors have stopped after the application changes.

The absence of token usage, input/output token metrics, and cost details is not something I would expect based solely on the request count data shown in the monitoring experience. Since request counts are visible while usage and cost-related information are not appearing across multiple monitoring surfaces, this may indicate a telemetry or reporting issue rather than an active rate-limiting problem.

Could you please share screenshots of the following (with any sensitive information removed)?

Foundry deployment Monitor tab

Azure Portal Metrics blade

Azure Cost Analysis view

Additionally, please confirm whether the behavior is affecting all GPT-5.4 deployments in the resource or only a specific deployment. This information will help determine whether the issue is deployment-specific or related to usage reporting for the resource as a whole.

Thank you.

1 answer

Your answer

Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-06-25T19:32:08.4166667+00:00

Hello @Donovan Bouton

A rate limit error can occasionally occur even when the expected user traffic is low. Azure OpenAI evaluates requests across short time windows, so rapid bursts, retries, parallel requests, or application loops can sometimes trigger throttling before the overall RPM quota is reached.

Since you also mentioned that metrics have been unavailable for several hours, it would be helpful to check:

Whether the issue continues after some time or was only temporary.

If the application is retrying failed requests automatically.

Azure Monitor or diagnostic logs for unexpected request spikes.

Azure Service Health for any ongoing monitoring or platform-related issues affecting metrics visibility.

If possible, could you share:

The Azure region where the deployment is hosted.

The approximate UTC timestamp when the error occurred.

Whether you're using any custom retry logic or SDK-level retries.

Whether the metrics have started appearing again.

That information may help determine whether the throttling was caused by actual request activity or a temporary platform-side issue.

I Hope this helps. Do let me know if you have any further queries.

Thankyou!
Donovan Bouton 1 Reputation point

2026-06-26T21:18:24.5766667+00:00

Hi @Anshika Varshney -

I have rewritten tons of my app's logic out of panic. I am not getting rate limit errors anymore.

However, online in multiple different places, it's only showing the number of API Requests to GPT 5.4, not the token usage (input or output) or cost, so I am still flying extremely blind. This lack of data (which I am familiar with checking daily and has been working in the past) is true across:

Foundry's deployed model "Monitor" tab;

The Azure portal's resource's Metrics tool where you can add stats to the graph;

My azure subscription's Cost Analysis breakdown of daily charges.

In none of the 3 locations does cost or actual usage appear (locations #1 and #2 do show request counts however). My deployments are in US-East-2.

Thanks!
Anshika Varshney 14,085 Reputation points Microsoft External Staff Moderator

2026-07-01T17:03:15.11+00:00

Hello @Donovan Bouton

Thank you for the update. It is helpful to know that the rate limit errors have stopped after the application changes.

The absence of token usage, input/output token metrics, and cost details is not something I would expect based solely on the request count data shown in the monitoring experience. Since request counts are visible while usage and cost-related information are not appearing across multiple monitoring surfaces, this may indicate a telemetry or reporting issue rather than an active rate-limiting problem.

Could you please share screenshots of the following (with any sensitive information removed)?

Foundry deployment Monitor tab

Azure Portal Metrics blade

Azure Cost Analysis view

Additionally, please confirm whether the behavior is affecting all GPT-5.4 deployments in the resource or only a specific deployment. This information will help determine whether the issue is deployment-specific or related to usage reporting for the resource as a whole.

Thank you.

Answer 1

Hello Donovan Bouton,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your Azure OpenAI unexpected Rate Limit Error.

One fact is that Azure OpenAI enforces limits by subscription, region, model, deployment type, assigned TPM, and derived RPM, not by visible human users alone; also, Priority processing improves latency but uses the same quota as Standard processing, so it does not provide separate unlimited throughput.

I will suggest you verify the exact deployment quota, review Azure Monitor metrics for request count, token usage, and 429 responses, enable diagnostic logging to Log Analytics, inspect the application for retry loops or hidden concurrent calls, and update the client to obey Retry-After with exponential backoff and circuit breaking. If Azure Monitor shows no matching request/token spike while 429s were returned, raise an Azure support via your portal or Priority Customer Support - (PCS) https://learn.microsoft.com/en-us/azure/azure-portal/supportability/priority-community-support case with the request IDs, timestamps, deployment name, model version, region, and exported metrics so Microsoft can confirm service-side root cause.

Use the below official Microsoft resources for more reading and implementation steps:

I hope this is helpful. Please! Do not hesitate to let me know if you have any other questions, steps or clarifications.

Please don’t forget to close the thread by upvoting and accepting the answer if any part of it is helpful.