Autoscaling delay causing downtime during Azure Load Testing – best approach?

Question

Autoscaling delay causing downtime during Azure Load Testing – best approach?

Chandan Shah (Persistent Systems Inc) 5 Microsoft External Staff

We are currently performing Azure Load Testing on a production-like architecture and are facing issues with reactive autoscaling delays.

Test Scenario:

Concurrent users: 2000–3000
Load pattern: Step-based ramp (increase every 2 minutes)
Application type: Blazor Server (SignalR-based)
Backend: Azure SQL Elastic Pool (vCore model)
Hosting: Azure App Service

Current Scaling Architecture:

Alert (CPU/threshold breach) → Action Group → Logic App → Automation Runbook → Scale up (App Service / SQL Elastic Pool)

Problem Observed:

During load testing, when traffic increases rapidly:

CPU/resources hit thresholds quickly
Autoscaling is triggered
HOWEVER, scaling takes several minutes (5–10 minutes)

During this scaling window:

Application becomes unresponsive
HTTP 503 (Service Unavailable) errors occur
SQL starts throttling / timing out

Key Issue:

Autoscaling is reactive and not fast enough to handle sudden load spikes. By the time scaling is completed, the application has already experienced downtime.

Questions:

Is this scaling delay (5–10 minutes) expected behavior for: - Azure SQL Elastic Pool (vCore scaling)? - Azure App Service scaling?
Is our current architecture (Alert → Logic App → Runbook) introducing additional delay compared to native Azure Autoscale?
What is the recommended approach to handle sudden traffic spikes without downtime?

Should we implement prescaling (manual or scheduled)?
Is predictive autoscaling available for this scenario?

Are there Azure-native features or best practices to reduce scaling latency or avoid cold-start impact?
For load testing scenarios with 2000–5000 concurrent users, what is the recommended scaling strategy to ensure zero downtime?

Our goal is to design a production-ready autoscaling approach where infrastructure is ready before traffic saturation occurs.

Proof-image

Any guidance or real-world best practices would be highly appreciated.

Please reachout to me at v-chanshah@microsoft.com

SAI JAGADEESH KUDIPUDI 3,550 Reputation points Microsoft External Staff Moderator

2026-06-24T20:10:33.8433333+00:00

Hi @Chandan Shah (Persistent Systems Inc) ,
Could you please share the requested details in Private Message?

Answer recommended by moderator

0 additional answers

Your answer

SAI JAGADEESH KUDIPUDI 3,550 Reputation points Microsoft External Staff Moderator

2026-06-24T20:10:33.8433333+00:00

Hi @Chandan Shah (Persistent Systems Inc) ,
Could you please share the requested details in Private Message?

Answer 1

Hi Chandan,

Yes, a 5–10 minute delay can be expected in this type of design. Autoscale should not be treated as an instant failover mechanism, especially when the load ramp is aggressive and the backend is Azure SQL Elastic Pool.

Your current flow:

Alert -> Action Group -> Logic App -> Automation Runbook -> Scale

will usually add more delay than native autoscale. For App Service, I would use Azure Monitor autoscale or App Service automatic scaling directly instead of routing the scale action through a Logic App/runbook. For Azure SQL Elastic Pool, scaling vCores is also not something I would rely on during a live traffic spike.

Recommended approach:

Pre-scale before the load test or known traffic window Scale App Service instances and SQL Elastic Pool vCores before the test starts. Reactive scaling is usually too late once CPU/SQL throttling has already started.

Use native autoscale for App Service Configure minimum instance count, maximum instance count, and scale-out rules directly. Use lower thresholds or leading indicators such as requests, CPU, memory, response time, or queue length.

Keep SQL capacity ahead of demand For Azure SQL Elastic Pool, pre-size the pool for expected peak load. Also review query tuning, indexes, connection pooling, retry logic, and max pool settings in the application.

For Blazor Server/SignalR, validate connection scaling Blazor Server keeps active SignalR connections. Consider Azure SignalR Service and make sure ARR affinity/session behavior is understood during scale-out.

Use health checks and warm instances Enable App Service Health Check and Always On. If using App Service automatic scaling, configure always-ready/prewarmed capacity where applicable.

Do not target “zero downtime” only through autoscale Keep headroom, pre-scale for planned load, use retry logic, graceful degradation, caching, and back-pressure/rate limiting where appropriate.

Predictive autoscale in Azure Monitor is mainly for Virtual Machine Scale Sets, not Azure SQL Elastic Pool or App Service in this scenario. For this workload, scheduled/pre-scaling plus native App Service autoscale is the safer production pattern.