Intermittent "Can't reach database server" on Azure Cosmos DB for PostgreSQL (port 6432) from Azure Functions

Question

Intermittent "Can't reach database server" on Azure Cosmos DB for PostgreSQL (port 6432) from Azure Functions

Pavle Zikic 10

We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.

Environment

Azure Functions (Node.js, timer trigger)
Prisma Client (@prisma/client)
Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
Region: [add your region]

Error (recurring)

Invalid prisma.queueJob.findMany() invocation

Can't reach database server at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432

Please make sure your database server is running at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432.

PrismaClientKnownRequestError

at Mn.handleRequestError (@prisma/client/runtime/library.js)

at Mn.request (@prisma/client/runtime/library.js)

Wrapped at the Functions host level as:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException:

Exception while executing function: Functions.queue_worker

---> RpcException: Result: Failure

What we've observed

The failures are transient — the same function succeeds on subsequent runs.
No deployment or config change correlates with the onset; frequency has simply increased over time.
The endpoint uses the pooled connection port 6432.

Questions for the community / Microsoft

Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?

Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.We're running an Azure Functions app (Node.js, timer-triggered) that uses Prisma to connect to an Azure Cosmos DB for PostgreSQL cluster. Intermittently, our function fails with a connection error indicating the database server is unreachable. The database appears to go offline briefly and then recovers on its own, but this has been happening more frequently lately.

Environment

Azure Functions (Node.js, timer trigger)
Prisma Client (@prisma/client)
Azure Cosmos DB for PostgreSQL (coordinator connection on port 6432 / connection pooling endpoint)
Region: [add your region]

Error (recurring)

Invalid prisma.queueJob.findMany() invocation

Can't reach database server at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432

Please make sure your database server is running at

c-<cluster>.<id>.postgres.cosmos.azure.com:6432.

PrismaClientKnownRequestError

at Mn.handleRequestError (@prisma/client/runtime/library.js)

at Mn.request (@prisma/client/runtime/library.js)

Wrapped at the Functions host level as:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException:

Exception while executing function: Functions.queue_worker

---> RpcException: Result: Failure

What we've observed

The failures are transient — the same function succeeds on subsequent runs.
No deployment or config change correlates with the onset; frequency has simply increased over time.
The endpoint uses the pooled connection port 6432.

Questions for the community / Microsoft

Are there known causes of brief coordinator-node unavailability on Azure Cosmos DB for PostgreSQL (e.g., maintenance windows, failovers, automatic scaling, node restarts) that would produce short "can't reach server" windows on port 6432?
Is port 6432 (managed PgBouncer/pooler) more susceptible to these drops than the direct 5432 port, and is one recommended over the other for serverless/Functions workloads?
What is the recommended way to diagnose whether these are node restarts/failovers vs. client-side connection pool exhaustion? Which metrics/logs should we check (e.g., in Azure Monitor / cluster metrics)?
Best-practice guidance for resilient connections from Azure Functions + Prisma (connection limits, timeouts, retry strategy) against this service?

Any pointers appreciated. We also plan to open a support ticket with Azure for the underlying availability investigation.

0 comments

Intermittent "Can't reach database server" on Azure Cosmos DB for PostgreSQL (port 6432) from Azure Functions

Your answer