An Azure service that provides a hybrid, multi-cloud management platform for APIs.
Hello Stavros Koureas,
Greetings! Thanks for raising this question in Q&A forum.
You have correctly identified a real gap in the llm-token-limit policy. It enforces a raw token count quota but has no awareness of model-specific pricing, so it cannot translate consumption into actual cost or apply separate input/output token budgets. There is no native cost-budget policy in APIM today, but there is a practical way to get close using the existing policy set without building a fully external app.
Here is the approach that covers both of your gaps:
- Separate input and output token tracking using the variables already exposed by
llm-token-limit. Thetokens-consumed-variable-namecaptures total tokens after the response returns, and you can read theusage.prompt_tokensandusage.completion_tokensfields from the response body in an outbound policy to split them. Add this in your outbound section:
<set-variable name="promptTokens" value="@(((IResponse)context.Response).Body.As<JObject>()["usage"]["prompt_tokens"].Value<int>())" />
<set-variable name="completionTokens" value="@(((IResponse)context.Response).Body.As<JObject>()["usage"]["completion_tokens"].Value<int>())" />
- Compute an approximate cost inline using a policy expression. Once you have prompt and completion tokens separated, multiply by the per-token rate for your deployed model. For example for gpt-4o at current pricing:
<set-variable name="estimatedCost" value="@{
int prompt = (int)context.Variables["promptTokens"];
int completion = (int)context.Variables["completionTokens"];
double cost = (prompt / 1000000.0 * 2.50) + (completion / 1000000.0 * 10.00);
return cost.ToString("F6");
}" />
You would update the rates in this expression whenever Microsoft changes model pricing. Yes, this requires a policy update when prices change, but it is far simpler than maintaining a full external pricing service.
- Emit the cost and split token counts to Application Insights using
llm-emit-token-metricwith custom dimensions. Thellm-emit-token-metricpolicy sends custom metrics to Application Insights about LLM token consumption and in preview now includes cached, reasoning, and thinking token categories in addition to prompt and completion tokens. Add it like this:
<llm-emit-token-metric namespace="CostTracking">
<dimension name="SubscriptionId" value="@(context.Subscription?.Key)" />
<dimension name="PromptTokens" value="@(context.Variables["promptTokens"].ToString())" />
<dimension name="CompletionTokens" value="@(context.Variables["completionTokens"].ToString())" />
<dimension name="EstimatedCostUSD" value="@(context.Variables["estimatedCost"].ToString())" />
</llm-emit-token-metric>
Note that Azure Monitor currently limits you to 10 dimension keys per metric and 50,000 total active time series per region in a 12-hour period, so plan your dimensions carefully.
- Build a cost budget enforcement gate using a named value or external cache. Store a monthly cost budget per subscription key in APIM's named values or in an Azure Cache for Redis entry. In your inbound policy, read the accumulated cost for the current subscription and return a 403 if the budget is exceeded:
<cache-lookup-value key="@($"cost:{context.Subscription?.Key}:{DateTime.UtcNow:yyyy-MM}")" variable-name="accumulatedCost" />
<choose>
<when condition="@((double)context.Variables.GetValueOrDefault("accumulatedCost", 0.0) >= 50.0)">
<return-response>
<set-status code="403" reason="Monthly cost budget exceeded" />
</return-response>
</when>
</choose>
Then in the outbound policy, increment and store the updated cost back to cache after each successful call.
- For the Developer Portal quota visibility you mentioned, this is not natively supported today as a cost view. APIM does emit token metrics via the
llm-emit-token-metricpolicy and you can add custom dimensions to filter the metric in Azure Monitor, so you can build an Azure Workbook or Application Insights dashboard that shows per-subscription token consumption and estimated cost and share that link with your API consumers through the Developer Portal's custom content pages.
At Build 2026, Microsoft expanded token metrics to track reasoning, cached, and audio tokens across providers, which helps FinOps teams building cost dashboards and budget alerts capture how current models actually behave.
If this answer helps you kindly accept the answer which will help others who have similar questions.
Best Regards,
Jerald Felix.