Edit

Claude models in Microsoft Foundry

Anthropic's Claude models bring advanced conversational AI capabilities to Microsoft Foundry, providing state-of-the-art language understanding and generation for intelligent applications. Claude models excel at complex reasoning, code generation, and multimodal tasks including image analysis. This article describes the available Claude models, how they're hosted and billed, supported APIs, capabilities, quotas, and best practices.

To deploy and call a Claude model, see Deploy and use Claude models in Microsoft Foundry.

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

How Claude models are hosted and billed

Microsoft Foundry offers Claude models in two versions: Hosted on Azure and Hosted on Anthropic infrastructure. Not every model is available in both versions. A model's lifecycle stage, such as Preview or Generally available, can differ between the two versions. For per-model availability and lifecycle status, see Available Claude models.

Note

You access Claude models in Microsoft Foundry through Foundry Models from partners and community. Models from partners and community that Anthropic sells and operates are Non-Microsoft Products under the Product Terms. Claude models in Foundry require an Azure Marketplace subscription. Ensure that you have the permissions required to subscribe to model offerings before you deploy.

Claude models that are Hosted on Azure run on Azure infrastructure end-to-end and are Generally available (GA).

Claude models that are Hosted on Anthropic infrastructure run on Anthropic's infrastructure (outside of Azure).

To learn how data is processed when you use Claude models in Foundry, see Data, privacy, and security for Claude models in Microsoft Foundry.

To learn how Claude consumption units (CCU) bill Claude models in Microsoft Foundry through Azure Marketplace, see Claude Consumption Units (CCU) billing in Microsoft Foundry.

Available Claude models

The following table compares model availability for both versions of Claude models in Foundry. For details on the features referenced in the table, see the Capabilities section.

Warning

1M context beta on Claude Sonnet 4.5 was retired on April 30, 2026.

Starting May 1, 2026:

  • Requests greater than 200K tokens that include the context-1m-2025-08-07 beta header on Sonnet 4.5 return an error.
  • Requests 200K tokens or fewer remain unaffected, even with the header present.

To migrate, remove the context-1m-2025-08-07 beta header from your requests. For workloads that require 1M context, migrate to Claude Sonnet 4.6 (where 1M context is generally available) or to Claude Opus 4.6 or Claude Opus 4.7 for higher-intelligence workloads.

Model Availability Context window / Max output Key capabilities Best for
claude-mythos-51 Hosted on Anthropic: Gated research preview 1M / 128K
  • Adaptive thinking
  • Image and text input
  • Microsoft Entra ID authentication only
  • Biology and life sciences
  • Cybersecurity (defensive use cases prioritized): vulnerability discovery, attack-surface auditing, red teaming, threat intelligence
  • Autonomous coding
  • Long-running agents
claude-fable-5 Hosted on Anthropic: Preview 1M / 128K
  • Adaptive thinking
  • Reasoning over entire codebases and multi-day project context
  • Longer independent work than any prior Claude model
  • Self verification
  • Sub-agent orchestration
  • Refusal stop_reason on dual-use safeguard policies2
  • Cybersecurity
  • Autonomous coding
  • Long-running agents
  • Coding and agents, with deeper reasoning for enterprise workflows
claude-mythos-preview1 Hosted on Anthropic: Gated research preview 1M / 128K
  • Adaptive thinking
  • Image and text input
  • Microsoft Entra ID authentication only
  • Cybersecurity (defensive use cases prioritized)
  • Autonomous coding
  • Long-running agents
claude-opus-4-83 Hosted on Azure: GA 1M / 128K
  • Adaptive thinking with xhigh effort level
  • Reasoning over entire codebases and multi-day project context
  • High-resolution image input (up to 2576px / 3.75MP)
  • Coding
  • Long-running agents
  • Financial analysis
  • Cybersecurity
  • Computer use
claude-opus-4-83 Hosted on Anthropic: GA 1M / 128K
  • Adaptive thinking with xhigh effort level
  • Reasoning over entire codebases and multi-day project context
  • High-resolution image input (up to 2576px / 3.75MP)
  • Coding
  • Long-running agents
  • Financial analysis
  • Cybersecurity
  • Computer use
claude-opus-4-7 Hosted on Anthropic: GA 1M / 128K
  • Adaptive thinking
  • Reasoning over entire codebases
  • High-resolution image input (up to 2576px / 3.75MP)
  • Coding
  • Enterprise workflows
  • Long-running agents
  • Multimodal reasoning
  • Financial analysis
  • Cybersecurity
claude-opus-4-6 Hosted on Anthropic: GA 1M / 128K
  • Adaptive thinking
  • Image and text input
  • Computer use
  • Advanced tool use (search, programmatic calling, examples)
  • Coding
  • Enterprise agents
claude-opus-4-5 Hosted on Anthropic: GA 200K / 64K
  • Extended thinking
  • Image and text input
  • Computer use
  • Advanced tool use (search, programmatic calling, examples)
  • Coding
  • Agents
  • Computer use
  • Enterprise workflows
claude-opus-4-1 Hosted on Anthropic: GA 200K / 32K
  • Extended thinking
  • Image and text input
  • Coding
  • Long-running tasks
claude-sonnet-5 Hosted on Azure: GA 1M / 128K
  • Adaptive thinking
  • xhigh effort level
  • Reasoning over entire codebases and multi-day project context
  • High-res image input (up to 2576px / 3.75MP) are on by default
  • Mid-conversation4 role:"system"
  • Token budgets4 (task_budget)
  • Coding
  • Long-running agents
  • Financial analysis
  • Cybersecurity
  • Computer use
claude-sonnet-5 Hosted on Anthropic: GA 1M / 128K
  • Adaptive thinking
  • xhigh effort level
  • Reasoning over entire codebases and multi-day project context
  • High-res image input (up to 2576px / 3.75MP) are on by default
  • Mid-conversation4 role:"system"
  • Token budgets4 (task_budget)
  • Coding
  • Long-running agents
  • Financial analysis
  • Cybersecurity
  • Computer use
claude-sonnet-4-6 Hosted on Anthropic: GA 1M / 128K
  • Adaptive thinking
  • Image and text input
  • Computer use
  • Advanced tool use (search, programmatic calling, examples)
  • Coding
  • Agents
  • Enterprise workflows
claude-sonnet-4-5 Hosted on Anthropic: GA 200K / 64K
  • Extended thinking
  • Image and text input
  • Computer use
  • Agents and complex, long-horizon tasks
  • High-volume workloads
claude-haiku-4-5 Hosted on Azure: GA 200K / 64K
  • Extended thinking
  • Image and text input
  • Coding
  • Agents
claude-haiku-4-5 Hosted on Anthropic: GA 200K / 64K
  • Extended thinking
  • Image and text input
  • Coding
  • Agents

1 Claude Mythos 5 and Claude Mythos Preview are only available as gated research preview. Access to the models is granted solely at Anthropic's discretion and prioritized for defensive cybersecurity use cases. See the Claude Mythos Preview system card and Claude Mythos 5 system card for responsible use guidance.

2 Claude Fable 5 applies additional input/output classifiers that may refuse requests whose content triggers dual-use safeguard policies. When a refusal occurs, the request returns a successful (200) response with a refusal indicator stop_reason: "refusal" instead of model-generated content. You're not billed for input tokens that are refused.

3 Follow the Migration guide to migrate Messages API code from Claude Opus 4.7 to Claude Opus 4.8.

4 Mid-conversation and Token budgets are currently in Beta state.

API overview

The following table lists the APIs that you can use to interact with both the Hosted on Azure and Hosted on Anthropic infrastructure versions of Claude models in Foundry.

Use the Anthropic SDKs and the following Claude APIs:

Tip

The Hosted on Anthropic infrastructure version of Claude models in Foundry supports more APIs than the ones listed in this table. You can see them on the Claude API docs: API overview page.

API Description
Messages1 (POST /v1/messages) Core Messages API: Send a structured list of input messages with text or image content, including streaming responses. The model generates the next message in the conversation.
Token counting (POST /v1/messages/count_tokens) Token Count API: Count the number of tokens in a message before sending it to Claude.

1You can call the Messages API from the anthropic Python package, the @anthropic-ai/foundry-sdk JavaScript package, or directly through REST. The deployment endpoint follows the shape https://<resource-name>.services.ai.azure.com/anthropic/v1/messages, and REST and JavaScript clients use the anthropic-version: 2023-06-01 header.

Capabilities

Claude models in Foundry expose core capabilities for processing, analyzing, and generating content, and tools that let Claude interact with external systems, execute code, and perform automated tasks.

The following table summarizes capabilities available for both the Hosted on Azure and Hosted on Anthropic infrastructure versions of Claude models in Foundry, including core capabilities and tools.

Tip

The Hosted on Anthropic infrastructure version of Claude models in Foundry supports more capabilities than the ones listed in this table. You can see them on the Claude API docs: Features overview page.

Feature Description
Streaming responses Server-sent event streaming.
Fine-grained tool streaming Stream tool use parameters without buffering or JSON validation, reducing latency for large parameters. Requires the anthropic-beta header fine-grained-tool-streaming-2025-05-14.
Prompt caching Cache context to reduce cost and latency.
Tool use with client-executed tools Custom tools plus Anthropic-defined bash, text editor, computer use, and memory.
Context editing Automatically manage conversation context with configurable strategies, including clearing tool results and managing thinking blocks. Requires the anthropic-beta header context-management-2025-06-27.
Extended thinking Step-by-step reasoning for complex tasks.
Effort Control how many tokens Claude uses when responding, trading off between response thoroughness and token efficiency.
Citations Ground Claude's responses in sources, including search_result content blocks.
Image support Process and analyze content from images (provided as base64 or URL).
PDF support Process and analyze text and visual content from PDF documents. Provided as base64 or URL.
1M context window Up to 1M tokens for processing large documents, extensive codebases, and long conversations. Support is subject to model eligibility.

Model-specific parameter values

Extended thinking

The Extended thinking feature allows specific values for the thinking parameter type, depending on the model, as described in the following table. The adaptive type allows the model to decide whether to think, based on query complexity and effort level.

Model adaptive enabled disabled
claude-mythos-5 Yes No No
claude-fable-5 Yes No No
claude-mythos-preview Yes Yes No
claude-opus-4-8 Yes No Yes
claude-opus-4-7 Yes No Yes
claude-opus-4-6 Yes Yes Yes
claude-sonnet-5 Yes No Yes
claude-sonnet-4-6 Yes Yes Yes

Effort

The Effort feature allows specific effort levels for each model, as described in the following table. The xhigh level produces the same result as max.

Model low medium high max xhigh
claude-mythos-5 Yes Yes Yes No Yes
claude-fable-5 Yes Yes Yes No Yes
claude-opus-4-8 Yes Yes Yes Yes Yes
claude-opus-4-7 Yes Yes Yes Yes Yes
claude-opus-4-6 Yes Yes Yes Yes No
claude-sonnet-5 Yes Yes Yes Yes Yes
claude-sonnet-4-6 Yes Yes Yes Yes No

Agent support

Quotas, rate limits, and regions

Use this section to understand where you can deploy Claude models, how quota is shared, and what rate limits apply to your deployments.

Deployment types

Claude models in Foundry are available for the following deployment types:

  • Global Standard: All Claude models (Hosted on Azure and Hosted on Anthropic infrastructure) are available in East US2 and Sweden Central.
  • Data Zone Standard (US): claude-sonnet-5, claude-opus-4-8, claude-haiku-4-5.

For more information on the different deployment types, see Deployment types for Microsoft Foundry Models.

Quotas and rate limits

Subscription-level management handles the deployment quota. Resources and regions share the quota instead of allocating it separately for each resource or region.

  • All Global Standard deployments of the same model and version in a subscription draw from one shared quota pool across all regions.
  • All Data Zone Standard deployments of the same model and version in a subscription draw from a shared quota pool within each data zone (for example, US).

For more information about quota management for Foundry Models, see Microsoft Foundry Models quotas and limits.

Claude models in Foundry measure rate limits in Requests Per Minute (RPM) and uncached input Tokens Per Minute (ITPM).

What counts towards ITPM?

  • Input TPM — tokens in the request after the last cache breakpoint (uncached input).
  • Cache write 5m TPM — tokens being written to the 5-minute prompt cache.
  • Cache write 1h TPM — tokens being written to the 1-hour prompt cache.

What doesn't count towards ITPM?

  • Output tokens (including tokens read from cache) don't count towards uiTPM.

For more information about rate limits and cache, see Claude API Docs: Rate limits.

Rate limits by subscription type

Your Azure subscription type determines your rate limits. The Version 2: Hosted on Azure and Version 1: Hosted on Anthropic infrastructure columns indicate whether quota is available for that model and deployment type combination. Yes means quota is available. N/A means the model and version combination don't have quota for that deployment type.

As listed in the following table, to increase your quota beyond the default limits, submit a request through the quota increase request form.

Pay-as-you-go

Model Version 2: Hosted on Azure Version 1: Hosted on Anthropic infrastructure Deployment type RPM ITPM
claude-fable-5 N/A Yes Global Standard 0 0
claude-opus-4-8 Yes Yes Global Standard 40 40,000
claude-opus-4-8 Yes N/A Data Zone Standard (US) 40 40,000
claude-opus-4-7 N/A Yes Global Standard 40 40,000
claude-opus-4-6 N/A Yes Global Standard 40 40,000
claude-opus-4-5 N/A Yes Global Standard 40 40,000
claude-opus-4-1 N/A Yes Global Standard 40 40,000
claude-sonnet-5 Yes Yes Global Standard 40 40,000
claude-sonnet-5 Yes N/A Data Zone Standard (US) 40 40,000
claude-sonnet-4-6 N/A Yes Global Standard 80 80,000
claude-sonnet-4-5 N/A Yes Global Standard 80 80,000
claude-haiku-4-5 Yes Yes Global Standard 80 80,000

Responsible AI considerations

When using Claude models in Foundry, consider these responsible AI practices:

Best practices

Follow these best practices when working with Claude models in Foundry:

Prompt engineering

  • Clear instructions: Provide specific and detailed prompts.
  • Context management: Use the available context window effectively.
  • Role definitions: Use system messages to define the assistant's role and behavior.
  • Structured prompts: Use consistent formatting for better results.

Cost optimization

To optimize your usage and avoid rate limiting:

  • Implement retry logic: Handle 429 responses with exponential backoff.
  • Batch requests: Combine multiple prompts when possible.
  • Monitor token usage: Track your token consumption and request patterns.
  • Use appropriate models: Use the most cost-effective model for your use case. See Available Claude models.