Playbook · Azure AI Foundry8 min read

Azure AI Foundry Pricing Guide 2026: What enterprise AI actually costs

Azure AI Foundry itself is free — you pay for the underlying services it orchestrates. Understanding the actual cost of a production AI agent deployment requires breaking down five distinct cost categories, each with its own pricing model and optimisation levers. This guide gives you real numbers and a framework to build an honest budget.

Written by Kovil AI engineers · Updated May 2026

The 5 cost categories

Azure AI Foundry sits above a set of Azure services you provision and pay for directly. Every production deployment spans some or all of these five categories. None are optional for a serious enterprise deployment — they each serve a distinct function.

01

Azure OpenAI — token-based compute

The largest variable cost for most deployments. GPT-4o is priced at approximately $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, East US region, 2026 pricing). GPT-4o-mini runs at roughly $0.15/$0.60 per million tokens — 94% cheaper on input. For a customer service agent handling 10,000 queries per month at an average of 800 input tokens and 300 output tokens per exchange, GPT-4o costs approximately $200/month in model inference alone. Provisioned throughput units (PTUs) offer 30–50% savings at committed volume.

02

Azure AI Search — retrieval infrastructure

Required for any RAG-based agent. Standard tier (S1) starts at approximately $245/month per search unit and is the minimum for production workloads — Free tier is single-replica, single-partition, and not suitable for enterprise use. A typical enterprise knowledge base (100k–500k documents, 1536-dimensional vectors) requires S1 or S2 depending on index size. Storage for vector indexes is priced additionally at roughly $0.025 per GB per month. Budget $245–$980/month for AI Search depending on document volume and query throughput requirements.

03

Azure Machine Learning compute — fine-tuning and batch

Only relevant if you are fine-tuning models or running batch inference jobs. Fine-tuning GPT-4o requires a quota request and is priced per training token ($0.008/1k tokens) plus inference cost. For deployments using only RAG (the majority), Azure ML compute cost is minimal — limited to any scheduled evaluation or batch jobs you run. Most production agents that don't fine-tune spend less than $50/month on ML compute for evaluation pipelines.

04

Microsoft Copilot Studio — if you use low-code orchestration

Copilot Studio is licensed at $200 per month for 2,000 message sessions, with each additional 1,000 sessions costing approximately $100. A 'session' is up to 60 minutes of interaction with a single user. For Teams-embedded agents with moderate volume (under 2,000 sessions/month), this is the most cost-effective orchestration layer. For high-volume deployments, Semantic Kernel orchestration via Azure Functions is typically cheaper because you pay for compute rather than sessions.

05

Azure infrastructure — the sum of the small parts

Storage (Azure Blob for documents, $0.018/GB/month for hot tier), networking (egress from Azure is $0.087/GB after 5GB/month), Azure API Management (Developer tier free, Standard from $175/month), Azure Monitor and Application Insights ($2.30/GB ingested), Azure Key Vault ($0.03 per 10,000 operations), and Azure Container Apps or Functions for hosting your agent code ($0.000016 per vCPU-second). These individually look small but aggregate to $100–$500/month for a typical medium deployment.

Typical monthly costs by deployment type

These ranges are based on deployments we have built and operated. They assume pay-as-you-go pricing on GPT-4o, Standard tier AI Search, and no fine-tuning. Committed-use agreements and PTUs can reduce the AI compute component by 30–50% at scale.

Deployment typeConfigurationMonthly cost range
Small pilot1 agent, GPT-4o-mini or GPT-4o, low query volume (<5k/mo), AI Search S1, no fine-tuning$500 – $1,500
Medium deployment3–5 agents, GPT-4o, medium volume (5k–50k queries/mo), AI Search S1–S2, Copilot Studio or Semantic Kernel$2,000 – $8,000
Large rollout10+ agents, GPT-4o with PTUs, high volume (50k–500k queries/mo), AI Search S2–S3, API Management Standard$10,000 – $50,000
Enterprise scaleMulti-region, dedicated PTU capacity, multiple AI Search indexes, full observability stack$50,000+

Cost modelling tip

The biggest swing factor is query volume and average context length. A 10x increase in query volume does not produce a 10x cost increase — most of the infrastructure cost (AI Search, API Management, monitoring) is fixed. Token costs scale linearly with volume, but that's only one of five cost categories. Run your cost model at 3x your expected volume to stress-test the business case.

Token cost optimisation

Token costs are the most optimisable expense in your AI deployment. We have seen 40–70% token cost reductions on production deployments through systematic application of these strategies — without any degradation in output quality.

Model selection: right-size for the task

GPT-4o-mini handles the majority of factual Q&A, summarisation, classification, and structured extraction tasks at 94% lower cost than GPT-4o. Reserve GPT-4o for complex reasoning, nuanced judgement, or multi-step planning tasks. A production deployment that uses GPT-4o-mini for 80% of queries and GPT-4o for 20% of complex queries spends roughly 55% less on model inference than an all-GPT-4o deployment.

Prompt caching: pay once, retrieve cheaply

Azure OpenAI supports prompt caching for system prompts and static context. If your system prompt is 2,000 tokens and you handle 10,000 queries per month, prompt caching eliminates 20 million input tokens per month — approximately $50/month saved on GPT-4o at current pricing. Caching is enabled by default for compatible deployments; the key is structuring your prompts so the static prefix (system prompt, RAG context) comes first and the dynamic user input comes last.

Context window management

Conversation history is the silent budget killer. A naive implementation that appends every message to context produces an exponentially growing context window. Implement conversation summarisation at 4–6 turns: replace the full history with a compressed summary, preserving key facts and decisions. Semantic Kernel&apos;s KernelFunction-based memory compression handles this automatically when configured correctly.

RAG chunk sizing and retrieval precision

Returning 10 irrelevant 500-token chunks from Azure AI Search costs more than returning 3 highly relevant 300-token chunks. Use hybrid search (vector + BM25) with reciprocal rank fusion to improve retrieval precision. Set your top-K parameter (k=3 or k=5) based on empirical testing of your dataset, not defaults. Every unnecessary token in retrieved context is a direct cost.

Batching and async processing

For non-real-time workloads (document summarisation, batch classification, scheduled report generation), use the Azure OpenAI Batch API. Batch processing is priced at 50% of the standard token rate. A nightly document processing pipeline that costs $200/month in real-time API calls costs $100/month via the Batch API — no code changes beyond async submission.

Hidden costs to budget for

Every AI deployment budget we have reviewed underestimates these items. They are not surprising in retrospect — but they rarely appear in initial cost models because they are not line items in the Azure pricing calculator.

!

Data ingestion pipeline

One-time and ongoing cost to extract, chunk, enrich, and index your documents into Azure AI Search. For 100k documents, initial ingestion engineering is typically 2–4 weeks of developer time. Ongoing indexing of new documents requires an automated pipeline (Azure Functions or Logic App) — budget $50–$200/month for compute.

!

Index maintenance and re-indexing

Azure AI Search indexes degrade as your document corpus evolves. Plan for quarterly re-indexing runs as document chunking strategies and embedding models improve. Each re-index of a large corpus consumes significant Azure AI Search query and indexing units — budget 10–20% of your monthly AI Search cost for index maintenance operations.

!

Azure AI Content Safety filtering

Content Safety is priced per 1,000 text records analyzed ($1–$2.50/1k depending on tier and features). For a deployment handling 50k queries per month with both input and output filtering, budget $100–$250/month specifically for content safety.

!

Monitoring, logging, and alerting

Application Insights and Azure Monitor data ingestion adds up quickly. A production agent generating detailed traces (recommended for quality evaluation) generates 1–5 GB of telemetry per 10k queries. At $2.30/GB ingested, a high-volume deployment can easily spend $200–$500/month on observability alone.

!

Developer and ML engineer time

The most underestimated cost of all. Prompt engineering, evaluation dataset curation, model evaluation runs, and ongoing quality monitoring require sustained engineering effort. A production AI agent typically requires 0.25–0.5 FTE of ongoing maintenance per agent — prompt updates, retrieval quality improvements, evaluation refreshes, incident response. This cost is real even though it does not appear on your Azure bill.

ROI calculation framework

A sound business case does not start from the technology cost — it starts from the current cost of the process you are automating. Here is the framework we use with every client before a build begins.

The ROI formula

Annual Benefit = (Hours saved × blended FTE rate)

+ (Error reduction × cost per error)

+ (Throughput gain × revenue per unit)

Annual Net Benefit = Annual Benefit

− (Infrastructure cost)

− (Maintenance FTE cost)

− (Amortised implementation cost)

The key to a credible business case is being conservative on automation rate. A process that takes 30 minutes of human time cannot save 30 minutes when automated — humans will spend 5–10 minutes reviewing AI output, handling escalations, and managing exceptions. A realistic automation saving is 60–75% of the original process time, not 100%.

Conservative

3–4x

3-year ROI

Small team, moderate volume, careful phased rollout

Typical

4–6x

3-year ROI

Mid-size deployment, clear process scope, good baseline data

Best case

8–12x

3-year ROI

High-volume repetitive process, strong data quality, quick scale-up

Key takeaways

  • Azure AI Foundry is free — costs come from the underlying services: Azure OpenAI, AI Search, and infrastructure.
  • Small pilots typically cost $500–$1,500/month. Medium enterprise deployments run $2,000–$8,000/month.
  • Token costs are the most optimisable expense — model selection and prompt caching alone can cut inference costs by 50–70%.
  • Budget explicitly for hidden costs: data ingestion, index maintenance, content safety filtering, and ongoing developer time.
  • Build your ROI case from current process cost, not from technology cost. Use conservative automation rates (60–75%, not 100%).
  • Provisioned throughput units (PTUs) become cost-effective above roughly 50,000 queries per month.

Free cost estimate

Want a real cost model for your deployment?

We build accurate cost models based on your actual query volumes, document corpus, and process scope — not generic estimates. No obligation.