Is Azure AI Foundry free?

Azure AI Foundry itself is free — there is no platform fee. You pay for the underlying services it orchestrates: Azure OpenAI (per token), Azure AI Search (per month by tier), Azure Machine Learning compute (per vCPU-hour if used), and Azure infrastructure (storage, networking, API Management). A small pilot using GPT-4o-mini with Azure AI Search S1 typically costs $500–$1,500/month in total.

What is the cheapest production-ready Azure AI Foundry setup?

The minimum production configuration is: Azure OpenAI with GPT-4o-mini (approximately $0.15/$0.60 per million input/output tokens), Azure AI Search Standard S1 ($245/month), Azure Container Apps or Functions for hosting (typically $20–$80/month), and Azure Monitor for basic observability ($30–$50/month). Total: approximately $400–$600/month before token consumption. This covers a single agent with low-to-medium query volume.

How do I estimate my Azure OpenAI token costs?

The formula is: Monthly token cost = (Monthly query volume × average input tokens per query × input price per million) + (Monthly query volume × average output tokens per query × output price per million). For GPT-4o at $2.50/$10.00 per million tokens: 10,000 queries × 800 average input tokens × $2.50/1M = $20/month in input costs. 10,000 queries × 300 average output tokens × $10.00/1M = $30/month in output costs. Total: $50/month for 10,000 queries. Add RAG retrieved context (typically 1,000–3,000 additional input tokens per query) for a more accurate estimate.

What is the ROI timeline for Azure AI Foundry?

For most mid-size deployments, the ROI break-even point occurs in months 4–10. Month 1–2 are the pilot and measurement baseline period. Month 3–6 typically see first ROI positive, with accumulated net benefit exceeding accumulated cost. Months 6–18 show compounding returns as scope expands. 3-year ROI on mid-size deployments typically ranges from 3–8x on conservative estimates, and higher when multiple ROI levers (labour reduction, error elimination, throughput increase) are activated simultaneously.

Playbook · Azure AI Foundry8 min read

Azure AI Foundry Pricing Guide 2026: What enterprise AI actually costs

Azure AI Foundry itself is free — you pay for the underlying services it orchestrates. Understanding the actual cost of a production AI agent deployment requires breaking down five distinct cost categories, each with its own pricing model and optimisation levers. This guide gives you real numbers and a framework to build an honest budget.

Written by Kovil AI engineers · Updated May 2026

In this guide

01The 5 cost categories 02Monthly costs by deployment type 03Token cost optimisation 04Hidden costs to budget for 05ROI calculation framework

The 5 cost categories

Azure AI Foundry sits above a set of Azure services you provision and pay for directly. Every production deployment spans some or all of these five categories. None are optional for a serious enterprise deployment — they each serve a distinct function.

Azure OpenAI — token-based compute

The largest variable cost for most deployments. GPT-4o is priced at approximately $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, East US region, 2026 pricing). GPT-4o-mini runs at roughly $0.15/$0.60 per million tokens — 94% cheaper on input. For a customer service agent handling 10,000 queries per month at an average of 800 input tokens and 300 output tokens per exchange, GPT-4o costs approximately $200/month in model inference alone. Provisioned throughput units (PTUs) offer 30–50% savings at committed volume.

Azure AI Search — retrieval infrastructure

Required for any RAG-based agent. Standard tier (S1) starts at approximately $245/month per search unit and is the minimum for production workloads — Free tier is single-replica, single-partition, and not suitable for enterprise use. A typical enterprise knowledge base (100k–500k documents, 1536-dimensional vectors) requires S1 or S2 depending on index size. Storage for vector indexes is priced additionally at roughly $0.025 per GB per month. Budget $245–$980/month for AI Search depending on document volume and query throughput requirements.

Azure Machine Learning compute — fine-tuning and batch

Only relevant if you are fine-tuning models or running batch inference jobs. Fine-tuning GPT-4o requires a quota request and is priced per training token ($0.008/1k tokens) plus inference cost. For deployments using only RAG (the majority), Azure ML compute cost is minimal — limited to any scheduled evaluation or batch jobs you run. Most production agents that don't fine-tune spend less than $50/month on ML compute for evaluation pipelines.

Microsoft Copilot Studio — if you use low-code orchestration

Copilot Studio is licensed at $200 per month for 2,000 message sessions, with each additional 1,000 sessions costing approximately $100. A 'session' is up to 60 minutes of interaction with a single user. For Teams-embedded agents with moderate volume (under 2,000 sessions/month), this is the most cost-effective orchestration layer. For high-volume deployments, Semantic Kernel orchestration via Azure Functions is typically cheaper because you pay for compute rather than sessions.

Azure infrastructure — the sum of the small parts

Storage (Azure Blob for documents, $0.018/GB/month for hot tier), networking (egress from Azure is $0.087/GB after 5GB/month), Azure API Management (Developer tier free, Standard from $175/month), Azure Monitor and Application Insights ($2.30/GB ingested), Azure Key Vault ($0.03 per 10,000 operations), and Azure Container Apps or Functions for hosting your agent code ($0.000016 per vCPU-second). These individually look small but aggregate to $100–$500/month for a typical medium deployment.

Typical monthly costs by deployment type

These ranges are based on deployments we have built and operated. They assume pay-as-you-go pricing on GPT-4o, Standard tier AI Search, and no fine-tuning. Committed-use agreements and PTUs can reduce the AI compute component by 30–50% at scale.

Deployment type	Configuration	Monthly cost range
Small pilot	1 agent, GPT-4o-mini or GPT-4o, low query volume (<5k/mo), AI Search S1, no fine-tuning	$500 – $1,500
Medium deployment	3–5 agents, GPT-4o, medium volume (5k–50k queries/mo), AI Search S1–S2, Copilot Studio or Semantic Kernel	$2,000 – $8,000
Large rollout	10+ agents, GPT-4o with PTUs, high volume (50k–500k queries/mo), AI Search S2–S3, API Management Standard	$10,000 – $50,000
Enterprise scale	Multi-region, dedicated PTU capacity, multiple AI Search indexes, full observability stack	$50,000+

Cost modelling tip

The biggest swing factor is query volume and average context length. A 10x increase in query volume does not produce a 10x cost increase — most of the infrastructure cost (AI Search, API Management, monitoring) is fixed. Token costs scale linearly with volume, but that's only one of five cost categories. Run your cost model at 3x your expected volume to stress-test the business case.

Token cost optimisation

Token costs are the most optimisable expense in your AI deployment. We have seen 40–70% token cost reductions on production deployments through systematic application of these strategies — without any degradation in output quality.

Model selection: right-size for the task

GPT-4o-mini handles the majority of factual Q&A, summarisation, classification, and structured extraction tasks at 94% lower cost than GPT-4o. Reserve GPT-4o for complex reasoning, nuanced judgement, or multi-step planning tasks. A production deployment that uses GPT-4o-mini for 80% of queries and GPT-4o for 20% of complex queries spends roughly 55% less on model inference than an all-GPT-4o deployment.

Prompt caching: pay once, retrieve cheaply

Azure OpenAI supports prompt caching for system prompts and static context. If your system prompt is 2,000 tokens and you handle 10,000 queries per month, prompt caching eliminates 20 million input tokens per month — approximately $50/month saved on GPT-4o at current pricing. Caching is enabled by default for compatible deployments; the key is structuring your prompts so the static prefix (system prompt, RAG context) comes first and the dynamic user input comes last.

Context window management

Conversation history is the silent budget killer. A naive implementation that appends every message to context produces an exponentially growing context window. Implement conversation summarisation at 4–6 turns: replace the full history with a compressed summary, preserving key facts and decisions. Semantic Kernel's KernelFunction-based memory compression handles this automatically when configured correctly.

RAG chunk sizing and retrieval precision

Returning 10 irrelevant 500-token chunks from Azure AI Search costs more than returning 3 highly relevant 300-token chunks. Use hybrid search (vector + BM25) with reciprocal rank fusion to improve retrieval precision. Set your top-K parameter (k=3 or k=5) based on empirical testing of your dataset, not defaults. Every unnecessary token in retrieved context is a direct cost.

Batching and async processing

For non-real-time workloads (document summarisation, batch classification, scheduled report generation), use the Azure OpenAI Batch API. Batch processing is priced at 50% of the standard token rate. A nightly document processing pipeline that costs $200/month in real-time API calls costs $100/month via the Batch API — no code changes beyond async submission.

Hidden costs to budget for

Every AI deployment budget we have reviewed underestimates these items. They are not surprising in retrospect — but they rarely appear in initial cost models because they are not line items in the Azure pricing calculator.

Data ingestion pipeline

One-time and ongoing cost to extract, chunk, enrich, and index your documents into Azure AI Search. For 100k documents, initial ingestion engineering is typically 2–4 weeks of developer time. Ongoing indexing of new documents requires an automated pipeline (Azure Functions or Logic App) — budget $50–$200/month for compute.

Index maintenance and re-indexing

Azure AI Search indexes degrade as your document corpus evolves. Plan for quarterly re-indexing runs as document chunking strategies and embedding models improve. Each re-index of a large corpus consumes significant Azure AI Search query and indexing units — budget 10–20% of your monthly AI Search cost for index maintenance operations.

Azure AI Content Safety filtering

Content Safety is priced per 1,000 text records analyzed ($1–$2.50/1k depending on tier and features). For a deployment handling 50k queries per month with both input and output filtering, budget $100–$250/month specifically for content safety.

Monitoring, logging, and alerting

Application Insights and Azure Monitor data ingestion adds up quickly. A production agent generating detailed traces (recommended for quality evaluation) generates 1–5 GB of telemetry per 10k queries. At $2.30/GB ingested, a high-volume deployment can easily spend $200–$500/month on observability alone.

Developer and ML engineer time

The most underestimated cost of all. Prompt engineering, evaluation dataset curation, model evaluation runs, and ongoing quality monitoring require sustained engineering effort. A production AI agent typically requires 0.25–0.5 FTE of ongoing maintenance per agent — prompt updates, retrieval quality improvements, evaluation refreshes, incident response. This cost is real even though it does not appear on your Azure bill.

ROI calculation framework

A sound business case does not start from the technology cost — it starts from the current cost of the process you are automating. Here is the framework we use with every client before a build begins.

The ROI formula

Annual Benefit = (Hours saved × blended FTE rate)

+ (Error reduction × cost per error)

+ (Throughput gain × revenue per unit)

Annual Net Benefit = Annual Benefit

− (Infrastructure cost)

− (Maintenance FTE cost)

− (Amortised implementation cost)

The key to a credible business case is being conservative on automation rate. A process that takes 30 minutes of human time cannot save 30 minutes when automated — humans will spend 5–10 minutes reviewing AI output, handling escalations, and managing exceptions. A realistic automation saving is 60–75% of the original process time, not 100%.

Conservative

3–4x

3-year ROI

Small team, moderate volume, careful phased rollout

Typical

4–6x

3-year ROI

Mid-size deployment, clear process scope, good baseline data

Best case

8–12x

3-year ROI

High-volume repetitive process, strong data quality, quick scale-up

Key takeaways

Azure AI Foundry is free — costs come from the underlying services: Azure OpenAI, AI Search, and infrastructure.
Small pilots typically cost $500–$1,500/month. Medium enterprise deployments run $2,000–$8,000/month.
Token costs are the most optimisable expense — model selection and prompt caching alone can cut inference costs by 50–70%.
Budget explicitly for hidden costs: data ingestion, index maintenance, content safety filtering, and ongoing developer time.
Build your ROI case from current process cost, not from technology cost. Use conservative automation rates (60–75%, not 100%).
Provisioned throughput units (PTUs) become cost-effective above roughly 50,000 queries per month.

Playbook

The Azure AI Foundry ROI guide: How to build a business case that actually holds up

Playbook

Azure AI Foundry Security & Compliance: The complete enterprise configuration guide

Implementation Guide

How to architect your first Azure AI Foundry agent: A practitioner's checklist

Service

AI Agent Design & Build — end-to-end agent engineering on Azure

Free cost estimate

Want a real cost model for your deployment?

We build accurate cost models based on your actual query volumes, document corpus, and process scope — not generic estimates. No obligation.

Azure AI Practice

Explore more Azure AI Foundry content

View full practice hub