Most first Azure AI agents fail before a line of code is written. The architecture decisions made in the first week — orchestration layer, retrieval strategy, authentication approach — determine whether you get a working production agent or an expensive demo. Here are the 14 questions you must answer before you start.
Before you open Azure AI Foundry, three architecture decisions must be made. Get these right, and the build is straightforward. Get them wrong, and you will rebuild from scratch in week four.
Semantic Kernel (code-first, complex multi-step agents, Python/C#), Copilot Studio (low-code, Teams-native, bounded use cases), or direct Azure OpenAI API (simple stateless completions). Most enterprise production agents use Semantic Kernel.
Azure AI Search RAG (right for most document-grounded use cases), fine-tuning (for domain vocabulary, not facts), or no retrieval (for pure reasoning tasks). When in doubt, start with RAG — it's easier to iterate and doesn't require retraining.
Managed online endpoint via Prompt Flow (recommended — versioning, A/B testing, monitoring included), Azure Function (for event-driven stateless tasks), or Logic App (for low-code workflow integration). Most production agents should be on managed endpoints.
The most important question
“What is the single, specific job this agent will do?” Write it in one sentence. If you need two sentences, you are scoping two agents. Build one first. Agents scoped too broadly are the single most common reason Azure AI implementations fail.
Answer every question below before writing a line of code. These answers become the specification for your agent — they inform your system prompt, your Semantic Kernel plugin design, your Azure AI Search index configuration, and your Prompt Flow evaluation pipeline.
Write it in one sentence. 'Resolve customer support queries about order status and refunds.' If you need two sentences, you are scoping two agents. This sentence becomes the foundation of every design decision that follows.
List every data source: CRM records, knowledge base documents, product catalogue, order management system. For each source, identify how it will be made available — Azure AI Search index, direct API call, or Semantic Kernel plugin.
Define the identity boundary. Is it internal employees only (Entra ID authentication)? External customers (Azure B2C)? Both? Access control must be designed before build, not bolted on after.
The 'never do' list goes into your system prompt as hard constraints. Common examples: never make refund commitments above a threshold, never discuss competitor products, never access records the user is not authorised to view, never process transactions without human approval.
For internal Azure services: Managed Identity (no stored credentials — this is always the right answer in Azure). For external APIs: Azure Key Vault for secrets, never hardcoded. Define this before the build starts.
Three options: (1) Azure AI Search RAG — right for most document-grounded use cases; (2) Fine-tuning — right when the model needs to embody a specific style or domain vocabulary, not to 'know facts'; (3) No retrieval — right for reasoning or generation tasks with no knowledge requirement. Most enterprise agents use option 1.
Define your success metric before go-live, not after. Example: '65% autonomous resolution rate with CSAT ≥ 4.2/5 at 30 days post-launch.' This becomes your pilot pass/fail criterion and the benchmark for ongoing monitoring.
Is it user-initiated (conversational), event-triggered (new document uploaded, new case created), or scheduled (daily report generation)? The trigger type determines whether you use Copilot Studio, a Logic App, an Azure Function, or a Semantic Kernel orchestration loop.
Every production agent needs an escalation path. Define: the conditions that trigger escalation (complexity threshold, negative sentiment, explicit user request, agent confidence below threshold), the destination (Teams notification, CRM ticket, email), and what context is passed to the human.
Azure AI Content Safety is on by default. Configure severity thresholds for hate, violence, sexual, and self-harm categories. For regulated industries, also configure PII detection and output filtering. Do this before UAT, not after go-live.
Identify PII fields that must never enter LLM context (SSNs, payment card numbers, health record IDs). Use Azure AI Content Safety PII detection to strip or mask these fields at the application layer before they reach Azure OpenAI.
You need a minimum of 100–200 representative test cases before go-live. Each case: input + expected output + pass/fail criteria. Build this dataset in parallel with the agent — not after it's built. Prompt Flow uses this dataset for automated evaluation on every deployment.
Define upfront: which metrics you'll track (response latency, token cost, quality score, escalation rate), where they'll be logged (Azure Monitor, Application Insights), and what thresholds trigger alerts. The Prompt Flow deployment dashboard handles this if configured correctly.
Azure AI Foundry Prompt Flow deployments support traffic splitting and instant rollback. Define your rollback trigger (quality score drops below X, error rate above Y) and the rollback procedure before deploying to production. Test it in staging.
After building dozens of Azure AI agents in production, these are the failure patterns we see repeatedly.
Building a general-purpose assistant
An agent that handles everything handles nothing well. One agent, one job, one well-defined scope.
Skipping content safety configuration
Azure AI Content Safety is on by default but using default thresholds. For regulated industries, you must configure it explicitly — default thresholds are not compliance.
No evaluation framework before go-live
Deploying without a Prompt Flow evaluation pipeline means you do not know your agent's error rate until users find it. Build the eval dataset in parallel with the agent.
Hardcoded prompts with no versioning
A system prompt that lives in code and isn't tracked in Prompt Flow cannot be evaluated, A/B tested, or rolled back. Use Prompt Flow from day one.
No private networking on Azure OpenAI
For any production enterprise deployment, traffic to Azure OpenAI should traverse private endpoints within your VNet — not the public internet. Configure this at the start, not when a security audit flags it.
The orchestration layer is the single most consequential technical decision in an Azure AI agent build. Here is how to choose.
| Factor | Semantic Kernel | Copilot Studio | Direct API |
|---|---|---|---|
| Best for | Complex multi-step agents | Teams-native, bounded use cases | Simple stateless completions |
| Skill required | Python / C# / Java engineers | Power Platform users | Any developer |
| Multi-step reasoning | Full support | Limited | Manual implementation |
| Custom tool plugins | Full support | Limited via connectors | Manual implementation |
| Production monitoring | Via Prompt Flow | Built-in analytics | Manual via App Insights |
| Teams deployment | Via custom bot | Native | Via custom bot |