Implementation Guide9 min read

How to architect your first Azure AI Foundry agent: A practitioner's checklist

Most first Azure AI agents fail before a line of code is written. The architecture decisions made in the first week — orchestration layer, retrieval strategy, authentication approach — determine whether you get a working production agent or an expensive demo. Here are the 14 questions you must answer before you start.

The 3 architecture decisions that determine everything

Before you open Azure AI Foundry, three architecture decisions must be made. Get these right, and the build is straightforward. Get them wrong, and you will rebuild from scratch in week four.

Orchestration Layer

Semantic Kernel (code-first, complex multi-step agents, Python/C#), Copilot Studio (low-code, Teams-native, bounded use cases), or direct Azure OpenAI API (simple stateless completions). Most enterprise production agents use Semantic Kernel.

Retrieval Strategy

Azure AI Search RAG (right for most document-grounded use cases), fine-tuning (for domain vocabulary, not facts), or no retrieval (for pure reasoning tasks). When in doubt, start with RAG — it's easier to iterate and doesn't require retraining.

Deployment Target

Managed online endpoint via Prompt Flow (recommended — versioning, A/B testing, monitoring included), Azure Function (for event-driven stateless tasks), or Logic App (for low-code workflow integration). Most production agents should be on managed endpoints.

The most important question

“What is the single, specific job this agent will do?” Write it in one sentence. If you need two sentences, you are scoping two agents. Build one first. Agents scoped too broadly are the single most common reason Azure AI implementations fail.

The 14-question pre-build checklist

Answer every question below before writing a line of code. These answers become the specification for your agent — they inform your system prompt, your Semantic Kernel plugin design, your Azure AI Search index configuration, and your Prompt Flow evaluation pipeline.

01

What is the single job this agent does?

Write it in one sentence. 'Resolve customer support queries about order status and refunds.' If you need two sentences, you are scoping two agents. This sentence becomes the foundation of every design decision that follows.

02

What data does the agent need to complete that job?

List every data source: CRM records, knowledge base documents, product catalogue, order management system. For each source, identify how it will be made available — Azure AI Search index, direct API call, or Semantic Kernel plugin.

03

Who is authorised to use this agent?

Define the identity boundary. Is it internal employees only (Entra ID authentication)? External customers (Azure B2C)? Both? Access control must be designed before build, not bolted on after.

04

What should it never do?

The 'never do' list goes into your system prompt as hard constraints. Common examples: never make refund commitments above a threshold, never discuss competitor products, never access records the user is not authorised to view, never process transactions without human approval.

05

How does it authenticate to backend systems?

For internal Azure services: Managed Identity (no stored credentials — this is always the right answer in Azure). For external APIs: Azure Key Vault for secrets, never hardcoded. Define this before the build starts.

06

What is the retrieval strategy?

Three options: (1) Azure AI Search RAG — right for most document-grounded use cases; (2) Fine-tuning — right when the model needs to embody a specific style or domain vocabulary, not to 'know facts'; (3) No retrieval — right for reasoning or generation tasks with no knowledge requirement. Most enterprise agents use option 1.

07

How do you measure success?

Define your success metric before go-live, not after. Example: '65% autonomous resolution rate with CSAT ≥ 4.2/5 at 30 days post-launch.' This becomes your pilot pass/fail criterion and the benchmark for ongoing monitoring.

08

What triggers the agent?

Is it user-initiated (conversational), event-triggered (new document uploaded, new case created), or scheduled (daily report generation)? The trigger type determines whether you use Copilot Studio, a Logic App, an Azure Function, or a Semantic Kernel orchestration loop.

09

How does it escalate to a human?

Every production agent needs an escalation path. Define: the conditions that trigger escalation (complexity threshold, negative sentiment, explicit user request, agent confidence below threshold), the destination (Teams notification, CRM ticket, email), and what context is passed to the human.

10

What is the content safety configuration?

Azure AI Content Safety is on by default. Configure severity thresholds for hate, violence, sexual, and self-harm categories. For regulated industries, also configure PII detection and output filtering. Do this before UAT, not after go-live.

11

How is PII handled?

Identify PII fields that must never enter LLM context (SSNs, payment card numbers, health record IDs). Use Azure AI Content Safety PII detection to strip or mask these fields at the application layer before they reach Azure OpenAI.

12

What is the evaluation dataset?

You need a minimum of 100–200 representative test cases before go-live. Each case: input + expected output + pass/fail criteria. Build this dataset in parallel with the agent — not after it's built. Prompt Flow uses this dataset for automated evaluation on every deployment.

13

What is the monitoring strategy?

Define upfront: which metrics you'll track (response latency, token cost, quality score, escalation rate), where they'll be logged (Azure Monitor, Application Insights), and what thresholds trigger alerts. The Prompt Flow deployment dashboard handles this if configured correctly.

14

What is the rollback plan?

Azure AI Foundry Prompt Flow deployments support traffic splitting and instant rollback. Define your rollback trigger (quality score drops below X, error rate above Y) and the rollback procedure before deploying to production. Test it in staging.

The 5 most common first-agent mistakes

After building dozens of Azure AI agents in production, these are the failure patterns we see repeatedly.

Building a general-purpose assistant

An agent that handles everything handles nothing well. One agent, one job, one well-defined scope.

Skipping content safety configuration

Azure AI Content Safety is on by default but using default thresholds. For regulated industries, you must configure it explicitly — default thresholds are not compliance.

No evaluation framework before go-live

Deploying without a Prompt Flow evaluation pipeline means you do not know your agent's error rate until users find it. Build the eval dataset in parallel with the agent.

Hardcoded prompts with no versioning

A system prompt that lives in code and isn't tracked in Prompt Flow cannot be evaluated, A/B tested, or rolled back. Use Prompt Flow from day one.

No private networking on Azure OpenAI

For any production enterprise deployment, traffic to Azure OpenAI should traverse private endpoints within your VNet — not the public internet. Configure this at the start, not when a security audit flags it.

Choosing your orchestration layer

The orchestration layer is the single most consequential technical decision in an Azure AI agent build. Here is how to choose.

FactorSemantic KernelCopilot StudioDirect API
Best forComplex multi-step agentsTeams-native, bounded use casesSimple stateless completions
Skill requiredPython / C# / Java engineersPower Platform usersAny developer
Multi-step reasoningFull supportLimitedManual implementation
Custom tool pluginsFull supportLimited via connectorsManual implementation
Production monitoringVia Prompt FlowBuilt-in analyticsManual via App Insights
Teams deploymentVia custom botNativeVia custom bot

2-week risk-free pilot

Ready to build your first Azure AI Foundry agent?

We handle the architecture decisions, build the evaluation pipeline, and deploy to production. Fixed price. Zero delivery risk.