When should I choose Semantic Kernel over Azure Prompt Flow?

Semantic Kernel is the right choice when you need programmatic control over agent behaviour — complex multi-step reasoning, custom tool orchestration, or tight integration with your existing .NET or Python codebase. Prompt Flow is better suited for evaluation-heavy workflows, rapid prototyping, and teams that prefer a visual flow designer. Most production agents start in Prompt Flow for prototyping and graduate to Semantic Kernel for production orchestration.

What are the most common architecture mistakes in first Azure AI agents?

The three most common mistakes are: (1) Not chunking documents correctly for RAG — overlapping chunks with 10–15% overlap significantly improve retrieval quality. (2) Using a single large system prompt instead of layered context — role prompt, retrieved context, and conversation history should be managed separately. (3) No evaluation pipeline — shipping without Prompt Flow evaluation means you have no baseline to measure regressions against when you change the prompt or model.

How do I handle memory and conversation history in Azure agents?

Azure AI agents use three memory layers: in-context memory (current conversation window, typically 128k tokens with GPT-4o), external short-term memory (Azure Cosmos DB for session state, persisted across requests), and long-term memory (Azure AI Search vector index for user preferences and historical context). Semantic Kernel manages context window trimming automatically, dropping oldest messages when the window approaches the model's limit.

What security controls should I implement from day one?

Four non-negotiable controls from the first deployment: Managed Identity for all service-to-service authentication (no connection strings), Azure Key Vault for all secrets referenced by the agent, Private Endpoints for Azure OpenAI and Azure AI Search (no public endpoint), and Prompt Flow content safety filters with jailbreak and hate/violence detection enabled. These controls are significantly harder to retrofit than to implement from day one.

Implementation Guide9 min read

How to architect your first Azure AI Foundry agent: A practitioner's checklist

Most first Azure AI agents fail before a line of code is written. The architecture decisions made in the first week — orchestration layer, retrieval strategy, authentication approach — determine whether you get a working production agent or an expensive demo. Here are the 14 questions you must answer before you start.

The 3 architecture decisions that determine everything

Before you open Azure AI Foundry, three architecture decisions must be made. Get these right, and the build is straightforward. Get them wrong, and you will rebuild from scratch in week four.

Orchestration Layer

Semantic Kernel (code-first, complex multi-step agents, Python/C#), Copilot Studio (low-code, Teams-native, bounded use cases), or direct Azure OpenAI API (simple stateless completions). Most enterprise production agents use Semantic Kernel.

Retrieval Strategy

Azure AI Search RAG (right for most document-grounded use cases), fine-tuning (for domain vocabulary, not facts), or no retrieval (for pure reasoning tasks). When in doubt, start with RAG — it's easier to iterate and doesn't require retraining.

Deployment Target

Managed online endpoint via Prompt Flow (recommended — versioning, A/B testing, monitoring included), Azure Function (for event-driven stateless tasks), or Logic App (for low-code workflow integration). Most production agents should be on managed endpoints.

The most important question

“What is the single, specific job this agent will do?” Write it in one sentence. If you need two sentences, you are scoping two agents. Build one first. Agents scoped too broadly are the single most common reason Azure AI implementations fail.

The 14-question pre-build checklist

Answer every question below before writing a line of code. These answers become the specification for your agent — they inform your system prompt, your Semantic Kernel plugin design, your Azure AI Search index configuration, and your Prompt Flow evaluation pipeline.

What is the single job this agent does?

Write it in one sentence. 'Resolve customer support queries about order status and refunds.' If you need two sentences, you are scoping two agents. This sentence becomes the foundation of every design decision that follows.

What data does the agent need to complete that job?

List every data source: CRM records, knowledge base documents, product catalogue, order management system. For each source, identify how it will be made available — Azure AI Search index, direct API call, or Semantic Kernel plugin.

Who is authorised to use this agent?

Define the identity boundary. Is it internal employees only (Entra ID authentication)? External customers (Azure B2C)? Both? Access control must be designed before build, not bolted on after.

What should it never do?

The 'never do' list goes into your system prompt as hard constraints. Common examples: never make refund commitments above a threshold, never discuss competitor products, never access records the user is not authorised to view, never process transactions without human approval.

How does it authenticate to backend systems?

For internal Azure services: Managed Identity (no stored credentials — this is always the right answer in Azure). For external APIs: Azure Key Vault for secrets, never hardcoded. Define this before the build starts.

What is the retrieval strategy?

Three options: (1) Azure AI Search RAG — right for most document-grounded use cases; (2) Fine-tuning — right when the model needs to embody a specific style or domain vocabulary, not to 'know facts'; (3) No retrieval — right for reasoning or generation tasks with no knowledge requirement. Most enterprise agents use option 1.

How do you measure success?

Define your success metric before go-live, not after. Example: '65% autonomous resolution rate with CSAT ≥ 4.2/5 at 30 days post-launch.' This becomes your pilot pass/fail criterion and the benchmark for ongoing monitoring.

What triggers the agent?

Is it user-initiated (conversational), event-triggered (new document uploaded, new case created), or scheduled (daily report generation)? The trigger type determines whether you use Copilot Studio, a Logic App, an Azure Function, or a Semantic Kernel orchestration loop.

How does it escalate to a human?

Every production agent needs an escalation path. Define: the conditions that trigger escalation (complexity threshold, negative sentiment, explicit user request, agent confidence below threshold), the destination (Teams notification, CRM ticket, email), and what context is passed to the human.

What is the content safety configuration?

Azure AI Content Safety is on by default. Configure severity thresholds for hate, violence, sexual, and self-harm categories. For regulated industries, also configure PII detection and output filtering. Do this before UAT, not after go-live.

How is PII handled?

Identify PII fields that must never enter LLM context (SSNs, payment card numbers, health record IDs). Use Azure AI Content Safety PII detection to strip or mask these fields at the application layer before they reach Azure OpenAI.

What is the evaluation dataset?

You need a minimum of 100–200 representative test cases before go-live. Each case: input + expected output + pass/fail criteria. Build this dataset in parallel with the agent — not after it's built. Prompt Flow uses this dataset for automated evaluation on every deployment.

What is the monitoring strategy?

Define upfront: which metrics you'll track (response latency, token cost, quality score, escalation rate), where they'll be logged (Azure Monitor, Application Insights), and what thresholds trigger alerts. The Prompt Flow deployment dashboard handles this if configured correctly.

What is the rollback plan?

Azure AI Foundry Prompt Flow deployments support traffic splitting and instant rollback. Define your rollback trigger (quality score drops below X, error rate above Y) and the rollback procedure before deploying to production. Test it in staging.

The 5 most common first-agent mistakes

After building dozens of Azure AI agents in production, these are the failure patterns we see repeatedly.

Building a general-purpose assistant

An agent that handles everything handles nothing well. One agent, one job, one well-defined scope.

Skipping content safety configuration

Azure AI Content Safety is on by default but using default thresholds. For regulated industries, you must configure it explicitly — default thresholds are not compliance.

No evaluation framework before go-live

Deploying without a Prompt Flow evaluation pipeline means you do not know your agent's error rate until users find it. Build the eval dataset in parallel with the agent.

Hardcoded prompts with no versioning

A system prompt that lives in code and isn't tracked in Prompt Flow cannot be evaluated, A/B tested, or rolled back. Use Prompt Flow from day one.

No private networking on Azure OpenAI

For any production enterprise deployment, traffic to Azure OpenAI should traverse private endpoints within your VNet — not the public internet. Configure this at the start, not when a security audit flags it.

Choosing your orchestration layer

The orchestration layer is the single most consequential technical decision in an Azure AI agent build. Here is how to choose.

Factor	Semantic Kernel	Copilot Studio	Direct API
Best for	Complex multi-step agents	Teams-native, bounded use cases	Simple stateless completions
Skill required	Python / C# / Java engineers	Power Platform users	Any developer
Multi-step reasoning	Full support	Limited	Manual implementation
Custom tool plugins	Full support	Limited via connectors	Manual implementation
Production monitoring	Via Prompt Flow	Built-in analytics	Manual via App Insights
Teams deployment	Via custom bot	Native	Via custom bot

Technical Deep Dive

Azure OpenAI vs OpenAI API: What changes when you deploy in Azure?

Service

AI Agent Design & Build — end-to-end agent engineering on Azure

2-week risk-free pilot

Ready to build your first Azure AI Foundry agent?

We handle the architecture decisions, build the evaluation pipeline, and deploy to production. Fixed price. Zero delivery risk.