The first version of this orchestration failed in a very specific way: the mapper kept producing templates that looked plausible to a human and were wrong for the platform. The failure was not that the model was incapable of reasoning. The failure was that I had asked too much of a single, broad conversation, so the wrong structure got accepted early and then propagated all the way to the end. By the time the validator complained, the chain had already lost the distinction between analysis, mapping, generation, and approval.
That is the point where I stopped treating the system like a prompt stack and started treating it like a state machine. In the workflow analyzer SaaS, the code in runner/azure_foundry/src/orchestrator.py does the important work: it creates the kernel, registers the plugins, assembles the agents, defines who speaks next, and decides when the run is done. The state manager in runner/azure_foundry/src/state_manager.py sits around that orchestration so a run can resume from the last durable message history instead of inventing a fresh conversation every time. That single design choice changed the behavior of the whole pipeline.
The architecture I wanted was simple to describe and annoying to get right: each agent gets one contract, each contract produces one kind of output, and the validator is the gate that decides whether the output can move forward. The analyzer looks at the workflow analysis and identifies automation opportunity. The mapper turns that opportunity into abstract integration steps. The generator turns the mapped steps into a platform-shaped template. The validator checks the structure and either approves the run or sends it back to the generator.
The part that made this work was not just splitting the roles. It was making the next speaker explicit. That is what AgentGroupChat gives me here: selection strategy, termination strategy, and a history buffer that can be rehydrated before the chat starts. Once those pieces are in place, the system stops behaving like a free-form exchange and starts behaving like a controlled pipeline.
What broke first
The first architecture looked tidy on paper and failed under repetition. I had an analysis step, a mapping step, a generation step, and a validation step, but the conversation itself was too loose. The mapper started inventing platform syntax because nothing in the orchestration made that impossible. The generator then built on top of that invented syntax, and the validator only saw the problem after the wrong shape had already been repeated several times.
The deeper problem showed up on retries. If the run timed out or the validator rejected the output, the next attempt often lost the conversation history that explained why the output had failed. That meant the system retried from a weak starting point and repeated the same mistake. The loop did not need more creativity. It needed a strict memory of what had already happened and a strict rule about which agent was allowed to correct which class of mistake.
That is why I moved to a narrow contract chain with validator-led convergence. The validator does not try to repair everything. It does one job: decide whether the generated template is acceptable. If the answer is no, the generator gets another pass. If the answer is yes, the run terminates. That keeps the correction local instead of letting every agent participate in every mistake.
The core loop in AgentGroupChat
The orchestration in runner/azure_foundry/src/orchestrator.py is built around AgentGroupChat with two strategies attached: a selection strategy that decides the next speaker, and a termination strategy that decides when to stop. That is the heart of the pattern.
The selection strategy reads the conversation history and follows a fixed pipeline:
- Analyzer speaks first.
- Mapper speaks second.
- Generator speaks third.
- Validator speaks fourth.
- If the validator returns INVALID, the next speaker is the Generator again.
- If the validator returns WORKFLOW_APPROVED, the chat ends.
That is a very different shape from a single model answering everything inside one giant instruction block. The selection strategy makes the pipeline visible to the runtime. It also gives me a place to encode the retry rule explicitly instead of hiding it inside a paragraph of instructions.
The termination strategy is just as important. The validator is the only agent whose result can end the run, and the run still has a maximum iteration cap of 10 so a bad loop cannot spin forever. That cap matters because agent loops do not usually fail in dramatic ways. They fail by wobbling. A small wobble repeated enough times turns into wasted tokens, delayed jobs, and outputs that never settle.
Here is the shape of that control flow in a standalone Python example that mirrors the same idea. This version does not depend on Azure or Semantic Kernel, but it shows the exact contract I want the orchestrator to enforce:
from dataclasses import dataclass
from enum import Enum
from typing import Any, Dict, List, Optional
class AgentName(str, Enum):
ANALYZER = 'Analyzer'
MAPPER = 'Mapper'
GENERATOR = 'Generator'
VALIDATOR = 'Validator'
@dataclass
class Message:
role: str
content: str
name: Optional[str] = None
def rehydrate_history(history_state: List[Dict[str, Any]]) -> List[Message]:
return [
Message(
role=item.get('role', 'user'),
content=item.get('content', ''),
name=item.get('name')
)
for item in history_state
]
def next_speaker(history: List[Message]) -> Optional[AgentName]:
if not history:
return AgentName.ANALYZER
last = history[-1]
if last.role == AgentName.VALIDATOR.value:
if 'WORKFLOW_APPROVED' in last.content:
return None
if 'INVALID' in last.content:
return AgentName.GENERATOR
order = [AgentName.ANALYZER, AgentName.MAPPER, AgentName.GENERATOR, AgentName.VALIDATOR]
seen = [msg.role for msg in history if msg.role in {agent.value for agent in order}]
if not seen:
return AgentName.ANALYZER
last_agent = AgentName(seen[-1])
idx = order.index(last_agent)
return order[min(idx + 1, len(order) - 1)]
if __name__ == '__main__':
restored = rehydrate_history([
{'role': 'Analyzer', 'content': 'Detected CRM handoff', 'name': 'Analyzer'},
{'role': 'Mapper', 'content': 'Trigger + action pair identified', 'name': 'Mapper'},
{'role': 'Generator', 'content': '{"triggers": [], "actions": []}', 'name': 'Generator'},
{'role': 'Validator', 'content': 'INVALID: missing triggers and actions', 'name': 'Validator'},
])
print(next_speaker(restored))
That small example captures the most important behavior: the validator does not restart the chain, and the chain does not forget where it left off. If the validator rejects the output, the next speaker is the Generator, not the Analyzer, because the analysis step did its job already.
How history gets rehydrated before the run
The resumability story matters because the orchestration runs inside Prompt Flow, not inside a single in-memory toy conversation. In the real flow, state_manager.py loads the prior session history before the orchestrator node runs. That history is passed in as history_state, and the orchestrator reconstructs ChatMessageContent objects from each saved message so AgentGroupChat can continue the conversation instead of starting over.
That detail is easy to miss and expensive to ignore. Without rehydration, every timeout becomes a reset. With rehydration, the system can pick up from the last durable state and keep going. If the validator has already rejected a malformed template, that rejection stays in history. If the mapper already established the target platform and the apps involved, that context remains available. If the previous run ended halfway through generation, the next run does not need to relearn the same facts.
The lifecycle is straightforward:
- load session history from
state_manager.py - reconstitute the saved messages into the chat history
- run the orchestrator with the restored history
- persist the resulting execution state after the run
That is what makes the pipeline durable. The system is not pretending the previous attempt never happened. It is continuing the same conversation with the same constraints.
The template library is not decoration
The template library in runner/template_library gives the generator a starting point that already matches the platform family. That matters because the generator should not be inventing a whole import shape from memory. It should be filling a known skeleton with detected apps, triggers, and actions.
When the generator starts from a template base, it can spend its effort on the parts that actually need reasoning: mapping the workflow into the right platform structure, inserting the right application names, and filling the right fields. The template library keeps it from drifting into malformed top-level structure.
That is also why the validator can stay narrow. If the generator is working from a known shape, the validator does not need to be a general-purpose critic. It only needs to check the contract that matters for the chosen platform: do the expected keys exist, is the structure complete, and does the result satisfy the platform rules well enough to be accepted?
The combination is what matters. The template library constrains the starting point. The generator fills it. The validator checks it. The selection strategy decides whether the system should move forward or send the work back to the generator.
The real runtime wiring in the orchestrator
The orchestrator itself is where the system becomes code-first instead of prompt-first. The kernel is initialized with Azure OpenAI configuration from the environment, the validation plugin is registered, and the template library plugin is registered before the conversation begins. Then the agents are assembled and the selection and termination strategies are attached.
That ordering is important. The model cannot speak before the kernel has its service. The generator cannot rely on the template library unless the plugin is present. The validator cannot enforce the schema unless the validation plugin is loaded. The orchestration code makes those dependencies explicit instead of leaving them implicit in a prose prompt.
The runtime wiring also keeps the Azure configuration outside the orchestration logic itself. The environment supplies the deployment name, endpoint, and API key, and the kernel uses those values when it constructs the Azure chat completion service. That keeps the orchestration focused on behavior rather than connection plumbing.
In practice, the structure looks like this:
The important arrow is the one that returns from Validator to Generator. That is the local correction path. Invalid output does not reset the analysis, erase the mapping, or jump back to the start. It goes back to the one agent responsible for producing the template.
A runnable validator that actually checks structure
The earlier draft included a validator that always returned success, which would have defeated the entire point of the system. The validator must reject malformed output or it is just another polite participant in the conversation. Here is a minimal, runnable version that checks a template shape and returns a real verdict:
import json
from typing import Any, Dict, List
SUPPORTED_PLATFORMS = {'zapier', 'make', 'n8n'}
def select_template(platform: str) -> Dict[str, Any]:
platform_key = platform.lower()
if platform_key not in SUPPORTED_PLATFORMS:
raise ValueError(f'Unsupported platform: {platform}')
base = {
'name': f'{platform_key}_automation',
'triggers': [],
'actions': [],
}
return json.loads(json.dumps(base))
def validate_template(template_json: str, platform: str) -> Dict[str, Any]:
errors: List[str] = []
try:
template = json.loads(template_json)
except json.JSONDecodeError as exc:
return {'status': 'invalid', 'errors': [f'Invalid JSON: {exc}']}
for key in ('name', 'triggers', 'actions'):
if key not in template:
errors.append(f'Missing key: {key}')
if platform.lower() not in SUPPORTED_PLATFORMS:
errors.append(f'Unsupported platform: {platform}')
if not isinstance(template.get('triggers'), list) or not template.get('triggers'):
errors.append('At least one trigger is required')
if not isinstance(template.get('actions'), list) or not template.get('actions'):
errors.append('At least one action is required')
return {
'status': 'valid' if not errors else 'invalid',
'errors': errors,
}
if __name__ == '__main__':
bad = json.dumps({'name': 'demo', 'triggers': [], 'actions': []})
good = json.dumps({'name': 'demo', 'triggers': ['new_event'], 'actions': ['create_record']})
print(validate_template(bad, 'Zapier'))
print(validate_template(good, 'Zapier'))
That is the behavior I want the real validator path to approximate: reject missing structure, reject empty structure, and return a verdict that the selection strategy can use to decide the next speaker. The important part is not the exact shape of this standalone example. The important part is the discipline it expresses. Validation is not a suggestion. It is a gate.
Why this pattern is easier to trust
The reason this design works is that every failure has one owner. If analysis is weak, the Analyzer is the problem. If the mapping invents structure, the Mapper is the problem. If the generated template does not fit the platform, the Generator is the problem. If the template violates the contract, the Validator says so. That clarity makes debugging much easier because I am not trying to infer which layer went wrong from a cloud of blended instructions.
It also makes the retry logic sane. When the validator rejects the output, I do not want the entire conversation to restart. I want the generator to take another pass with the same context still in memory. That is what the selection strategy enforces. The orchestrator reads the last message, sees the invalid verdict, and routes execution back to the generator. No one else needs to renegotiate the earlier steps.
State rehydration matters here because the retry is not the same as a fresh run. A fresh run throws away the exact information I need most: what failed, what was already agreed, and which platform shape was already chosen. Rehydration preserves that state so the next attempt can repair the actual fault instead of repeating the whole conversation.
The result is a system that can fail locally without collapsing globally. That sounds like a small thing until you watch a long run survive a rejection, recover from persisted history, and land on a valid template without re-deriving the entire workflow from scratch.
Closing the loop
The strongest part of this orchestration is not that it uses multiple agents. It is that the agents are constrained by code, the next speaker is chosen by strategy, the validator owns the final gate, and the session history survives retries. Once those pieces were in place, the mapper stopped drifting and the generator stopped improvising against the wrong shape.
That gave me exactly what I wanted from the workflow analyzer SaaS: a chain that can analyze, map, generate, validate, and resume without pretending a failed attempt never happened. The next thing I am interested in is pushing more of the template shape into the library itself so the generator begins with an even tighter platform skeleton and the validator has less to reject in the first place.
