💻Video + Code Examples·9 mins

The Orchestrator

Nilay Parikh

Show the CleanLoop orchestrator as the real control surface. This lesson explains the reader, repair forge, and crucible split, traces one bounded loop run, and shows why dashboard evidence matters before the system gets more autonomous.

Thumbnail for The Orchestrator — The Orchestrator · 9 mins

Transcript18 entries

Instructor:Now the genome is clear. The next question is control: who reads the failure, who asks for the next change, and who decides whether that change should stay? That is the job of the orchestrator. You can think of it as the brain of the loop, but it only works because the judge and the genome were bounded correctly first. Think of this course as one continuous example. Each lesson adds new modules and components, but the earlier lessons are still running underneath. Lesson 01 defined the loop boundary

Instructor:and placed AutoGen at the orchestration seam. Lesson 02 defined the genome as the one place where changes can happen. This lesson builds on those parts. We are not starting again. We are extending the same system. Now look at the diagram. This is the control layer of the system. It reads the failure, asks for a fix, runs the test, and then either keeps the change or rejects it. The first job is diagnosis. The loop reads the current

Instructor:genome and the latest error so the next step is grounded in feedback instead of guesses. Then the proposal step comes in. It takes that context and suggests a small change, not a full rewrite, just a focused repair. Nothing is accepted because it looks clever. The system runs a deterministic evaluation and keeps track of what actually passes. Source control becomes part of the decision tree. Anything that fails is rejected and reverted.

Instructor:So the orchestrator is not a black box. It is a layered structure. It combines deterministic control with one AI proposal step. That is what makes the system reliable and repeatable. Now let us go through the code and follow the orchestration flow. Focus on where the control path is fixed and where the AI suggestion changes. That separation is the critical seam. If that boundary is not clear in the code, it will create problems in real use.

Instructor:We are now in the hands-on lab for Lesson 03, the orchestrator lesson. In the first lesson, we covered the full design of this example. In the lesson documents, you can find the markdown file and the diagram we already saw. Now we move to the execution loop where this lesson lives: one loop around the system. Bear in mind that this example is sequential, but the broader idea is not limited to one orchestration style. In real projects, the orchestration could be parallel, sequential, or a mix of both. Any agentic

Instructor:architecture is acceptable as long as the control responsibilities stay clear. Here, the loop restores the genome, runs the starter, repairs the iteration, captures the baseline, and then continues until it either improves or exits. This is the seam between deterministic and non-deterministic behavior. The loop passes grounded feedback into the proposal

Instructor:step, and then the LLM suggests the next mutation. That candidate runs through another round of evaluation. The system checks whether the code improved and whether the result still holds under the fixed judge. Based on that, it decides whether to continue or stop. I have also left code anchors in the lesson notes. I strongly recommend reviewing those anchors because they show exactly how the orchestration path works. The commands are simple. Start with status. Then verify. Then reset. After reset,

Instructor:run evaluate so you can confirm the baseline. In this case, we expect 78 rows, 13 matches, and 48 missing rows against the gold data. Once that baseline is clear, run one max iteration and watch what the system does. While that run is happening, open the dashboard.

Instructor:The dashboard helps you inspect the current artifacts, the score line, and the evidence surface. You can also see the current genome view and the trace decisions that explain

Instructor:what the loop is doing on individual rows. For example, you can inspect one row and see whether it passed through the deterministic path or whether it fell through to the mutation playbook. That distinction matters. It shows where the fixed logic still works and where the loop had

Instructor:to rely on learned mutation behavior. This is also why the dashboard matters so much in Software 3.0 systems. What matters is your control over the boundary between deterministic and non-deterministic components. The dashboard is not decoration. It is how you inspect whether

Instructor:the system is behaving the way you intended. In real implementations, this becomes even more important. You may have multi-pass mutation workflows or distributed agent systems that pass work across several agents before you get a final outcome. When that happens, traces become

Instructor:essential because they explain how each stage behaved and where a decision came from. In this lesson we are only talking about finance invoice data, but the same orchestration pattern can apply to real-time customer queries, logged client calls, or any other process you want to make agentic. The more autonomy you add, the more important the trace surface becomes.

Instructor:Observability is also what lets you increase the pressure safely. Right now the loop operates in a

Instructor:constrained space. If you improve observability, you can gradually widen that space and raise the pressure so the loop can handle more complex discrepancies without losing control. That is where backtesting and trace review start to matter. When you make deterministic changes, you need to know whether the result is a real improvement or a regression. The only way to know that is to inspect what went well,

Instructor:what went wrong, and which use cases improved or degraded. We will cover observability directly in the next lesson, but the reason should already be clear. Focus on the dashboard, not just the loop execution. Understand how the

Instructor:decisions show up in the evidence layer. There are four exercises in this lesson, and I strongly recommend doing them. They help you understand score deltas, stored failures, and how to detect when the loop is getting stuck. They also force you to think about how to use observability as a practical control mechanism instead of a vague reporting layer.

Instructor:Now you can see how the loop works. The orchestrator reads failures, proposes changes, and lets the system decide what deserves to stay. Next, we add tracking and observability in a more explicit way. That will help you see whether the system is improving, stalling, or sending the wrong signals. Thank you for staying with this lesson. If you found it useful, please like, subscribe, and enable notifications so the next lesson reaches you directly. I will see you in the next one.

Trace the Orchestrator Loop

0/3

Open the control path in code

Read the main loop beside the AutoGen runtime so you can see where failure context becomes a bounded mutation request.

bash

cd _examples/self-improving-agent/cleanloop
code loop.py autogen_runtime.py

Run one bounded iteration

Reset the genome, evaluate the baseline, and run one loop round so the control path is visible in the terminal output.

bash

python util.py reset
python util.py evaluate
python util.py loop --max-iterations 1

Inspect dashboard evidence

Open the dashboard and review artifacts, score movement, and trace decisions so you can see why observability is the next lesson.

bash

python util.py dashboard

nilayparikh/tuts-agentic-ai-examples/tree/main/self-improving-agent/cleanloopGitHub

Complete source code for this lesson.

github.com/nilayparikh/tuts-agentic-ai-examples/tree/main/self-improving-agent/cleanloop

Q&A

Q & A

Why call the orchestrator the real control surface?

Because it owns the sequence that turns failure into the next verified attempt. Without that control shell, you only have raw suggestions and no reliable survival rule.

Where should AutoGen sit in this lesson?

Inside the bounded proposal step. It can suggest the next mutation, but the fixed evaluation path still decides whether the candidate survives.

Why end an orchestration lesson on dashboards and traces?

Because orchestration without observability does not scale. If you cannot inspect the evidence trail, you cannot safely widen the search space or trust the next stage of autonomy.