Close the CleanLoop course with production safety. This lesson shows how sandboxing, tripwires, reset controls, and graduated autonomy turn a self-improving loop into something you can audit, contain, and actually trust.
Conclusion & Production Safety · 8 minsTranscript24 entries
Instructor:Think of a system like a self-driving car. It's impressive that it can move on its own, but nobody trusts it unless there are brakes, sensors, and override controls. That's exactly what production safety is. A self-improving loop is only valuable and viable if it can operate safely under real constraints. That means containment, anomaly detection, hard stop signals, and controlled autonomy. This lesson is where everything becomes real. You're not just building a loop anymore. We're
Instructor:building something that you can actually deploy. I want you to zoom out for a second, and the entire course is one real-world example. Every lesson added a new mechanism, but none of those earlier pieces have disappeared. They're still running. Let's recap what we have built. A bounded genome, a deterministic orchestrator, a feedback signal that you can actually read, challenge pressure to avoid lazy convergence, and search and reranking for better candidates. Now the question is, can
Instructor:they survive production? We are not adding something optional. We are wrapping up everything that we have built inside a defensible safety model. So keep the system in your head. Mutation, evaluation, feedback, selection. And now we add safety around every step. And that's the final layer. I think of this as a layered defense system. At the core, you've got a sandbox and anomaly detection around it, and then permission control. Here's the mindset shift. You assume
Instructor:things will go wrong. Not maybe, not rarely, always. So instead of hoping your candidates behave, you contain them, like running untrusted code in a sandbox. Let me share a very, very good example that actually happened to us. We had a debugging agent, and that debugging agent actually had root control because it needed to run LLDB or a C++ debugger. Now somehow the LLM figured out that this control had administrative privileges. The LLM was not able to find a way to run certain
Instructor:git commands. However, it sensed that it could, and it actually jailbroke by identifying that one debugging agent could actually fire root execution commands, and it attempted an rm command. However, it was very quickly spotted by us because there was also a hook protecting those particular commands, which we anticipated and did not want to run. However, that gave us so many reasons to worry about it because we knew that this command could actually cause a
Instructor:problem, and we put in a hook. But there might be a command that we didn't know, and it could actually cause real damage. We have seen many online examples where jailbreaking is a real case. So let's always make sure sandboxing is a must. It is not optional. Some failures are fine. Repeated failures are not. You install tripwires. Same failure patterns, same failures, patterns repeating, metrics degrading consistently, unexpected drift. When that all happens, you stop
Instructor:the loop. Not retry, not let it run, just stop. This is where a lot of teams make mistakes. They jump straight to fully autonomous. And that's the classic mistake. Instead, they should move through levels such as manual, assisted, supervised, autonomous. Promotion only happens when evidence proves it is safe. Let's zoom out one last time. Contain execution, monitor behavior, stop damage early, increase autonomy slowly, and that's how you make it real. And before we jump
Instructor:into the lab, the full implementation in the GitHub repo link is below in the description. Make sure you have it open before we do the walkthrough and the controls together. All right, let's close this properly. I'm sure you've got the GitHub repo checked out and everything is ready. What do we do? Inspect the sandbox boundaries. Figure out the failure scenarios and observe. Three questions we should ask: Does it get contained? Does the system detect it? Does it stop or retry? What should you focus on?
Instructor:Where execution is isolated, where the state reset happens, and how the autonomy label is exposed. A production system is not pretending to be fully autonomous. It is showing you control. That's what production systems do. And one last request: if you want to keep building systems like this, hit the notification bell and subscribe to our channel. So we are on the hands-on lab, and the best place to start is 07 production safety. You have a diagram here, and you have three flows which
Instructor:help you understand how the code actually works. We already start safety with containment. There's nothing more we need to add. Human oversight using a dashboard. We already spoke about it. Trust should rise and fall with evidence. Always ensure that your decisions are evidence-based and not assumption-based. That's one of the classic mistakes most teams do make. Reset is a control, not a convenience. The tripwire or the kill switch is very important.
Instructor:If you have seen the advertisements from many large AI companies about the kill-switch engineer, what is that kill-switch engineer? It is exactly this. A kill switch is the most important control in terms of cybersecurity, and never, ever let it lapse. Safe loops need explicit operator modes. Ensure you have clear modes, and between those modes the proper containment boundaries are defined so you can do safe promotions. Now this is more or less, in this lesson, instead of doing everything in real
Instructor:terms, we are going to simulate a lot because bear in mind, if we add real sandboxing, the system will become very complex. So we will make sandboxing a simulation, but that will help us understand what sandboxing should mean. Also, the other aspect that we will check, the autonomy and the promotions, we will basically simulate those things. I'm more interested in explaining the process, and I'm more interested in sharing the pattern
Instructor:and practices, because all of those elements are courses on their own. For example, sandboxing. Sandboxing is a course on its own, around five or six lessons. How we can use Docker or how we can build sandboxing boundaries, least privilege principles, the cloud, and other access points that we have to control. That's quite a lengthy conversation to have. Autonomy is also quite a lengthy conversation to have. So what I'm going to do in this particular
Instructor:course is make them simulated. Simulation will help you understand the concept, and simulation will help you understand what, in real-world cases, you should expect about it, and there will be some nice exercises at the end of it. So it will also help you explore some more ideas and dig for better solutions than what we have implemented, or more advanced solutions than what we have implemented. There are a couple of pins I have added in here. I
Instructor:would strongly recommend you visit those. Those pins will help you understand what the sandbox looks like. In terms of sandbox, we are just doing nothing but firing the exit button, and we are just checking the sandboxing as a wrapper. Then we have autonomy. The autonomy is nothing but just a simple state function that can add trust, and it can let it go on and on as an infinite loop. However, the infinite loop can also have constraints, such as
Instructor:how much tokens it can consume or, in real terms, what amount of money it can spend, for example, five dollars or six dollars, if you have cost calculations as well. So there are a lot of things you can apply there as those elements as well, and here you can see how to run those and how to observe and validate them as well. And there are a lot of exercises as well. So what I'm going to do is basically just clear this up, and we're going to run the sandbox. We already
Instructor:ran the reranking before, so we're just going to run the sandbox. And by the way, this is just simply doing the simulation based on a previous run. So it's not actually going anywhere. But you can see here, it executed the same command, same process, using the simulator itself. Now if we go back and fire autonomy up to five rounds, then you can see how it simulated. So what it did is actually run it, actually took the five previous runs, and actually held on. Based on that,
Instructor:what level it should use. So currently we are at the supervised level, meaning that it should not go to autonomous level. The reason it should not go to autonomous level is because the readings are coming very low, and it's very inconsistent. So that's the idea. When you make a self-improving agent, they can also make this recommendation about what level they are at the moment, what maturity level they are at, and can they actually go with the autonomy as well. Then from history, if you
Instructor:see here, the label is supervised. But in history, we got a couple of very good runs where the rolling score is around 93, and that will allow them to actually move on to further levels. But here is my point. Even though they are simulated, they explain to you how to implement, how to design those principles in your project, and how to build more comfortable systems, but well constrained systems. Bear in mind, autonomy or autonomous systems are not the real solution. The
Instructor:real solution is to define the correct constraint boundaries and containment for those systems. Without them, they'll just become random. They will not deliver the objectives, and they will not fulfill the core area that we want to achieve out of it. So always make sure it is well constrained. Now I think that's more than enough, in my opinion, to discuss why these are very deep concepts and we can keep going and keep going. Every concept that I discuss has five or six different alternate patterns to implement.
Instructor:I have tried my best to design these exercises for understanding these concepts as clearly as possible. However, they do not give you the full practical experience, such as using a virtual machine from a cloud or using an individual machine to run the agent itself, with a more secure machine to run the agent, or Docker containers to run the agent. The reason is because I don't want to diverge from understanding and designing principles to actual operational and implementation perspectives.
Instructor:However, as I said, there will be a course coming next month, and that's all about design, security, and sandboxing and these constraint mechanisms. So I'm sure that will fulfill far more purpose than if I'm adding it here. So let's wait for that. So make sure you subscribe and add the notification. I'm sure that next month you will find the remaining gaps that are not there in the production lifecycle, but it will be more general. So it can apply on any agentic architecture. So obviously this is sufficiently good
Instructor:enough because this is a self-improving agent. Anyways, let's go back to our presentation. Then now you have the full picture: bounded mutation, AutoGen orchestration, observability, challenge pressure, reranking, and production safety. That's a complete mutation cycle. But here is the truth. It only matters if you apply it. Start small. Pick one surface in your system and build a trustworthy loop around it. That's how the real system began. And if it helps you think differently, subscribe,
Instructor:stick around, and let's keep building. I'll see you in another course. Take care.
Inspect Containment, Trust, And Recovery
0/3
Generate one fresh safety baseline
Reset the genome, run one bounded round, and keep the saved history so the later safety commands have real evidence to inspect.
bash
cd _examples/self-improving-agent/cleanloop
python util.py reset
python util.py loop --max-iterations 1
Validate containment and evidence
Run the sandbox path and the read-only observer so you can see what gets contained and which artifacts operators use to review the run.
Check the simulated trust ladder, compare it with history-based trust, then reset to the starter baseline without losing the evidence you just generated.
Because the genome is rewritten code. A self-improving loop needs containment around crashes, hangs, and unsafe side effects before it deserves more autonomy.
Q
Why keep reset separate from deleting output artifacts?
Because recovery should restore the starter genome without erasing the evidence that explains what just happened. Operators still need the logs, traces, and judged history after a bad run.
Q
Why can the loop stay in review mode even after some good history?
Because the trust ladder is meant to be conservative. A few good runs are useful, but recent instability, drift, or weak rolling scores should still block automatic promotion.