💻Video + Code Examples·8 mins

The Mutation Engine

Nilay Parikh

Frame the mutation engine before the deeper build lessons. This lesson explains why broken pipelines still bottleneck on humans, defines the bounded mutation contract, tours the CleanLoop repo surface, and places AutoGen at the orchestration seam instead of the judge.

Thumbnail for The Mutation Engine — The Mutation Engine · 8 mins

Transcript28 entries

Instructor:We spent the last decade building pipelines that move data, but in the era of Software 3.0, moving data is not enough. If your pipeline breaks today, a human has to fix it. That is a bottleneck, and it cannot scale. The next iteration of data engineering isn't about writing code better. It's about building an agent architecture that will self-evolve. Hi, I'm Nilay Parikh. This is a seven-part roadmap to building Data Engineer 3.0. Let's define the failure points before we start with the hands-on lab.

Instructor:Operational input is frequently degraded through inconsistencies like drifting formats and duplicates, breaking systems. Poor data quality is causing losses exceeding up to five million in every one in four organisations, and up to five hundred million for many projects. So what could be the solution? One is certainly Software 3.0. But let's understand what Software 3.0 is. It's quite a vague term as of today. However, let me metaphorically explain it in our own context.

Instructor:Think of it like a cavity wall, two rigid layers of brick, with the wriggling middle space which is hollow. And that's where we handle this mess. The AI lives in that middle gap, absorbing the vibration and variations of messy data, expanding or contracting to fix the discrepancies. While the outer bricks remain immovable, providing much-needed reinforcement and structural integrity. Moving from static code to programmable operators, where we use agents to find likely repairs in seconds.

Instructor:Now let's just see this same thing we discussed. in a live example. So let's go and visit the code. In this hands-on, we will cover three important things: first, how to configure and run; second, end-to-end run; and third, get overall familiarity with this example. In this particular lesson, we will not dive into each individual component. For that, we have another six lessons coming after this. This first lesson will set a good stepping stone to understand the project end to end,

Instructor:and that will help whenever we are getting into it. We will have the broader and larger context why we are doing it and what are the benefits of doing it in such a way. The very first thing to start is our README file. README file will let you have quick commands and the documentation map. It's well documented and it has all the lessons clearly defined with the exercises that you can benefit from. The README will also take you to some other important documentation such as architecture detail and operations and

Instructor:tracing. All the commands are documented under the runbook. Reset and recovery will help you to bring the project back with the original state. That means you can do multiple, you can actually go on this project a couple of times, and if you think that, oh yeah, we have moved it a bit far and we need to come back, you can always reset the project. And also, setup and verify will ensure that you have a clean setup.

Instructor:That is what most probably everyone would need just to see the single command to setup the project unless you want to go and install the project in specific configuration. All you need to do is pip install -e . That will install all the dependencies and everything it needs. The second thing is you can just run the status command. Then after, the status command will read all the configuration, tell you what it finds. Then you can validate if everything that you want is as per the status or not.

Instructor:Dot env example is a place which you can copy as env file and this is just your own configuration. It will be picked up by the status itself. And then finally verify, verify will ensure that your project state is in correct state to run all these examples. So that's it, that's a perfect state. Now once we get there, we can actually run the loop command. Loop command is the one that makes us run everything end to end. And I'm running with the maximum iterations. That means it will stop after max iterations.

Instructor:But let's understand what is this. We got input files where we have the discrepancies and the data inconsistency. So these input files have been read. The gold standard is something we just use to compare. It is nothing much more. But from here we read them, we process them. Some of them will be processed deterministically. Some of them will be processed non-deterministically. It will process everything that deterministically can be put into master.

Instructor:What it can't process deterministically, it processes as failures and then attempts to process them into success. But instead of that, there is a better way. The dashboard which is running on a Streamlit app. And once this process is complete, we walk through there. Perfect. The dashboard is there. So now just refresh it. So we get the latest. Go to the Round Blueprint. And you can see here all the information as needed. Now this recalls what we discussed about Software 3.0.

Instructor:It tells you the execution and the judge metrics. It's more advanced when we discuss the judge. It will help us to understand what judge metrics are like. But look at this mutable genome diff. This is where we started a deterministic component. clean_data_starter.py and clean_data.py. These two files are the genome files. This is just to actually copy over in case if we modify manually, just to keep the starter there. That's all. That's the only purpose of this starter. However, once the genome has been copied,

Instructor:then you can see that the model has decided to obtain a lot of the things from a deterministic output. It maintained the deterministic output, but then beyond the deterministic, it actually started putting non-deterministic mutations. So these all mutations are actually coming from our LLM, and how they define these mutations in the dataset, we provide hints. In LLM calls, it has skills as well, and it got a context. So based on a context, based on a skill, and based on a hint of the schema structure,

Instructor:the LLM can decide how to deal with every individual data inconsistency. In real life, the project we build in our organisation, we have a ninety-nine percent recall rate, which means out of a hundred that fail deterministically due to inconsistencies, we are able to recover almost ninety-nine percent out of them. And that's the power of this particular implementation. So let's go back on a CleanLoop example again. So here we process everything deterministically, and once it has been deterministically processed,

Instructor:you can go into Data Quality and understand what exists. It will tell you all the data that exists there. And also there will be one log table that explain what exactly we got and the statistics. And in Diagnostics, we have a perfect understanding here. The Proposal Events. So when a data has been picked up, did we actually revert? Did we generate a candidate? How do we work along with the LLMs? And when we go through this, it will give you the perfect understanding of everything, before and after, to

Instructor:understand where you need to make tuning, where you need to make changes, where you need to make improvements. This is a very detailed way to actually assess instead of system logs being logged manually. Right. So then we go, that's quite detailed work that we discuss. Here is the Row Decisions. This is also very important. It picks up the row and then it will tell you from each and every function whether the decision was deterministic or whether the decision was based on a mutation playbook.

Instructor:If you find a mutation playbook, that means the LLM has decided for us instead of just the algorithm that we provided. And if it is deterministic, then it's algorithm that we provided. So all the data which are clean were processed by algorithms. But those data which were incomplete were processed by LLMs. Now what kind of mutations can we actually process? For example, it has a CSV and the row is broken. It got a CSV and the rows and the columns been shuffled.

Instructor:It got a CSV and some data is missing. And the LLM understands where the data came from. Etcetera, etcetera, etcetera, etcetera. You can think of anything. Yes, if we can think of anything, yes, we can implement to fix it. That's the power of this particular implementation. Let me give you a very powerful example that we actually achieved in our own organization. We were sitting on more than fifty data feeds, which were extremely inconsistent. Like almost fifty percent of data was not able to process, and why?

Instructor:Because it comes from various job markets. The job market, we collect the data, and then we build the outlook for the Indian economy macro. And that to building an economy macro, we need to collect from official sources and more than two gigabytes of data every month. Manual fixing that data used to cost around two hundred K to three hundred K a year. It was not economically viable, so we decided to get a paid feed from somewhere else.

Instructor:Since we have implemented this particular process six months ago, we have now recovered up to ninety-nine percent, and we have taken out the complete dependence on manual process. We also can offboard it from third-party tools and third-party feeds. Now let me show you how accurate it is. So this particular whole dashboard is actually generated by this Software 3.0 data jobs, which come from the system feed, which come from government feed, which come

Instructor:from informal sources, and from many other places, which are very much nightmare to put into structured database. However, the success of this particular project was so powerful because the non-deterministic recovery rate was ready to go up to ninety-nine percent. And therefore, we even built accuracy beyond professional data feeds that we could ever scrape with so large sum of money. That's the power of Data Engineer 3.0, and that's how you can actually build an AI data engineer.

Instructor:The premise is we look for the failures and turn those failures into success. Now let's go back on the presentation and complete the lesson. The course follows the boundary-first and autonomy-last structure where each point introduces a minimum mechanism that relies on the foundation of the previous one. Lessons one and two are the bounded surface. Lessons three and four are where we control the flow, especially the loop, the structure moving into automation,

Instructor:where we will be using AutoGen as our orchestration framework. Lessons five and six are where we add the pressure and search. The complexity is higher there, which will increase by raising difficulty and comparing multiple decision candidates and choices. And lesson seven, where we wrap up the whole thing and test whether it's resilient enough for production environment. Let's take the whole course in one image. This is the contract for the whole course. Keep that shape in mind for every lesson that follows.

Instructor:The model usually now is not the slow part. The reality is the human reading the failure and fixing it, deciding the next move, carrying the loop memory forward by hand. Now focus on the center, that's the editable surface, the judge stays fixed, and the feedback cycle stays cheap to repeat itself. The ladder is on the right, which matters because the autonomy is earned, not assumed. You prove the narrow loop works first, then add observability, and only then widen pressure and the search depth.

Instructor:This is something you can also do as a progressive widening, especially in the loop process. So you can have ten steps where slowly it starts with the narrow, keep getting out with higher-confidence fixes. Once those higher-confidence fixes are through, and the loop is at the end of it, the funnel goes further and make sure it provide enough pressure to carry out the right set of data with the correct resilience that we want. So it can allow us to decide what is the minimum we are ready to accept.

Instructor:Now, let's understand. So this engine is a simple process: the bounded surface, fixed judge, and repeatable loop. Now let's go back on another important diagram. It's the full mutation process. The word and the terminology we inherit from genome genetics, the biology. It's a similar process. However, it is very widely used in this context in artificial intelligence. We borrow a lot of biological terminology in artificial intelligence anyways.

Instructor:This diagram shows the actual mutation process. The agent can mutate the genome but cannot rewrite the judge. that decides whether the change survives or not. The middle divider matters more than labels. It's the safety glass, the boundary. On the right, the genome is highly editable. This is where it spends most of its search budget, because it's the only editable part. At the bottom, failure of the output contract becomes the repair signal,

Instructor:and the repair signal leaves the arena and heads back to the orchestrator. That's why we say, at every level, release some more pressure and allow the fixes to go through. But however, those fixes that are not successful still go through as a repair signal. And we keep mutating and mutating the genome as long as it takes. So this mutation process, in one word if I, one sentence I would say is a fixed judge, isolated mutation surface, and feedback routed back as a repair signal.

Instructor:This particular example, I'll link it back from the repair signal. We've reached the lesson end. You have now the contract. Loop stays bounded, judge is fixed. AutoGen sits above the mutation surface instead of swallowing it. Next, we will zoom into the genome itself, which will be the next part of the slide. Thanks for watching this. Make sure you like and subscribe. So whenever the next part arrives, it straight comes to your timeline. Thank you. I'll see you in lesson two.

Get Started

0/3

Read the repo map first

Start in the CleanLoop README so you see the command surface, the docs map, and the root learning files before you inspect code.

bash

cd _examples/self-improving-agent/cleanloop
code README.md

Validate setup before mutation

Run the status and verify commands so you can inspect the finance inputs, provider config, and preflight gate before the loop starts.

bash

python util.py status
python util.py verify

Run one bounded loop and inspect evidence

Execute a short loop run, then open the dashboard and inspect the genome surface, mutation decisions, and recall evidence.

bash

python util.py loop --max-iterations 2
python util.py dashboard

nilayparikh/tuts-agentic-ai-examples/tree/main/self-improving-agent/cleanloopGitHub

Complete source code for this lesson.

github.com/nilayparikh/tuts-agentic-ai-examples/tree/main/self-improving-agent/cleanloop

Q&A

Q & A

Why spend the first lesson on framing and repo orientation instead of code mutation?

Because a self-improving loop only makes sense when the learner can name the mutable surface, the fixed judge, and the artifact trail before the first mutation ever runs.

Where does AutoGen belong in this course architecture?

At the proposal and orchestration seam. It can help coordinate retries and candidate generation, but it does not get to redefine correctness or grade its own work.

Why show the dashboard in Lesson 01?

Because the course does not treat mutation as hidden magic. The dashboard makes the genome surface, the mutation decisions, and the evidence trail visible from the start.