Short Course · 7 lessons · 58 mins

Build an AI Data Engineer: Self-Improving Pipelines with AutoGen Framework

Build one bounded mutation loop over messy finance data, keep the judge fixed, widen search only when it earns its cost, and close with sandboxing, reset, and graduated autonomy.

Start learning View on GitHub

Data EngineeringAutoGenSoftware 3.0Self-Improving PipelinesCleanLoopStreamlit

BeginnerModerateExpert

What You'll Learn

🧠

Frame the real bottleneck

See why human repair loops, not model quality, still block modern data pipelines.

🧬

Understand the bounded mutation contract

Learn the core shape: one editable genome, one fixed judge, and one visible artifact trail.

🛠️

Understand the control loop

See how the orchestrator turns one failure into the next bounded repair attempt without letting the model own correctness.

About This Course

This site now publishes the complete seven-lesson Self-Evolving Data Engineer course. The series frames the business problem, defines the mutation contract, locks the exact pipeline genome, shows how the orchestrator controls one bounded repair loop, makes that loop observable, raises pressure with a fixed judge and smarter challengers, adds best-of-N reranking before commit, and closes with production safety.

The scope stays narrow on purpose: one mutable surface, one fixed judge, one repeatable control path, one readable feedback surface, and one safety ladder for containment, reset, and trust. That keeps the public course auditable from Lesson 01 through Lesson 07 instead of widening into a vague platform story.

1What is live right now?

All seven lessons are live with published YouTube videos, the CleanLoop code surface, and the synced transcript, Q&A, and step-guide content for the full public course boundary.

2What comes next?

The course is complete. The next step is to apply the same bounded pattern to one real surface in your own system: define the genome, keep the judge fixed, add observability, then earn safety and autonomy one control at a time.

Who Should Join?

Prerequisites

What you need before starting

Python basicsData pipelinesCSV cleanup

You do not need prior AutoGen experience for Lessons 01 through 03. It helps if you already understand basic Python workflows, messy CSV data, and why deterministic rules fail on real-world pipeline inputs.

🏗️

Data platform engineers

You want a safer pattern for introducing AI into brittle data-cleaning and normalization pipelines.

🤖

Agent builders

You want to see how AutoGen fits into a bounded engineering loop without letting the model redefine correctness.

Course Outline

7 lessons · 58 mins

Each lesson builds on the previous one — follow them in order for the best experience.

1💻

The Mutation Engine

💻Video + Code Examples·8 mins

Frame the mutation engine before the deeper build lessons. This lesson explains why broken pipelines still bottleneck on humans, defines the bounded mutation contract, tours the CleanLoop repo surface, and places AutoGen at the orchestration seam instead of the judge.

→2💻

Defining the Pipeline Genome

💻Video + Code Examples·9 mins

Define the one mutable pipeline genome for CleanLoop. This lesson reconnects to the Lesson 01 contract, shows why one file and one fixed judge keep mutation auditable, and walks the runtime surface where deterministic cleanup hands off to the mutation playbook.

→3💻

The Orchestrator

💻Video + Code Examples·9 mins

Show the CleanLoop orchestrator as the real control surface. This lesson explains the reader, repair forge, and crucible split, traces one bounded loop run, and shows why dashboard evidence matters before the system gets more autonomous.

→4💻

Observability & The Feedback Signal

💻Video + Code Examples·8 mins

Make the CleanLoop loop observable. This lesson shows where run history, strategy snapshots, row-level traces, and dashboard metrics come from so you can tell whether a mutation actually taught the system anything.

→5💻

The Judge & Self-Challenging Loops

💻Video + Code Examples·8 mins

Keep the judge fixed while the data gets harder. This lesson shows how CleanLoop generates adversarial finance CSVs, applies targeted pressure, and forces the loop to improve without redefining correctness.

→6💻

Test-Time Reranking

💻Video + Code Examples·8 mins

Add best-of-N candidate search to CleanLoop. This lesson shows why reranking improves output quality only when each candidate stays bounded, isolated, and scored by the same fixed judge.

→7💻

Conclusion & Production Safety

💻Video + Code Examples·8 mins

Close the CleanLoop course with production safety. This lesson shows how sandboxing, tripwires, reset controls, and graduated autonomy turn a self-improving loop into something you can audit, contain, and actually trust.

→

Instructor

Nilay Parikh

Founder · LocalM · ErgoSum

Technologist with 20+ years of engineering experience and an ML/AI practitioner since 2010. Founder of ErgoSum (quantitative & equity research) and LocalM (AI-assisted SDLC). Currently focused on AI Platform Engineering, Agentic AI, and Context Engineering.