💻Video + Code Examples·10 mins

Multi-Agent System Deep Dive — Loan Approval

Nilay Parikh

Capstone: build a production-grade multi-agent loan approval system with AI-driven decisioning (80%), human-in-the-loop escalation (20%), a React dashboard for approvals, and OpenTelemetry observability.

Thumbnail for Multi-Agent System Deep Dive — Loan Approval — Multi-Agent System Deep Dive — Loan Approval · 10 mins

Transcript15 entries

Instructor:Welcome back to LocalM Tuts. I am Nilay Parikh. This is Lesson 14 of 16, the multi-agent deep dive, our capstone lesson. In the last lesson we completed the framework tour with the Claude Agent SDK implementation. Now we bring everything together. If you are watching this as a standalone video, please find a link in the description below. You will find also example and interactive page link in description as well. This is the largest example in the course. Five agents and an orchestrator with

Instructor:a React dashboard. Clone the repo, open the lesson folder and follow along. 5 specialized agents in a pipeline. Intake validates application fields, Risk Scorer calculates composite risk score using deterministic rules, LLM reasoning, Compliance checks regulatory checks, Decision makes the call. 80% automated, 20% escalated to human. The Escalation agent handles the human review queue, and all agents share the model provider abstraction. The orchestrator discovers five agents via their Agent Cards, routes the

Instructor:application through the pipeline, and handles the 80/20 split. The React UI shows the real-time approval status, and the telemetry dashboard tracks every span. This uses every pattern from the course: Agent Card discovery, Agent Executor, to_a2a shortcut, bridge packages, LLM-based routing, chain execution, error recovery, MCP, A2A together. See the full pipeline in action. We will start all 5 agents on their assigned ports. Launch the orchestrator. Watch it discover each agent, route the application through the

Instructor:pipeline, see the entire 80/20 decision split in the real time and this is the practical session. Pause the video if you need to catch up. Lesson 14 practical implementation. It's one of the most comprehensive example of all. You can see here you got a compliance agent, compliance server, decision agent, decision server, escalation agent, escalation server, intake agent, intake server, model provider, orchestrator, server orchestrator, risk scorer server, Risk Scorer agent, and this is the script

Instructor:where you can run all of them at one go, and this is the script that submits them for testing, and one of the scripts for telemetry to capture the OTLP. I have managed to get everything up and running in terms of servers, so I got all the servers now loading. It will log everything in here whenever we start it, so we can always go back and check what is happening. It is running on 10100, 10101, to 10105, and the orchestrator works on 10100, while the REST API is on 8080, and I also got the dashboard pending.

Instructor:the dashboard up and running on port 3000 and let me get that dashboard here. So you can see here dashboard. There is nothing in that place right now and let us see how we do when we submit the batch. So I am just submitting the batch pipeline and let us see how. the agents work. And by the way, if you haven't got access to Visual Studio Code, then obviously go to our interactive page. It explains very well, and as you can see in the real time we are getting this.

Instructor:dashboard up and running, so you can see the telemetry. What is happening. Decline process. Oh, there we go, we got one escalation. You want to approve or reject? And there is a. Well done a couple of them are being processed, and it is a very detailed documentation. You should I would give you a long read on this and it will help you understand what it does. It is one of the real-life agent architectures, but still I would say. The true production real life agent of architecture would be even more complex

Instructor:than this, but it should give you rough idea what we are looking for when we call a production and its its a very good example to walk through. Its got a lot of Q&A, and you got a lot of important code highlighted here. As it is pending, and there we go, we got everything sorted. So as you can see here if I go on my umm on the on the page. And I say, you know what I want to do. Umm. Okay, so 25% rejection, 4 applications successful, 2 applications failed. You can see here what is happening with the trace

Instructor:waterfall. Escalation queue, there are 2 being escalated, and I say OK, I find this useful and by the way this. All data is generated by semantic models. This is not generated by any structured or deterministic architecture. So this is the beauty of it. It translates into an architecture, into a UI for the, for the structure, not just for UI, but it actually generates everything via LLM. So let let me say I am happy with this call. And I just get it approved, it

Instructor:been set up and it will move forward. And I say, uh. next. Detailed, and I'm going today to request the info to be sent, and if I go into dashboard then you can see the info requested and then you can review approved. So this is human-in-the-loop. You can also make a human in loop in a way that once the human add something even further AI. Umm process can happen and then again it can put back into the human loop unless he is comfortable. So it's a very interesting pattern when it comes to human in loop and how

Instructor:to integrate that along with this architecture. So umm so let's see how the logs work. So if you see the compliance log, then it has done everything that needed. Trace IDs everything it has logged. And based on this, we make the responsible AI and responsible agent AI. Why? Because we can actually go back and validate that what and how the decision been made by AI. It is very essential whenever we are using AI, we should have the very good traceability to understand that. If AI made any mistake then

Instructor:why? And if AI was successful consistently then why? So we can preserve the consistency and we can start. Removing the the kind of area where AI is struggling. So yeah, it's it's a very detailed way of what is happening and you can see that all the servers logged there. stuff, um, and yeah, that's, that's all for now, but I hope you like this example. I would if, if I am studying A2A, I would give this particular practical session the highest amount of effort. And highest amount of

Instructor:Focus and time and go through this code very detail in in detail. This will help you to build many production grade design patterns. Because there are lot of design patterns I have put in it, it may be practically impossible for me to go through each and every, but it's one of the models that you can take forward, and build something on top of it. Yeah. So if you got any questions, if you got any, any request for me, just drop commands. So reach out to me and I will try my best to come back to you.

Instructor:This is a production-grade multi-agent system: 6 frameworks, 4 models, 2 protocols, one standard. We proved A2A makes framework and model choice local decision, not the system wide constraint. Thanks for watching this lesson on LocalM Tuts. In the next lesson we will cover the advanced production patterns: protocol extensions, security, handling with OAuth, mTLS, OpenTelemetry, observability, and enterprise compliance. However we are not going to deep-dive with any of these topics, but we will

Instructor:touch base on them and understand what is expected of each of this area. But whenever we are writing in, whenever we are writing the new course in future with the advance use cases in mind, all of this will be discussed and understood in detail. You can find the next video link in the A2A Protocol course playlist in the description and I see you there.

Learning Objectives6

Architect a multi-agent loan approval pipeline with specialized agents
Implement human-in-the-loop escalation for edge-case applications
Build a React frontend for reviewing and approving escalated loans
Instrument the entire pipeline with OpenTelemetry distributed tracing
Create a real-time React dashboard showing agent telemetry and decision flow
Run the complete system end-to-end with 80% auto-approve / 20% human review

Run the Capstone Loan Approval System

0/8

Clone and install Python dependencies

cd _examples/a2a/lessons/14-multi-agent-deep-dive
cd agents
pip install -r requirements.txt

Install React UI dependencies

cd ../ui
npm install

Configure environment variables

# In agents/ directory — copy and edit .env
cp .env.example .env
# Set GITHUB_TOKEN, AZURE_OPENAI_ENDPOINT, AZURE_AI_API_KEY,
# AZURE_AI_MODEL_DEPLOYMENT_NAME=Kimi-K2-Thinking,
# OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Start OpenTelemetry collector (optional)

docker run -d --name jaeger \
  -p 16686:16686 -p 4317:4317 -p 4318:4318 \
  jaegertracing/jaeger:latest

Skip if you only want console trace output.

Start all agents

cd agents/src
python start_all.py
# Ports: 10100 (orchestrator), 10101-10105 (pipeline agents)

Start the React frontend

cd ui
npm run dev
# Approval Queue: http://localhost:3000/approvals
# Telemetry Dashboard: http://localhost:3000/dashboard

Submit test applications

cd agents/src
python submit_test_batch.py
# Expected: 3 auto-approved, 2 auto-declined, 3 escalated

Review escalated applications

Open http://localhost:3000/approvals to see the 3 pending applications. Review the agent reasoning, risk factors, and compliance notes. Click Approve, Decline, or Request More Info.

nilayparikh/tuts-agentic-ai-examples/tree/main/a2a/lessons/14-multi-agent-deep-diveGitHub

Complete source code for this lesson.

github.com/nilayparikh/tuts-agentic-ai-examples/tree/main/a2a/lessons/14-multi-agent-deep-dive

Q&A

Q & A

How does the system decide which loans need human review?

The RiskScorerAgent scores each application on a 0–100 risk scale. Applications scoring ≤ 40 are auto-approved, ≥ 80 are auto-declined, and the 40–80 range (≈20% of cases) is routed to the human-in-the-loop queue via the EscalationAgent.

What telemetry is captured?

OpenTelemetry traces span the full pipeline — from intake through risk scoring, compliance checks, and final decision. Each agent creates child spans with attributes like agent_name, decision, confidence_score. Traces are exported to a Jaeger-compatible OTLP endpoint and visualized in the React dashboard.

How does the React frontend handle approvals?

The React app polls a REST API for pending escalations, displays the full application context plus agent reasoning, and lets reviewers approve/decline/request-more-info. The decision is pushed back through the A2A pipeline to complete the task.

What happens if the human reviewer requests more information?

The EscalationAgent marks the application as 'INFO_REQUESTED' and the system can notify the applicant. Once additional information is provided, the application re-enters the pipeline from the IntakeAgent stage with the supplementary data.

Can the 80/20 thresholds be configured?

Yes. The DecisionAgent reads thresholds from environment variables (AUTO_APPROVE_THRESHOLD and AUTO_DECLINE_THRESHOLD). You can tune these per institution or regulatory requirement.

How does OpenTelemetry work across A2A agents?

The orchestrator injects W3C trace context headers into A2A task requests. Each agent extracts the context, creates child spans, and propagates it forward. This produces a unified distributed trace across all agents regardless of framework.