Build a standalone QA agent powered by GitHub Phi-4 using an async class pattern with domain knowledge injection.
Building Your First A2A Agent · 17 minsTranscript26 entries
Instructor:Welcome back to LocalM Tuts. I am Nilay Parikh. This is lesson 5 of 16 Building your first A2A agent. Last time we set up a development environment, Python the A2A SDK, GitHub Models, and the course repository. If you are watching this as a standalone video, please find the course playlist link in the description below. Before we dive in, here is what you will find. All the code for the lesson, including the GitHub repository and the interactive tutorial link is available. In the description below. You can find
Instructor:them in the course submodule, clone the repo, open the lesson folder, follow the README, and you should be good to go. We will build a QA Agent class. An asynchronous Python class wrapper that wraps the OpenAI-compatible API. It loads insurance policy document as context, constructs the system prompt, and answers the question using the Phi- 4 model via the GitHub Models API. Key concepts. First OpenAI-compatible API pattern with the same SDK, different base URLs, second system prompt with knowledge injection
Instructor:and the third async-first design that we will wrap as an A2A server in the next session. Let's see this in action. Open your terminal, navigate to the lesson folder and run the QA agent standalone. We will ask it the question about the insurance policy and let's see the response. Follow along. Pause the video if you need to catch up. If you haven't got VS Code ready, you can also use our website to have an interactive Jupyter session. Other than running, you can probably access the example, the
Instructor:outputs, everything along with many other important content. The link for this website of the course is in the description, please feel free to find. There is also a link for that example on GitHub that is the same link available and you can find the example that we are talking about. It's very well documented and you can access it straightforward. Now. I have loaded that example into Visual Studio Code and let us see what we have done here. As described earlier we are creating a very
Instructor:simple configuration, which model it needs to use in a second block and then in step 2 we are configuring the client based on what selection we have made. The quick model test whether it works or not, so let's run up to this point. Ah. Execute all above steps and let's see. Voila. Yes, so we got all of the above steps running now. Just a quick test. Bear in mind the policy text will run this thing to configure. It's been back correctly. 2 + 2 equal to 4.
Instructor:run. Now we are building the QA agent. Let's load the domain knowledge. Now. Bear in mind the policy dot text will be replaced by the content of insurance policy. So this is just a simple insurance policy we have created and it will be replaced here as a system prompt for knowledge injection. So let's quickly run this thing to configure it's been RAM. Now we are building the QA agent. This is this is the most important aspect where we are coming. It's a simple simple Python class you can actually create in
Instructor:any supported language the same way. It will configure model endpoint, API keys, everything the knowledge the system prompts. And most importantly, it is exposing one method called query. So whenever it's been whenever our client was sending a query, it will run this particular piece of code and find an answer. So lets lets see lets run this class. From the test the Q agent so I am just trying to find. Yeah, all good. So very fast question. What is the deductible for the standard plan question printing?
Instructor:So far it's just a simple chat completion with some sort of retrieval augmentation. There we go. The answer is correct. As expected, the standard plan deductible, emergency deductible and prescription drug deductible. We sometime call it excess as well. So this question too, are the cosmetic procedure covered? So we got the answer. Cosmetic procedures unless medically
Instructor:necessary. So it's an expected line and we try our third out of scope question, which is not in the document: what is the capital of France, but you want to make sure that it just does not get the answer from model. But instead it return back saying that I do not have that information in the document. This is a very important thing and I think while this is running many of the agents fall back on the universal knowledge or the pre-trained knowledge of a model and ideally when they are building
Instructor:a model for retrieval, augmentation or specific use cases, we should avoid falling back or we should build resiliency enough that information is retrieved based on information that we have provided or retrieved. The answer appears and if the question is not within that scope, then it should come back with a clear denial. Instead of augmenting it based on a training material. So here, what is the capital of France, I am sorry, that information is not in the policy document. So this is very good so far.
Instructor:Step 7 Building the claim agent multi turn. The A2A protocol also supports the multi turn interaction where the agent can request additional inputs from the client mid-task such as input required state and this is what we are going to demonstrate as a simple stand- alone agent. So let us run it. OK, the class has been initialized, now let's test the multi-turn. So we got input required. If you recall the earlier sessions we have described the terminal States and also the. Umm, the follow-up states
Instructor:where input required does go back to processing state, one of the processing states, and then if the If the information is provided correctly then it can find back itself into termination state. So here we are going to provide that information. And by the way, if you have seen it, we will maintain the session ID and that's how the correlation is actually handled. So session ID was created here and session ID was passed here. That's how the agent will ensure the memory is maintained. Memories are
Instructor:very important aspect in building agents. We are currently focusing on A2A protocol. But building agents itself, it's an art and we will come back with some nice tutorials on what are the areas we need to look, look at, look for. When we are building agents as well and what are the best practices?So when it has completed, it has built the claim receipt and a the whole information has returned with the perfect status completed. So it has reached the status into terminal phase. The multi turn
Instructor:documentation, what we have done, you can hear see it a sequence diagram. It's the same thing what we did. And what we demonstrated now we are building a policy summary agent. So let's run it. We got a policy summary agent and now we are just trying to summarise a policy. So its the same knowledge base that we have provided, but based on that you will just simply use for summarization of the policy. So if you see in this summarization, what we have done is providing a knowledge
Instructor:path which is here. Provided the model information and everything and then we instructed the model that provides the summary in certain structure. This is very important when we are building agents as well because agents has data parts and data parts sometime may not be strongly structured. So how to?How to encourage each joins and models which are which are probabilistic to produce some deterministic structure and more and more we master the the art of prompting and correct weightage we can
Instructor:actually get. Near to deterministic structural outcome. So we have we got it now you can see it has delivered in a very good way covered services and exclusion. It's a wonderful way of summarization. These agents are very useful for voice, especially if they are connected with the voice agents as well. Because when automated marketing calls or when user query calls comes, this provide a wonderful way of augmenting large amount of data into summarised documents and then we can actually deal further with that
Instructor:particular kind of kind of request, especially its coming from voice agents. Multi skilled routing. Ethan read it, its its again a sort of a another another type of agent that we have written and it demonstrate various capabilities around skills. So it it got it got a skill routing now and we gonna we have created 3 skills. Policy question and answer, claim filing and policy summary. So let us basically rangle skills. This skills has the similar similar design or architecture like what we call it
Instructor:cloud skills or any other GitHub copilot skills. So there is nothing much difference in terms of principle what skills are, but it's just a different way of implementing the skill, most likely programmatically way we will be implementing the skills using. Girls using program itself. You can see here Multi user base with the name of multi skill agent and we are running. So when we run this test, what should resolve who is then see that Bennett somebody is asking premium, it's moving to the policy QA
Instructor:skill. I meet to file a claim, its moving to the claim filing skill. Give me a summary of a policy, its going to the policy summary skill. And this skills can actually be also rooted to different agents down the line as well. So it provides a very, very, very insightful and comprehensive asymmetrical horizontal integration with multiple agents. And this is now we are moving you to the much more. Real life use cases. In real life, agents are not just one class. In real life agents are computation
Instructor:of hundreds of different colours, communicating to each other, building building, working on different knowledge bases, providing bias or unbiased output and then. On assembling the final result back to the original request. So in real world, agents are never going to be a linear or symmetrical. So, well, we did the policy outline. And we did the claim filing using a skill you can see here. We did the claim firing using the skew. Then now we are waiting for the final response.
Instructor:I know it might sound overwhelming when you are running this first time. I would I would recommend you to give a good run, give a good reading. One, one time, twice, thrice may be little bit more. Writing agents is a abstract thinking and. For anyone who has. Who has not developed that area of a skill? Its sometime fine little bit confusing and there is nothing to be wrong about because not everyone learn abstract thinking in a in a in a very early stage of career. So if you are at the
Instructor:earlier stage of career and if you find this little bit overwhelming instead of. Instead of giving it up, I would recommend go through couple of times, build the packets of thinking in abstract way and then with agents will make much more sense that how this whole things is working slowly. Because if you see here we haven't got any deterministic code logic. But still the outcome is near to deterministic and that is the idea of agents. We don't want to code the business logic but we
Instructor:want to direct agent to build the business logic as part of their abstract declaration, as part of their abstract definitions and then. Generate the near deterministic or absolutely deterministic outcome and this is what we see here. And this is all happening using WiFi 4 which is very entry level model. So you can see that writing the good agent is actually a how important it is to write agent in the right way?And using agent in a right way even. Even the entry level model can perform really
Instructor:brilliant work. Not everything need to go to the Super smart models. Of course. If we use super smart model we can solve many complex problem. But not all the problems deserve those super smart models either. And there very costly as well so. Now we are going to go for small experiment how much the monthly premium and also thing and it will load the skills and it will show us the result. There you go. Policy, QA takes completed. Everything just looks good. So we have covered practically everything that A to
Instructor:a does offer as an agent. Multi turn artifacts, data part, multiple skills, task, life cycle and text part. These are agent. We have just build an agent. Now in the next subsequent sessions we will explode them, how to embed them as a server and then using the client start consuming them. But I hope you like this End of this lesson you have working Q agent that answer insurance question using the FIFO. It's tested in its time stand alone in a notebook. In the next session we will wrap as a discoverable A
Instructor:to A session. Thanks for watching this lesson on LocalM Tuts. In next lesson we will wrap this agent. In a fully discoverable A2A server with an aged card and HTTP endpoint, you find the next video in the A2A protocol. Course playlist which is available. The link is available in the description NICU there in the next session.
Setup Instructions
0/3
Clone the repository
Clone the course repository to your local machine to follow along with the code examples.
bash
git clone https://github.com/nilayparikh/tuts-agentic-ai-examples/tree/main/a2a/lessons/05-first-a2a-agent
cd $(basename https://github.com/nilayparikh/tuts-agentic-ai-examples/tree/main/a2a/lessons/05-first-a2a-agent)
Create a virtual environment
Create an isolated Python environment for the project dependencies.
bash
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
Install dependencies
Install all required packages from the requirements file.
Why use an async class instead of a simple function?
A2A servers are asynchronous (built on ASGI/Starlette). Starting with an async class means zero refactoring when you wrap it with AgentExecutor in Lesson 6.
Q
Can I use a different model instead of Phi-4?
Yes. Since we use the OpenAI-compatible API, any model on GitHub Models works — just change the model name in the create() call.
Q
Why not use a vector store for the knowledge base?
For a bounded domain like a single policy document, injecting the full text into the system prompt is simpler and equally effective.
Q
What does temperature 0.2 do?
Temperature controls randomness. Lower values (0.0–0.3) produce more deterministic, focused responses — ideal for factual Q&A.