Build Chat Agent with Vector DBs

Summarize the following in all salient bullet points. Do not include introductory material. Include a timeline.
Title: “Build Conversational Agents with Vector DBs – LangChain #9”
Transcript: “in one of the early videos in this series on line chain we talked about retrieval augmentation and one of the most commonly asked questions from that is how can the large language model know when to actually search through the vector database because obviously if you’re just chatting with the chat button it doesn’t need to refer to any external knowledge at that point there’s no reason for the for the model to actually go to that database and retrieve information so how can we make it an optional thing where we’re not always querying our Vector database well I mean there’s kind of two options the first option is you you just actually stick with that and you just set like a similarity threshold where if a retrieved context is below that threshold you just don’t include it as the added information within your query to the large language model and the second option which is what we’re going to talk about today is actually using a retrieval tool as part of a AI agent so if you have been following along with this series we’re essentially going to take what we spoke about last which is agents and what we spoke about earlier in the series which is retrieval augmentation and we’re going to put them both together so I mean let’s jump straight into the code so we have this notebook that you can follow along with there’ll be a link to this somewhere near the top of the video right now and the first thing we need to do is obviously install any prerequisite libraries so we have open AI Pinecone Lang chain tick token and home-based data sets so yeah we run those I’ve already run it so I’m going to run it again first thing I’m going to do is actually load our data set now this is the data set we’re going to be using to create our knowledge base it’s it’s a basically pre-processed data set we won’t need to do any of the chunking or anything that we would usually do and that’s on purpose I want this to be pretty simple and we can focus more on the agent side of things rather than the data prep side of things so yeah using the Stanford question answering data set and within this so the reason that we don’t need to do any of this date reprocessing that we usually do is if we just run this so just conveying it into a panda’s data frame is because we have these contacts and each context is roughly a paragraph or a little bit more of text and that’s what we’re going to be indexing within our knowledge base typically you know what you’ll find is if you’re for example working PDFs and you want to sort those in your knowledge base you’ll need to chunk that long piece of text into smaller terms this basically is already chunked it just makes our life a little bit easier but one thing we do need to do is actually de-duplicate this because we have many of the same context over and over again so we just do that here so just drop duplicates on the subset we’re just going to keep the first of each one of those and do all that in place and now you can see that the context are now different Okay cool so I mean that’s the the data prep side of thing is done and what we’re going to want to do now is initialize both the embedding model and now Vector database so embelling model first so we’re going to be using text embedding r.002 from openai again you can use like any embedding model you want it doesn’t need to be open AI doesn’t need to be text embedding on 002 okay so I’m going to enter my API key there so the API key can go from platform Dot openingi. com and then we need our Pinecone API key and Pinecone environment so I’m going to go into my dashboard and grab those so that is app.pinecone.io you go to API keys and what I want to do is just copy the key value and remember my environment here so I’ve got us west1 gcp okay yours might vary so make sure you actually check this for your environment variable so now I’m going to run this I enter my API key and then my environment so Us West one gcp cool so this is going to first initialize that Index right so here sorry this initializes the connection to Pinecone not the index and then if we don’t have an existing index with this index name then it initializes the index now for the metric we’re using dot product that is specific to text embedding r2002 a lot of models actually use cosine so if you’re not sure what to use and I’d recommend you just use cosine and see how it works if it works and also dimensionality so again this is something specific to each model for text embedding either 002 it is one five three six okay cool I will let that run okay that’s initialize and then we connect to the index so again passing the same index name I’m using grpc index you can also just use index grpc indexes it’s just more reliable and a little bit faster so I go with that and then we’re going to describe you in this so we can see what is in there at the moment and we should see that Turtle Vector count is zero because we haven’t added anything in there yet so okay that’s great and then we move on to indexing so this is just where we’re going to add all of these embeddings in a Pinecone now we do this directly with the pine cone client and the grpc index that we have here rather than through line chain because with line chain is just slower so I find this it’s just the better way of doing it so we set our batch size to 100 that means we’re going to just encode 100 records or contacts at once and we’re going to add those to Pinecone in batches of 100 at once as well so then we just Loop through our data set we get get the batch we get metadata so metadata is just going to contain the title and context so if we come up here title is this and this is a context okay looks good and then okay where are we um yeah let’s just run this actually okay so we’re creating a metadata we get our our contacts from the current batch and then we embed those using text embedding r02 okay so these uh like the the chunks of text that we’re passing in would usually call them either contacts or documents or also passages as well you can also call them that they get referred to as as any of those okay and then what we do is we get our IDs so the ID again is just this here so it’s Unique ID for every item that’s important otherwise we’re going to overwrite records within Pinecone and then we just add everything to Pinecone so we basically just take our IDs okay IDs embeddings and the metadata and each of these is a list and we zip those all together so that we get a list of tuples where each two poor contains a single record and that records ID embedding and metadata okay so I will fast forward to let this finish okay so it’s finished and again we can describe index stats and we should see now that it has been populated with vectors okay so we have almost 19 000 vectors in there now all records Okay cool so up to you we’ve been using the pycon client to do this again like I said it’s just faster than using the implementation in line chain at the moment but now we’re going to switch back to line chain because we want to be able to use the conversational agent and all the other tooling that comes with langchain so what we’re going to do is reinitialize our index and we’re going to use a normal index and not grpc because that is what is implemented with line chain so we initialize that and then we initialize a vector object which is basically Lang chains version of the index that we created here it just includes the embedding in there as well and we’ll also so the text field that’s important that’s just the field within your metadata that contains the text for each record so for us that is text because we set it here so let’s run this and like we did before in in the previous retrieval video we can test that this is working by using the similarity search method with our query here so when was the College of Engineering in the University of Notre Dame established and yeah we passed that and we say you want to return top three most relevant documents passages contacts whatever you want to call them and we can see that we get so this is a document here so we have I think that’s probably relevant uh this one is definitely relevant and we have another document there as well okay so we get those three results looks good so let’s now move on to the agent part of things okay so our conversational agent needs our chat large language model conversational memory and the retrieval QA chain so we import each of those here right and let me explain what those actually are so the we have the chat LM that is basically is chat GPT okay so chat LMS they just received the input in a different format to normal LMS that is more inducive to a chat like stream of data or information then we have our conversational memory so this is important so we have our memory key we’re using chat history because that is what the memory is referred to in the I think the conversational agent component so whenever using conversational agent you need to make sure you set a memory key equal to chat history here we’re going to remember the previous five interactions and yep that’s our conversation memory okay so after that we set up our retrieval q a chain so for that we need our chat LM we set the chain type here to stuff set up basically means when you are retrieving the I think the three items from the vector store we’re going to just place them as is into the retrieval q a so we’re going to kind of like stuff them all into the context rather than doing like any fancy summarization or anything like that okay and then with our Retriever and the retriever is our Vector store but as a retriever okay which is just a slightly different class or object all right so we run that and then with those we can generate our answer okay so we run and we’re using the same query here so you see that what’s the query let me come up here when was the College of Engineering and University of Notre Dame established we come down and the answer is the not College of Engineering was established in 1920 at the University of Notre Dame okay so cool we get the answer and it is generated by our GPT 3. 5 turbo model based on the context that we retrieved from our Vector store okay so basically based on these three documents here cool now that’s good but that isn’t that isn’t the whole thing yet that’s actually just a retrieval q a chain okay that isn’t a conversational agent to create our conversation agent we actually need to convert our retrieval q a chain into a tool that the agent can use so that’s what we’re doing here we get a tools list which is what we’ll pass to our agent and we can include multiple Tools in there that’s why it’s a list but where only you actually using one tool in this case so we Define the name of that tool we’re going to call it the knowledge base we pass in the function that runs when the agent calls this chain which is just q a run like we did here and then we set a description so this description is important because it is using this description that the conversation agent will decide which tool to use if you have multiple tools or also just whether to use this tool so we say use this tool when answering general knowledge queries to get more information about the topic okay which I think is a pretty clear description as to when to use this tool okay and yeah so from there we initialize our agents we’re using this chat conversational react description agent we pass in our tools our LM but those just means we’re going to get a load of printed output which helps us just see what is actually going on Max iterations defines a number of times the agent can use basically go through a tool usage Loop which is we’re going to limit it to three otherwise it can what can happen is it can keep going to tools over and over again and get stuck in an infinite Loop which we don’t want the model is going to decide when to stop generation and we also need to include our conversational memory because this is a conversational agent okay we run that and now our agent is ready to use so we can let’s pass in that query that we used before so the this one was uh let me run it and we’ll see okay so this action input here is actually the generated question that the LM is passing to our tool so it might not be exactly what we put in or it might actually be the same it depends basically sometimes the agent will reformat this into a question that it thinks is going to get us better results so our question is when was the College of Engineering at the University of Notre Dame established and the observation because it it refers to the knowledge base for this so the observation is the College of Engineering at the University of natural damage established in 1920 okay then the agent is like okay I think I have enough information to answer this question so it says final answer and then the final answer it returns is this okay which is the same same thing right and then we can see that here so that final output okay now what if we ask you a something that is not general knowledge so what is two times seven let’s see what it will say okay and you see it doesn’t decide to use a knowledge base here it knows that it doesn’t need to so it just goes straight to final answer and it tells us it is 14. okay now let’s try some more so I’m going to ask it to tell me some facts about the University of Notre Dam so it knows to use a large base and to pass in University we Notre Dame facts so you can see here that is not just passing in what I wrote here it’s actually passing in a generated version that it thinks will basically return better results okay and what I got was it obviously got some of the the context that we saw before and based on the information in those context it’s come up with all of this all these facts right which is quite a lot so it’s known as bullet pointless and then the final answer so based on this bullet point list it’s given us this like paragraph so yeah you can you can see I I don’t I haven’t been through thi”

– The video discusses retrieval augmentation and the question of when to search through a vector database in a large language model.
– Two options are presented: setting a similarity threshold for retrieved contexts or using a retrieval tool as part of an AI agent.
– The video focuses on combining agents and retrieval augmentation.
– The code starts by installing prerequisite libraries and loading a pre-processed dataset for creating a knowledge base.
– Data deduplication is performed to remove duplicate contexts.
– The embedding model (text embedding r.002) and Vector database (Pinecone) are initialized.
– The dataset is indexed by encoding the embeddings and adding them to the Vector database.
– The conversational agent setup begins by reinitializing the index and creating a Vector object in LangChain.
– The chat LM (Chat GPT), conversational memory, and retrieval QA chain are imported.
– The retrieval QA chain is set up using the chat LM and the Vector store.
– An answer is generated using a query and the retrieval QA chain.
– The retrieval QA chain is converted into a tool that can be used by the conversational agent.
– The agent is initialized with the tool, LM, conversational memory, and other parameters.
– The agent can process queries and generate answers based on the knowledge base.
– The agent decides whether to use the knowledge base tool based on the query type.
– Examples of queries and their corresponding answers are provided, demonstrating the agent’s behavior.
– The video concludes by showcasing the agent’s ability to handle different types of queries, including general knowledge and mathematical calculations.