Educational Tree of Thoughts

Task: Summarize the given YouTube video transcript and classify it as Entertainment, Educational, or Other. If the transcript is Educational, provide a bulleted list of key points. If code is explained in the video, include it in a code block. Identify any mentioned models, tokenizers, embeddings, or other files, and create a list in the footer. Additionally, include the title of the transcript as a footer note. If the transcript is Educational, summarize the text with references and provide a list of references in the footer. Input: The input will be a transcript of a YouTube video. Output: Part 1: Classification: [Entertainment/Educational/Other] Part 2: Summary: [Summary of the transcript] [Bulleted list of key points (if Educational)] [Code block (if applicable)] Footer: Models, Tokenizers, Embeddings, Files mentioned: [List of mentioned items] References (if Educational): [List of references]
Title: “Tree of Thoughts: Deliberate Problem Solving with Large Language Models – Let Your LLMs Play Games!”
Transcript: “hello everybody and welcome back so today we’re going to be talking about tree of thoughts deliberate problem solving with large language models so this is an interesting new take on a classic idea you know how do we get these language models to perform better at these kind of more complex reasoning tasks right so I just thought that I’d show you kind of the result and then we’ll show you how we got there so first of all the game that we’re playing today is called the game of 24 and the idea is simple we are given a certain number of numbers we must use all of them and we must get to the Target of 24. so there’s going to be a total of three steps because we have to use all of the numbers and we’re provided with an initial four numbers so we can use this tree of thought method to find the correct answer so when given the initial input of four five six ten we can use this tree of thought method to get to the correct answer which is four times five is twenty twenty minus 6 is 14 and 10 plus 14 is 24. so we’ve used all of our numbers and we have the correct answer so that’s great but how did we get there right well the model shows us some outputs above and we’re going to peek into the code to go over a little bit more how we actually got there but the first place we’re going to start of course is the paper what is the real novel idea here we have Chain of Thought right we have this idea of prompting language models to you know split tasks down into their basic little quanta and then go through them one at a time however we don’t have any kind of like exploration available for the llm so it just kind of does it in order right with this tree of thoughts implementation we give the model the ability to search over some You Know sample space of generated answers and pick the best ones we allow it to even backtrack right so we give it access to a lot more context or information that it can use and then we let it evaluate those options and proceed forward with the ones it thinks are best based on whatever strategy we’re implementing so this is a great way to show us kind of how this model works so we have you know in traditional Chain of Thought we have this idea of we give it an input and then it generates some you know specific little pieces of thought that it goes through step by step we also have this idea of self-consistency with Chain of Thought now that’s it’s not a bad idea to do that however it doesn’t give us the full breadth of ability to move laterally and backwards when we’re considering what the actual processes or sub steps are right as you can see these dots are all just cascading into each other and then we pick some it’s not bad but we want to leverage the llm’s ability to be very good at a specific task in order to get us to a place where where it’s able to perform more complex tasks right so we’re breaking these tasks down and then instead of asking the model to just you know review what it’s put down we let the model choose at each of these steps and so they talk about the key ideas of this strategy on page three and specifically I want to focus on two really I think core ideas here which is thought decomposition and then thought generator yes there is an evaluator you know it’s that’s fine that can that’s kind of flexible and they’re they’re using uh you know they have a couple different options that you can use but I really want to focus in on these two pieces right number one the tree of thoughts model depends on us breaking our thoughts into something that is small enough so that language models can generate promising and diverse samples and yet big enough so that LMS can evaluate its Prospect towards problem solving so the idea here is we still do want to break our task down into these bite-sized tasks that we can perform kind of one after another in a way right they don’t have to occur one after the other but it is important that we break these tasks down into those bite-sized pieces so the thought generators broken down into two main strategies here we have the idea to sample thoughts now this is going to be you know better for tasks that have a larger space from which we want to pull our thoughts so we’re not just talking about you know a simple equation we’re talking about these complex kind of ideas or concepts and then our proposed thoughts which were just you know generating sequentially using some proposed prompt uh you know we’ll look at some examples of exactly what those are this is better when we’re talking about things that are more simply represented so you know like we’re looking for an equation or something like a word right a short and kind of constrained idea we have our evaluator that’s just to check to see how we’re doing and help make decisions this is broken down into two major camps we have value and we have vote so in the case that we have value we’re just talking about turning it into some scalar the scalar basically you know being some range of one to ten or whatever it is the idea is that we’re turning it into a score and then we’re telling the and then we’re getting the model to say okay this is a good score or this is a bad score you know the idea here being that we want what the model thinks is best overall it doesn’t really matter what the score is just as long as we can infer which the model thinks is quote unquote best for the given task then we have the ability to vote essentially we’re just saying hey given these few things which would you choose vote is similar to self-refined in which we are asking it to provide feedback on its outputs and this is great if you’re you don’t have like you know say for the game of 24 and we’re working with that uh you know that example that we had here this five four five six ten right if we get some ludicrous answer right away like say we have 100 and then and then 350s we’re gonna be able to pretty quickly decide hey that’s actually not going to work out for us in the long run so we’re going to score that pretty low right that’s not this doesn’t seem like a great way to get started vote is more meant for things where it might be subjective or it might be harder to quantify right it’s difficult to say oh well this is more likely or less likely to be correct it’s more like what do you think is best of these four options I think there’s a ton of value left in exploring what is the best way to make the decision on which of the possible thoughts is best in a paper indicates that as well you know this is a space that’s right for Innovation as we all evolve with this technology together lastly we have the search Rhythm this is go with this the idea here is we’re either using breadth first search BFS which is just going to kind of keep our set of B most promising States per step or we’re using depth first search where we’re actually going to check you know how good is the most promising State first until they reach the final output or they’re like actually this is impossible if you’re up on your DSA this is straightforward but the general idea here is we’re either keeping a bunch of states that we’re happy about or we’re trying to get to the bottom of the problem first and then we’re only coming back up if we hit a wall or we find the correct answer so in the case of the examples they provided both the text generation and the game of 24 use BFS and the crossword puzzle uses depth first search so the task we’re to focus on today is the game of 24. it’s a fairly straightforward idea the the game itself is fairly simple right you are given four numbers you must use all four numbers in order to get the number 24. that’s the entire game right so uh for a human this is fairly straightforward I mean it still might take you some time but it is in fact straightforward and you know it is very difficult for models to do so we have the example here we’re not going to spend too much time with this example because we’re just going to go through the code and I think the outputs do a better job explaining what’s actually going on here this is kind of the meat of the paper right this is why this is a paper it is basically saying that this uh this tree of thought is better than your traditional input output prompt and Chain of Thought prompt including chain of prop with self-consistency where we have k equal to 100. now as you can see this is like much better out of the gate right kind of crazy to be honest with you you know but what if we set this up to be fair for the other methods right so we give them kind of a a best of whatever so you know best of 100 uh you know k equal 10 for your for yourself refine and you can see that it’s not close right even what we’re talking about a per node Effectiveness so this is the number of nodes right nodes in this case are considered individual thoughts we can see that the tree of thoughts method just just crushes the i o and the Chain of Thought method even when they’re given like the most you know help so we’re going to start from how we implement this in the notebook and then we’re going to move on to kind of exploring what each part of the script is doing so first things first we have some dependencies to install we do have to provide an open AI key this is going to be making a lot of calls to your open AI API endpoint so make sure that you uh you know you’re aware of that we also just want to clone into this repository uh we can do that so we can easily run the scripts and we’re going to run the breadth first search script in order to employ tot properly as you can see this is done through this python run.py we can pass a number of parameters so the task we’re going to play is game of 24 the task file is located here the task start index is 900 the task and index is 1000 we’re going to use the propose method so that’s what we talked about before we’re going to ask it to propose a number of potential Solutions we are evaluating this by value we are greedily selecting which ones we want to keep the number of evaluate samples we’re using is three and the number of Select samples we’re going to be taking is five really quickly before we get into the code let’s just look at how this thing works right so first things first we do start with some data and this is what we start with essentially just this list of four numbers in this case four five six ten what’s going to happen here is that the tree of thoughts is going to generate us a number of potential First Steps so you can see here we have a bunch of different first steps you know we we just got a ton right four plus five equals nine which leaves us with six nine and ten we’ve got five plus six equals 11 which leaves us with 4 10 and 11. I mean the idea here is that we’re just generating a bunch of different samples right then we score those samples so we generate some values for them as you can see it all of these all of these samples seem fine except for this last one which is rated a little bit lower let’s check and see which one that is this is actually fine we could use this but the model has deemed it not be useful so instead we are going to drop it like it’s hot now we are going to only keep the best five we’re using the greedy allocation here so it’s just keeping the ones that were sorted to the top so we have this four plus five equals nine that’s five plus six this six plus six you know we’re just keeping these as the first five that we see that all achieved a score of three Now we move to the next step and the next step is just you know hey it’s step two so here we go we get again a bunch of different samples you can see that it’s tacked on the second part and then again we score them and we see which ones we want to keep in this case you can see that two of them score very highly and the rest score rather mediumly or very lowly and the ones we wanted to keep are we have this we have this guy that’s looking pretty promising right four times five equals 20 leaving us with 6 10 20 then 20 minus six equals 14 leaving us with 10 and 14. boy that’s close then we have four times five equals 20 and then 10 plus 20 equals 30 with 6 left over it scores those very highly so we’re going to keep them and then we’re gonna keep as well some of these threes that are kind of left over uh only three of them since we’re keeping the top five each time we do the same thing but with the third line and so our first one is just correct so we don’t really need to do much more it gets a high value and we go ahead and we choose that one as the highest rated answer at this time we have completed the problem successfully and that’s fantastic so let’s look and see how this is actually happening in the code a little bit again we’re starting from this CSV that contains all of these different puzzles then we’re selecting the game 24 task the game 24 task in this case is basically a compilation of things that helps us interface with our particular task so it helps us test our output it helps us wrap our prompts in whatever kind of prompt we’re using so you can see here this is our proposed prompt wrap we also have our value prompt wrap next up we can take a look at what these prompts actually look like so for the game of 24 our prompts look pretty straightforward right we have an input we have some possible next steps now we have an input and we ask it for possible next steps spoiler alert this is how we generate a bunch of different proposed actions to take then in this piece of code this is where all the magic happens right essentially all we’re doing is because we’re using this proposed prompt we’re going to get a bunch of proposals we’re going to convert those proposals into some new ideas we’re going to use the value evaluate method to figure out how to Value those again this is all the LM making these evaluations we’re going to use the greedy method to pick which ones we like the best greedy here just means we’re going to pick the ones that get the biggest score and that’s really it it’s incredibl”

Part 1: Classification: Educational

Part 2: Summary:
The video discusses the concept of “Tree of Thoughts” for deliberate problem-solving using large language models. The speaker introduces the game of 24 as an example and demonstrates how the Tree of Thoughts method can be used to find the correct solution. The model is given the initial set of numbers and uses the method to generate and evaluate potential steps until it reaches the solution. The speaker explains the key ideas behind the Tree of Thoughts strategy, including thought decomposition and thought generation. They also discuss the evaluator component, which can use either value or voting-based evaluation methods. The search rhythm, either breadth-first search (BFS) or depth-first search (DFS), is another aspect of the method. The speaker emphasizes the effectiveness of the Tree of Thoughts approach compared to traditional prompt-based methods. The video concludes with a demonstration of implementing the method in a notebook using OpenAI’s API and evaluating the results.

Key points:
– Introduction to the Tree of Thoughts approach for problem-solving with language models
– Game of 24 used as an example for demonstrating the method
– Tree of Thoughts allows the model to search and evaluate potential steps
– Thought decomposition and thought generation strategies
– Evaluation methods: value and voting
– Search rhythm: breadth-first search (BFS) and depth-first search (DFS)
– Comparison of Tree of Thoughts with traditional prompt-based methods
– Implementation of the method in a notebook using OpenAI’s API

Footer: Models, Tokenizers, Embeddings, Files mentioned:
– OpenAI API
– Tree of Thoughts model
– Game of 24 task file

References: None provided.