AI Q&A with HuggingFace – Dr. Trippers Stuff

Summarize the following in all salient bullet points. Do not include introductory material. Include a timeline.
Title: “Getting Started with AI powered Qu0026A using Hugging Face Transformers | HuggingFace Tutorial”
Transcript: “hey welcome back and in this video we’re gonna start to take a look at how we can start to embed ai into our solutions and i’m not meaning using the off-the-shelf ai models that you get with all the various cloud platforms i mean looking at it from a kind of open source perspective now normally when somebody talks about ai solutions it always seems to be about the inner details of how the ai models work as if you are wanting to become an ai scientist of some sort and we’re not going to look at that from this video’s perspective we’re going to focus on how you can start taking some of the best in class ai models that have been developed and are available on the internet to be used how you can get started with them how you can start to consume them and of course in this video we’re going to particularly focus on questioning and answering models again we’ll look at the use cases we’ll look at some of the details how they work etc but we’re gonna focus mainly on how you can get started [Music] all right so to get started the first thing we’re gonna do is go and have a look at one of the best in class question and answering models that is sort of available for you to use in your solutions today so to do that i’m gonna open up a browser and i’m gonna go to a website called huggingface.co now what hugging faces is they are a company that’s trying to build an ai community so actually they’ve got this sort of hub model of ais that have been pre-trained or and are available for you to use so a lot of companies like google and facebook once they develop a new model that is pretty cool and works for everything is invest in class then they essentially give that across the hugging face hugging face will do all of the pre-training for you and then it’s available for you to pick up and use in your solutions so what we’re going to do is we’re going to explore a couple of models we’ll look at questioning and answering and we’ll do an example or two and then we’ll look at how you can start to use them so i’ve got the hugging face website up in front of me so if i just click on models here and let me just make this a little bit larger you can kind of see that you’ve got lots of different tasks that are available and we’ll cover some of these tasks in different videos so there’s things like uh text summarization there’s text generation text classification the ability to uh do translations so from one language to another like english to french french to russian whatever but today we’re going to focus on question and answer now i’m sure you’ve seen question answering type stuff before so if you’ve ever used a chat ball and asked a few questions of that or even if you’ve used something like alexa and when you go hey alexa tell me this or that um you know hey alexa tell me who is the quarterback for the new york giants then it will come back with an answer and underneath that is a sort of concept of questioning and answering and i’ll talk about how that sort of works in a second so if i just click on the question answering uh filter here you’re gonna see it’s gonna come back with a list of models that are already in the hugging face uh model hub which you can use in your own solution so you see there’s this bur large uncased whole world masking thing there’s roberta and then there’s uh there’s a kind of a few other ones distilled and then there’s albert these are all types of models that have been trained and got very scientific papers behind them and they’re all pretty good to be honest we’ll probably focus on a couple of the models later on but again you can kind of use whichever model seems to work for you so if i click on this bur large cased whole word uh masking fine tuned a little bit of a mouthful that one uh and then you can kind of see the the nice thing about hugging faces got a little uh test harness that you can use so what a question and answering type model always expects is it expects you to give a context and what we mean by context is essentially a set of data for you to ask questions against or a corpus so if you look at this example here and i’ll just make it a bit larger it says my the context is my name is wolfgang and i live in berlin and then of course i can ask a question so if you type in where do i live in this case um it’s going to come back and it understands that context of text and then it’s smart enough to come back and say well i think you live in berlin based on that text now it can get quite complicated so if i were to let’s let’s just go to something on the internet so if i go to nfl. com anyone that knows me knows i’m a big fan of uh nfl and if i click on news and oh look there’s my team the new york giant so let’s uh click on this article here there’s a whole article about the new york giants and and it seems to be about um you know the o-line for the new york giants and the left tackle uh is going to protect daniel jones okay cool so if i just take a copy of all of this um i haven’t really uh read too much of that if i open up uh text edit for a second i’m gonna copy this into here i’m just gonna make this plain text and we’ll just save that and then i’m going to copy this article and i’m going to put it into this context so now this is basically the article that i’ve taken off of nfl.com and you you can see it’s talking about daniel jones who’s the quarterback for the new york giants it’s talking about the left tackle andrew thomas um if we scroll down a little bit further it mentions the giants coaches joe george so i could probably start asking questions of that so type of questions i might be able to ask is who is the quarterback for the new york giants and i do expect that to come back with daniel jones so if we hit compute give it a second and there you go this came back with daniel jones that’s actually really cool so we’ll go into the details of how that works for a second but but i want you to just take and hold this for a second right i’ve given this a paragraph of text that i’ve just randomly pulled off the internet and asked it a few questions and it’s got it spot on correct so as you can see i had a 97 confidence factor so it’s feeling pretty confident on that answer and again i can i can ask other questions so i can ask it who is the head coach for the new york giant so i would expect that to come back with joe judge so if we hit compute and indeed it comes back with joe judge and it’s got a 95 uh confidence level so you know it’s it’s pretty good that is a really good model and of course you can kind of check and change some of the other models but i’m hoping you’re starting to get some of those use cases right which is if you’ve got some text or some data so maybe it’s a question and answering uh solution you want to build so maybe you have a core piece of data maybe it’s a manual maybe it’s an faq section then as you could probably see just grabbing that model and then um putting in your data your core piece of information and then uh maybe you might need to do a little bit of fine tuning but but maybe out of the box is probably good enough and then you could open that up to ask questions against it as a kind of chatbot um you know which is going to be really good for sort of any sort of contact center scenario knowledge base type scenario where you want to offer that q a functionality so what we’re going to do now is we’re going to pull it back a little bit and have a look at what’s going on underneath the hood talk a little bit about what’s going on and then we are going to go and build our own on google collab all right the first thing i want you to understand is this idea of transfer learning so what is transfer learning well transfer learning is a new sort of when i say new last few years type thing ai technique that is used by a lot of people let’s compare how machine learning used to work in the past right so let’s say you wanted to learn some sort of task maybe it was a classification maybe classify something as a color or pet type of dog whatever right then how it kind of used to work is you would get a set of data you would label it label it all up and then the machine learning algorithm would sort of churn away against that data set you’ll have trained against that data set and then it can come back with answers now let’s say i wanted to then learn some other classification maybe i wanted to learn about cats or i wanted to learn classified types of tortoises or something well how you would do that in the past is you would do the exact same thing you you would go and get another set of data with all the tortoises or all the cats or whatever and then you would train the models again and these models these training models would take quite a long time it’s quite intensive to do that but if but if you think about how humans learn right we don’t throw away all of our knowledge if i need to learn something new if i need to identify a cat or a dog or something i don’t throw that information away i’m building upon my previous knowledge base so maybe i’ve learned about cats and then i will use the techniques that i’ve used to learn about cats to learn about dogs i don’t go and get a brand new data set and then you know wipe my brain and then relearn again that’s not how this works what how everything works is you’re building upon previous knowledge and that’s how this idea of transfer learning works as well so what will happen is you get this general purpose model that has been pre-trained so in the case of q a the model that sort of powers the underlying q and a algorithm the base model is something called burp which stands for bi-directional encoder representation from transformers yeah it’s a real sort of uh easy thing to remember right that’ll come off the top of your head but um but that burnt model isn’t just used for q a it’s used for lots and lots of other machine learning or natural language processing uh routines so it becomes a base model and i think bur in this case is built up of the corpus of data that it’s been used to train has been from wikipedia and also i think it’s a kind of book corpus type thing as well and those two sources of data have been used to train bert so that it understands the english language or any other language but understands the [Music] structures of sentences and how to tokenize and all that good stuff so that it can understand language models and then once bur has learned the kind of the the core english models then what you can then do is run a level of fine tuning over the top so there’s these sort of two statuses the first status is training which is training against this sort of large corpus of data which is the kind of book corpus and uh wikipedia and then the second stage is training it against uh a specific data set so it can learn how to do these fine-tuned tasks so in this case question and answer and we’ll sort of delve into that in a second so if you want to find out a little bit more about bert you can just go to github.com go to uh google dash research and then you can look at uh bert so i’ve got the url up on the screen just now and that’ll give you a lot more details and allow you to download the papers the text of bur as i said is i’ve given you a bit of an explanation but it’s designed to pre-train bi-directional representations from unlabeled text by jointly conditioning both left and right contacts in all layers and as a result the pre-trained burp model can be fine-tuned just with one additional output layer to create state-of-the-art models for a wide range of tasks such as question and answering which is the one we’re doing just now or language inference without task specific architecture modifications so that’s exactly what i just explained a second ago we’re pre bert has been pre-trained against wikipedia and and book corpus that gives a general understanding of how language works and then you’re then teaching the burp model how to do specific tasks now if you think about this for a second right downloading wikipedia is gigs and gigs and gigs worth of data yeah huge amount of data and then it’s going to take days and days and days of training on either gpus or tpus to to get that pre-trained model and that’s where something like hugging face comes in because that that pre-trained model is available and you can just download and use that and use it for your own base models and then you could train bert to do other language tasks if you want or in this case it’s it’s question and answering now the second part that’s probably important is bur is unsupervised right and it’s training so and if you think about it that’s useful on the internet because there’s not a lot of sort of annotated data but there’s loads and loads and loads and loads of articles such as wikipedia books etc that you can use and that’s kind of how that’s been done now the other key thing about this is it’s a bi-directional system and that’s what made sort of revolutionary because prior to burr it was sort of a unidirectional type thing so what i mean by bi-directional so if we look at my screen there you can see that um in this example you see the man went to the mask and he bought a mask of milk now in a unidirectional system it would have went sort of left to right and it would have looked at the man went to the whatever it is and then tried to guess whatever was in that sentence in a bi-directional system such as bur it looks left to right and then right to left as well so it can then guess what’s going on there and use the whole sentence or the whole sequence uh to figure out what these mass things are so in this case you can you can look at the milk part at the end and you can go okay he bought a it could be a gallon it could be a leader um but then that makes it kind of easier to figure out that this one here is probably going to be store or shop or something so that’s kind of how that works and the burp model as you can probably guess from that example in the mas”

– The video focuses on embedding AI into solutions from an open-source perspective.
– It explores using pre-trained AI models available on the internet, specifically questioning and answering models.
– The Hugging Face website is introduced as a hub for pre-trained models developed by companies like Google and Facebook.
– Different tasks are mentioned, including text summarization, text generation, text classification, and translation.
– The video focuses on the question and answer task and explores various models available on the Hugging Face model hub.
– The Bur large cased whole-word masking fine-tuned model is used as an example.
– A test harness is shown, where a context is provided, and questions can be asked to the model.
– An example using an article about the New York Giants is demonstrated, and the model accurately answers questions about the article.
– The concept of transfer learning is explained, where a general-purpose model (BERT) is pre-trained on a large corpus of data (Wikipedia and book corpus) and then fine-tuned for specific tasks like question answering.
– BERT’s bi-directional nature is highlighted, allowing it to understand the context and structure of sentences.
– Hugging Face provides access to pre-trained models, saving time and resources required for training.
– The video mentions the availability of more information and papers about BERT on the Google Research GitHub page.

Timeline:
– Introduce the focus on embedding AI into solutions and using open-source perspectives.
– Discuss the availability of pre-trained AI models on the internet.
– Introduce the Hugging Face website and its model hub.
– Explore different tasks, highlighting the question and answer task.
– Demonstrate the Bur large cased whole-word masking fine-tuned model and the test harness.
– Show an example using an article about the New York Giants.
– Explain the concept of transfer learning and the use of BERT as a general-purpose pre-trained model.
– Highlight BERT’s bi-directional nature and its training on large text corpora.
– Mention the availability of more information and papers about BERT on the Google Research GitHub page.