CS50 Video Player
    • 🧁

    • 🍮

    • 🍐

    • 🍿
    • 0:00:00Introduction
    • 0:01:15Deep Fake
    • 0:01:40Welcome
    • 0:04:14Generative Artificial Intelligence
    • 0:07:41ChatGPT
    • 0:12:53Prompt Engineering
    • 0:14:56CS50.ai
    • 0:21:18AI
    • 0:24:16Decision Trees
    • 0:25:59Minimax
    • 0:34:27Machine Learning
    • 0:44:37Deep Learning
    • 0:50:23Large Language Models
    • 0:54:25Hallucinations
    • 0:00:00[MUSIC PLAYING]
    • 0:01:20TOM CRUISE: I'm going to show you some magic.
    • 0:01:24It's the real thing.
    • 0:01:27I mean, it's all the real thing.
    • 0:01:40DAVID MALAN: All right, this is CS50.
    • 0:01:43And this is our Family Weekend here at Harvard College.
    • 0:01:45So we have lots of parents and siblings and other relatives here in the group.
    • 0:01:48And this is meant to be a family friendly lecture
    • 0:01:51on Artificial Intelligence, or AI.
    • 0:01:53My name is David Malan.
    • 0:01:54I am your instructor today.
    • 0:01:55And in CS50 for some time, we have had this tradition
    • 0:02:00of giving every student in the class a rubber duck like this one here,
    • 0:02:04whereby the third or so week of the class we hand these out.
    • 0:02:07And the idea is that if students are struggling with some concept
    • 0:02:10or they have some bug or mistake in their code,
    • 0:02:13they are encouraged to literally talk to the rubber duck.
    • 0:02:16And in that process of verbalizing what confusion and questions
    • 0:02:19they're having, invariably, that proverbial light bulb tends to go off.
    • 0:02:22Now, some years ago, we actually virtualized this rubber duck
    • 0:02:25and implemented a software version of it, whereby students, for instance,
    • 0:02:29could open up a chat window in CS50's programming interface.
    • 0:02:32They could begin to converse with this here virtual duck.
    • 0:02:36I'm hoping you can help me solve a problem.
    • 0:02:38And then randomly once, twice, or three times with this duck,
    • 0:02:41quack back in response.
    • 0:02:43We have anecdotal evidence that this was sufficient,
    • 0:02:45educationally, to actually prompt the student
    • 0:02:47to figure out what was going wrong because they had already
    • 0:02:49verbalized their confusion.
    • 0:02:50But much to the surprise of some of these students' predecessors,
    • 0:02:54just over a year ago, did the duck literally
    • 0:02:57overnight in spring of 2023 start talking back to them in English.
    • 0:03:02And that is all because of, underlying the hood now,
    • 0:03:06is, indeed, some artificial intelligence.
    • 0:03:08So among our goals for today is to give you a taste of artificial intelligence
    • 0:03:12and, in turn, CS50 itself, but to also give you a sense of how this technology
    • 0:03:16itself works, because it's certainly not going anywhere.
    • 0:03:19It's only going to become all the more omnipresent, most likely.
    • 0:03:21So hopefully at the end of today's hour, you
    • 0:03:23will exit here all the more of a computer person.
    • 0:03:27But the talk of the town has been specifically
    • 0:03:30something called generative AI.
    • 0:03:32Like AI as a field of computer science has been with us for decades.
    • 0:03:35But it really has made exponential improvements in recent months
    • 0:03:39and in recent years.
    • 0:03:40But the focus of late has been on, indeed, generative
    • 0:03:43AI, whereby we're using some form of artificial intelligence
    • 0:03:46to generate stuff.
    • 0:03:47And that stuff might be text.
    • 0:03:49That stuff might be images.
    • 0:03:50That stuff might be video content and so much more in the years to come.
    • 0:03:54In fact, the Tom Cruise that you saw greet us just a moment ago was not,
    • 0:03:58in fact, the real Tom Cruise, but a so-called deepfake, which
    • 0:04:01we introduced today's class with, playfully, of course,
    • 0:04:04but there's actually serious implications, suffice it to say,
    • 0:04:07when it comes to information, disinformation in the world.
    • 0:04:10But for today, we'll focus really on the underlying technology and know-how.
    • 0:04:14So we want to make this as participatory, too, as we can.
    • 0:04:17And so over the past couple of years, lots
    • 0:04:19of publications, the New York Times among them,
    • 0:04:21has sort of tested people's ability to discern true from false, reality
    • 0:04:26from artificial intelligence.
    • 0:04:28And the New York Times for instance, put together a sequence of images
    • 0:04:31that we thought we'd share with you.
    • 0:04:33I'm joined by CS50's preceptor here up on stage,
    • 0:04:35Julia who's going to help guide us through a sequence of multiple choice
    • 0:04:39problems, if you will.
    • 0:04:40The first of which is going to be this one here.
    • 0:04:42Two images-- one left, one right--
    • 0:04:45which was generated by AI, left or right?
    • 0:04:51And if Julia, you want to switch over to our special software,
    • 0:04:54we'll see the votes coming in.
    • 0:04:55Looks like 80% of you are voting at the moment for right.
    • 0:04:58The left is making some progress here.
    • 0:05:014% or so unsure.
    • 0:05:03I think that's pretty close to the right winning.
    • 0:05:06Let's see what the correct answer is, if Julia, we can switch back to this.
    • 0:05:09The answer was, indeed, the one on the right.
    • 0:05:11And why is that?
    • 0:05:12Someone who voted right, why did you say?
    • 0:05:14Feel free to just shout it out.
    • 0:05:16Yeah.
    • 0:05:17Why right?
    • 0:05:19AUDIENCE: It's more clear.
    • 0:05:20DAVID MALAN: OK, so it's more clear.
    • 0:05:21Seems a little more vivid, maybe a little too good.
    • 0:05:23OK, so a pretty good heuristic.
    • 0:05:25But still, 20% of you got that wrong.
    • 0:05:27So let's try one more.
    • 0:05:29Left or right, which was generated now by AI, left or right?
    • 0:05:35Which was generated by AI?
    • 0:05:38Let's toggle over and see what the responses are looking like.
    • 0:05:40This time it looks like you just switched.
    • 0:05:43Your most confident answer here, so 70%, just under,
    • 0:05:46are voting now for the person on the left, 30% person on the right.
    • 0:05:49About 5% still unsure.
    • 0:05:52If we toggle back to the two photographs.
    • 0:05:55Unfortunately, trick question, both of them were generated by AI.
    • 0:05:58So it's getting harder already, indeed.
    • 0:06:01And it's only going to get daresay impossible before long.
    • 0:06:03Well, let's turn our attention to text, because clearly that's
    • 0:06:06what underlies something like CS50's own rubber duck.
    • 0:06:09Here is a headline from the New York Times some months ago.
    • 0:06:11Did a fourth grader write this, or the new chat bot?
    • 0:06:14And a chat bot is just a piece of software
    • 0:06:16that converses with you textually and, invariably soon, via voice, as well.
    • 0:06:22This test is textual.
    • 0:06:23And I'll read it aloud.
    • 0:06:24Essay 1-- I like to bring a yummy sandwich and a cold juice box for lunch,
    • 0:06:28and sometimes I'll even pack a tasty piece of fruit
    • 0:06:30or a bag of crunchy chips.
    • 0:06:32As we eat, we chat and laugh and catch up on each other's day.
    • 0:06:35Essay 2-- my mother packs me a sandwich, a drink, fruit, and a treat.
    • 0:06:39When I get in the lunchroom, I find an empty table
    • 0:06:41and sit there and eat my lunch.
    • 0:06:43My friends come and sit down with me.
    • 0:06:45So one of those was written by a fifth grader.
    • 0:06:46One of those was written by AI.
    • 0:06:48Which was written by AI, essay 1 or essay 2?
    • 0:06:53Let's see the votes as they come in.
    • 0:06:56So similar percentage.
    • 0:06:58So maybe it's roughly the same group of people for each of these votes,
    • 0:07:01having switched last time and now stayed in the lane.
    • 0:07:03About 76% Essay 1, 23% Essay 2.
    • 0:07:07Fewer people are now unsure, so that's progress.
    • 0:07:102% to 3%, so still, indeed, progress.
    • 0:07:13Let's go back and take a look.
    • 0:07:14The answer in this case was essay 1 was the AI.
    • 0:07:18And here, too, it's not necessarily obvious.
    • 0:07:21But I'm not sure how many fifth graders say
    • 0:07:23they catch up on each other's day in middle school, for instance.
    • 0:07:27So this, too, though, is only going to become more of a challenge.
    • 0:07:30Thank you to CS50's preceptor, Julia, here as we--
    • 0:07:34and maybe round of applause for having choreographed that so perfectly.
    • 0:07:37Thank you, Julia.
    • 0:07:40So where do we begin?
    • 0:07:42So in CS50 in spring of 2023, we began to embrace artificial intelligence
    • 0:07:47in some form.
    • 0:07:48We were not sure quite how.
    • 0:07:49We weren't quite sure how well it would work.
    • 0:07:51This has all been very much an experiment.
    • 0:07:53But ChatGPT itself, as you might recall, only came out about 23 months ago
    • 0:07:58in November of 2022.
    • 0:08:00And how quickly the world seems to have changed already.
    • 0:08:02But at least in our educational context, this
    • 0:08:04is the premise from which we begin all of CS50's work and development with AI
    • 0:08:09is that ChatGPT and Bing and Claude and tools like it,
    • 0:08:12they're just too helpful out of the box.
    • 0:08:14Educationally, as you yourselves might have experienced,
    • 0:08:16they're all too willing to just hand students' answers
    • 0:08:18to problems, which is great if they just want to get that answer.
    • 0:08:22But if they want to learn the material, certainly if a teacher
    • 0:08:24wants to assess their understanding of the material, all too willing to just
    • 0:08:28hand out answers, rather than lead students to successful outcomes.
    • 0:08:31And so we actually put, two years ago, almost,
    • 0:08:33in CS50 syllabus that it's not reasonable to use
    • 0:08:35ChatGPT or similar tools.
    • 0:08:37We can't prevent it technologically, but we
    • 0:08:39do communicate both ethically and through policy that it's, indeed,
    • 0:08:43not allowed, and thus not reasonable.
    • 0:08:45But we do think it reasonable to use CS50's own AI-based software, including
    • 0:08:49that virtual rubber duck in several different forms, only one
    • 0:08:53of which you've glimpsed so far, which, indeed,
    • 0:08:55is designed to lead students to solutions, but, indeed, not hand
    • 0:08:59them to them outright.
    • 0:09:00So we thought we'd share with you, then, a taste of how this duck is implemented,
    • 0:09:04but then, in turn, how artificial intelligence is making all of this work.
    • 0:09:08And here, for more of the engineering folks in the audience,
    • 0:09:11the computer persons is an architectural diagram
    • 0:09:13of what the CS50's team has been building over the past couple of years,
    • 0:09:17including CS50.ai, which is like the central server that runs all of this.
    • 0:09:22It provides, of course, students with a very user friendly interface,
    • 0:09:25including a rubber duck.
    • 0:09:26But we also have a local vector database,
    • 0:09:28as they're called these days, where we actually, after every lecture,
    • 0:09:31convert, for better or for worse, everything that comes out of my mouth
    • 0:09:34to text, then run it through a database of ours so that it can then be searched
    • 0:09:38not only by students, but, in turn, this underlying AI.
    • 0:09:41And all of this is built on top of third party tools.
    • 0:09:44So we have not reinvented the wheel.
    • 0:09:46We have not built our own large language model,
    • 0:09:48as these things are called, as we'll soon see.
    • 0:09:50But we're building on top of things called APIs,
    • 0:09:53Application Programming Interfaces, which
    • 0:09:55are third party services that the OpenAI's, Google's, Microsofts,
    • 0:09:58and others provide so you can build your own tools, educational in nature,
    • 0:10:02in our case here.
    • 0:10:04Now, as for what this looks like, for instance, this is just one of the views.
    • 0:10:07Your own student or child can perhaps give you a better sense.
    • 0:10:10But this is the chat interface as students experience it
    • 0:10:13in CS50, similar in spirit to ChatGPT.
    • 0:10:15And indeed, for now, we do have disclaimers
    • 0:10:17that it's not going to be perfect.
    • 0:10:19There are things called hallucinations that we, on occasion, might suffer from,
    • 0:10:23as well.
    • 0:10:23More on that soon.
    • 0:10:24But here is a representative question that a student might ask.
    • 0:10:27Bless your hearts, but unfortunately, this is about as detailed
    • 0:10:30as the questions get sometime.
    • 0:10:31My code is not working as expected.
    • 0:10:33Any ideas?
    • 0:10:34And so the duck, upon seeing not only that question, but the student's code,
    • 0:10:39actually, and with a wave of a hand I'll stipulate today, the duck
    • 0:10:42debugger or DDB, doesn't just answer the question outright,
    • 0:10:46but responds with something like this.
    • 0:10:48It looks like you're trying to add two integers,
    • 0:10:50but there's an issue with how you're handling the input.
    • 0:10:52What data type does input return, and how might that affect your addition?
    • 0:10:57So ideally, behaving like a good teacher, a good tutor,
    • 0:11:00and interactively having the conversation with the student.
    • 0:11:02Not always perfect, but pretty darn good already out of the box.
    • 0:11:06And surely, as industry progresses, it's only going to get better.
    • 0:11:09And, indeed, the conversations will only get
    • 0:11:10richer and more involved for students.
    • 0:11:13Now, besides our students here in Cambridge,
    • 0:11:15besides our students down the road in New Haven at Yale,
    • 0:11:18we actually have a very rich history of OpenCourseWare in CS50,
    • 0:11:21where everything we do curricularly and technologically,
    • 0:11:23is freely available to anyone around the world, teachers and students alike.
    • 0:11:27So to date, since spring of 2023, we have some 201,000 students and teachers
    • 0:11:33already using this here duck.
    • 0:11:34That's averaging 35,000 prompts or questions per day,
    • 0:11:38a total of 9.4 million as of this morning.
    • 0:11:40So not only are our own students here, but really, the world
    • 0:11:43is starting to embrace these tools, whether it's off the shelf,
    • 0:11:46like ChatGPT, or ducks like ours here.
    • 0:11:48So the overarching pedagogical goal, though,
    • 0:11:51and the utility, as our own students probably know by now,
    • 0:11:54is really to provide this--
    • 0:11:55students with 24/7 office hours, one-on-one,
    • 0:11:59or duck-on-one opportunities for help with the course's homework assignments
    • 0:12:03and more, and to approximate, really, what is the Holy Grail, I dare say,
    • 0:12:06educationally, a one-to-one student to teacher ratio,
    • 0:12:10which even at a place like Harvard or Yale,
    • 0:12:12where we have the luxury of lots of teaching fellows or teaching assistants,
    • 0:12:15many of whom are students themselves, we still might have in class ratios of 1
    • 0:12:18to 6, 1 to 10, 1 to 20, which is great.
    • 0:12:22But as you think about this, as we often do,
    • 0:12:24even if you have just six students in a room at office hours,
    • 0:12:27and that office hour is, indeed, an hour,
    • 0:12:29that's, like, 10 minutes per student, which, for the struggling student,
    • 0:12:32has never been historically enough time, necessarily,
    • 0:12:34for that in-person interactions.
    • 0:12:36And so with software now, we hope to continue to leverage
    • 0:12:39the exact same humans we have, but allocate those same resources ideally
    • 0:12:43to the students who need it and benefit from it most,
    • 0:12:45while allowing those students more comfortable
    • 0:12:47to, indeed, interact virtually if they prefer any time of the day with this
    • 0:12:52duck.
    • 0:12:52So as for what then is powering this duck and similar technologies underneath
    • 0:13:00it is sort of a new term of art that you might
    • 0:13:02have heard in the real world that have prompt engineering.
    • 0:13:05So we've got these AIs out there, ChatGPT among them.
    • 0:13:08And it's become a skill, sort of a LinkedIn thing,
    • 0:13:10daresay, for better or for worse, to know prompt engineering, which
    • 0:13:13essentially means how to ask good questions of an AI-- generally
    • 0:13:17in English, but really in any human language.
    • 0:13:19It's a little bit hackish.
    • 0:13:20This is not really an engineering skill as much
    • 0:13:22as it is just getting acclimated to what kinds of patterns of questions
    • 0:13:28tend to induce the AI to respond to you better.
    • 0:13:30It's like being good at Google searches.
    • 0:13:32This is not something that's probably going
    • 0:13:33to be a necessary skill before long, because the software is just
    • 0:13:35going to get better.
    • 0:13:36But in essence, what prompt engineering means
    • 0:13:38is that someone somewhere has at least written for the AI a system prompt,
    • 0:13:43a set of instructions, usually in English, that tell the AI how to behave,
    • 0:13:48what domain of information to focus on, and so forth.
    • 0:13:51So for instance, in the case of CS50, when we built our duck on top
    • 0:13:54of OpenAI's APIs, we literally tell OpenAI this in CS50's so-called system
    • 0:14:01prompt, quote unquote, "you are a friendly and supportive teaching
    • 0:14:04assistant for CS50," essentially coercing the underlying version
    • 0:14:08of ChatGPT to behave in a CS50-specific way.
    • 0:14:11But our second sentence in our so-called system prompt is this--
    • 0:14:15"you are also a rubber duck."
    • 0:14:16And that is enough to coerce some degree of quacking or other similarly
    • 0:14:20adorable behavior.
    • 0:14:21But we further go on to tell the AI, "answer student questions only about
    • 0:14:25CS50 in the field of computer science.
    • 0:14:26Do not answer questions about unrelated topics.
    • 0:14:29Do not provide full answers to problem sets,
    • 0:14:31as this would violate academic honesty."
    • 0:14:33And then in essence, we say, answer this question.
    • 0:14:36And we copy and paste whatever the student
    • 0:14:38has typed into the window their question, which
    • 0:14:40is generally known as a user prompt, much like you might type into ChatGPT.
    • 0:14:44So that not only does the underlying AI know
    • 0:14:47what the question is, it has this system prompt for additional context
    • 0:14:50so that it behaves unlike the default and more
    • 0:14:53like a pedagogically designed rubber duck, in our case.
    • 0:14:56In fact, let's see how we might implement this in code.
    • 0:14:59Let me go over to a program I've got running on my machine
    • 0:15:02here already, which students know as VS Code, Visual Studio
    • 0:15:06Code, which is the programming environment we and lots
    • 0:15:08of folks in industry use.
    • 0:15:09I'm going to run a command called code chat dot pi.
    • 0:15:13And I'm going to implement the simplest of chat bots
    • 0:15:15here live in a language called Python that we just learned a few days ago.
    • 0:15:19I'm going to go ahead and say message equals input, maybe
    • 0:15:25something like, what's your question, question mark?
    • 0:15:28And what this line of code is doing for the parents and siblings in the room,
    • 0:15:32this is what's called a variable, similar in spirit to mathematics,
    • 0:15:35like x, y, z.
    • 0:15:35But I'm using an English word instead.
    • 0:15:37Input is a function, like a verb, that will
    • 0:15:39tell the computer what to do for me, in this case, get input from the user.
    • 0:15:43And in quotes here, I have the prompt or the question
    • 0:15:46that I want the computer to ask of the human.
    • 0:15:49Then I'm going to go ahead and do this-- print, quote unquote, "quack, quack,
    • 0:15:54quack."
    • 0:15:54So in essence, this was version one of our rubber duck.
    • 0:15:57And if I run this program now with a command in my the bottom of my screen,
    • 0:16:01Python of chat dot pi and hit Enter, you'll
    • 0:16:03see that the cursor is now waiting for me to provide my user prompt,
    • 0:16:07if you will.
    • 0:16:08How about, what is AI, question mark?
    • 0:16:11And that was it for sort of version 1.
    • 0:16:14But I dare say, if you'll humor me and let me just type somewhat quickly
    • 0:16:17a little more advanced code, that even CS50 students this past week
    • 0:16:20have not yet seen, I can actually turn this
    • 0:16:22into an artificially intelligent duck, as well.
    • 0:16:25So let me clear the bottom of my window and hide that for a moment.
    • 0:16:28And let me start doing a few things that I'm
    • 0:16:30going to wave my hand at to some extent, but I'll explain a few of the lines
    • 0:16:33as we go.
    • 0:16:34So I'm going to first import some functionality relating to the underlying
    • 0:16:37operating system.
    • 0:16:38I'm then going to import from a dot env, a dot environment,
    • 0:16:43a function called load env, which is just
    • 0:16:45going to make it easier for me to talk to OpenAI without having
    • 0:16:48to log in and do a bunch of stuff that I did in advance of class already.
    • 0:16:52And I'm going to call that function called load dot env right away.
    • 0:16:56After that, I'm going to import from a library I already
    • 0:16:59installed called OpenAI, which they make freely available to us and anyone else.
    • 0:17:03I'm going to import a feature called OpenAI, capital O, capital AI.
    • 0:17:08I'm then going to create another variable called client, which
    • 0:17:11refers to the software I'm writing.
    • 0:17:13And I'm going to set that equal to whatever that functionality, that
    • 0:17:17feature does for me, OpenAI's feature.
    • 0:17:20And I'm going to specify that the API key that I want to use
    • 0:17:23is equivalent to whatever is in my operating systems
    • 0:17:26environment in a special variable called API key,
    • 0:17:30which, again, I configured before class, just saving myself
    • 0:17:33the trouble of logging in with a username and password here.
    • 0:17:36All right, what do I do next?
    • 0:17:37Let's first define a system prompt now and a third variable.
    • 0:17:41And that system prompt will be reminiscent of what we actually
    • 0:17:43do in CS50.
    • 0:17:44You are a friendly and supportive teaching assistant for CS50 period.
    • 0:17:51You are also a duck, period.
    • 0:17:55And that's it.
    • 0:17:56Now I'm going to go ahead and create a fourth variable called user prompt,
    • 0:18:02set that equal to the input function, which we saw briefly earlier.
    • 0:18:06And I'm just going to say, again, what's your question, question mark?
    • 0:18:10But now I'm going to do something whereby I'm talking to OpenAI's API,
    • 0:18:14passing to OpenAI my system prompt and my user prompt together
    • 0:18:18so that it behaves in the way I want.
    • 0:18:20I'm going to create another variable called ChatGPT completion, set
    • 0:18:23that equal to this function, which is a mouthful,
    • 0:18:26client dot chat dot completions dot create open parentheses.
    • 0:18:31Then on a new line, just so it doesn't look messy on the screen,
    • 0:18:33I'm going to say that the messages I want to send to OpenAI
    • 0:18:36is this list of messages for students from this past week.
    • 0:18:40Messages is thus a named parameter, which is just a parameter to a function.
    • 0:18:44And the square brackets mean this is a list.
    • 0:18:46The list I'm going to provide has two things, two Python dictionaries,
    • 0:18:50sets of key value pairs.
    • 0:18:52The first is going to say a special keyword here role colon system.
    • 0:18:57And then the content for this role is going to be the system prompt variable
    • 0:19:02that I defined earlier.
    • 0:19:03So this is passing to OpenAI the system prompt.
    • 0:19:06Then I'm going to pass in one more of these things,
    • 0:19:08where the role this time is going to be that of quote unquote "user,"
    • 0:19:11then the content of that is going to be the user prompt,
    • 0:19:15which the human typed in.
    • 0:19:16After that, I'm going to specify one other named parameter, which
    • 0:19:21is to say the model I want to use is something called GPT 4.0, which
    • 0:19:24is the latest and greatest version with which some of you
    • 0:19:26might be familiar in the real world.
    • 0:19:28I know it's a mouthful, but we're almost done.
    • 0:19:30Now, I'm going to go ahead and create a final variable, response text,
    • 0:19:34to literally look at the text that comes back from OpenAI.
    • 0:19:37I'm going to set that equal to the chat completion
    • 0:19:40variables, choices, attribute, the first element therein,
    • 0:19:45starting at 0, the message therein and the content thereof.
    • 0:19:50And then lastly, finally, I'm going to print out that response text.
    • 0:19:55Now, I don't normally write this many lines of code all at once in class.
    • 0:19:59So I'm going to cross my fingers big time now, reopen my terminal window,
    • 0:20:02run Python of chat dot pi.
    • 0:20:05I'll increase the size of my terminal just so we focus on this.
    • 0:20:08Hopefully I made no typographical errors.
    • 0:20:12All right, let's ask again.
    • 0:20:13What is AI, question mark?
    • 0:20:15Enter.
    • 0:20:20OK, some of you knew that.
    • 0:20:22Thanks a lot.
    • 0:20:23OK, so response text.
    • 0:20:26The last line, too.
    • 0:20:27OK, all right.
    • 0:20:29So that's a bug, as we call it in programming.
    • 0:20:32Let's run this again, python of chat.py.
    • 0:20:35What is AI, question mark?
    • 0:20:41There we go.
    • 0:20:42AI, or Artificial Intelligence, refers to the simulation of human intelligence
    • 0:20:45and machines that are programmed to think and learn like humans, dot, dot,
    • 0:20:48dot, some other educational stuff, and quack
    • 0:20:51at the end, which was generated for us.
    • 0:20:54So thank you.
    • 0:20:56[APPLAUSE]
    • 0:20:59So suffice it to say, we've spent a lot more time-- the whole team of CS50
    • 0:21:03has spent a lot more time building the fancier version of ChatGPT dot
    • 0:21:06pi that is CS50 by itself.
    • 0:21:07But that is it in essence as to how people
    • 0:21:10like us are building on top of today's AIs
    • 0:21:13to build very specific applications that hopefully leverage them
    • 0:21:16for better instead of for worse.
    • 0:21:18But how did OpenAI make all of that possible?
    • 0:21:21How do these large language models, ChatGPT and, in turn, CS50's duck work?
    • 0:21:26Well, let's consider, ultimately, what has
    • 0:21:28been getting developed over the decades now underneath the hood that
    • 0:21:32defines what AI is.
    • 0:21:34But before we go in that direction, let me
    • 0:21:36propose that we look at not only generative artificial intelligence,
    • 0:21:42which is, again, the use of AI to generate content,
    • 0:21:45as we just did in text, but specifically,
    • 0:21:47artificial intelligence more generally.
    • 0:21:48So here's where we take a spin through the world of AI.
    • 0:21:52So AI's been with us for years, even though we only really started
    • 0:21:55talking about it every day.
    • 0:21:56In the past couple of years, any of us who've
    • 0:21:58been using Gmail or Outlook or the like, the spam filters
    • 0:22:01have been getting pretty darn good.
    • 0:22:02It's pretty rare, at least with most big mail programs nowadays,
    • 0:22:05where you have to manually deal with spam.
    • 0:22:07It's often going into your spam folder.
    • 0:22:09But there isn't some human who works at Google or Microsoft who's
    • 0:22:12looking at all of your email or even all of the email coming in and saying spam,
    • 0:22:16not spam, spam.
    • 0:22:17That would just not be feasible nowadays.
    • 0:22:19So somehow, AI is inferring using some kind of techniques or algorithms, step
    • 0:22:24by step instructions for solving problems, what is spam and what is not.
    • 0:22:27It's not perfect.
    • 0:22:28But hopefully it's usually correct.
    • 0:22:30And indeed, that's the behavior we might want.
    • 0:22:32Handwriting recognition on tablets and phones and the like,
    • 0:22:34this has been using AI for years because no one at Microsoft or Google
    • 0:22:37knows what your specific handwriting looks like.
    • 0:22:40It looks similar, though, to other people that have trained the AI.
    • 0:22:44Watch history-- the Netflix and other streaming services
    • 0:22:47are getting pretty darn good at recommending
    • 0:22:49based on things you have watched and maybe upvoted or downvoted,
    • 0:22:52similar shows or movies that you might want to watch, as well, that, too,
    • 0:22:56has been AI.
    • 0:22:56And then, of course, all of these voice assistants, like Siri and Alexa
    • 0:22:59and Google Assistant, they don't know your voice specifically,
    • 0:23:02but it's pretty similar to other humans' voices.
    • 0:23:05And so they learn and figure out how to respond
    • 0:23:07to not only the Google and Microsoft employees, but to your and my voice,
    • 0:23:11as well.
    • 0:23:12And then, of course, we can actually go way back to things
    • 0:23:14like the very one of the earliest arcade games, which some of you
    • 0:23:18might have played as a kid, this here being pong.
    • 0:23:20It's sort of like a tennis game where two people
    • 0:23:22move the pedal up, down and up, down and bounce the ball back and forth.
    • 0:23:26So it turns out that games is a nice domain in which to start
    • 0:23:30talking about AI because one, it's fun.
    • 0:23:33But two, it also tends to have very well defined rules and goals,
    • 0:23:36like maximize your score or minimize the other person's score.
    • 0:23:39In fact, here's another incarnation of really the same idea.
    • 0:23:42This was an arcade game that came out later called Breakout--
    • 0:23:44exists in many different flavors.
    • 0:23:46But in this case here, things are sort of flipped around.
    • 0:23:48There only needs to be one player.
    • 0:23:50And the idea is with this paddle, you bounce this ball against the bricks,
    • 0:23:53and the bricks disappear once you break them.
    • 0:23:56And therefore, the goal is to get rid of all of these bricks.
    • 0:23:58But even just based on this screenshot, odds are, all of us, as humans,
    • 0:24:02have an instinct for which way the paddle should move.
    • 0:24:05If the ball just left the paddle and went this way,
    • 0:24:08which way should the paddle be moved by the human, left or right?
    • 0:24:11I mean, obviously to the left if this thing
    • 0:24:13is going to bounce or reflect off of that first brick.
    • 0:24:16So there's a very well defined heuristic that even
    • 0:24:18we have ingrained in us already.
    • 0:24:20Maybe we can translate that to code and something a computer can ultimately do.
    • 0:24:23Well, the first way we'll try thinking about this
    • 0:24:26actually comes from the class before us, EC10, or decision trees,
    • 0:24:29or from strategic thinking more generally,
    • 0:24:32whereby you can actually draw a picture, like a tree in programming that we've
    • 0:24:35seen in week five of CS50, where you have some root node where you begin
    • 0:24:39and then all of these different children via which you
    • 0:24:41can decide yes or no, do this thing.
    • 0:24:44So a decision tree for something like the game we just saw
    • 0:24:47could be a drawing like this-- is the ball to the left of the paddle.
    • 0:24:51If so, then go ahead and move the paddle left, which
    • 0:24:55is what everyone's instinct here was.
    • 0:24:56But if the ball is not to the left of the paddle,
    • 0:24:58then you should ask a second question.
    • 0:25:00You shouldn't just blindly move to the right,
    • 0:25:02because there's actually three scenarios here, even if non-obvious.
    • 0:25:05Is the ball to the right of the paddle?
    • 0:25:07That, too, has a yes/no answer.
    • 0:25:08If yes, then obviously move the paddle to the right.
    • 0:25:11But if no, then you should probably don't move the paddle
    • 0:25:14and stay where you are.
    • 0:25:16So not very hard, but now you can imagine, especially if you're in CS50
    • 0:25:20or you're a computer person, we could probably
    • 0:25:22translate this to some kind of code in some language
    • 0:25:25because it's very much based on conditionals-- if, else if,
    • 0:25:28else, so to speak.
    • 0:25:29So what does that do for us?
    • 0:25:31Well, we could translate this to pseudocode, English-like code
    • 0:25:35that we might write in CS50.
    • 0:25:36While the game is ongoing, if the ball is to the left of the paddle,
    • 0:25:40then move the paddle left.
    • 0:25:41Else if the ball is to the right of the paddle--
    • 0:25:43that's a typo.
    • 0:25:44Second bug, sorry.
    • 0:25:45If else if the ball is the right of the paddle, move the paddle.
    • 0:25:48Else, don't move the paddle.
    • 0:25:51So there's a perfect translation from decision trees as a concept
    • 0:25:54and as a picture to the code that we've been writing over the past several weeks
    • 0:25:57in any number of languages.
    • 0:25:59But let's try a slightly more sophisticated game that most of us
    • 0:26:02probably grew up playing on like pieces of paper and napkins
    • 0:26:05and the like, that of Tic-Tac-Toe.
    • 0:26:07For those unfamiliar, it's a 3 by 3 grid.
    • 0:26:09You play X's or O's as two different people.
    • 0:26:12And the goal is to get three in a row, three X's, either horizontally,
    • 0:26:15vertically, or diagonally.
    • 0:26:16And whoever achieves that first wins, unless it's a tie.
    • 0:26:19Well, Tic-Tac-Toe lends itself to a pretty juicy discussion of decision
    • 0:26:23making because there's a lot of different ways
    • 0:26:26we could play even the simplest of these games.
    • 0:26:29So, for instance, if we are considering here this board,
    • 0:26:32where X and O have each gone once, let's consider what should happen next.
    • 0:26:37In particular, if we translate this into a decision tree, whoever's
    • 0:26:41turn it is should ask themselves, can I get three in a row on this turn?
    • 0:26:44Because if yes, well, then they should play in the square
    • 0:26:48to get three in a row, period.
    • 0:26:50But if they can't get three in a row on this turn,
    • 0:26:52they shouldn't just choose randomly.
    • 0:26:53They should probably answer another question for themselves like this.
    • 0:26:56Can my opponent get three in a row on their next turn?
    • 0:26:59Because if they can, then I want to block them
    • 0:27:03by playing in the square to block the opponent's three in a row.
    • 0:27:05But here's where things get interesting.
    • 0:27:08If the answer is no, where do I go?
    • 0:27:11Well, if there's only one space left, it's pretty obvious, go there.
    • 0:27:13If there's two left, which is better?
    • 0:27:15If there's three left, which of those is better?
    • 0:27:18And the further you rewind in the game, the less obvious
    • 0:27:20it gets where you, the human, should go.
    • 0:27:22That said, there is an algorithm for solving this problem optimally.
    • 0:27:26In fact, I'll disclose now, if you ever lose Tic-Tac-Toe,
    • 0:27:31you are objectively bad at Tic-Tac-Toe.
    • 0:27:34Because there is a strategy that won't guarantee that you'll win always.
    • 0:27:38But there is a strategy that will guarantee that you will never lose.
    • 0:27:41You can at least force a tie if you're not going to win.
    • 0:27:44So with that set up and sort of bubble burst,
    • 0:27:47perhaps, let's consider how we can go about answering the question mark
    • 0:27:50part of the tree, especially when the game starts super early like this.
    • 0:27:53Where does O go to play optimally?
    • 0:27:56Well, what we could do is this.
    • 0:27:58Recognize first that with this particular game,
    • 0:28:00we have these inputs and outputs that can be represented mathematically.
    • 0:28:03More on that in just a moment.
    • 0:28:04But the goal of Tic-Tac-Toe, then, can be
    • 0:28:06said to be to maximize maybe or minimize the score.
    • 0:28:09Maybe X wants the biggest score possible or O wants the smallest score possible,
    • 0:28:13for instance.
    • 0:28:14Specifically, let's do this.
    • 0:28:16Let's talk about an algorithm in computing called minimax.
    • 0:28:19And as the name implies, this is all about minimizing and maximizing
    • 0:28:22something, which is really what Tic-Tac-Toe, we can see, is all about.
    • 0:28:25For instance, here are two sample boards of Tic-Tac-Toe.
    • 0:28:28On the left is one in which O has one.
    • 0:28:31On the right is one in which X has one.
    • 0:28:32And then in the middle is one that's a tie.
    • 0:28:34It doesn't matter what numbers we use, but we need to agree.
    • 0:28:37So let me propose that if O wins, we're going
    • 0:28:39to call the board a negative 1 value.
    • 0:28:41If X wins, it's a positive 1.
    • 0:28:43And if no one wins, it's a 0.
    • 0:28:45As such, it stands to reason that X's goal in life
    • 0:28:48now is clearly to be to maximize its score.
    • 0:28:51And O's goal in life is to minimize its score.
    • 0:28:53We could flip the numbers around.
    • 0:28:54It doesn't matter.
    • 0:28:55But we've reduced a fairly fun childhood game now to boring mathematics,
    • 0:28:59if you will, but in a way that we can now reason about it
    • 0:29:02and quantify just how easy or hard this game is as each of X and Y--
    • 0:29:06as each as X and O aspire to maximize or minimize their score, respectively.
    • 0:29:11So here, then, in the board, in the middle,
    • 0:29:13here, is one little sanity check.
    • 0:29:16What is the value of this board on the screen per my definition a second ago?
    • 0:29:19AUDIENCE: 1.
    • 0:29:20AUDIENCE: 0.
    • 0:29:21DAVID MALAN: I heard 1.
    • 0:29:22I heard 0.
    • 0:29:23[INTERPOSING VOICES]
    • 0:29:24DAVID MALAN: I heard minus 1.
    • 0:29:25Great.
    • 0:29:25We're seeing the range of answers.
    • 0:29:27The answer here is going to be 1, because I do see that x has won.
    • 0:29:31And I proposed earlier that if X has won, the value of the board is a 1.
    • 0:29:36So again, if X wins, it's a 1.
    • 0:29:38If O wins, it's a negative 1.
    • 0:29:39Those are the correct answers for Tic-Tac-Toe here, too.
    • 0:29:42And a tie means 0.
    • 0:29:44So let's back up one step.
    • 0:29:45Here is a board that's got only two moves left.
    • 0:29:47Suppose now that it's O's turn.
    • 0:29:49Where should O go?
    • 0:29:50Now, you might, as an expert Tic-Tac-Toe player
    • 0:29:53have an immediate instinct for this.
    • 0:29:54But let's break it down into some decisions.
    • 0:29:56So if it's O's turn, O can ask itself, well, what is the value of this board?
    • 0:30:00Because I want it to be negative 1.
    • 0:30:02Or barring that, I want it to be 0.
    • 0:30:04Well, what's it going to be?
    • 0:30:05Well, if O goes in the top hand corner, what's the value of this board?
    • 0:30:09We're not sure yet, because no one has won.
    • 0:30:11Well, if then x goes invariably in the bottom location there-- darn it.
    • 0:30:16Now X has won.
    • 0:30:17So the value of this board way down here is, by definition, 1.
    • 0:30:20There's only one way, logically, to get from this board to this board.
    • 0:30:23So we might as well, by transitivity, say that the value of this board
    • 0:30:26is 1, even though it hasn't been finished yet, because we
    • 0:30:29know where we're going with it.
    • 0:30:30So that, then, invites the question, is this board any better?
    • 0:30:33So if O goes bottom middle, what's the value?
    • 0:30:36Well, the only place X can go is top left.
    • 0:30:38And now the value of that bottom right board is actually better.
    • 0:30:41It's a 0 because no one has won.
    • 0:30:43Logically, the value of this board might as well
    • 0:30:45be considered a 0 as well, because that's the only way
    • 0:30:48to get from one to the other.
    • 0:30:50So now the decision for O is, do I want value 1, or do I want value 0?
    • 0:30:54O's goal, as I defined it, is to minimize its score.
    • 0:30:570 is less than 1.
    • 0:30:59So O had better go to--
    • 0:31:01sorry-- O had better go to, then, the bottom middle location.
    • 0:31:05And the value, therefore, of this board is ultimately going to be 0.
    • 0:31:08So this is to say if you play like O just did, you won't always win,
    • 0:31:13but you will never lose because you can choose
    • 0:31:15between the right paths in the tree.
    • 0:31:17Now, the problem is, even if you're on board with that algorithm,
    • 0:31:20let's go one step back such that there's not two places left, but three.
    • 0:31:24Unfortunately, the size of this decision tree essentially doubles.
    • 0:31:28In fact, it's an exponential relationship
    • 0:31:30because now there's 1, 2, 3 spaces in the board.
    • 0:31:32If we remove one of those and we go to four moves left, the tree doubles again.
    • 0:31:37Remove another, the tree doubles again.
    • 0:31:38And so the initial board of decisions that you
    • 0:31:41might need to consider for Tic-Tac-Toe actually gets really, really darn big,
    • 0:31:45more so than you might imagine.
    • 0:31:46In fact, in the world of tic-tac-toe, if we implement it exactly like this,
    • 0:31:51if the player is X, for each possible move-- that is in a loop in CS50
    • 0:31:56speak-- consider every possible move, calculate the score for the board,
    • 0:32:00and then choose the move with the highest score, just like I did.
    • 0:32:02So this is the pseudocode for what we just walked through verbally.
    • 0:32:05If the player is O, though, for each possible move,
    • 0:32:08calculate the score for the board.
    • 0:32:09And then choose the move with the lowest score.
    • 0:32:11So in other words, if both X and O are thinking as many steps
    • 0:32:15ahead as they can, they can either win or force a tie,
    • 0:32:19and I claim, never lose, according to this algorithm.
    • 0:32:21But here's the rub.
    • 0:32:23How many different ways are there to play Tic-Tac-Toe?
    • 0:32:26You might have bored of it years ago as a kid.
    • 0:32:28But you surely did not play all possible versions of Tic-Tac-Toe
    • 0:32:32with your brother or sister growing up, for instance.
    • 0:32:34Why?
    • 0:32:34There are 255,000 ways to play a 3 by 3 grid of Tic-Tac-Toe back and forth,
    • 0:32:41back and forth, which means that's a really big decision tree,
    • 0:32:44certainly for a kid, to keep in their mind,
    • 0:32:46let alone waste the time sort of figuring all of that out.
    • 0:32:49To be fair, computers, no problem, considering 255,000 different ways
    • 0:32:55to play a game.
    • 0:32:56They have lots of memory.
    • 0:32:57They have very fast processors nowadays.
    • 0:32:59Drop in the bucket for a computer.
    • 0:33:01Big deal for a human.
    • 0:33:02But what about other games that are more sophisticated than Tic-Tac-Toe?
    • 0:33:06Some of you might play chess.
    • 0:33:07And we'll keep chess simple.
    • 0:33:08If you consider only the first four pairs of moves--
    • 0:33:12so one player goes, then the other, then again, then again,
    • 0:33:15then again, so four pairs of moves.
    • 0:33:18How many ways are there for those two humans
    • 0:33:19to play the first four moves of chess?
    • 0:33:22Over 85 billion ways because of the various permutations on a normal chess
    • 0:33:26board.
    • 0:33:27If you're familiar with the game of Go, 266 quintillion ways to play that game.
    • 0:33:32There's no way, with our current computers,
    • 0:33:34that they can possibly think that many steps ahead and then make
    • 0:33:37a decision tree and optimal decision.
    • 0:33:40So even the IBM Watson's of the world, with which you might be familiar,
    • 0:33:43playing Jeopardy years ago and the like, they
    • 0:33:45were really doing their best to approximate correct answers.
    • 0:33:48But they were not taking all day long or all of our lifetimes to crunch
    • 0:33:52all of those numbers, to get through those numbers.
    • 0:33:55So here, really then, is the motivation for actual AI.
    • 0:33:57Everything we've talked about thus far the world's called AI,
    • 0:34:00or they called the computer, the CPU player.
    • 0:34:03But it was really just code written with ifs, else ifs, and else
    • 0:34:06to dictate how the ball moves, how the paddle moves, who goes in Tic-Tac-Toe
    • 0:34:11and what order and the like.
    • 0:34:12It's all very deterministic.
    • 0:34:13But today, it's really about artificial intelligence
    • 0:34:16learning and figuring out on its own how to win a game when it can't possibly
    • 0:34:21have enough memory or enough time to figure out,
    • 0:34:24deterministically, the perfect answer.
    • 0:34:26So thus was born machine learning.
    • 0:34:28We're not writing code to tell computers exactly what to do,
    • 0:34:32but we're really writing code that tries to teach computers
    • 0:34:35how to figure out the solutions of two problems,
    • 0:34:38even if we, ourselves, don't know the correct answer.
    • 0:34:41And so machine learning is, indeed, about
    • 0:34:43trying to get computers to learn from available data.
    • 0:34:48And, in fact, what we feed to them is input is training data of some sort.
    • 0:34:52And there's different ways we can train computers.
    • 0:34:54One way that we thought we'd introduce today is called reinforcement learning.
    • 0:34:57And it's actually relatively simple.
    • 0:34:59In fact, could I get one volunteer who's comfortable
    • 0:35:02coming up on stage on the internet?
    • 0:35:03OK, come on down.
    • 0:35:06Come on over here.
    • 0:35:10Maybe round of applause, because this is always awkward.
    • 0:35:13[APPLAUSE]
    • 0:35:16All right, we have no microphone today, so just talk near me.
    • 0:35:19What's your name?
    • 0:35:19AUDIENCE: Max.
    • 0:35:20DAVID MALAN: And do you want to say a little something about yourself?
    • 0:35:22AUDIENCE: Hi, I'm Max.
    • 0:35:23I'm a senior in high school.
    • 0:35:25I'm here for family weekend.
    • 0:35:27DAVID MALAN: Nice.
    • 0:35:27Well, welcome.
    • 0:35:27Come on over here.
    • 0:35:28And we're going to teach Max how to flip a pancake.
    • 0:35:32So we've got an actual pan here, and we've got a fake pancake here.
    • 0:35:35And what I'd like, Max, you to do is to figure out
    • 0:35:39how to flip this pancake up so it goes up and around, but stays in the pan.
    • 0:35:42And I will either reward or punish you, so
    • 0:35:45to speak, by saying good or good or bad each time.
    • 0:35:50That was actually very good.
    • 0:35:51OK, do more of that.
    • 0:35:54That was bad.
    • 0:35:55Do less of that.
    • 0:35:58Getting worse.
    • 0:36:00[LAUGHTER]
    • 0:36:06Didn't really flip.
    • 0:36:07One more try.
    • 0:36:09All right, it's round of applause.
    • 0:36:11Thank you to Max.
    • 0:36:12[APPLAUSE]
    • 0:36:13Here, come on in here.
    • 0:36:14We have a parting gift for you here, too, if you would like.
    • 0:36:17Thank you.
    • 0:36:17So the point here is that even though that sort of peaked early there
    • 0:36:21and did really well with the first one, but I
    • 0:36:23was sort of rewarding and punishing Max by giving some kind of feedback,
    • 0:36:26like good or bad, or somehow reinforcing the positive behavior,
    • 0:36:30and not reinforcing the bad behavior, if you will.
    • 0:36:32So this is actually representative of what we mean by reinforcement learning.
    • 0:36:35And if you're a parent, you've done this in some form,
    • 0:36:37presumably, with your kids over time to get
    • 0:36:39them to do more good behavior and ideally less bad behavior.
    • 0:36:42But how might we do this, then, with code?
    • 0:36:44Well, here, for instance, is a visualization
    • 0:36:47of a researcher working in a lab with not Max this time, but an actual robot.
    • 0:36:52And we'll see over time that the more we reward and reinforce good behavior,
    • 0:36:57the better even a robot controlled by software can get over time
    • 0:37:01without being programmed to move up or down or left or right,
    • 0:37:05but just try movements and then do more of the good movements
    • 0:37:08and less of the bad movements.
    • 0:37:10So the human is going once just to show it some good movements.
    • 0:37:13But there's no code here in question.
    • 0:37:15There's no one way to flip a pancake correctly.
    • 0:37:17And so the first time does worse than Max.
    • 0:37:21The third time, still not so good.
    • 0:37:28Fifth time, not so good.
    • 0:37:30But it's trying different movements again and again.
    • 0:37:33That's 10 trials.
    • 0:37:37The human is now fixing things again, 11 trials.
    • 0:37:42No, getting a bit more air.
    • 0:37:4415 trials.
    • 0:37:47Still bad.
    • 0:37:48But if we start to fast forward to 20 trials--
    • 0:37:54[LAUGHTER]
    • 0:37:56Now, just another angle of the same.
    • 0:37:58So all of these movements can be broken down into x, y, and z movements.
    • 0:38:02So when we say do more of that, it can do more of the x, more of the y,
    • 0:38:05more of the z.
    • 0:38:06But it's really trying to infer.
    • 0:38:09And now it's sort of picking up what to do more of.
    • 0:38:12And it seems to be repeating the good behavior,
    • 0:38:15such that after 50 such trials, the robot
    • 0:38:18is able now to do this again and again and again.
    • 0:38:21So that, then, is reinforcement learning.
    • 0:38:24So, Max, you were well-taught growing up,
    • 0:38:25it would seem, for that particular exercise.
    • 0:38:27But let's consider now the implications of using reinforcement
    • 0:38:30learning in other contexts and see if this solves all problems for us.
    • 0:38:34Well, here's a very boring distillation of a game
    • 0:38:37that's like a maze, whereby this might be the player in yellow here.
    • 0:38:40The goal is to get to this green exit.
    • 0:38:42And then the rest of the floor may or may not
    • 0:38:44be lava, whereby there could be some red lava pits that the yellow dot does not
    • 0:38:47want to fall into.
    • 0:38:48So the best this player can do is really try randomly up, down, left, right.
    • 0:38:52And when it falls into a lava pit, do less of that.
    • 0:38:55But if it doesn't, do more of that.
    • 0:38:57So for instance, suppose that we know, with the bird's eye view
    • 0:39:01here, where the lava pits are.
    • 0:39:02Suppose that the yellow dot gets unlucky and trips into the one first.
    • 0:39:06So now we say, don't do that.
    • 0:39:08And so it can use a little bit of memory as represented by this darker red line.
    • 0:39:12Let me not go there again.
    • 0:39:13That was a bad decision to make.
    • 0:39:15So now I have a new life.
    • 0:39:16Just try again.
    • 0:39:17So the yellow dot now tries this.
    • 0:39:18Maybe it tries this.
    • 0:39:20Maybe it then falls into the same lava pit,
    • 0:39:22not realizing, because it does not have the same bird's eye view as us,
    • 0:39:25that it fell into a lava pit again.
    • 0:39:27So let's remember with a bit of memory or RAM, do less of that.
    • 0:39:31Again, again, again, lava pit.
    • 0:39:34That's OK.
    • 0:39:34Let's use a bit more memory to reinforce that bad behavior negatively.
    • 0:39:38Again, again, again.
    • 0:39:39OK, bad, but we're making progress.
    • 0:39:42Again, again, again.
    • 0:39:44And now I think the yellow dot, just by luck,
    • 0:39:46might find its way to the green output.
    • 0:39:49And so this is a success.
    • 0:39:50We've now won the game.
    • 0:39:52But we haven't necessarily maximized our score.
    • 0:39:54Why?
    • 0:39:55That was a correct solution, but--
    • 0:39:57AUDIENCE: It could get there in much less moves.
    • 0:39:58DAVID MALAN: Yeah.
    • 0:39:59It could get there in fewer moves by going more like a straight line.
    • 0:40:01Even though it still has to go up, down, left, right, it
    • 0:40:03didn't really need to take this circuitous route.
    • 0:40:06But the problem is if that you only reinforce the good behavior
    • 0:40:09and then you stick to your guns, you may never
    • 0:40:11maximize your score by just following the path with which you're
    • 0:40:14most familiar.
    • 0:40:15And so there's this principle actually in computing
    • 0:40:18whereby ideally, this thing would know that,
    • 0:40:21yes, this is a correct solution, as per the green recollections.
    • 0:40:25But what if we start exploring a little bit nonetheless,
    • 0:40:28whereby each time I play this game, even if I
    • 0:40:31know how I can win, let me just, with 5%, 10% probability,
    • 0:40:36try a different path.
    • 0:40:37This is something you can actually practice in the real world.
    • 0:40:39As soon as I learned about this principle in computing,
    • 0:40:42I realized that this explains my own behavior and restaurants, whereby
    • 0:40:45if I go to a restaurant for the first time, I choose something off the menu
    • 0:40:48and I really like it, I will never try anything else off
    • 0:40:51of that restaurant's menu because I liked it well enough.
    • 0:40:54But who knows if there's an even better dish on that menu.
    • 0:40:57Problem is, I tend to, in the real world, exploit knowledge I already have.
    • 0:41:01I really reinforce that first process of learning.
    • 0:41:03But I rarely explore.
    • 0:41:05But maybe we can find better solutions to problems
    • 0:41:07by just exploring a little bit.
    • 0:41:09Maybe we'll fail sometimes, but maybe we'll get lucky, too.
    • 0:41:12And so here in pseudocode is how we might distill this idea.
    • 0:41:15Let's choose some epsilon value, just a variable to 10%, whatever it is,
    • 0:41:19to sprinkle a little bit of randomness in here.
    • 0:41:21And then if a random number we choose is less
    • 0:41:23than that value, which will not happen often if it's so small,
    • 0:41:26make a random move.
    • 0:41:27Instead of going right and following the path already traveled, go up this time
    • 0:41:31and see what happens.
    • 0:41:32Else, make the move with the highest value, so to speak.
    • 0:41:36So sometimes you will fall into another lava pit.
    • 0:41:39But again, if you do this again and again and again over time,
    • 0:41:41probabilistically, you might, in fact, find a better path.
    • 0:41:44And if you let your mind wander for a moment
    • 0:41:46and consider why tools like ChatGPT are wrong sometimes,
    • 0:41:49maybe they're doing a little bit of exploration for just me.
    • 0:41:53And darn it, they gave me a wrong answer as a result.
    • 0:41:56You can think about it being a little bit like that in the real world.
    • 0:42:00And so now if we try again, sprinkling in a bit of randomness,
    • 0:42:03I might very well find a path that, ah, as you noted,
    • 0:42:06gets me to the green exit all the more quickly.
    • 0:42:09So we can still reinforce good behaviors and punish bad behaviors.
    • 0:42:12But by sprinkling in a little bit of randomness in there,
    • 0:42:16we can instead ensure that, maybe over time,
    • 0:42:19we will find an even better solution.
    • 0:42:21Now, we can see this in other contexts, as well.
    • 0:42:23If we revisit Breakout, let me go back now to a video version
    • 0:42:27thereof, whereby you might think that over time, the best way
    • 0:42:30to play Breakout is, again, move the paddle left and right, very
    • 0:42:33deterministically, like we proposed earlier with the screenshot.
    • 0:42:36And you will just gradually work your way through the blue.
    • 0:42:39Then you can work your way through the green.
    • 0:42:40Then you can work your way through the yellow, and so forth.
    • 0:42:43But what this video is of is a computer learning by reinforcement learning what
    • 0:42:49to do more of.
    • 0:42:50So somewhere there's a human probably giving it
    • 0:42:53feedback that was good, do more of that, or maybe don't do that.
    • 0:42:56Or it can be baked into the score based on the color of these bricks.
    • 0:42:59And I dare say if we give more points to the top bricks and fewer points
    • 0:43:02to the bottom, that's equivalent to rewarding the best strategy
    • 0:43:05and maybe punishing the worst strategy, because you really want
    • 0:43:08to get to those higher bricks first.
    • 0:43:10But here, to the surprise of the researcher, if you will,
    • 0:43:14the AI, a little creepily, finds out that the best strategy
    • 0:43:17is to let the game play itself.
    • 0:43:19And you can perhaps see where this is going.
    • 0:43:21Now it's sort of in hands-off mode.
    • 0:43:23It's getting all of the highest score on its own.
    • 0:43:25And it only would have done that if maybe it tried a few things randomly,
    • 0:43:28like, oh my God, I found a better strategy now than just mindlessly
    • 0:43:32going back and forth and forth.
    • 0:43:34And so this exists in so many different domains,
    • 0:43:36not just restaurants, but games in the world of these large language
    • 0:43:40models and more.
    • 0:43:41But what we've seen thus far is that these are examples
    • 0:43:43of really reinforcement learning.
    • 0:43:46There's got to be some point system associated with this, maybe
    • 0:43:49a human supervising the whole process.
    • 0:43:51And indeed, in the context of learning, any time
    • 0:43:54you have a human providing feedback, whether it's
    • 0:43:56the fellow in the video of the pancakes giving feedback to the robot or someone
    • 0:44:01kind of working behind the scenes at Google
    • 0:44:03initially trying to teach the servers what is spam and what is not,
    • 0:44:07the catch with supervised learning is that there's only one of that guy
    • 0:44:09and there's only a finite number of humans working at Google doing this.
    • 0:44:13And once the data exceeds what a human can do or wants to do,
    • 0:44:17we probably need to transition from supervised learning to unsupervised,
    • 0:44:22where we still write the code that ideally teaches the machines how
    • 0:44:25to learn, but we don't have to tell it constantly what is good, what is bad,
    • 0:44:29what is correct, what is incorrect.
    • 0:44:30Heck, let's let the software figure that out, too,
    • 0:44:33and take us out, for better or for worse, of the picture altogether.
    • 0:44:37So what we're really transitioning to as a society now, if you will,
    • 0:44:40is something called deep learning, which goes beyond the reinforcement
    • 0:44:43learning, the supervised, the unsupervised, the supervised learning
    • 0:44:46that we just saw.
    • 0:44:47But deep learning is often grounded in what we
    • 0:44:50might call these here neural networks.
    • 0:44:52And a neural network is really inspired by real world biology, whereby
    • 0:44:57governing our human neural system are all these little neurons that
    • 0:44:59have some kind of physical connections to each other.
    • 0:45:02And somehow there's electrical signals traveling through us such
    • 0:45:04that if I think a thought, that's how I know to stick out my hand
    • 0:45:07or shake someone's hand or the like.
    • 0:45:09There's some kind of control system going on here.
    • 0:45:12So here's a picture, a rough picture of what a neuron might look like.
    • 0:45:16Here's a pair of them being close enough to somehow communicate.
    • 0:45:20Being computer scientists, we're going to abstract this away and really just
    • 0:45:23think of each such neuron as a circle.
    • 0:45:25And if they have a means of communicating,
    • 0:45:28we're going to draw an edge between them,
    • 0:45:29turning these into really mathematical graphs.
    • 0:45:31So these are nodes in our CS50 speak.
    • 0:45:33And these are edges, in this case.
    • 0:45:35But it's still the same idea.
    • 0:45:36There's something communicating with something.
    • 0:45:38And heck, we can think of this as maybe the input and this,
    • 0:45:42now, as maybe the output.
    • 0:45:43We can really map this biological world to the computing world.
    • 0:45:46Suppose you have a world of two inputs, though.
    • 0:45:48That's fine.
    • 0:45:49Maybe based on this input and this input,
    • 0:45:51this here output can give us some answer to some question.
    • 0:45:54Well, what does this mean?
    • 0:45:55Well, let's make it more real.
    • 0:45:56Let's shrink that down and move it at left.
    • 0:45:58Let's come up with a Cartesian plane here with x and y-coordinates.
    • 0:46:02And let's assume that in this world there exist dots.
    • 0:46:04And those dots are either blue or they are red.
    • 0:46:07And it would be nice, in this world, to be able to predict,
    • 0:46:10based on the xy coordinates of some dot, if it is going to be blue or red.
    • 0:46:14So you can imagine this representing other types of questions
    • 0:46:17to which we might want answers.
    • 0:46:18So here's our y-axis.
    • 0:46:20Here's our x-axis.
    • 0:46:21If I only have a limited amount of training data, though, two dots--
    • 0:46:24one blue, one red--
    • 0:46:25the best I can really do is guess at what
    • 0:46:28defines red versus blue in this world.
    • 0:46:30So maybe I can think of this neuron as representing
    • 0:46:34x, this neuron as representing y.
    • 0:46:36And the output I want is an answer--
    • 0:46:38red or blue dot based on x comma y.
    • 0:46:41Well, how do I come up with this?
    • 0:46:42Well, best guess, maybe it's just a straight line.
    • 0:46:45So maybe everything to the left of this line is going to be blue in this world.
    • 0:46:48Everything to the right is going to be red.
    • 0:46:50So what I really am trying to figure out here and learn
    • 0:46:53here is, what is the best fit line that somehow separates the red from the blue?
    • 0:46:58Well, what I'm really trying to do then over time is adjust this.
    • 0:47:01The more training data I get, the more actual dots in the real world,
    • 0:47:05I might need to modify my best guess here.
    • 0:47:08And so blue is now over here.
    • 0:47:10So I think I want to maybe tilt this line
    • 0:47:12and come up with a different slope for it.
    • 0:47:14And if you give me more and more dots, I bet I can do better and better
    • 0:47:17and refine the line.
    • 0:47:18Might not be perfect, but this is pretty darn good.
    • 0:47:21I got most of the blue at left, and most of the red at right.
    • 0:47:24And frankly, if I want to really get it correct,
    • 0:47:26you're going to have to let me do more than just a straight line.
    • 0:47:28I'm going to need to somehow use some quadratics or something
    • 0:47:31to have curly lines in there to capture this perfectly.
    • 0:47:34But if you see where this might be going, if I've got x's and y's, I'm
    • 0:47:37trying to find the best fit line.
    • 0:47:39And what I'm trying to do is think of this as representing a neural network.
    • 0:47:43What I'm really trying to do is something like this.
    • 0:47:46Can I come up with, mathematically, a value for A and a value for B
    • 0:47:50that gives me an answer C that is either red or blue?
    • 0:47:54Yes or no, if you will?
    • 0:47:55Well, how might I do that?
    • 0:47:57Well, again, not to dwell too much on grade school math
    • 0:47:59here or high school math, but ax plus by plus c is a line of some sort.
    • 0:48:04And maybe we can just arbitrarily say that if you give me an x and a y value
    • 0:48:08and you've given me enough training data to figure out what A should be
    • 0:48:11and B should be and C should be, if the answer to that mathematical expression
    • 0:48:14is, say, greater than 0, I'm going to say the dot should be blue.
    • 0:48:17And if it's less than or equal to 0, I'm going to say it should be red instead.
    • 0:48:22It doesn't matter.
    • 0:48:22It's just like Tic-Tac-Toe.
    • 0:48:23We just have to agree how to represent these games or these questions
    • 0:48:26mathematically to which we can then get a Boolean answer, yes or no,
    • 0:48:31red or blue.
    • 0:48:32And so what these neural networks are really trying to do
    • 0:48:34is based on lots and lots of data, just plug-in a whole bunch
    • 0:48:38of parameters, coefficients, if you will, of those x's and y's
    • 0:48:42so that when you pass in inputs like these,
    • 0:48:44you get back a correct answer over here.
    • 0:48:47And what's funny about neural networks, at least
    • 0:48:49right now among computing circles, is that even
    • 0:48:52the best engineers at Google, Microsoft, and OpenAI
    • 0:48:55who are using neural networks to, given your input or your question,
    • 0:49:00produce an answer thereto, a la ChatGPT, even though there's millions,
    • 0:49:05if not billions of numbers involved underneath the hood,
    • 0:49:08no computer scientist, even the head engineer, can point at this
    • 0:49:11and say, well, this node represents such and such.
    • 0:49:14And this edge is this value because of this reason.
    • 0:49:17It's all sort of a black box in the sense of abstraction.
    • 0:49:20And so because we're just throwing lots and lots of data at these things,
    • 0:49:24the computer is figuring out what all of these interconnections
    • 0:49:26should be mathematically.
    • 0:49:28And what we're really just trying to do probabilistically
    • 0:49:30is, with high confidence, spit out the right answer.
    • 0:49:34But even we humans don't really know how these work
    • 0:49:37step by step underneath the hood.
    • 0:49:39And therein lies this idea of machine learning.
    • 0:49:43So with that said, how might we apply this to some real world domains?
    • 0:49:46Well, maybe you're in meteorology.
    • 0:49:48And so given a humidity level and pressure,
    • 0:49:49the goal is to output a rainfall prediction.
    • 0:49:52Maybe you can do that with a neural network
    • 0:49:54by feeding in your computer with a whole bunch of sample humidity, sample
    • 0:49:58pressure values, and historical rainfall amounts and just
    • 0:50:01let it figure out how to represent that kind of pattern.
    • 0:50:05Alternatively, in the world of advertising,
    • 0:50:08if you know how much you spent in a month and what month that
    • 0:50:10is, I bet if you give me enough historical data,
    • 0:50:13I can crunch those numbers somehow and predict
    • 0:50:16what your sales are going to be based on that data-- not 100% correctly,
    • 0:50:19but probably confidently correctly most of the time.
    • 0:50:23Well, what we have, then, in ChatGPT and what we have,
    • 0:50:25then, in CS50's duck is what's called a Large Language Model,
    • 0:50:29whereby the inputs to these neural networks
    • 0:50:31have been like all of the content of the internet,
    • 0:50:34so think Google search results and Reddit and Stack Overflow dictionaries
    • 0:50:38and encyclopedias and any such works that it's just been consuming as input.
    • 0:50:42And what these large language models are trying
    • 0:50:44to do is figure out, based on patterns and frequencies of text
    • 0:50:49in all of those inputs, well, if someone asks me, how are you,
    • 0:50:53question mark, probablistically, based on all of this data,
    • 0:50:56I bet 99% of the time I, ChatGPT or CS50's duck, is supposed to reply,
    • 0:51:01good, thanks.
    • 0:51:02How are you?
    • 0:51:03So not always the correct answer, but probabilistically.
    • 0:51:06And that's why ChatGPT is sometimes wrong.
    • 0:51:08Because there might be misinformation on the internet.
    • 0:51:10Maybe there's a bit of exploration sprinkled in randomly.
    • 0:51:13But it's not always going to give you the right answer.
    • 0:51:16But probabilistically, it's going to.
    • 0:51:17And this stuff is truly hot off the presses.
    • 0:51:20In 2017, folks at Google proposed what is generally now called
    • 0:51:25attention, which is a feature that underlies AI whereby you can figure out
    • 0:51:29dynamically what the relationship is between words in English paragraph
    • 0:51:33or an English text or really any human language.
    • 0:51:36And giving weight to words in that way has actually
    • 0:51:38fed a lot of these new large language models.
    • 0:51:40In 2020, OpenAI published its GPT model.
    • 0:51:44And most recently, in 2022, did ChatGPT itself come out.
    • 0:51:47And what underlies what we're talking about here is technically,
    • 0:51:50this big mouthful-- generative pre-trained transformers,
    • 0:51:53whereby the purpose of these AIs is to generate stuff.
    • 0:51:56They've been pre-trained on lots of publicly available data.
    • 0:51:59And the goal is to transform the user's input into ideally correct output.
    • 0:52:04And if you see where I'm going with this,
    • 0:52:05that's the GPT in ChatGPT, which itself was never
    • 0:52:09meant to be like a branded product.
    • 0:52:11It's a little weird that GPT has entered the human vernacular.
    • 0:52:14But what it does is evokes exactly these ideas.
    • 0:52:19So here's a sample paragraph, for instance.
    • 0:52:21Massachusetts is a state in the New England region of the Northeastern
    • 0:52:24United States.
    • 0:52:25It borders on the Atlantic Ocean to the East.
    • 0:52:27The state's capital is, dot, dot, dot, essentially
    • 0:52:29inviting us to answer now this question.
    • 0:52:31Well, historically, prior to 2017-ish, it was actually pretty hard for machines
    • 0:52:36to learn that this mention of Massachusetts is actually related
    • 0:52:40to this mention of state.
    • 0:52:42Why?
    • 0:52:42Because they're pretty far apart.
    • 0:52:44This is in a whole new sentence.
    • 0:52:45And unless it knows already what Massachusetts is--
    • 0:52:48and technically, it's a Commonwealth-- it might not
    • 0:52:50give much attention to these two words, too much weight to the relationship
    • 0:52:53thereof.
    • 0:52:54But if you train these GPTs on enough data
    • 0:52:56and you start to break down the input into sequences of words, for instance,
    • 0:53:01well, you might have an array, or a list of words here, in CS50 speak.
    • 0:53:05You might be able to figure out, based on your training data,
    • 0:53:08that if you number of these words from 1 to 27 or whatnot, in this case,
    • 0:53:12you could represent them mathematically somehow.
    • 0:53:14As an aside, the way that these large language models are representing words
    • 0:53:19like Massachusetts literally is with numbers like this.
    • 0:53:22This is 1,536 floating point values in a vector, a.k.a. list or array,
    • 0:53:27that literally represents the word Massachusetts,
    • 0:53:30according to one of these algorithms.
    • 0:53:32Let's take a step back and abstract it away as little rectangles instead
    • 0:53:35and use these little edges to imply that if there's a bolder edge here,
    • 0:53:38that implies that there's really a relationship in the training
    • 0:53:41data between Massachusetts and state.
    • 0:53:43One of those words is giving more attention to the other as opposed to is,
    • 0:53:46which is maybe a thin line because there's not much
    • 0:53:48going on there between Massachusetts and is as opposed to those two nouns,
    • 0:53:52in that case.
    • 0:53:53All of this input, all of these vectors, are
    • 0:53:55fed into large neural networks that have lots and lots of inputs, far more than 1
    • 0:54:00and 2 and 3, the output of which, ideally, then,
    • 0:54:02is the answer to this question, or a whole answer to your question.
    • 0:54:07And so when you ask the duck a question, you
    • 0:54:09ask ChatGPT the question, essentially the software
    • 0:54:12is navigating this neural network, trying to find the best path through it
    • 0:54:16to give you the most correct answer.
    • 0:54:19And ideally, it's going to get it correct.
    • 0:54:21But it might not necessarily do it every such time.
    • 0:54:25And so, in fact, there are these things-- and we'll end where we began--
    • 0:54:28known as hallucinations where sometimes ChatGPT, and admittedly, even
    • 0:54:32CS50's own duck, might just make something up and tell you
    • 0:54:35such very confidently.
    • 0:54:37Those are, indeed, known as hallucinations.
    • 0:54:39And what I thought we'd end on is a note that
    • 0:54:41actually was published quite some time ago, perhaps in your childhood, as well,
    • 0:54:45from Shel Silverstein here, the Homework Machine, from which we have this here
    • 0:54:49poem.
    • 0:54:49"The Homework Machine, oh the Homework Machine, most perfect contraption
    • 0:54:53that's ever been seen.
    • 0:54:54Just put in your homework then drop in a dime.
    • 0:54:57Snap on the switch, and in ten seconds' time,
    • 0:54:59your homework comes out clean as can be-- quick and clean as can be.
    • 0:55:02Here it is-- nine plus four? question mark, and the answer is three.
    • 0:55:07Three, oh, me.
    • 0:55:09I guess it's not as perfect as I thought it would be."
    • 0:55:12So foretold decades ago what we're now here talking about.
    • 0:55:16But this, then, was CS50.
    • 0:55:18This was, then, AI.
    • 0:55:20This is the URL which you parents and siblings and others are welcome to take
    • 0:55:23the course, so to speak, in any way.
    • 0:55:25Feel free, though, to come on up with any hellos or questions.
    • 0:55:27That, then, is our class.
    • 0:55:28And we will see you, hopefully, next time.
    • 0:55:31[APPLAUSE]
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video