CS50 Video Player
    • 🧁

    • 🍫

    • 🥥

    • 🍿
    • 0:00:00Introduction
    • 0:01:17Algorithms
    • 0:05:34Searching
    • 0:08:17Big O Notation
    • 0:12:51Common Running Times
    • 0:13:18Asymptotic Notation
    • 0:16:50Searching Lockers
    • 0:20:46Linear Search
    • 0:29:45Binary Search
    • 0:37:11Sorting and Searching vs. Just Searching
    • 0:41:23Implementing Linear Search
    • 0:45:34String Comparison
    • 0:54:28Storing Data in Arrays
    • 0:59:36Structs
    • 1:12:26Sorting
    • 1:13:39Visualizing Sorts
    • 1:26:22Selection Sort
    • 1:34:28Bubble Sort
    • 1:43:03Comparing Sorts
    • 1:45:23Recursion
    • 2:00:54Merge Sort
    • 2:15:16Sort Race
    • 0:00:00[MUSIC PLAYING]
    • 0:01:17DAVID J. MALAN: This is CS50, and this is already week three.
    • 0:01:22And even as we've gotten much more into the minutia of programming
    • 0:01:25and some of the C stuff that we've been doing
    • 0:01:27is all the more cryptic looking, recall that at the end of the day,
    • 0:01:30like, everything we've been doing ultimately fits into to this model.
    • 0:01:34So keep that in mind, particularly as things
    • 0:01:36seem like they're getting more complicated and more sophisticated.
    • 0:01:39It's just a process of learning a new language that ultimately
    • 0:01:41lets us express this process.
    • 0:01:44And of course, last week we really went into the weeds of like how
    • 0:01:47inputs and outputs are represented.
    • 0:01:49And this thing here, a photograph thereof, is called what?
    • 0:01:53This is what?
    • 0:01:54AUDIENCE: RAM.
    • 0:01:55DAVID J. MALAN: RAM, I heard--
    • 0:01:56Random Access Memory or just generally known as memory.
    • 0:01:59And recall that we looked at one of these little black chips
    • 0:02:02that contains all of the bytes--
    • 0:02:04all of the bits, ultimately.
    • 0:02:05It's just kind of a grid, sort of an artist grid, that
    • 0:02:08allows us to think about every one of these memory locations
    • 0:02:11as just having a number or an address, so to speak.
    • 0:02:14Like, this might be byte number 0 and then 1 and then 2
    • 0:02:16and then, maybe way down here again, something like 2 billion
    • 0:02:19if you have 2 gigabytes of memory.
    • 0:02:22And so as we did that, we started to explore how we could use this canvas
    • 0:02:26to create kind of our own information, our own inputs and outputs, not
    • 0:02:29just the basics like ints and floats and so forth.
    • 0:02:32But we also talked about strings.
    • 0:02:34And what is a string as you now know it?
    • 0:02:37How would you describe in layperson's terms a string?
    • 0:02:40Yeah, over there.
    • 0:02:41AUDIENCE: I was gonna say--
    • 0:02:41[AUDIO OUT]
    • 0:02:42DAVID J. MALAN: An array of characters.
    • 0:02:43And an array, meanwhile-- let's go there.
    • 0:02:45How might someone else define an array in more familiar now terms?
    • 0:02:50What would be an array?
    • 0:02:53Yeah.
    • 0:02:53AUDIENCE: Kind of like an indexed set of things.
    • 0:02:57DAVID J. MALAN: An indexed set of things-- not bad.
    • 0:02:59And I think a key characteristic to keep in mind with an array is that it
    • 0:03:02does actually pertain to memory.
    • 0:03:03And it's contiguous memory.
    • 0:03:05Byte after byte after byte is what constitutes an array.
    • 0:03:09And we'll see in a couple of weeks time that there's actually
    • 0:03:11more interesting ways to use this same primitive Canvas to stitch together
    • 0:03:15things that are sort of two directional even that have some kind of shape
    • 0:03:19to them.
    • 0:03:19But for now, all we've talked about is arrays and just using these things
    • 0:03:22from left to right, top to bottom, contiguously to represent information.
    • 0:03:27So today, we'll consider still an array.
    • 0:03:29But we won't focus so much on representation
    • 0:03:33of strings or other data types.
    • 0:03:34We'll actually now focus on the other part of that process,
    • 0:03:36of inputs becoming outputs, namely the thing in the middle--
    • 0:03:39algorithms.
    • 0:03:40But we have to keep in mind, even though every time we've looked at an array
    • 0:03:45thus far, certainly on the board like this, you as a human certainly
    • 0:03:48have the luxury of just kind of eyeballing
    • 0:03:50the whole thing with a bird's eye view and seeing where all of those numbers
    • 0:03:53are.
    • 0:03:54If I asked you where a particular number is,
    • 0:03:56like zero, odds are your eyes would go right to where it is,
    • 0:03:59and boom, problem solved in sort of one step.
    • 0:04:02But the catch is, with a computer that has this memory, even though you,
    • 0:04:07the human, can [INAUDIBLE] see everything at once, a computer cannot.
    • 0:04:10It's better to think of your computer's memory, your phone's memory,
    • 0:04:14or more specifically an array of memory like this
    • 0:04:17as really being a set of closed doors, not unlike lockers in a school.
    • 0:04:21And only by opening each of those doors can
    • 0:04:24the computer actually see what's in there,
    • 0:04:26which is to say that the computer, unlike you, doesn't
    • 0:04:28have this bird's eye view of all of the data in all these locations.
    • 0:04:32It has to much more methodically look here,
    • 0:04:35maybe look here, maybe look here, and so forth in order to find something.
    • 0:04:39Now fortunately, we already have some building blocks-- loops, conditions,
    • 0:04:43Boolean expressions, and the like--
    • 0:04:45where you could imagine writing some code
    • 0:04:47that very methodically goes from left to right or right to left or something
    • 0:04:51more sophisticated that actually finds something you're looking for.
    • 0:04:55And just remember that the conventions we've had since last week
    • 0:04:58now is that these arrays are zero indexed, so to speak.
    • 0:05:03To be zero indexed just means that the data type starts counting from zero.
    • 0:05:08So this is location 0, 1, 2, 3, 4, 5, 6.
    • 0:05:11And notice even though there are seven total doors here,
    • 0:05:15the right-most one, of course, is called 6
    • 0:05:17just because we've started counting at 0.
    • 0:05:19So in the general case, if you had n doors or n bytes of memory,
    • 0:05:240 would always be at the left, and n minus 1 would always be at the right.
    • 0:05:29That's sort of a generalization of just thinking about this kind of convention.
    • 0:05:33All right, so let's revisit the problem that we started the whole term off
    • 0:05:37with in week zero, which was this notion of searching.
    • 0:05:40And what does it mean to search for something?
    • 0:05:42Well, to find information-- and this, of course, is omnipresent.
    • 0:05:45Anytime you take out your phone, you're searching for a friend's contact.
    • 0:05:48Any time you pull up a browser, you're googling for this or that.
    • 0:05:51So search is kind of one of the most omnipresent topics and features
    • 0:05:55of any device these days.
    • 0:05:56So let's consider how the Googles, the Apples, the Microsofts of the world
    • 0:05:59are implementing something as seemingly familiar as this.
    • 0:06:03So here might be the problem statement.
    • 0:06:06We want some input to become some output.
    • 0:06:08What's that input going to be?
    • 0:06:09Maybe it's a bunch of closed doors like this out of which we want
    • 0:06:13to get back an answer, true or false.
    • 0:06:16Is something we're looking for there or not?
    • 0:06:18You can imagine taking this one step further and trying to find
    • 0:06:21where is the thing you're looking for.
    • 0:06:23But for now, let's just take one bite out of the problem.
    • 0:06:25Can we tell ourselves, true or false, is some number behind one
    • 0:06:30of these doors or lockers in memory?
    • 0:06:33But before we go there and start talking about ways to do that-- that is,
    • 0:06:37algorithms.
    • 0:06:38Let's consider how we might lay the foundation of, like,
    • 0:06:42comparing whether one algorithm is better than another.
    • 0:06:46We talked about correctness, and it sort of
    • 0:06:47goes without saying that any code you write, any algorithm you implement,
    • 0:06:51had better be correct.
    • 0:06:52Otherwise, what's the point if it doesn't give you the right answers?
    • 0:06:55But we also talked about design.
    • 0:06:57And in your own words, what do we mean when we say a program is better
    • 0:07:01designed at this stage than another?
    • 0:07:04How do you think about this notion of design now?
    • 0:07:07Yeah, in the middle?
    • 0:07:09AUDIENCE: Easier to understand or easier to institute.
    • 0:07:11DAVID J. MALAN: OK, so easier to understand.
    • 0:07:12I like that.
    • 0:07:13Other thoughts?
    • 0:07:14Yeah.
    • 0:07:14AUDIENCE: Efficiency.
    • 0:07:15DAVID J. MALAN: Efficiency, and what do you mean by efficiency precisely?
    • 0:07:18AUDIENCE: [INAUDIBLE]
    • 0:07:22DAVID J. MALAN: Nice.
    • 0:07:22It doesn't use up too much memory, and it isn't redundant.
    • 0:07:25So you can think about design along a few
    • 0:07:27of these axes-- sort of the quality of the code
    • 0:07:29but also the quality of the performance.
    • 0:07:31And as our programs get bigger and more sophisticated and just longer,
    • 0:07:35those kinds of things are really going to matter.
    • 0:07:37And in the real world, if you start writing code
    • 0:07:39not just by yourself but with someone else, getting the design
    • 0:07:42right is just going to make it easier to collaborate and ultimately produce,
    • 0:07:46write code, with just higher probability.
    • 0:07:48So let's consider how we might focus on exactly the second characteristic,
    • 0:07:52the efficiency, of an algorithm.
    • 0:07:54And the way we might talk about the efficiency of algorithms, just how fast
    • 0:07:58or how slow they are, is in terms of their running time.
    • 0:08:01That is to say, when they're running, how much time do they take?
    • 0:08:04And we might measure this in seconds or milliseconds or minutes
    • 0:08:08or just some number of steps in the general case
    • 0:08:10because presumably fewer steps, to your point, is better than more steps.
    • 0:08:14So how might we think about running times?
    • 0:08:16Well, there's one general notation we should define today.
    • 0:08:19So computer scientists tend to describe the running time of an algorithm
    • 0:08:23or a piece of code, for that matter, in terms of what's called big O notation.
    • 0:08:27This is literally a capitalized O, a big O.
    • 0:08:30And this generally means that the running time of some algorithm
    • 0:08:34is on the order of such and such, where such and such, we'll see,
    • 0:08:37is just going to be a very simple mathematical formula.
    • 0:08:40It's kind of a way of waving your hands mathematically
    • 0:08:42to convey the idea of just how fast or how slow some algorithm or code is
    • 0:08:46without getting into the weeds of like, it
    • 0:08:48took this many milliseconds or this many specific number of steps.
    • 0:08:52So you might recall then from week zero, I even introduced this picture
    • 0:08:56but without much context.
    • 0:08:57At the time, we just use this to compare those phone book algorithms.
    • 0:09:00Recall that this red straight line was the first algorithm,
    • 0:09:03one page at a time.
    • 0:09:04The yellow line that's still straight differed how if you recall?
    • 0:09:09That line represented what alternative algorithm?
    • 0:09:14Looking out and back.
    • 0:09:15What is that second algorithm?
    • 0:09:16Yeah, over there.
    • 0:09:17AUDIENCE: Like, two pages at a time.
    • 0:09:18DAVID J. MALAN: Two pages at a time, which was almost correct so long as we
    • 0:09:22potentially double back a page if maybe we go a little too far in the phone
    • 0:09:25book.
    • 0:09:25So it had a potential bug but arguably solvable.
    • 0:09:28This last algorithm, though, was the so-called divide and conquer
    • 0:09:31strategy where I sort of unnecessarily tore the phone book in half
    • 0:09:34and then in half and then in half, which,
    • 0:09:36as dramatic as that was unnecessarily, it actually took significantly bigger
    • 0:09:40bites out of the problem-- like 500 pages
    • 0:09:43the first time, another 250, another 125 versus just 1 or 2 bytes at a time.
    • 0:09:49And so we described its running time as this picture
    • 0:09:52there, though I didn't use that expression at the time, running times.
    • 0:09:55But indeed, time to solve might be measured just
    • 0:09:58abstractly in some unit of measure--
    • 0:10:00seconds, milliseconds, minutes, pages--
    • 0:10:03via this y-axis here.
    • 0:10:05So let's now slap some numbers on this.
    • 0:10:07If we had n pages in that phone book, n just representing
    • 0:10:11a generic number, the first algorithm here
    • 0:10:13we might describe as taking n steps.
    • 0:10:15Second algorithm we might describe as taking n divided by 2 steps,
    • 0:10:18maybe give or take one if we have to double back but generally
    • 0:10:21n divided by 2.
    • 0:10:22And then this thing, if you remember your logarithms,
    • 0:10:24was sort of a fundamentally different formula--
    • 0:10:26log base 2 of n or just log of n for short.
    • 0:10:30So this is of a fundamentally different formula.
    • 0:10:32But what's noteworthy is that these first two algorithms,
    • 0:10:36even though, yes, the second algorithm was hands down faster--
    • 0:10:40I mean, literally twice as fast--
    • 0:10:41when you start to zoom out and if I increase my y-axis and x-axis,
    • 0:10:46these first two start to look awfully similar to one another.
    • 0:10:52And if we keep zooming out and zooming out and zooming
    • 0:10:54out as n gets really large--
    • 0:10:57that is, the x-axis gets really long--
    • 0:10:59these first two algorithms start to become essentially the same.
    • 0:11:03And so this is where computer scientists use big O notation.
    • 0:11:06Instead of saying specifically, this algorithm takes any steps.
    • 0:11:10And this one n divided by 2, a computer scientist would say,
    • 0:11:13eh, each of those algorithms takes on the order of n steps
    • 0:11:17or on the order of n over 2.
    • 0:11:19But you know what?
    • 0:11:19On the order of n over 2 is pretty much the same
    • 0:11:23when n gets really large as being equivalent to big O of n itself.
    • 0:11:29So yes, in practice, it's obviously fewer steps to move twice as fast.
    • 0:11:34But in the big picture, when n becomes a million, a billion,
    • 0:11:37the numbers are already so darn big at that point
    • 0:11:40that these are as, the shapes of these curves imply,
    • 0:11:43pretty much functionally equivalent.
    • 0:11:45But this one still looks better and better
    • 0:11:49as n gets large because it's rising so much less quickly.
    • 0:11:52And so here, a computer scientist would say
    • 0:11:54that that third algorithm was on the order of-- that is, big O of-- log n.
    • 0:11:59And they don't have to bother with the base
    • 0:12:01because it's a smaller mathematical detail that is also just in some sense
    • 0:12:04a constant, multiplicative factor.
    • 0:12:07So in short, what are the takeaways here?
    • 0:12:09This is just a new vocabulary that we'll start
    • 0:12:12to use when we just want to describe the running time of an algorithm.
    • 0:12:16To make this more real, if any of you have implemented
    • 0:12:18a for loop at this point in any of your code and that for loop iterated n times
    • 0:12:24where maybe in was the height of your pyramid or maybe n
    • 0:12:27was something else that you wanted to do n times, you wrote code
    • 0:12:31or you implemented an algorithm that operated in big O of n time,
    • 0:12:36if you will.
    • 0:12:37So this is just a way now to retroactively start
    • 0:12:39describing with somewhat mathematical notation what we've
    • 0:12:43been doing in practice for a while now.
    • 0:12:45So here's a list of commonly seen running times in the real world.
    • 0:12:51This is not a thorough list because you could come up
    • 0:12:55with an infinite number of mathematical formulas, certainly.
    • 0:12:57But the common ones we'll discuss and you will see in your own code
    • 0:13:01probably reduce to this list here.
    • 0:13:04And if you were to study more computer science theory,
    • 0:13:06this list would get longer and longer.
    • 0:13:07But for now, these are sort of the most familiar ones that we'll soon see.
    • 0:13:11All right, two other pieces of vocabulary, if you will,
    • 0:13:14before we start to use this stuff--
    • 0:13:16so this, a big omega, capital omega symbol,
    • 0:13:19is used now to describe a lower bound on the running time of an algorithm.
    • 0:13:25So to be clear, big O is on the order of-- that
    • 0:13:28is, an upper bound-- on how many steps an algorithm might
    • 0:13:31take, on the order of so many steps.
    • 0:13:34If you want to talk, though, from the other perspective, well,
    • 0:13:37how few steps my algorithm take?
    • 0:13:39Maybe in the so-called best case, it'd be nice
    • 0:13:42if we had a notation to just describe what a lower
    • 0:13:45bound is because some algorithms might be super fast
    • 0:13:47in these so-called best cases.
    • 0:13:49So the symbology is almost the same, but we replace the big O
    • 0:13:53with the big omega.
    • 0:13:54So to be clear, big O describes an upper bound and omega
    • 0:13:58describes a lower bound.
    • 0:14:00And we'll see examples of this before long.
    • 0:14:02And then lastly, last one here, big theta, is used by a computer scientist
    • 0:14:08when you have a case where both the upper bound on an algorithm's
    • 0:14:12running time is the same as the lower bound.
    • 0:14:15You can then describe it in one breath as being in theta of such and such
    • 0:14:19instead of saying it's in big O and in omega of something else.
    • 0:14:23All right, so out of context, sort of just seemingly cryptic symbols,
    • 0:14:27but all they refer to is upper bounds, lower bounds, or when
    • 0:14:31they happen to be one in the same.
    • 0:14:32And we'll now introduce over time examples of how we might actually
    • 0:14:36apply these to concrete problems.
    • 0:14:38But first, let me pause to see if there's any questions.
    • 0:14:43Any questions here?
    • 0:14:46Any questions?
    • 0:14:48I see pointing somewhere.
    • 0:14:50Where are you pointing to?
    • 0:14:53Over here-- there we go.
    • 0:14:54OK, sorry-- very bright.
    • 0:14:55AUDIENCE: So, um, smaller--
    • 0:14:59DAVID J. MALAN: Smaller n functions move faster.
    • 0:15:01So yes, if you have something like n, that takes only steps.
    • 0:15:06If you have a formula like n squared, just by nature of the math,
    • 0:15:09that take more steps and therefore be slower.
    • 0:15:11So the larger the mathematical expression,
    • 0:15:13the slower your algorithm is because the more time or more steps that it takes.
    • 0:15:20AUDIENCE: So you want your n function to be small?
    • 0:15:23DAVID J. MALAN: You want your n function, so to speak, to be small, yes.
    • 0:15:26And in fact, the Holy Grail, so to speak,
    • 0:15:28would be this last one here either in big O notation or even theta,
    • 0:15:31when an algorithm is on the order of a single step.
    • 0:15:34That means it literally takes constant time, one step, or maybe 10 steps,
    • 0:15:39100 steps, but a fixed, constant number of steps.
    • 0:15:42That's the best because even as the phone book gets bigger,
    • 0:15:46even as the data set you're searching gets larger and larger,
    • 0:15:50if something only takes a finite number of steps constantly,
    • 0:15:53then it doesn't matter how big the data set actually gets.
    • 0:15:57Questions as well on these notations-- yep, thank you for the pointing.
    • 0:16:01This is actually very helpful.
    • 0:16:03I'm seeing pointing this way?
    • 0:16:05AUDIENCE: [INAUDIBLE]
    • 0:16:08DAVID J. MALAN: What is the input to each of these functions?
    • 0:16:10It is an expression of how many steps an algorithm takes.
    • 0:16:13So in fact, let me go ahead and make this
    • 0:16:16more concrete with an actual example here if we could.
    • 0:16:19So on stage here, we have seven lockers which represent,
    • 0:16:22if you will, an array of memory.
    • 0:16:24And this array of memory is maybe storing
    • 0:16:26seven integers, seven integers that we might actually want to search for.
    • 0:16:30And if we want to search for these values, how might
    • 0:16:33we go about doing this?
    • 0:16:34Well, for this, why don't we make things interesting?
    • 0:16:36Would a volunteer like to come on up?
    • 0:16:37Have to be masked and on the internet if you are comfortable.
    • 0:16:40Both of-- oh, there's someone putting their friend's hand up and back?
    • 0:16:44Yes, OK.
    • 0:16:44Come on down.
    • 0:16:50And in just a moment, our brave volunteer
    • 0:16:53is going to help me find a specific number in the data set
    • 0:16:57that we have here on the screen.
    • 0:16:58So come on down, and I'll get things ready for you in advance here.
    • 0:17:02Come on down nice to meet.
    • 0:17:08And what is your name?
    • 0:17:09AUDIENCE: [? Nomira. ?]
    • 0:17:10DAVID J. MALAN: Minera?
    • 0:17:10AUDIENCE: [? Nomira. ?]
    • 0:17:11DAVID J. MALAN: [? Nomira. ?] Nice to meet.
    • 0:17:13Come on over.
    • 0:17:13So here we have for Nomira seven lockers or an array of memory.
    • 0:17:17And behind each of these doors is a number.
    • 0:17:19And the goal, quite simply, is, given this array of memory
    • 0:17:22as input, to return, true or false, is the number I care about actually there?
    • 0:17:27So suppose I care about the number 0.
    • 0:17:30What would be the simplest, most correct algorithm you could
    • 0:17:33apply in order to find us the number 0?
    • 0:17:38OK, try opening the first one.
    • 0:17:41All right, and maybe just step aside so the audience can see.
    • 0:17:44I think you have not found 0 yet.
    • 0:17:46OK, so keep the door open.
    • 0:17:47Let's move on to your next choice.
    • 0:17:50Second door, sure.
    • 0:17:51AUDIENCE: [INAUDIBLE]
    • 0:17:51DAVID J. MALAN: Oh, go ahead, second door.
    • 0:17:53Let's keep it simple.
    • 0:17:54Let's just move from left to right, sort of searching our way.
    • 0:17:57And what do you see there?
    • 0:17:58Oh, 6, not 0.
    • 0:17:59How about the next door?
    • 0:18:04All right, also not working out so well yet, but that's OK.
    • 0:18:07If you want to go on to the next, we're still looking for 0.
    • 0:18:11All right, I see a 2.
    • 0:18:12All right, it's not so good yet.
    • 0:18:14Let's keep going.
    • 0:18:15Next door.
    • 0:18:162, 7-- no.
    • 0:18:18OK, next door.
    • 0:18:20No, that's a-- all right, very well done.
    • 0:18:25Oh.
    • 0:18:29All right, so I kind of set you up for a fairly slow algorithm,
    • 0:18:32but let me just ask you to describe what is it
    • 0:18:34you did by following the steps I gave you.
    • 0:18:37AUDIENCE: I just went one by one to each character.
    • 0:18:39DAVID J. MALAN: You went one by one to each character
    • 0:18:41if you want to talk into here.
    • 0:18:43So you went one by one by each character.
    • 0:18:45And would you say that algorithm left or right is correct?
    • 0:18:48AUDIENCE: No.
    • 0:18:49DAVID J. MALAN: No?
    • 0:18:50AUDIENCE: Or, yes, in the scenario.
    • 0:18:53DAVID J. MALAN: OK, yes in this scenario.
    • 0:18:54Why are you hesitating?
    • 0:18:55What's going through your mind?
    • 0:18:56AUDIENCE: Because it's not the most efficient way to do it.
    • 0:18:58DAVID J. MALAN: OK, good.
    • 0:18:58So we see a contrast here between correctness and design.
    • 0:19:01I mean, I do think it was correct because even though it was slow,
    • 0:19:04you eventually found zero.
    • 0:19:05But it took some number of steps.
    • 0:19:08So in fact, this would be an algorithm.
    • 0:19:10It has a name, called linear search.
    • 0:19:12And, [? Nomira, ?] as you did, you kind of walked along
    • 0:19:14a line going from left to right.
    • 0:19:16Now let me ask.
    • 0:19:17If you had gone from right to left, would the algorithm
    • 0:19:19have been fundamentally better?
    • 0:19:23AUDIENCE: Yes.
    • 0:19:23DAVID J. MALAN: OK, and why?
    • 0:19:24AUDIENCE: Because the zero is here in the first scenario.
    • 0:19:27But if it was like, the zero is in the middle, it wouldn't have been.
    • 0:19:30DAVID J. MALAN: Yeah, and so here is where the right way to do things
    • 0:19:35becomes a little less obvious.
    • 0:19:36You would absolutely have given yourself a better result
    • 0:19:38if you would just happened to start from the right
    • 0:19:40or if I had pointed you to start over there.
    • 0:19:42But the catch is if I asked her to find another number, like the number 8,
    • 0:19:45well, that would have backfired.
    • 0:19:47And this time, it would have taken longer
    • 0:19:48to find that number because it's way over here instead.
    • 0:19:51And so in the general case, going left to right or, heck, right to left
    • 0:19:55is probably as correct as you can get because if you know nothing
    • 0:19:59about the order of these numbers-- and indeed, they seem to be fairly random.
    • 0:20:03Some of them are smaller, some of them are bigger.
    • 0:20:05There doesn't seem to be rhyme or reason.
    • 0:20:07Linear search is about as good as you can do when you don't know anything
    • 0:20:11a priori about the numbers.
    • 0:20:13So I have a little thank you gift here, a little CS stress ball.
    • 0:20:15Round of applause for our first volunteer.
    • 0:20:18Thank you so much.
    • 0:20:23Let's try to formalize what I just described as linear search
    • 0:20:27because indeed, no matter which end [? Nomira ?] had started on,
    • 0:20:30I could have kind of changed up the problem
    • 0:20:32to make sure that it appears to be running slow.
    • 0:20:35But it is correct.
    • 0:20:36If zero were among those doors, she absolutely would have found it
    • 0:20:39and indeed did.
    • 0:20:40So let's now try to translate what we did into what we might call again
    • 0:20:45pseudo code as from week zero.
    • 0:20:48So with pseudo code, we just need a terse English
    • 0:20:50like, or any language, syntax to describe what we did.
    • 0:20:53So here might be one formulation of what [? Nomira ?] did.
    • 0:20:56For each door, from left to right, if the number is behind the door,
    • 0:21:00return true.
    • 0:21:02Else, at the very end of the program, you would return false by default.
    • 0:21:07And now you got lucky.
    • 0:21:08And by the seventh door, [? Nomira ?] had indeed
    • 0:21:10returned true by saying, well, there is the zero.
    • 0:21:14But let's consider if this pseudo code is now correct,
    • 0:21:17an accurate translation.
    • 0:21:18First of all, normally, when we've seen ifs, we might see an if else.
    • 0:21:22And yet down here, return false is aligned with the for.
    • 0:21:26Why did I not indent the return false, or put another way,
    • 0:21:29why did I not do if number is behind door, return true, else return false?
    • 0:21:36Why would that version of this code have been problematic?
    • 0:21:39Way in back.
    • 0:21:41AUDIENCE: [INAUDIBLE]
    • 0:21:50DAVID J. MALAN: OK, I'm not sure it's because of redundancy.
    • 0:21:52Let me go ahead and just make this explicit.
    • 0:21:55If I had instead done else return false, I
    • 0:21:58don't think it's so much redundancy that I'd be worried about.
    • 0:22:02Let me bounce somewhere else.
    • 0:22:04Yeah, in front?
    • 0:22:04AUDIENCE: Um, maybe [INAUDIBLE] for the entire list
    • 0:22:08after just checking one number.
    • 0:22:09DAVID J. MALAN: Yeah, it would be returning falls for--
    • 0:22:11even though I'd only looked at-- [? Nomira ?]
    • 0:22:13had only looked at one element.
    • 0:22:15And it would have been as though if all of these doors were still closed,
    • 0:22:18she opens this up and says, nope, this is not zero, return false.
    • 0:22:21That would give me an incorrect result because obviously,
    • 0:22:24at that stage in the algorithm, she wouldn't have even looked
    • 0:22:27through any of the other doors.
    • 0:22:28So just the original indentation of this, if you will,
    • 0:22:32without the [? else, ?] is correct because only
    • 0:22:35if I get to the bottom of this algorithm or the pseudo code does
    • 0:22:38it make sense to conclude at that point, once she's
    • 0:22:41gone through all of the doors, that nope, there's in fact--
    • 0:22:45the number I'm looking for is, in fact, not actually there.
    • 0:22:48So how might we consider now the running time of this algorithm?
    • 0:22:52We have a few different types of vocabulary now.
    • 0:22:55And if we consider now how we might think about this,
    • 0:22:59let's start to translate it from sort of higher level pseudo code
    • 0:23:02to something a little lower level.
    • 0:23:04We've been writing code using n and loops and the like.
    • 0:23:07So let's take this higher level pseudo code and now just kind of
    • 0:23:12get a middle ground between English and C.
    • 0:23:14Let me propose that we think about this version of the same algorithm
    • 0:23:18as being a little more pedantic.
    • 0:23:20For i from 0 to n minus 1, if number behind doors bracket i return true.
    • 0:23:28Otherwise, at the end of the program, return false.
    • 0:23:31Now I'm kind of mixing English and C here,
    • 0:23:33but that's reasonable if the reader is familiar with C
    • 0:23:35or some similar language.
    • 0:23:37And notice this pattern here.
    • 0:23:39This is a way of just saying in pseudo code, give myself a variable called i.
    • 0:23:44Start at 0 and then just count up to n minus 1.
    • 0:23:49And recall n minus 1 is not one shy of the end of the array.
    • 0:23:53N minus 1 is the end of the array because again, we
    • 0:23:56started counting at 0.
    • 0:23:57So this is a very common way of expressing this kind of loop
    • 0:24:00from the left all the way to the right of an array.
    • 0:24:03Doors I'm kind of implicitly treating as the name of this array,
    • 0:24:07like it's a variable from last week that I defined as being
    • 0:24:10an array of integers in this case.
    • 0:24:11So doors bracket i means that when i is 0, it's this location.
    • 0:24:16When i is 1, it's this.
    • 0:24:18When i is 7 or, more generally n minus--
    • 0:24:21sorry, 6 or, more generally, n minus 1, that's this location here.
    • 0:24:26So same idea but a translation of it.
    • 0:24:28So now let's consider what the running time of this algorithm is.
    • 0:24:32If we have this menu of possible answers to this question,
    • 0:24:36how efficient or inefficient is this algorithm,
    • 0:24:38let's take a look in the context of this pseudo code.
    • 0:24:41We don't even have to bother going all the way to C.
    • 0:24:44How do we go about analyzing each of these steps?
    • 0:24:47Well, let's consider this.
    • 0:24:49This outermost loop here for i from 0 to n minus 1, that line of code
    • 0:24:55is going to execute how many times?
    • 0:24:57How many times will that loop execute?
    • 0:25:01Let me give folks this moment to think on it.
    • 0:25:04How many times is that going to loop here?
    • 0:25:07Yeah, over there.
    • 0:25:08AUDIENCE: [INAUDIBLE]
    • 0:25:10DAVID J. MALAN: n times, right?
    • 0:25:11Because it's from 0 to n minus 1.
    • 0:25:13And if it's a little weird to think in from 0 to n minus 1,
    • 0:25:16this is essentially the same mathematically as from 1 to n.
    • 0:25:19And that's perhaps a little more obviously more
    • 0:25:21intuitively n total steps.
    • 0:25:23So I might just make a note to myself this loop is going to operate n times.
    • 0:25:28What about these inner steps?
    • 0:25:29Well, how many steps or seconds does it take to ask a question?
    • 0:25:33If the number behind--
    • 0:25:35if the number you're looking for is behind doors bracket i,
    • 0:25:38well, as [? Nomira ?] did, that's kind of like one step.
    • 0:25:41So you open the door and boom.
    • 0:25:42All right, maybe it's two steps, but it's a constant number of steps.
    • 0:25:46So this is some constant number of steps.
    • 0:25:48Let's just call it one for simplicity.
    • 0:25:50How many steps or seconds does it take to return true?
    • 0:25:53I don't know exactly in the computer's memory
    • 0:25:55but that feels like a single step.
    • 0:25:57Just return true.
    • 0:25:58So if this takes one step, this takes one step
    • 0:26:01but only if the condition is true, it looks
    • 0:26:04like you're doing a constant number of things n times.
    • 0:26:09Or maybe you're doing one additional step.
    • 0:26:12So in short, the only thing that really matters here
    • 0:26:15in terms of the efficiency or inefficiency of the algorithm
    • 0:26:18is what are you doing again and again and again because that's obviously
    • 0:26:21the thing that's going to add up.
    • 0:26:22Doing one thing or two things a constant number of times?
    • 0:26:26Not a big deal.
    • 0:26:27But looping, that's going to add up over time because the more doors there are,
    • 0:26:32the bigger n is going to be and the more steps that's going to take,
    • 0:26:35which is all to say if you were to describe roughly
    • 0:26:38how many steps does this algorithm take in big O notation,
    • 0:26:43what might your instincts say?
    • 0:26:46How many steps is this algorithm on the order of given n doors or n integers?
    • 0:26:51Yeah?
    • 0:26:51AUDIENCE: [INAUDIBLE]
    • 0:26:52DAVID J. MALAN: Say again?
    • 0:26:53AUDIENCE: O n.
    • 0:26:54DAVID J. MALAN: Big O of n.
    • 0:26:56And indeed, that's going to be the case here.
    • 0:26:58Why?
    • 0:26:58Because you're essentially, at the end of the day,
    • 0:27:00doing n things as an upper bound on running time.
    • 0:27:03And that's, in fact, what exactly what happened
    • 0:27:05with [? Nomira. ?] She had to look at all n lockers
    • 0:27:08before finally getting to the right answer.
    • 0:27:11But what if she got lucky and the number we
    • 0:27:14were looking for was not at the end of the array
    • 0:27:17but was at the beginning of the array?
    • 0:27:20How might we think about that?
    • 0:27:21Well, have a nomenclature for this too, of course-- omega notation.
    • 0:27:25Remember, omega notation is a lower bound.
    • 0:27:27So given this menu of possible running times for lower bounds on an algorithm,
    • 0:27:34what might the omega notation be for [? Nomira's ?] linear search?
    • 0:27:38AUDIENCE: Omega 1.
    • 0:27:40DAVID J. MALAN: Omega of 1, and why that?
    • 0:27:42AUDIENCE: [INAUDIBLE]
    • 0:27:43DAVID J. MALAN: Right, because if just by chance she gets lucky
    • 0:27:46and the number she's looking for is right there where
    • 0:27:49she begins the algorithm, that's it.
    • 0:27:51It's one step.
    • 0:27:52Maybe it's two steps if you have to unlock the door and open it,
    • 0:27:55but it's a constant number of steps.
    • 0:27:56And the way we describe constant number of steps
    • 0:27:58is just with a single number like 1.
    • 0:28:01So the omega notation for linear search might be omega of 1
    • 0:28:04because in the best case, she might just get the number right from the get go.
    • 0:28:08But in the worst case, we need to talk about the upper bound, which
    • 0:28:11might indeed be big O of n.
    • 0:28:13So again there's this way now of talking symbolically
    • 0:28:16about best cases and worst cases or lower bounds and upper bounds.
    • 0:28:22Theta notation, just as a little trivia now,
    • 0:28:24is it applicable based on the definition I gave earlier?
    • 0:28:27AUDIENCE: [INAUDIBLE]
    • 0:28:28DAVID J. MALAN: OK, no, because you only take out the theta notation
    • 0:28:31when those two bounds, upper and lower, happen
    • 0:28:34to be the same for shorthand notation, if you will.
    • 0:28:36So it suffices here to talk about just big O and omega notation.
    • 0:28:41Well, what if we are a little smarter about this?
    • 0:28:43Let me go ahead and sort of semi-secretly here
    • 0:28:47rearrange these numbers.
    • 0:28:48But first, how about one other volunteer?
    • 0:28:50One other volunteer-- you have to be comfortable with your mask
    • 0:28:53and your being on the internet.
    • 0:28:55How about over here?
    • 0:28:58Yes, you want to come on down?
    • 0:28:59All right, come on down.
    • 0:29:00And don't look at what I'm doing because I'm going to--
    • 0:29:08take your time and don't look up this way
    • 0:29:10because I need a moment to rearrange all of the numbers.
    • 0:29:14And actually, if you could stay right there before coming up,
    • 0:29:17just an awkward few seconds while I finish hiding the numbers
    • 0:29:20behind these doors for you.
    • 0:29:22AUDIENCE: [INAUDIBLE]
    • 0:29:23DAVID J. MALAN: I will be right with you.
    • 0:29:26Actually, if-- do you want to warm up the crowd for a moment
    • 0:29:31and I'll be right back?
    • 0:29:32So you want to introduce yourself?
    • 0:29:33AUDIENCE: Yeah, hi, guys.
    • 0:29:34I'm Rave.
    • 0:29:37Yeah!
    • 0:29:43DAVID J. MALAN: All right, I think I am ready.
    • 0:29:46Thank you for stalling there.
    • 0:29:47AUDIENCE: Of course.
    • 0:29:48DAVID J. MALAN: And I didn't catch your name.
    • 0:29:49What was your name?
    • 0:29:50AUDIENCE: I'm Rave.
    • 0:29:51DAVID J. MALAN: I'm sorry?
    • 0:29:51AUDIENCE: Rave, like a party.
    • 0:29:52DAVID J. MALAN: Rave, OK.
    • 0:29:53Nice to meet.
    • 0:29:53Come on over.
    • 0:29:54So Rave has kindly volunteered now.
    • 0:29:56And I'm going to give you an additional advantage this time.
    • 0:29:58AUDIENCE: OK.
    • 0:29:59DAVID J. MALAN: Unbeknownst to you, I now took numbers behind the doors,
    • 0:30:03but I sorted them for you.
    • 0:30:04So they're not in the same random order like they
    • 0:30:06were for [? Nomira. ?] You now have the advantage
    • 0:30:08to know that the numbers are sorted from small to big.
    • 0:30:10AUDIENCE: OK.
    • 0:30:11DAVID J. MALAN: Given that, and given perhaps what we talked about in week zero
    • 0:30:15with the phone book, where might you propose we begin the story this time?
    • 0:30:19With which locker?
    • 0:30:20AUDIENCE: To find zero?
    • 0:30:21DAVID J. MALAN: Let's find number six this time.
    • 0:30:23Let's make things interesting.
    • 0:30:24AUDIENCE: OK.
    • 0:30:26I'll start in the middle.
    • 0:30:27DAVID J. MALAN: OK, so the middle.
    • 0:30:28There's seven total.
    • 0:30:28So--
    • 0:30:29AUDIENCE: OK.
    • 0:30:29DAVID J. MALAN: --that would be right here.
    • 0:30:30Go ahead.
    • 0:30:30Open that up.
    • 0:30:32And you find, sadly, the number five.
    • 0:30:34So what do you know now?
    • 0:30:36AUDIENCE: I know to go up.
    • 0:30:37DAVID J. MALAN: OK.
    • 0:30:37AUDIENCE: OK.
    • 0:30:38DAVID J. MALAN: All right, and just to keep it uniform,
    • 0:30:40just like I did, I opened to the right half of the phone book.
    • 0:30:43AUDIENCE: Yes.
    • 0:30:43DAVID J. MALAN: Let's keep it similar.
    • 0:30:44Yeah.
    • 0:30:45AUDIENCE: All right.
    • 0:30:46DAVID J. MALAN: All right, and, uh, a little too far
    • 0:30:48even though I know you wanted to go one over.
    • 0:30:50AUDIENCE: All good, all good.
    • 0:30:51DAVID J. MALAN: And now we're going to go which direction?
    • 0:30:52AUDIENCE: Over here in the middle.
    • 0:30:54DAVID J. MALAN: Right, and voila, the number six.
    • 0:30:56All right, so very nicely done.
    • 0:31:00A little stressful for you as well.
    • 0:31:02Thank you again.
    • 0:31:03So here we see by nature of the locker door
    • 0:31:05still being open sort of an artifact of the greater efficiency,
    • 0:31:10it would seem, of this algorithm because now that Rave
    • 0:31:13was given the assumption that these numbers are sorted from small
    • 0:31:16on the left to large on the right, she was able to apply that same divide
    • 0:31:20and conquer algorithm from week zero which we're now going to give a name--
    • 0:31:23binary search.
    • 0:31:25And simply by starting in the middle and realizing,
    • 0:31:28OK, too small, then by going to the right half and realizing, oh,
    • 0:31:32went a little too far, then by going to the left half, which,
    • 0:31:35Rave able to find in just three steps instead of seven
    • 0:31:39the number six in this case that we were actually searching for.
    • 0:31:43So you can see that this would seem to be more efficient.
    • 0:31:47Let's consider for just a moment is it correct.
    • 0:31:50If I had used different numbers but still sorted them from left to right,
    • 0:31:56would it still have worked this algorithm?
    • 0:31:59You're nodding your head.
    • 0:32:00Can I call on you?
    • 0:32:01Like, why would it still have worked, do you think?
    • 0:32:03AUDIENCE: [INAUDIBLE]
    • 0:32:06DAVID J. MALAN: Yeah, so so long as the numbers
    • 0:32:08are always in the same order from left to right
    • 0:32:10or, heck, they could even be in reverse order, so long as it's consistent,
    • 0:32:13the decisions that Rave was making-- if greater than, else, if less than--
    • 0:32:18would guide us to the solution no matter what.
    • 0:32:20And it would seem to take fewer steps.
    • 0:32:23So if we consider now the pseudo code for this algorithm,
    • 0:32:26let's take a look how we might describe binary search.
    • 0:32:28So binary search we might describe with something like this.
    • 0:32:31If the number is behind the middle door, which is where Rave began,
    • 0:32:34then we can just return true.
    • 0:32:36Else if the number is less than the middle door,
    • 0:32:40so if six is less than whatever is behind the middle door,
    • 0:32:42then Rave would have searched the left half.
    • 0:32:45Else if the number is greater than the middle door,
    • 0:32:47Rave would have searched the right half.
    • 0:32:49Else, if there are no doors-- and we'll see in a moment why I put
    • 0:32:53this up top just to keep things clean.
    • 0:32:55If there's no doors, what should Rave have presumably returned immediately
    • 0:32:59if I gave her no lockers to work with?
    • 0:33:02Just returned false.
    • 0:33:03But this is an important case to consider
    • 0:33:06because if in the process of searching by locker by locker,
    • 0:33:09we might have whittled down the problem from seven doors to three doors
    • 0:33:14to one door to zero doors-- and at that point,
    • 0:33:17we might have had no doors left to search.
    • 0:33:19So we have to naturally have a scenario for just considering
    • 0:33:22if there were no doors.
    • 0:33:23So it's not to say that maybe I don't give Rave any doors to begin with.
    • 0:33:26But as she divides and divides and divides,
    • 0:33:28if she runs out of lockers to ask those questions of-- or a few weeks ago,
    • 0:33:32if I ran out of phone book pages to tear in half,
    • 0:33:35I too might have had to return false as in this case.
    • 0:33:39So how can we now describe this a little more like C
    • 0:33:42just to give ourselves a variable to start thinking and talking about?
    • 0:33:45Well, I might talk about doors as being an array.
    • 0:33:48And so if I want to express the middle door, I could just, in pseudo code,
    • 0:33:52say doors bracket middle.
    • 0:33:54I'm assuming that someone has done the math
    • 0:33:56to figure out what the middle door is, but that's easy enough to do.
    • 0:33:59And then doors, if the number we're looking for
    • 0:34:01is less than doors bracket middle, then search door
    • 0:34:05zero through doors middle minus 1.
    • 0:34:09So again, this is a more pedantic way of taking what's a pretty intuitive idea--
    • 0:34:13search the left half, search the right half--
    • 0:34:16but start to now describe it in terms of actual indices or indexes
    • 0:34:22like we did with our array notation.
    • 0:34:24The last scenario, of course, is if the number
    • 0:34:26is greater than the door's bracket middle,
    • 0:34:28then Rave would have wanted to search the middle door plus 1--
    • 0:34:32so 1 over-- through doors n minus 1--
    • 0:34:37through n minus 1.
    • 0:34:38So again, just a way of sort of describing a little more syntactically
    • 0:34:41what it is that's going on.
    • 0:34:42So how might we translate this now into big O notation?
    • 0:34:46Well, in the worst case, how many steps total might Rave's binary search
    • 0:34:53algorithm have taken?
    • 0:34:55Given seven doors or given more generically n doors,
    • 0:34:59how many times could she go left or go right before finding herself with one
    • 0:35:03or no doors left?
    • 0:35:06What's the way to think about that?
    • 0:35:08Yeah, in the middle?
    • 0:35:09AUDIENCE: Log n.
    • 0:35:11DAVID J. MALAN: Log n.
    • 0:35:11So there's log n again.
    • 0:35:13And even if you're not feeling wholly comfortable with your logarithm
    • 0:35:16still, pretty much in programming and in computer science more generally,
    • 0:35:19any time we talk about some algorithm that's dividing and conquering
    • 0:35:22in half, in half, in half, or any other multiple,
    • 0:35:26it's probably involving logarithms in some sense.
    • 0:35:28And log base n essentially refers to the number
    • 0:35:31of times you can divide n by 2 until you bottom out at just a single door
    • 0:35:37or equivalently zero doors left.
    • 0:35:39So log n.
    • 0:35:40So we might say that indeed, binary search is in big O of log n
    • 0:35:44because the door that Rave opened last, this one,
    • 0:35:48happened to be three doors away.
    • 0:35:50And actually, if you do the math here, that roughly
    • 0:35:52works out to be exactly that case.
    • 0:35:54If we add one, that's sort of out of seven doors or roughly eight,
    • 0:35:58we were able to search it in just three total steps.
    • 0:36:01What about omega notation, though?
    • 0:36:03Like, in the best case, Rave might have gotten lucky.
    • 0:36:07She opened the door, and there it is.
    • 0:36:09So how might we describe a lower bound on the running time of binary search.
    • 0:36:14Yeah.
    • 0:36:15AUDIENCE: 1.
    • 0:36:16DAVID J. MALAN: Say again?
    • 0:36:17AUDIENCE: 1.
    • 0:36:17DAVID J. MALAN: Omega of 1.
    • 0:36:19So here too, we see that in some cases binary search and linear search, eh,
    • 0:36:23like, they're pretty equivalent.
    • 0:36:25And so this is why sometimes compelling to consider both the best
    • 0:36:30case in the worst case because honestly, in general,
    • 0:36:33who really cares if you just get lucky once in a while
    • 0:36:35and your algorithm is super fast?
    • 0:36:37What you probably care about is what's the worst case.
    • 0:36:40How long are my users--
    • 0:36:41how long am I going to be sitting there watching some spinning hourglass
    • 0:36:45or beach ball trying to give myself an answer to a pretty big problem?
    • 0:36:50Well, odds are, you're going to generally care about big O notation.
    • 0:36:53So indeed, moving forward, will generally
    • 0:36:55talk about the running time of algorithms often in terms of big O,
    • 0:36:58a little less so in terms of omega.
    • 0:37:00But understanding the range can be important
    • 0:37:03depending on the nature of the data that you're going to actually be given here.
    • 0:37:08All right let me pause and see if there is any questions.
    • 0:37:14Any questions here?
    • 0:37:16Yes, thank you.
    • 0:37:18AUDIENCE: So this method is clearly more efficient,
    • 0:37:21but it requires that the information is all compiled in a certain order.
    • 0:37:26How do you ensure that you can compile information
    • 0:37:29in a particular order at scale?
    • 0:37:31DAVID J. MALAN: Yeah, it's a really good question.
    • 0:37:33And if I can generalize it, how do you guarantee that you can do this
    • 0:37:36at scale, which algorithm is better?
    • 0:37:38I've sort of led us down this road of implying
    • 0:37:41that Rave's second algorithm, binary search,
    • 0:37:43is better because it's so much faster.
    • 0:37:45It's log of n in the worst case instead of big O of n.
    • 0:37:49But Rave was given an advantage when she came up here in that the doors were
    • 0:37:53already sorted.
    • 0:37:54And so that sort of invites the question,
    • 0:37:55well, given a whole bunch of random data,
    • 0:37:57either a small data set or, heck, something Google sized with millions,
    • 0:38:00billions of pieces of data, should you sort it first
    • 0:38:04from smallest to largest and then search?
    • 0:38:07Or should you just dive right in and search it linearly?
    • 0:38:11Like, how might you think about that?
    • 0:38:13If you are Google, for instance, and you've
    • 0:38:15got millions, billions of web pages, should they just go with linear search
    • 0:38:19because it's always going to work even though it might be slow?
    • 0:38:21Or should they invest the time in sorting all of that data--
    • 0:38:24we'll see how in a bit--
    • 0:38:26and then search it more efficiently?
    • 0:38:28Like, how do you decide between those options?
    • 0:38:31AUDIENCE: If you're sorting the data, then wouldn't you
    • 0:38:33have to go through all of the data?
    • 0:38:36DAVID J. MALAN: Yeah, if you had to sort the data first--
    • 0:38:38and we don't yet formally know how to do this.
    • 0:38:40But obviously, as humans, we could probably figure it out.
    • 0:38:43You do have to look at all of the data anyway.
    • 0:38:45And so you're sort of wasting your time if you're sorting it only
    • 0:38:48then to go in search it.
    • 0:38:50But maybe it depends a bit more.
    • 0:38:52Like, that's absolutely right, and if you're just
    • 0:38:54searching for one thing in life, then that's probably a waste of time
    • 0:38:58to sort it and then search it because you're just adding to the process.
    • 0:39:01But what's another scenario in which you might not
    • 0:39:03worry about that whereby it might make sense to sort it and then search?
    • 0:39:09Yeah.
    • 0:39:09AUDIENCE: [INAUDIBLE] you can go and use the other values as a way
    • 0:39:16to find out what's happening.
    • 0:39:17DAVID J. MALAN: Yeah, exactly.
    • 0:39:18So if your problem is a Google-like problem where
    • 0:39:21you have more than just one user who's searching for more than just one
    • 0:39:24website page, probably you should incur the cost up front
    • 0:39:26and sort the whole thing because every subsequent request thereafter
    • 0:39:30is going to be faster, faster, faster because it's
    • 0:39:32going to [INAUDIBLE] algorithm of binary search, binary search, binary search
    • 0:39:36that's going to add up to be way fewer steps
    • 0:39:39than doing linear search multiple times.
    • 0:39:41So again, kind of depends on the use case
    • 0:39:43and kind of depends on how important it is.
    • 0:39:45And this happens even in real world contexts.
    • 0:39:48I think back always to graduate school, when I was writing some code
    • 0:39:50to analyze some large data set.
    • 0:39:52And honestly, it was actually easier at the time for me
    • 0:39:55to write pretty inefficient but hopefully correct
    • 0:39:58code because you know what?
    • 0:39:59I could just go to sleep for eight hours and let it analyze this really big data
    • 0:40:02set.
    • 0:40:03I didn't have to bother writing more complex code to sort it just
    • 0:40:06to run it more efficiently.
    • 0:40:07Why?
    • 0:40:07Because I was the only user, and I only needed to run these queries once.
    • 0:40:11And so this was kind of a reasonable approach,
    • 0:40:13reasonable until I woke up eight hours later and my code was incorrect.
    • 0:40:17And now I had to spend another eight hours rerunning it after fixing it.
    • 0:40:20But even there, you see an example where,
    • 0:40:22what is your most precious resource?
    • 0:40:24Is it time to run the code?
    • 0:40:26Is it time to write the code?
    • 0:40:28Is it the amount of memory the computer is using?
    • 0:40:31These are all resources we'll start to talk about because it really
    • 0:40:33depends on what your goals are.
    • 0:40:36Any questions, then, on upper bounds, lower bounds,
    • 0:40:39or each of these two searches, linear or binary?
    • 0:40:42Yeah.
    • 0:40:42AUDIENCE: So just, when you're calculating running time,
    • 0:40:45does the sorting step count for that time?
    • 0:40:50DAVID J. MALAN: When analyzing running time, does the sorting step count?
    • 0:40:53If you want it to if you actually do it.
    • 0:40:55At the moment, it did not apply.
    • 0:40:56I just gave Rave the luxury of knowing that the data was sorted.
    • 0:41:01But if I really wanted to charge her for the amount of time
    • 0:41:04it took to find that number six, I should have added the time
    • 0:41:07to sort plus the time to search.
    • 0:41:09And in fact, that's a road we'll go down.
    • 0:41:11Why don't we go ahead and pace ourselves as before?
    • 0:41:13Let's take a 10 minute break here.
    • 0:41:14And when we come back, we'll write some actual code.
    • 0:41:17So we've seen a couple of searches-- linear search and binary search, which,
    • 0:41:21to be fair, we saw back in week zero.
    • 0:41:22But let's actually translate at least one of those now to some code
    • 0:41:25using this building block from last week where we can actually
    • 0:41:28define an array if we want, like an array of integers called numbers.
    • 0:41:32So let me switch over to BS Code here.
    • 0:41:34Let me go ahead and start a program called numbers.c.
    • 0:41:37And in numbers.c, let me go ahead here.
    • 0:41:40And how about let's include our familiar header files?
    • 0:41:44So css50.h.
    • 0:41:46I'll include standardio.h that we can get input and print input if we want.
    • 0:41:51And now I'm going to go ahead and give myself int main void.
    • 0:41:54No command line arguments today.
    • 0:41:56So I'll leave that as void.
    • 0:41:57And I'm going to go ahead and give myself
    • 0:41:58an array of how about seven numbers?
    • 0:42:01So I'll call it int number 7.
    • 0:42:04And then I can fill this array with numbers.
    • 0:42:06Like, numbers brackets 0 can be the number 4, and numbers bracket 1
    • 0:42:10could be the number 6, and numbers bracket 2 can be the number 8.
    • 0:42:14And this is the same list that we saw with [? Nomira ?] a bit
    • 0:42:16ago where it was 4, then 6, then 8.
    • 0:42:19But you know what?
    • 0:42:19There's actually another syntax I can show you here.
    • 0:42:22If you know in advance in a C program that you
    • 0:42:25want an array of certain values and you know therefore how many of those values
    • 0:42:30you want, you can actually do this little trick using curly braces.
    • 0:42:33You can say, don't worry about how big this is.
    • 0:42:36It's going to be implicit by way of these curly braces.
    • 0:42:39Here, I can do 4, 6, 8, 2, 7, 5, 0, close curly brace.
    • 0:42:44So it's a somewhat new use of curly braces.
    • 0:42:46But this has the effect of giving me an array called numbers inside
    • 0:42:50of which are a whole bunch of integers.
    • 0:42:52How many?
    • 0:42:53The compiler can infer it from what's ever inside these curly braces.
    • 0:42:56And it seems to be of size 1, 2, 3, 4, 5, 6, 7.
    • 0:43:00And all seven elements will be initialized with 4, 6, 8, 2, 7, 5, 0
    • 0:43:05respectively.
    • 0:43:06So just a minor optimization code wise to tighten up
    • 0:43:08what would have otherwise been like eight separate lines of code.
    • 0:43:12Now let's go ahead and implement linear search, as we called it.
    • 0:43:15And you can do this in a bunch of ways, but I'm going to do it like this.
    • 0:43:18For int i get 0, i is less than 7 i plus plus.
    • 0:43:24Then inside of my loop, I'm going to ask the question, well,
    • 0:43:27if the numbers at location i equals equals, as we asked of
    • 0:43:33[? Nomira, ?] the number 0, then I'm going to go ahead and do something
    • 0:43:36like printf found backslash n.
    • 0:43:41And then I'm going to return 0.
    • 0:43:43Just because of last week's discussion of returning
    • 0:43:46a value for main when all is well, I'm going to return 0 by convention
    • 0:43:50just to signal that indeed, I found what I'm looking for.
    • 0:43:53Otherwise, on what line do I want to go and add a printf, like, not found
    • 0:44:00and return something other than 0?
    • 0:44:02Right, I don't think I want an else here per our pseudo code earlier.
    • 0:44:06So on what line would you prefer I sort of insert a default scenario
    • 0:44:11of not found and I'll return an error?
    • 0:44:14Yeah, over here?
    • 0:44:15[INTERPOSING VOICES]
    • 0:44:19DAVID J. MALAN: Nice.
    • 0:44:20So at the end of the for loop because you
    • 0:44:21want to give the program or our volunteer
    • 0:44:23earlier a chance to go through all of the doors, all of the numbers.
    • 0:44:26But if you go through the whole thing, through the whole loop,
    • 0:44:29at the very end, you probably just want to conclude not found backslash n
    • 0:44:33and then return something like positive 1
    • 0:44:36just to signify that an error happened.
    • 0:44:38And again, this was a minor detail last week.
    • 0:44:40Any time main is successful, the programming convention is to return 0.
    • 0:44:44That means all as well.
    • 0:44:45And if something goes wrong, like you didn't find what you're looking for,
    • 0:44:49you might return something other than 0, like positive 1, maybe positive 2,
    • 0:44:52or even negative numbers if you want.
    • 0:44:55All right, well, let me go ahead and save this.
    • 0:44:57Let me do make numbers.
    • 0:44:58Hopefully no syntax errors.
    • 0:45:01All good so far.
    • 0:45:02dot slash numbers, enter.
    • 0:45:05All right, and it's found, as I would hope it would be.
    • 0:45:07And just as a little check, let's search for something that's definitely
    • 0:45:10not there, like the number negative 1.
    • 0:45:14Let me go ahead and recompile the code with make numbers.
    • 0:45:17Let me rerun the code with dot slash numbers
    • 0:45:19and hopefully-- whew, OK, not found.
    • 0:45:21So proof by example seems to be working correctly.
    • 0:45:24But let's make things a little more interesting now.
    • 0:45:26Right now, I'm using just an array of integers.
    • 0:45:29Let me go ahead and introduce maybe an array of strings instead.
    • 0:45:34And maybe this time, I'll store a bunch of names and not just integers
    • 0:45:37but actual strings of names.
    • 0:45:39So how might I do this?
    • 0:45:40Well, let me go back to my code here.
    • 0:45:42I'm going to switch us over to maybe a file called names.c.
    • 0:45:46And in here, I'll go ahead and include cs50.h.
    • 0:45:50I'll include standardio.h.
    • 0:45:53And I'm going to go ahead and for now include a new friend
    • 0:45:57from last week, string.h, which gives me some string-related functionality.
    • 0:46:00Int main void because I'm not going to bother with any command line
    • 0:46:03arguments for now.
    • 0:46:04And now if I want an array of strings, I could do something like this--
    • 0:46:09string names bracket 7.
    • 0:46:12And then I could start doing like before.
    • 0:46:14Names bracket 0 could be someone like Bill, and names bracket 1
    • 0:46:17could be someone like Charlie and so forth.
    • 0:46:20But there's this new improvement I can make.
    • 0:46:24Let me just let the compiler figure out how many names there are.
    • 0:46:27And using curly braces, I'll do Bill and then Charlie and then Fred and then
    • 0:46:32George and then Ginny and then Percy and then Ron if there's the pattern there.
    • 0:46:39All right, so now I have these seven names as strings.
    • 0:46:42Let's do something similar.
    • 0:46:44So for int, i get 0.
    • 0:46:47i is less than 7 as before, i plus plus as before.
    • 0:46:51And inside of the, loop lets this time check for the string in question,
    • 0:46:54and suppose we're searching for Ron arbitrarily.
    • 0:46:57He is there, so we should eventually find him.
    • 0:47:00Let me go ahead and say if names bracket i equals quote unquote Ron, then inside
    • 0:47:07of my if condition, I'm going to say printf found just like before.
    • 0:47:11And I'm going to return 0 just because all is well.
    • 0:47:13And I'm going to take your advice from the get go this time
    • 0:47:16and, at the end of the loop, print out not found because if I get this far,
    • 0:47:20I have not printed found, and I have not returned already.
    • 0:47:23So I'm just going to go ahead and return 1 after printing not found.
    • 0:47:27All right, let me go ahead and cross my fingers as always.
    • 0:47:30Make names this time.
    • 0:47:33And it doesn't seem to like my code here.
    • 0:47:36This is perhaps a new error that you might not
    • 0:47:38have seen yet in names.c line 11.
    • 0:47:41So that's this line here, my if condition.
    • 0:47:43Result of comparison against a string literal is unspecified.
    • 0:47:47Use an explicit string comparison function instead.
    • 0:47:50I mean, that's kind of a mouthful, and the first time you see it,
    • 0:47:53you're probably not going to know how to make sense of that.
    • 0:47:55But it does kind of draw our attention to something being awry
    • 0:47:59with the equality checking here, with equal equals and Ron.
    • 0:48:03And here's where again we've been telling
    • 0:48:06sort of a white lie for the past couple of weeks.
    • 0:48:08Strings are a thing in C. Strings are a thing in programming.
    • 0:48:12But recall from last week, I did disclaim
    • 0:48:14there's no such thing as a string data type
    • 0:48:16technically because it's not a primitive in the way an int
    • 0:48:20and a float and a bool are that are sort of built into the language.
    • 0:48:24You can't just use equation equals to compare two strings.
    • 0:48:28You actually have to use a special function that's
    • 0:48:31in this header file we talked briefly about last week.
    • 0:48:34In that header file was string length or strlen.
    • 0:48:36But there's other functions instead as well.
    • 0:48:39Let me, in fact, go ahead and open up the manual pages.
    • 0:48:43And if we go to string.h--
    • 0:48:46let me scroll down a bit.
    • 0:48:47In string.h you can perhaps infer what function will probably take
    • 0:48:52the place of equals equals for today.
    • 0:48:56What do we want to use?
    • 0:48:57Yeah.
    • 0:48:57AUDIENCE: Strcmp?
    • 0:48:58DAVID J. MALAN: So strcmp, S-T-R-C-M-P, which apparently compares two strings.
    • 0:49:03And if I click on that, we'll see more information.
    • 0:49:05And indeed, if I click on strcmp, we'll see under the synopsis
    • 0:49:09that, OK, I need to use the CS50 header file and string.h, as I already have.
    • 0:49:14Here is its prototype, which is telling me
    • 0:49:17that strcmp takes two strings, S1 and S2, that
    • 0:49:21are presumably going to be compared.
    • 0:49:22And it returns an integer, which is interesting.
    • 0:49:25So let's read on.
    • 0:49:26The description of this function is that it compares two strings case
    • 0:49:29sensitively.
    • 0:49:30So uppercase or lowercase matters, just FYI.
    • 0:49:34And then let's look it the return value here.
    • 0:49:36The return value of this function returns an int less than 0
    • 0:49:40if S1 comes before S2, 0 if S1 is the same as S2, or an int greater than 0
    • 0:49:48if S1 comes after S2.
    • 0:49:50So the reason that this function returns an integer and not just a
    • 0:49:54bool, true or false, is that it actually will
    • 0:49:57allow us to sort these things eventually because if you can tell me
    • 0:50:00if two strings come in this order or in this order or they're the same,
    • 0:50:04you need three possible return values.
    • 0:50:07And a bool, of course, only gives you two,
    • 0:50:08but an int gives you like 4 billion even though we just need the 3.
    • 0:50:12So 0 or a positive number or a negative number is what this function returns.
    • 0:50:17And the documentation goes on to explain what we mean by ASCIIbetical order.
    • 0:50:21Recall that capital A is 65, capital B is 66,
    • 0:50:25and it's those underlying ASCII or Unicode
    • 0:50:27numbers that a computer uses to figure out whether something comes before it
    • 0:50:31or after it like in the dictionary.
    • 0:50:33But for our purposes now, we only care about equality.
    • 0:50:35So I'm going to go ahead and do this.
    • 0:50:37If I want to compare names bracket i against Ron,
    • 0:50:41I use stir compare or strcmp, names bracket i comma, quote unquote, Ron.
    • 0:50:49So it's a little more involved than actually
    • 0:50:51using equals equals, which does work for integers, longs,
    • 0:50:55and certain other values.
    • 0:50:57But for strings, it turns out we need to use a more powerful function.
    • 0:51:00Why?
    • 0:51:01Well, last week, recall what a string really is.
    • 0:51:03It's an array of characters.
    • 0:51:06And so whereas you can use equals equals for single characters,
    • 0:51:09strcmp, as we'll eventually see, is going
    • 0:51:12to compare multiple characters for us.
    • 0:51:14There's more logic there.
    • 0:51:15There's a loop needed, and that's why it comes with the string library.
    • 0:51:19But it doesn't just work out of the box with equals equals alone.
    • 0:51:22That would literally be comparing two things, not two arrays of things.
    • 0:51:26And we'll come back to this next week as to what's
    • 0:51:28really going on under the hood.
    • 0:51:29So let me go ahead and fix one bug that I just realized I made.
    • 0:51:34I want to check if the return value of str compare is equal to 0
    • 0:51:39because per the documentation, that meant they're the same.
    • 0:51:42All right, let me go ahead and make names this time.
    • 0:51:45Now it compiles.
    • 0:51:46Dot slash names, Enter, found.
    • 0:51:49And just as a sanity check, let's check someone outside the family.
    • 0:51:55Searching now for Hermione after recompiling the code,
    • 0:51:59after rerunning the code.
    • 0:52:00And she's not, in fact, found.
    • 0:52:02So here's just a similar implementation of linear search
    • 0:52:05not for integers this time but instead for strings,
    • 0:52:09the subtlety really being we need a helper function, str compare,
    • 0:52:13to actually do the legwork for us of comparing two arrays of characters.
    • 0:52:17All right, questions on either of these implementations-- yeah, in the middle?
    • 0:52:21AUDIENCE: So, if I do [INAUDIBLE]
    • 0:52:22DAVID J. MALAN: Ah, good question.
    • 0:52:24If I had not fixed what I claimed was a mistake earlier and I did this--
    • 0:52:28and we saw an example of this last week, actually.
    • 0:52:30If a function returns an integer, be it negative or positive or 0,
    • 0:52:37when you get back 0, the expression, the Boolean expression,
    • 0:52:40will be considered false.
    • 0:52:42So 0 equals false always.
    • 0:52:44If a function returns any positive number, or any negative number,
    • 0:52:49that's going to be interpreted as true even
    • 0:52:52if it's positive or negative, whether it's 1, negative 1, 2, negative 2.
    • 0:52:57And so if I did this, this would be saying the opposite.
    • 0:53:01So if I were to say this, if str compare of names bracket i and Hermione, that's
    • 0:53:06implicitly like saying this does not equal 0, or it means sort of is true,
    • 0:53:13but you don't want to check for true because, again, we're
    • 0:53:15comparing integers here.
    • 0:53:17So the reason I did 0 here in this case is
    • 0:53:21that it explicitly checks for the return value that means they're the same.
    • 0:53:24And yeah.
    • 0:53:25Follow up?
    • 0:53:26AUDIENCE: [INAUDIBLE]
    • 0:53:30DAVID J. MALAN: Yes, you might not have seen this yet,
    • 0:53:32but you can express the equivalent because if you
    • 0:53:36want to check if this is false, you can actually
    • 0:53:40use an exclamation point, known as a bang in programming,
    • 0:53:43that inverts the meaning.
    • 0:53:44So false becomes true, true becomes false.
    • 0:53:48So this would be another way of expressing it.
    • 0:53:50This is arguably a worse design, though, because the documentation explicitly
    • 0:53:54says you should be checking for 0 or a positive value
    • 0:53:58or a negative value, and this little trick, while correct,
    • 0:54:01and I think you can make a reasonable case for it, sort of hides that detail.
    • 0:54:05And I would argue instead for the first way,
    • 0:54:07checking for equals equals 0 instead.
    • 0:54:09And if that's a little subtle, not to worry.
    • 0:54:11We'll come back to little syntactic tricks like that before long.
    • 0:54:16Other questions on linear search in these two forms.
    • 0:54:20Is there another hand or hands?
    • 0:54:22Two hands?
    • 0:54:24No?
    • 0:54:24OK, just holler if I missed.
    • 0:54:25So let's now actually take this one step further.
    • 0:54:28Suppose that we want to write a program that maybe implements something
    • 0:54:30a little more like a phone book that has both names and numbers and not
    • 0:54:35just integers but actual phone numbers.
    • 0:54:37Well, we could escalate things like this.
    • 0:54:39We could now have two arrays-- one called names, one called numbers.
    • 0:54:42And I'm going to use strings for the numbers now,
    • 0:54:45the phone numbers, because in most communities,
    • 0:54:48phone numbers might have dashes, pluses, parentheses, so something
    • 0:54:51that really looks more like a string even though we call it a phone number.
    • 0:54:55Probably don't want to use an int lest we throw away those kinds of details.
    • 0:54:58So let me switch back to BS Code here, and let's do one more program, this one
    • 0:55:03in a file called phonebook.c.
    • 0:55:05And now let me go ahead and do the same.
    • 0:55:06Let me include cs50.h.
    • 0:55:08Let me include standardio.h, and let me include string.h.
    • 0:55:14I'm going to again do int main void.
    • 0:55:17And then inside of my program, I'm going to give myself two arrays--
    • 0:55:20the efficient way this time.
    • 0:55:22String names will be just two of us this time.
    • 0:55:25How about Carter and me?
    • 0:55:28And then I'll give myself-- oops, typo already.
    • 0:55:30If I want this to be an array, I don't have to specify the number.
    • 0:55:33The compiler can count for me.
    • 0:55:35But I do need the square brackets.
    • 0:55:37Then for numbers, I'm again going to use a string array specifying with
    • 0:55:43the curly braces that how about Carter can be at 1-617-495-1000.
    • 0:55:49And how about my own number here--
    • 0:55:511-949-468-- oh pattern appearing--
    • 0:55:552750 will be mine.
    • 0:55:57Why mine?
    • 0:55:58Well, I'm just kind of lined things up.
    • 0:56:00So Carter's number is apparently first in this array,
    • 0:56:03and I'm claiming that he'll be first in this array, respectively.
    • 0:56:06I, David, will be the first-- the second in the names array
    • 0:56:09and second in the numbers array.
    • 0:56:12If you want to have a little fun with programming, feel free to text
    • 0:56:15or call me some time at that number.
    • 0:56:17So now let's actually use this data in some way.
    • 0:56:20Let's go ahead and actually search for my own name and number here.
    • 0:56:24So let me do.
    • 0:56:24For int i, get 0.
    • 0:56:27There's two of us this time-- so i less than 2 and then i plus plus as before.
    • 0:56:32And now I'm going to practice what I preached earlier,
    • 0:56:34and I'm going to use str compare to find my name in this case.
    • 0:56:38And I'm going to say if strcmp of names bracket i equals quote unquote David
    • 0:56:45and that equals 0, meaning they're the same,
    • 0:56:48then just as before, I'm going to go ahead and print something out.
    • 0:56:51But this time, I'm going to make the program more useful
    • 0:56:53and not just say found or not found.
    • 0:56:55Now I'm implementing a phone book, like the contacts app on iOS or Android.
    • 0:56:59So I'm going to say something like, quote unquote, found percent
    • 0:57:02s backslash n and then actually plug in numbers bracket i
    • 0:57:08to correspond to the current name bracket i.
    • 0:57:12And then I'll return 0 as before.
    • 0:57:14And then down here if we get all the way through the loop
    • 0:57:16and David's not there for some reason, I'm going to print as before not found
    • 0:57:20and then return 1.
    • 0:57:21So let me go ahead and compile this with make phone dot slash phonebook,
    • 0:57:26and it seems to have found the number.
    • 0:57:29So this code I'm going to claim is correct.
    • 0:57:33It's kind of stupid because I've just made a phone book or a contacts
    • 0:57:36app that only supports two people.
    • 0:57:37They're only going to be me and Carter.
    • 0:57:39This would be like downloading the contacts app on a phone
    • 0:57:41and you can only call two people in the world.
    • 0:57:43There's no ability to add names or edit things.
    • 0:57:45That, of course, could come later using get string or something else.
    • 0:57:48But for now for the sake of discussion, I've
    • 0:57:50just hardcoded two names and two numbers.
    • 0:57:53But for what it does, I claim this is correct.
    • 0:57:56It's going to find me and print out my number.
    • 0:57:59But is it well-designed?
    • 0:58:01Let's start to now consider if we're not just using arrays,
    • 0:58:05but are we using them, well?
    • 0:58:07We started to use them last week, but are we using them well this week?
    • 0:58:10And what might I even mean by using an array well or designing
    • 0:58:14this program well?
    • 0:58:16Any critiques or concerns with why this might not
    • 0:58:21be the best road for us to be going down when
    • 0:58:24I want to implement something like a phone book with pieces of information?
    • 0:58:28It seems all too vulnerable to just mistakes.
    • 0:58:31For instance, if I screw up the actual number of names in the names array
    • 0:58:35such that it's now more or less than is in the numbers array or vise versa,
    • 0:58:40it feels like there's not a tight relationship between those pieces
    • 0:58:43of data, and it's just sort of is trusting on the honor system
    • 0:58:47that any time I use names bracket i that it lines up with numbers bracket i.
    • 0:58:53And that's fine.
    • 0:58:54If you're the one writing the code, you're probably
    • 0:58:56not going to really screw this up.
    • 0:58:57But if you start collaborating with someone else or the program
    • 0:59:00is getting much, much longer, the odds that you or your colleagues
    • 0:59:03remember that you're sort of just trusting that names and numbers line up
    • 0:59:08like this is going to fail eventually.
    • 0:59:10Someone's not going to realize that, and just, the code is going to break.
    • 0:59:13And you're going to start out putting the wrong numbers for names, which
    • 0:59:16is to say it'd be much nicer if we could somehow couple these two
    • 0:59:20pieces of data, names and numbers, a little more tightly together so
    • 0:59:24that you're not just trusting that these two independent variables, names
    • 0:59:28and numbers, have this kind of relationship with themselves.
    • 0:59:32So let's consider how we might solve this.
    • 0:59:35A new feature today that we'll introduce is generally known as a data structure.
    • 0:59:39In C, we have the ability to invent our own data types,
    • 0:59:43if you will-- data types that the authors of C decades
    • 0:59:46ago just didn't envision or just didn't think
    • 0:59:48were necessary because we can implement them ourselves-- similar to Scratch
    • 0:59:51just as you could create custom puzzle pieces,
    • 0:59:54or in C, you can create custom functions.
    • 0:59:56So in C, can you create your own types of data
    • 1:00:00that go beyond the built in ints and floats and even strings?
    • 1:00:04You can make, for instance, a person data type or a candidate data
    • 1:00:10type in the context of elections or a person data type
    • 1:00:13more generically that might have a name and a number.
    • 1:00:15So how might we do this?
    • 1:00:17Well, let me go here and propose that if we want to define a person,
    • 1:00:23wouldn't it be nice if we could have a person data type,
    • 1:00:26and then we could have an array called people?
    • 1:00:29And maybe that array is our only array with two things
    • 1:00:32in it, two persons in it.
    • 1:00:35But somehow, those data types, these persons,
    • 1:00:38would have both a name and a number associated with them.
    • 1:00:41So we don't need two separate arrays.
    • 1:00:42We need one array of persons, a brand new data type.
    • 1:00:47So how might we do this?
    • 1:00:48Well, if we want every person in the world
    • 1:00:51or in this program to have a name and a number,
    • 1:00:53we literally right out first those two data types.
    • 1:00:56Give me a string called name.
    • 1:00:57Give me a string called number semicolon, after each.
    • 1:01:00And then we wrap that, those two lines of code,
    • 1:01:04with this syntax, which at first glance is a little cryptic.
    • 1:01:06It's a lot of words all of a sudden.
    • 1:01:08But typedef is a new keyword today that defines a new data type.
    • 1:01:12This is the C key word that lets you create your own data
    • 1:01:16type for the very first time.
    • 1:01:18Struct is another related key word that tells the compiler that this isn't just
    • 1:01:22a simple data type, like an int or a float renamed or something like that.
    • 1:01:27It actually is a structure.
    • 1:01:28It's got some dimensions to it, like two things in it or three things in it
    • 1:01:32or even 50 things inside of it.
    • 1:01:35The last word down here is the name that you want to give your data type,
    • 1:01:39and it weirdly goes after the curly braces.
    • 1:01:41But this is how you invent a data type called person.
    • 1:01:45And what this code is implying is that henceforth,
    • 1:01:48the compiler clang will know that a person is composed of a name that's
    • 1:01:53a string and a number that's a string.
    • 1:01:56And you don't have to worry about having multiple arrays now.
    • 1:01:59You can just have an array of people moving forward.
    • 1:02:03So how can we go about using this?
    • 1:02:05Well, let me go back to my code from before where
    • 1:02:07I was implementing a phone book.
    • 1:02:09And why don't we enhance the phone book code a little bit
    • 1:02:11by borrowing some of that new syntax?
    • 1:02:14Let me go to the top of my program above main
    • 1:02:16and define a type that's a structure or a data
    • 1:02:19structure that has a name inside of it and that has a number inside of it.
    • 1:02:24And the name of this new structure again is going to be called person.
    • 1:02:28Inside of my code now, let me go ahead and delete this old stuff temporarily.
    • 1:02:33Let me give myself an array called people of size 2.
    • 1:02:37And I'm going to use the non-terse way to do this.
    • 1:02:40I'm not going to use the curly braces.
    • 1:02:42I'm going to more pedantic spell out what I want in this array of size 2
    • 1:02:47at location 0, which is the first person in an array
    • 1:02:50because you always start counting at 0.
    • 1:02:52I'm going to give that person a name of quote unquote Carter.
    • 1:02:56And the dot is admittedly one new piece of syntax today too.
    • 1:02:59The dot means go inside of that structure
    • 1:03:02and access the variable called name and give it this value Carter.
    • 1:03:06Similarly, if I'm going to give Carter a number,
    • 1:03:08I can go into people bracket 0 dot number and give that the same thing
    • 1:03:13as before plus 1-617-495-1000.
    • 1:03:17And then I can do the same for myself here--
    • 1:03:20people bracket-- where should I go?
    • 1:03:24OK, one because again, two elements.
    • 1:03:26But we started counting at zero.
    • 1:03:27Bracket name equals quote unquote David.
    • 1:03:29And then lastly, people bracket 1 dot number equals quote unquote plus
    • 1:03:341-949-468-2750.
    • 1:03:40So now if I scroll down here to my logic,
    • 1:03:43I don't think this part needs to change too much.
    • 1:03:46I'm still, for the sake of discussion, going to iterate 2 times from i
    • 1:03:50is 0 on up to but not through 2.
    • 1:03:53But I think this line of code needs to change.
    • 1:03:56How should I now refer to the i-th person's name as I iterate?
    • 1:04:04What should I compare quote unquote David to this time?
    • 1:04:08Let me see.
    • 1:04:08On the end here?
    • 1:04:10AUDIENCE: People bracket i dot name.
    • 1:04:13DAVID J. MALAN: Yeah, people bracket i dot name.
    • 1:04:14Why?
    • 1:04:15Because people is the name of the array.
    • 1:04:16Bracket i is the i-th person that we're iterating over in the current loop--
    • 1:04:20first zero, then one, maybe higher if it had more people.
    • 1:04:23Then dot is our new syntax for going inside of a data structure
    • 1:04:26and accessing a variable therein which in this case is name.
    • 1:04:29And so I can compare David just as before.
    • 1:04:32So it's a little more verbose, but now arguably this is a better program
    • 1:04:36because now these people are full fledged data types unto themselves.
    • 1:04:42There's no more honor system inside of my loop
    • 1:04:44that this is going to line up because in just a moment,
    • 1:04:47I'm going to fix this one last remnant of the previous version.
    • 1:04:49And if I can call back on you again, what should I
    • 1:04:51change numbers bracket i to this time?
    • 1:04:54AUDIENCE: [INAUDIBLE] dot number.
    • 1:05:00DAVID J. MALAN: Dot number, exactly.
    • 1:05:02So gone is the honor system that just assumes
    • 1:05:04that bracket i in this array lines up with bracket i in this other array.
    • 1:05:08Now why?
    • 1:05:08There's only one array.
    • 1:05:10It's an array called people.
    • 1:05:11The things it stores are persons.
    • 1:05:14A person has a name and a number.
    • 1:05:15And so even though it's kind of marginal admittedly given
    • 1:05:18that this is a short program and given that this kind of made things
    • 1:05:21look more complicated at first glance, we're
    • 1:05:23now laying the foundation for just a better design because you really
    • 1:05:26can't screw up now the association of names
    • 1:05:29with numbers because every person's name and number is, so to speak,
    • 1:05:32encapsulated inside of the same data type.
    • 1:05:36And that's a term of art in CS.
    • 1:05:38Encapsulation means to encapsulate-- that is, contain--
    • 1:05:41related pieces of information.
    • 1:05:43And thus, we have a person that encapsulates two other data types, name
    • 1:05:49and number.
    • 1:05:50And this just sets the foundation for all
    • 1:05:52of the cool stuff we've talked about and you use every day.
    • 1:05:55What is an image?
    • 1:05:55Well, recall that an image is a bunch of pixels or dots on the screen.
    • 1:05:58Every one of those dots has RGB values associated
    • 1:06:02with it-- red, green, and blue.
    • 1:06:04You could imagine now creating a structure in C probably where
    • 1:06:07maybe you have three values, three variables-- one called red,
    • 1:06:11one called green, one called blue.
    • 1:06:13And then you could name the thing not person but pixel.
    • 1:06:15And now you could store in C three different colors-- some amount of red,
    • 1:06:19some green, some blue-- and collectively treat it as the color of a pixel.
    • 1:06:23And you could imagine doing something similar perhaps for video or music.
    • 1:06:27Music, you might have three variables-- one for the musical note,
    • 1:06:30the duration, the loudness of it.
    • 1:06:32And you can imagine coming up with your own data type for music as well.
    • 1:06:36So this is a little low level.
    • 1:06:37We're just using like a familiar contacts application.
    • 1:06:39But we now have the way in code to express most any type of data
    • 1:06:44that we might want to implement or discuss ultimately.
    • 1:06:48So any questions now on struct or defining our own types,
    • 1:06:53the purposes for which are to use arrays but use them more responsibly
    • 1:06:58now in a better design but also to lay the foundation
    • 1:07:01for implementing cooler and cooler stuff per our week zero discussion?
    • 1:07:05Yeah.
    • 1:07:06AUDIENCE: What's the [INAUDIBLE]
    • 1:07:07DAVID J. MALAN: What's the difference between this and an object in an object
    • 1:07:10oriented language?
    • 1:07:11So slight side note, C is not object-oriented.
    • 1:07:14Languages like Java and C++ and others which you might have heard
    • 1:07:17of, programmed yourself, had friends program in, are object oriented
    • 1:07:21languages in those languages they have things called classes or objects which
    • 1:07:25are interrelated.
    • 1:07:26And objects can store not just data, like variables.
    • 1:07:30Objects can also store functions, and you can kind of sort of do this in C.
    • 1:07:34But it's not sort of conventional.
    • 1:07:36In C, you have data structures that store data.
    • 1:07:39In languages like Java and C+, you have objects that store data and functions
    • 1:07:44together.
    • 1:07:45Python is an object-oriented language as well.
    • 1:07:47So we'll see this issue in a few weeks, but let me wave my hands at it for now.
    • 1:07:51Yeah.
    • 1:07:51AUDIENCE: Could you use this [INAUDIBLE]??
    • 1:07:53DAVID J. MALAN: Yes.
    • 1:07:54Could you use this struct to redefine how an int is defined?
    • 1:07:57Short answer, yes.
    • 1:07:58We talked a couple of times now about integer overflow.
    • 1:08:01And most recently, you might have seen me mention the bug in iOS and Mac OS
    • 1:08:05that was literally related to an int overflow.
    • 1:08:08That's the result of ints only storing 4 bytes or 32 bits
    • 1:08:12or even as long as 64 bits or 8 bytes.
    • 1:08:15But it's finite.
    • 1:08:16But if you want to implement some financial software
    • 1:08:19or some scientific or mathematical software that
    • 1:08:22allows you to count way bigger than a typical int or a long,
    • 1:08:25you could imagine John coming up with your own structure.
    • 1:08:29And in fact, in some languages there is a structure
    • 1:08:31called big int, which allows you to express even bigger numbers.
    • 1:08:35How?
    • 1:08:35Well, maybe you store inside of a big ant an array of values.
    • 1:08:40And you somehow allow yourself to store more and more bits
    • 1:08:43based on how high you want to be able to count.
    • 1:08:45So in short, yes.
    • 1:08:46We now have the ability now to do most anything we want in the language
    • 1:08:49even if it's not built in for us.
    • 1:08:52Other questions.
    • 1:08:53AUDIENCE: [INAUDIBLE]
    • 1:08:58DAVID J. MALAN: Could you define a name and a number in the same line?
    • 1:09:01Sort of.
    • 1:09:02It starts to get syntactically a little messy,
    • 1:09:03so I did it a little more pedantic line by line.
    • 1:09:07Good question.
    • 1:09:07Over here.
    • 1:09:08AUDIENCE: [INAUDIBLE] function you use for the function
    • 1:09:12at the bottom of the [INAUDIBLE].
    • 1:09:15Could you do something like that [INAUDIBLE]??
    • 1:09:19DAVID J. MALAN: Prototypes-- you have to do A and C. You
    • 1:09:21have to define anything you're going to use or declare anything you're going
    • 1:09:25to use before you actually use it.
    • 1:09:26So it is deliberate that I put it at the top of my code in this file.
    • 1:09:30Otherwise, the compiler would not know what I mean by person when I first
    • 1:09:34use it here on what's line 14.
    • 1:09:37So it has to come first, or it has to be put into something like a header file
    • 1:09:41so that you include it at the very top of your code.
    • 1:09:44Other questions over here.
    • 1:09:46Yeah.
    • 1:09:47AUDIENCE: [INAUDIBLE]
    • 1:09:53DAVID J. MALAN: Yeah, good question, and we'll
    • 1:09:54come back to this later in the term when we talk about SQL, a database language,
    • 1:09:58and storing things in actual databases.
    • 1:10:00Generally speaking, even though we humans call things phone numbers,
    • 1:10:04or in the US, we have social security numbers, those types of numbers
    • 1:10:07often have other punctuation in it, like dashes, parentheses, pluses,
    • 1:10:12and so forth.
    • 1:10:13You could not store any of that syntax or that punctuation inside of an int.
    • 1:10:17You could only store numbers.
    • 1:10:18So one motivation for using a string is just
    • 1:10:20I can store whatever the human wanted me to store, including parentheses
    • 1:10:24and so forth.
    • 1:10:25Another reason for storing things as strings,
    • 1:10:29even if they look like numbers, is in the context
    • 1:10:31of zip codes in the United States.
    • 1:10:33Again, we'll come back to this.
    • 1:10:34But long story short-- years ago, actually--
    • 1:10:36I was using Microsoft Outlook for my email client.
    • 1:10:39And eventually I switched to Gmail.
    • 1:10:41And this is like 10 plus years ago now.
    • 1:10:42And Outlook at the time lets you export all of your contacts as a CSV file--
    • 1:10:47Comma Separated Values.
    • 1:10:48More on that in the weeks to come too.
    • 1:10:50And that just means I could download a text
    • 1:10:52file with all of my friends and family and their numbers inside of it.
    • 1:10:55Unfortunately, I open that same CSV file with Excel, I think, at the time
    • 1:10:59just to kind of spot check it and see if what's
    • 1:11:01in there was what it was expected.
    • 1:11:03And I must have instinctively hit, like, Command or Control-S to save it.
    • 1:11:06And Excel at least has this habit of sort of reformatting your data.
    • 1:11:09If things look like numbers, it treats them as numbers.
    • 1:11:12And Apple Numbers does this too.
    • 1:11:14Google Spreadsheets does this to nowadays.
    • 1:11:16But long story short, I then imported my mildly saved CSV file into Gmail.
    • 1:11:23And now 10 plus years later, I'm still occasionally finding friends and family
    • 1:11:26members whose zip codes are in Cambridge, Massachusetts 2138,
    • 1:11:32which is missing the 0 because we here in Cambridge are 02138.
    • 1:11:36And that's because I treated or I let Excel
    • 1:11:39treat what looks like a number as an actual number or int,
    • 1:11:42and now leading zeros become a problem because mathematically, they
    • 1:11:45mean nothing, but in the mail system, they do--
    • 1:11:48sending envelopes and such.
    • 1:11:50All right, other final questions here.
    • 1:11:51AUDIENCE: [INAUDIBLE]
    • 1:11:54DAVID J. MALAN: Yeah, so could I have used a 2D or two dimensional
    • 1:11:58array to solve the problem earlier of having just one array?
    • 1:12:02Yes, but one, I would argue it's less readable, especially
    • 1:12:06as I get lots of names and numbers.
    • 1:12:08And two, that too is also kind of relying on the honor system.
    • 1:12:11It would be all too easy to omit some of the square brackets in the two
    • 1:12:14dimensional array.
    • 1:12:15So I would argue it too is not as good as introducing a struct.
    • 1:12:20More on that down the road.
    • 1:12:21Two dimensional arrays just means arrays of arrays, as you might infer.
    • 1:12:26All right, so now that we have this ability
    • 1:12:28to store different types of data like contacts in a phone book,
    • 1:12:32having names and addresses, let's actually
    • 1:12:33take a step back and consider how we might now
    • 1:12:36solve one of the original problems by actually sorting the information we're
    • 1:12:41given in advance and considering, per our discussion earlier, just how
    • 1:12:45costly, how time consuming is that because that might tip
    • 1:12:48the scales in favor of sorting, then searching, or maybe just
    • 1:12:53not sorting and only searching.
    • 1:12:54It'll give us a sense of just how expensive, so to speak,
    • 1:12:58sorting something actually is.
    • 1:13:00Well, what's the formulation of this problem?
    • 1:13:02It's the same thing as week zero.
    • 1:13:03We've got input to sort.
    • 1:13:04We want it to be output as sorted.
    • 1:13:07So for instance, if we're taking unsorted input as input,
    • 1:13:10we want the sorted output as the result. More concretely,
    • 1:13:13if we've got numbers like these--
    • 1:13:1563852741, which are just randomly arranged numbers--
    • 1:13:20we want to get back out 12345678.
    • 1:13:24So we just want those things to be sorted.
    • 1:13:26So again, inside of the black box here is
    • 1:13:28going to be one or more algorithms that actually gets this job done.
    • 1:13:33So how might we go about doing this?
    • 1:13:35Well, just to vary things a bit more, I think we have a chance here
    • 1:13:39for a bit more audience participation.
    • 1:13:41But this time, we need eight people if we may.
    • 1:13:43All of you have to be comfortable appearing on the internet.
    • 1:13:46OK, so this is actually quite convenient that you're all quite close.
    • 1:13:49How about 1, 2, 3, 4, 5, 6, 7--
    • 1:13:52oh, OK, and someone volunteering their friend-- number eight.
    • 1:13:57Come on down.
    • 1:13:58Come on down.
    • 1:13:58And if you could, I'm going to set things up.
    • 1:14:00If you all could join Valerie, my colleague over there,
    • 1:14:03to give you a prop to use here, we'll go ahead in just a moment
    • 1:14:08and try to find some numbers at hand.
    • 1:14:15In just a moment, each of our volunteers is going to be representing an integer.
    • 1:14:20And that integer is initially going to be in unsorted order.
    • 1:14:25And I claim that using an algorithm, step by step instructions,
    • 1:14:29we can probably sort these folks in at least a couple of different ways.
    • 1:14:33So they're in wardrobe right now just getting their very own Harvard T-shirts
    • 1:14:38with a Jersey number on it, which will then represent an element of our array.
    • 1:14:46Give us just a moment to finish getting the attire ready.
    • 1:14:51They're being handed a shirt and a number.
    • 1:14:56And let me ask the audience for just a moment.
    • 1:14:58As we have these numbers up here on the screen, these numbers too are unsorted.
    • 1:15:02They're just in random order.
    • 1:15:04And let me ask the audience.
    • 1:15:05How would you go about sorting these eight numbers on the screen?
    • 1:15:10How would you go about sorting these?
    • 1:15:12Yeah, what are your thoughts?
    • 1:15:14AUDIENCE: [INAUDIBLE] the number at the end, the following number.
    • 1:15:20DAVID J. MALAN: OK.
    • 1:15:20AUDIENCE: The following number is bigger, then I keep it as it is.
    • 1:15:24DAVID J. MALAN: OK.
    • 1:15:25AUDIENCE: If not, then [INAUDIBLE].
    • 1:15:26DAVID J. MALAN: OK, so just to recap, you would start
    • 1:15:29with one of the numbers on the end.
    • 1:15:30You would look to the number to the right or to the left of it,
    • 1:15:33depending on which end you start at.
    • 1:15:34And if it's out of order, you would just start to swap things.
    • 1:15:37And that seems reasonable.
    • 1:15:38There's a whole bunch of mistakes to fix here
    • 1:15:40because things are pretty out of order.
    • 1:15:42But probably, if you start to solve small problems at a time,
    • 1:15:45you can achieve the end result of getting the whole thing sorted.
    • 1:15:47Other instincts, if you were just handed these numbers, how
    • 1:15:50you might go about sorting them?
    • 1:15:54How might you?
    • 1:15:54Yeah, in the back.
    • 1:15:55AUDIENCE: [INAUDIBLE]
    • 1:16:00DAVID J. MALAN: OK, I like that.
    • 1:16:01So to recap there, find the smallest one first and put it at the beginning,
    • 1:16:05if I heard you correctly.
    • 1:16:07And then presumably, you could do that again and again and again.
    • 1:16:10And that would seem to give you a couple of different algorithms.
    • 1:16:13And if you all are attired here--
    • 1:16:15do you want to come on up if you're ready?
    • 1:16:18We had some [? felt ?] volunteers too.
    • 1:16:20Come on over.
    • 1:16:23So if you all would like to line yourselves up
    • 1:16:25facing the audience in exactly this order-- so
    • 1:16:27whoever is number zero should be way over here,
    • 1:16:30and whoever is number five should be way over there.
    • 1:16:33Feel free to distance as much as you'd like and scooch a little with this way
    • 1:16:36if you could.
    • 1:16:38OK, all right.
    • 1:16:39And make a little more room.
    • 1:16:40So seven-- let's see.
    • 1:16:415, 2, 7, 4--
    • 1:16:44AUDIENCE: [INAUDIBLE]
    • 1:16:45DAVID J. MALAN: 4, hopefully 1.
    • 1:16:46Yeah, keep them to the side.
    • 1:16:47OK, 1, 6, and there we go--
    • 1:16:513.
    • 1:16:51Come on over, three.
    • 1:16:52I was looking for you.
    • 1:16:53All right, so here, we have an array of eight numbers--
    • 1:16:57eight integers if you will.
    • 1:16:58And do you want to each say a quick hello to the group?
    • 1:17:00AUDIENCE: Hello, I'm Quinn.
    • 1:17:02Go [INAUDIBLE].
    • 1:17:04AUDIENCE: Hi, everyone.
    • 1:17:05I'm [INAUDIBLE].
    • 1:17:08AUDIENCE: Hey, I'm Mitchell.
    • 1:17:09AUDIENCE: Hi, I'm Brett.
    • 1:17:10And also, go [INAUDIBLE].
    • 1:17:12AUDIENCE: I'm Hannah.
    • 1:17:13Go [INAUDIBLE].
    • 1:17:15AUDIENCE: Hi, I'm Matthew.
    • 1:17:16Go [INAUDIBLE]
    • 1:17:18AUDIENCE: Hi, I'm Miriam.
    • 1:17:19Go Winthrop.
    • 1:17:20AUDIENCE: Hi, I'm Celeste, and go Strauss.
    • 1:17:22DAVID J. MALAN: Wonderful.
    • 1:17:23Well, welcome all to the stage, and let's just visualize,
    • 1:17:26perhaps organically, how you eight would solve this problem.
    • 1:17:29So we currently have the numbers 0 through 7 quite out of order.
    • 1:17:32Could you go ahead and just yourselves from 0 through 7?
    • 1:17:36AUDIENCE: Thank you.
    • 1:17:41DAVID J. MALAN: OK, so what did they just do?
    • 1:17:44OK, yes.
    • 1:17:44First of all, yes, very well done.
    • 1:17:50How would you describe what they just did?
    • 1:17:53Well, let's do this.
    • 1:17:53Could you go back into that order on the screen--
    • 1:17:5652741630?
    • 1:18:00And could you do exactly what you just did again?
    • 1:18:03Sort yourselves.
    • 1:18:08All right, what did--
    • 1:18:09OK, yes.
    • 1:18:09Well done again.
    • 1:18:14All right, so admittedly, there's kind of a lot going on because each of you,
    • 1:18:17except number four, are doing something in parallel all at the same time.
    • 1:18:21And that's not really how a computer typically works.
    • 1:18:23Just like a computer can only look at one memory location, at one locker,
    • 1:18:26at a time, so can a computer only move one number at a time-- sort of opening
    • 1:18:31a locker, checking what's there, moving it as needed.
    • 1:18:33So let's try this more methodically based on the two audience suggestions.
    • 1:18:36If you all could randomize yourself again to 52741630,
    • 1:18:42let's take the second of those approaches first.
    • 1:18:44I'm going to look at these numbers.
    • 1:18:46And even though I as the human can obviously see all the numbers
    • 1:18:48and I just kind of have the intuition for how to fix this,
    • 1:18:50we got to be more methodical because eventually, we've
    • 1:18:53got to translate this to pseudo code and then code.
    • 1:18:55So let me see.
    • 1:18:56I'm going to search for, as you proposed, the smallest number.
    • 1:18:59And I'm going to start from left to right.
    • 1:19:00I could do it right to left, but left to right just tends to be convention.
    • 1:19:04All right, 5 at this moment is the smallest number I've seen.
    • 1:19:07So I'm going to remember that in a variable, if you will.
    • 1:19:09Now I'm going to take one more step--
    • 1:19:112.
    • 1:19:11OK, 2 I'm going to compare to the variable in mind, obviously smaller.
    • 1:19:15I'm going to forget about 5 and only now remember 2 as the now smallest
    • 1:19:19elements.
    • 1:19:197, nope-- I'm going to ignore that because it's not smaller than the 2
    • 1:19:23I have in mind.
    • 1:19:234, 1-- OK, I'm going to update the variable in mind
    • 1:19:27because that's indeed smaller.
    • 1:19:28Now obviously, we the humans know that's getting pretty small.
    • 1:19:31Maybe it's the end.
    • 1:19:32I have to check all values to see if there's something even smaller
    • 1:19:35because 6 is not, 3 is not, but 0 is.
    • 1:19:38And what's your name again?
    • 1:19:39AUDIENCE: Celeste.
    • 1:19:40DAVID J. MALAN: Celeste.
    • 1:19:41Where should Celeste or number 0 go according to this proposed algorithm?
    • 1:19:47All right, I'm seeing a lot of this.
    • 1:19:49So at the beginning of the array, so before doing this for real,
    • 1:19:52let's have you pop out in front.
    • 1:19:54And could you all shift and make room for Celeste?
    • 1:19:58Is this a good idea to have all of them move or equivalently
    • 1:20:02move everything in the array to make room for Celeste
    • 1:20:04and number 0 over there?
    • 1:20:06No, probably not.
    • 1:20:07That felt like a lot of work.
    • 1:20:08And even though it happened pretty quickly, that's like seven steps
    • 1:20:12to happen just to move her in place.
    • 1:20:14So what would be marginally smarter perhaps--
    • 1:20:16a little more efficient, perhaps?
    • 1:20:18What's that?
    • 1:20:19AUDIENCE: Swapping.
    • 1:20:19DAVID J. MALAN: Swapping.
    • 1:20:20What do you mean by swap?
    • 1:20:21AUDIENCE: Replacing swaps.
    • 1:20:23DAVID J. MALAN: OK, replace two values.
    • 1:20:24So if you want to go back to where you were, one step Over, number 5,
    • 1:20:28he's not in the right place.
    • 1:20:29He's got to move eventually.
    • 1:20:30So you know what?
    • 1:20:31If that's where Celeste belongs, why don't we just swap 5 and 0?
    • 1:20:34So if you want to go ahead and exchange places with each other.
    • 1:20:36Notice what's just happened.
    • 1:20:38The problem I'm trying to solve has gotten smaller.
    • 1:20:41Instead of being size 8, now it's size 7.
    • 1:20:44Now granted, I moved 5 to another wrong location.
    • 1:20:47But if these numbers started off randomly,
    • 1:20:48it doesn't really matter where 5 goes until we get him into the right place.
    • 1:20:52So I think we've improved.
    • 1:20:54And now if I go back, my loop is sort of coming back around.
    • 1:20:57I can ignore Celeste and make this a seven step problem and not eight
    • 1:21:01because I know she's in the right place.
    • 1:21:032 seems to be the smallest.
    • 1:21:04I'll remember that.
    • 1:21:05Not 7, not 4--
    • 1:21:071 seems to be the smallest.
    • 1:21:09Now I know as a human this should be my next smallest.
    • 1:21:13But why, intuitively, should I keep going, do you think?
    • 1:21:18I can't sort of optimize as a human and just say, number 1,
    • 1:21:20let's get you into the right place.
    • 1:21:22I still want to check the whole array.
    • 1:21:24Why?
    • 1:21:25Yeah.
    • 1:21:25AUDIENCE: Perhaps there's another 1.
    • 1:21:28DAVID J. MALAN: Maybe there's another 1, and that
    • 1:21:30could be another problem altogether.
    • 1:21:32Other thoughts?
    • 1:21:32Yeah.
    • 1:21:33AUDIENCE: Could be another 0
    • 1:21:34DAVID J. MALAN: There could be another 0 indeed,
    • 1:21:36but I did go through the list once, right?
    • 1:21:38And I kind of know there isn't.
    • 1:21:39Your thoughts?
    • 1:21:40AUDIENCE: You don't know that every value is represented.
    • 1:21:43So maybe there's a [INAUDIBLE] You just don't know what kind of data
    • 1:21:47you're working with.
    • 1:21:48DAVID J. MALAN: Yeah, I don't necessarily know what is there.
    • 1:21:50And honestly, I only stipulated earlier that I'm using one variable in my mind.
    • 1:21:54I could use two and remember the two smallest elements I've seen.
    • 1:21:58I could use three variables, four.
    • 1:22:00But then I'm going to start to use a lot of space in addition to time.
    • 1:22:03So if I've stipulated that I only have one variable to solve this problem,
    • 1:22:06I don't know anything more about these elements
    • 1:22:09because the only thing I'm remembering at this moment
    • 1:22:11is number 1 is the smallest element I've seen.
    • 1:22:13So I'm going to keep going.
    • 1:22:146?
    • 1:22:14Nope.
    • 1:22:153?
    • 1:22:15Nope.
    • 1:22:155?
    • 1:22:16Nope.
    • 1:22:16OK, I know that number 1, and your name was--
    • 1:22:18AUDIENCE: Hannah.
    • 1:22:19DAVID J. MALAN: --Hannah is the next smallest element.
    • 1:22:21I could have everyone move over to make room, but nope.
    • 1:22:242?
    • 1:22:25You know, even though you're so close to where I want you,
    • 1:22:26I'm just going to keep it simple and swap you two.
    • 1:22:29So granted, I've made the problem a little worse.
    • 1:22:31But on average, I could get lucky too and just pop number 2
    • 1:22:35into the right place.
    • 1:22:36Now let me just accelerate this.
    • 1:22:38I can now ignore Hannah and Celeste, making the problem size 6 instead of 8.
    • 1:22:42So it's getting smaller.
    • 1:22:437 is the smallest.
    • 1:22:45Nope, now 4 is--
    • 1:22:462 is the smallest.
    • 1:22:47Still 2, still 2, still 2.
    • 1:22:50So let's go ahead and swap 2 and 7.
    • 1:22:53And now I'll just kind of orchestrate it verbally.
    • 1:22:564, you're about to have to do something.
    • 1:22:57So we now have 4, 7, 6 3, 5.
    • 1:23:01OK, 3-- could you swap with 4?
    • 1:23:04All right, now we have 7, 6, 4, 5.
    • 1:23:07OK, 4, could you swap with 7?
    • 1:23:10Now we have 6, 7, 5.
    • 1:23:135, could you swap with 6?
    • 1:23:15And now we have 7, 6.
    • 1:23:166, would you swap at 7?
    • 1:23:18And now perhaps round of applause.
    • 1:23:19They've sorted themselves.
    • 1:23:20OK, hang on there one minute.
    • 1:23:25So we'll do this one other approach.
    • 1:23:27And my God, that felt so much slower than the first approach,
    • 1:23:30but that's, one, because I was kind of providing a long voiceover.
    • 1:23:33But two, we were doing one thing at a time whereas the first time, you guys
    • 1:23:37had the luxury of moving like eight different CPUs--
    • 1:23:41brains, if you will-- were all operating at the same time.
    • 1:23:43And computers like that exist.
    • 1:23:45If you have a computer with multiple cores, so to speak,
    • 1:23:47that's like having a computer that technically
    • 1:23:49can do multiple things at once.
    • 1:23:51But software typically, at least as we've written it thus far,
    • 1:23:54can only do one thing at a time.
    • 1:23:56So in a bit, we'll add up all of these steps.
    • 1:23:58But for now, let's take one other approach.
    • 1:23:59If you all could reorder yourselves like that--
    • 1:24:0252741630-- let's take the other approach that
    • 1:24:06was recommended by just fixing small problems and see where this gets us.
    • 1:24:10So we're back in the original order.
    • 1:24:125 and 2 are clearly out of order.
    • 1:24:14So you know what?
    • 1:24:15Let's just bite this problem off now.
    • 1:24:175 and 2, could you swap?
    • 1:24:19Now let me take a next step.
    • 1:24:205 and 7, I think you're OK.
    • 1:24:22There's a gap, yes, but that might not be a big deal.
    • 1:24:257 and 4-- problem.
    • 1:24:26Let's have you swap.
    • 1:24:28OK, 7 and 1, let's have you swap.
    • 1:24:317 and 6, let's have you swap.
    • 1:24:347 and 3, you swap.
    • 1:24:357 and 0, you swap.
    • 1:24:37Now let me pause for just a moment.
    • 1:24:39Still not sorted.
    • 1:24:40So I'm clearly not done.
    • 1:24:42But have I improved the problem?
    • 1:24:45Right, I can't see-- like before, I can't optimize like
    • 1:24:48before because 0 is obviously not here.
    • 1:24:50So unless they're still way back there, so it's not like I've gone from 8 steps
    • 1:24:54to 7 to 6 just yet.
    • 1:24:56But have I made any improvements?
    • 1:24:58AUDIENCE: Yes.
    • 1:24:58DAVID J. MALAN: Yes.
    • 1:24:59In what sense is this improved?
    • 1:25:01What's a concrete thing you could point to is better?
    • 1:25:06Yeah.
    • 1:25:06AUDIENCE: Sorted the highest number.
    • 1:25:08DAVID J. MALAN: I've sorted the highest number, which is indeed 7.
    • 1:25:10And conversely, if you prefer, Celeste is one step closer to the beginning.
    • 1:25:15Now worst case, Celeste is going to have to move one step on each iteration.
    • 1:25:19So I might need to do this thing like n total times
    • 1:25:22to move her all the way over.
    • 1:25:24But that might work out OK.
    • 1:25:25Let me see.
    • 1:25:262 and 5, you're good.
    • 1:25:275 and 4, swap you.
    • 1:25:295 and 1, let's swap you.
    • 1:25:315 and 6, you're good.
    • 1:25:326 and 3, let's swap you.
    • 1:25:346 and 0, let's swap you.
    • 1:25:366 and 7, you're good.
    • 1:25:38And I think now--
    • 1:25:39notice that the high values, as you noted,
    • 1:25:41are sort of bubbling up, if you will, to the end of the list.
    • 1:25:442 and 4, you're good.
    • 1:25:454 and 1, let's swap.
    • 1:25:464 and 5, good.
    • 1:25:485 and 3, swap.
    • 1:25:495 and 0, swap.
    • 1:25:515, 6, 7, of course, are good.
    • 1:25:53So now you can sort of see the problem resolving itself.
    • 1:25:56And let's just do this part now faster.
    • 1:25:572 and 1, 2 and 4.
    • 1:26:00OK, 4 and 3, 4 and 0.
    • 1:26:04All right, now 1 and 2, 2, and 3, and 0, and good.
    • 1:26:08So we do have some optimization there.
    • 1:26:10We don't need to keep going because those all are sorted.
    • 1:26:121 and 2, you're good.
    • 1:26:132 and 0, all right, done.
    • 1:26:161 and 0-- and big round of applause in closing.
    • 1:26:20OK, so thank you all.
    • 1:26:24We need the puppets back, but you can keep the shirts.
    • 1:26:27Thank you for volunteering here.
    • 1:26:28Feel free to make your way exits left or right.
    • 1:26:31And let's see if, thanks to our volunteers
    • 1:26:33here, we can't now formalize a little bit what we did on both passes here.
    • 1:26:40I claim that the first algorithm our volunteers kindly acted out
    • 1:26:44is what's called selection sort.
    • 1:26:45And as the name implied, we selected the smallest elements again and again
    • 1:26:50and again, working our way from left to right,
    • 1:26:53putting Celeste into the right place, and then continuing with everyone else.
    • 1:26:58So selection sort, as it's formally called,
    • 1:27:01can be described, for instance, with this pseudo code here--
    • 1:27:044i from 0 to n minus 1.
    • 1:27:06And again, why this?
    • 1:27:08This is just how talk about arrays.
    • 1:27:10The left end is 0, the right end is n minus 1 where in this case,
    • 1:27:14n happened to be eight people.
    • 1:27:16So that's 0 through 7.
    • 1:27:18So for i from 0 to n minus 1, what did I do?
    • 1:27:22I found the smallest number between numbers bracket i and numbers bracket
    • 1:27:27n minus 1.
    • 1:27:28It's a little cryptic at first glance, but this
    • 1:27:31is just a very pseudo code-like way of saying
    • 1:27:34find the smallest element among all eight volunteers
    • 1:27:37because if i starts at 0 and n minus 1 never changes because there's always
    • 1:27:438, 8 people, so 8 minus 1 is 7, this first
    • 1:27:47says find the smallest number between numbers bracket 0
    • 1:27:50and numbers bracket 7, if you will.
    • 1:27:53Then what do I do?
    • 1:27:54Swap the smallest number with numbers bracket i.
    • 1:27:57So that's how we got Celeste from over here all the way over there.
    • 1:28:01We just swapped those two values.
    • 1:28:03What then happens next in this pseudo code?
    • 1:28:05i, of course, goes from 0 to 1.
    • 1:28:07And that's the technical way of saying now
    • 1:28:10find the smallest element among the 7 remaining volunteers,
    • 1:28:13ignoring Celeste this time because she was already in the correct location.
    • 1:28:17So the problem went from size 8 to size 7.
    • 1:28:19And if we repeat, size 6, 5, 4, 3, 2, 1, until boom,
    • 1:28:23it's all done at the very end.
    • 1:28:25So this is just one way of expressing in pseudo code what
    • 1:28:29we did a little more organically and a formalization of what someone
    • 1:28:33volunteered out in the audience.
    • 1:28:35So if we consider, then, the efficiency of this algorithm,
    • 1:28:40maybe abstracting it away now as a bunch of doors
    • 1:28:42where the left most again is always 0, the right most is always n minus 1,
    • 1:28:46or equivalently, the second to last is n minus 2, the third to last
    • 1:28:50is n minus 3 where n might be 8 or anything else,
    • 1:28:54how do we think about or quantify the running time of selection sort?
    • 1:28:59Big O of what?
    • 1:29:02I mean, that was a lot of steps to be adding up.
    • 1:29:05It's probably more than n, right, because I went through the list
    • 1:29:09again and again.
    • 1:29:10It was like n plus n minus 1 plus n minus 2.
    • 1:29:14Any instincts here?
    • 1:29:17We got like the whole team in the orchestra now.
    • 1:29:20Let me propose we think about it this way with just a bit of formula, say.
    • 1:29:25So the first time, I had to look at n different volunteers.
    • 1:29:29n was 8 in this case, but generically, I looked at all eight numbers
    • 1:29:33in order to decide who was the smallest.
    • 1:29:35And sure enough, Celeste was at the very end.
    • 1:29:37She happened to be all the way to the right.
    • 1:29:39But I only knew that once I looked at all 8 or all n volunteers.
    • 1:29:43So that took me n steps first.
    • 1:29:45But once the list was swapped into the right place, then
    • 1:29:49my problem with size n minus 1, and I had n minus 1 other people
    • 1:29:53to look through.
    • 1:29:54So that's n minus 1 steps.
    • 1:29:55Then after that, it's n minus 2 plus n minus 3 plus n minus 4 plus dot dot
    • 1:29:59dot until I had one final step.
    • 1:30:01And it's obvious that I only have one human left to consider.
    • 1:30:04So we might wave our hands at this with a little ellipsis
    • 1:30:07and just say dot dot dot plus 1 for the final step.
    • 1:30:10Now what does this actually equal?
    • 1:30:11Well, this is where you might think back on, like,
    • 1:30:13your high school math or physics textbook that
    • 1:30:15has a little cheat sheet at the end that shows these kinds of recurrences.
    • 1:30:18That happens to work out mathematically to be
    • 1:30:21n times n plus 1 all divided by 2.
    • 1:30:25That's just what that recurrence, that series, actually adds up to.
    • 1:30:28So if you take on faith that that math is correct, let's
    • 1:30:31just now multiply this out mathematically.
    • 1:30:35That's n squared plus n divided by 2 or n squared divided by 2 plus n over 2.
    • 1:30:41And here's where we're starting to get annoyingly into the weeds.
    • 1:30:44Like, honestly, as n gets really large, like a million doors or integers
    • 1:30:50or a billion web pages in Google search engine, honestly, which of these terms
    • 1:30:54is going to matter the most mathematically
    • 1:30:57if n is a really big number?
    • 1:30:59Is n squared divided by 2 the dominant factor,
    • 1:31:01or is n divided by 2 the dominant factor?
    • 1:31:04AUDIENCE: n squared.
    • 1:31:05DAVID J. MALAN: Yeah, n squared.
    • 1:31:07I mean, no matter what n is-- and the bigger it is,
    • 1:31:09the bigger raising it to the power 2 is going to be.
    • 1:31:12So you know what?
    • 1:31:13Let's just wave our hands at this because at the end of the day,
    • 1:31:16as n gets really large, the dominant factor is indeed that first one.
    • 1:31:19And you know what?
    • 1:31:20Even the divided 2, as I claimed earlier with our two phone book examples, where
    • 1:31:24the two straight lines if you keep zooming out essentially
    • 1:31:26looked the same when n is large enough, let's just call this on the order of n
    • 1:31:31squared.
    • 1:31:32So that is to say a computer scientist would describe bubble sort as taking
    • 1:31:37on the order of n squared steps.
    • 1:31:39That's an oversimplification.
    • 1:31:41If we really added it up, it's actually this many steps-- n
    • 1:31:44squared divided by 2 plus n over 2.
    • 1:31:46But again, if we want to just be able to generally compare two algorithms'
    • 1:31:50performance, I think it's going to suffice if we look at that highest
    • 1:31:53order term to get a sense of what the algorithm feels like, if you will,
    • 1:31:59or what it even looks like graphically.
    • 1:32:02All right, so with that said, we might describe bubble sort
    • 1:32:06as being in big O--
    • 1:32:07sorry, selection sort as being in big O of n squared.
    • 1:32:11But what if we consider now the best case scenario-- an opportunity
    • 1:32:17to talk about a lower bound?
    • 1:32:19In the best case, how many steps does selection sort take?
    • 1:32:23Well, here, we need some context.
    • 1:32:24Like, what does it mean to be the best case or the worst case
    • 1:32:27when it comes to sorting?
    • 1:32:29Like, what could you imagine meaning the best possible scenario when you're
    • 1:32:32trying to sort a bunch of numbers?
    • 1:32:35I got the whole crew here again.
    • 1:32:37Yeah.
    • 1:32:37AUDIENCE: They would already be sorted.
    • 1:32:39DAVID J. MALAN: All right, they're already sorted, right?
    • 1:32:40I can't really imagine a better scenario than I have to sort some numbers,
    • 1:32:44but they're already sorted for me.
    • 1:32:46But does this algorithm leverage that fact in practice?
    • 1:32:51Even if all of our humans had lined up from 0 to 7,
    • 1:32:54I'm pretty sure I would have pretty naively started here.
    • 1:32:56And yes, Celeste happens to be here.
    • 1:32:58But I only know she needs to be here once I've looked at all eight people.
    • 1:33:03And then I would have realized, well, that was a waste of time.
    • 1:33:06I can leave Celeste be.
    • 1:33:07But then what would I have done?
    • 1:33:09I would have ignored her position because we've solved one problem.
    • 1:33:12I would have done the same thing now for seven people, then six people.
    • 1:33:16So every time I walk through, I'm not doing much useful work.
    • 1:33:19But I am doing those comparisons because I
    • 1:33:21don't know until I do the work that the people were in the right order.
    • 1:33:25So this would seem to imply that the omega notation, the best case
    • 1:33:30scenario, even, a lower bound on the running time would be what, then?
    • 1:33:33AUDIENCE: [INAUDIBLE]
    • 1:33:35DAVID J. MALAN: A little louder?
    • 1:33:37AUDIENCE: N squared.
    • 1:33:38DAVID J. MALAN: It's still going to be n squared,
    • 1:33:40in fact, because the code I'm giving myself doesn't leverage or benefit
    • 1:33:45from any of that scenario because it just mindlessly continues
    • 1:33:50to do this again and again.
    • 1:33:51So in this case, yes, I would claim that the omega notation for selection sort
    • 1:33:57is also big O of n squared.
    • 1:33:58So those are the kinds of numbers to beat.
    • 1:34:00It seems like the upper bound and lower bound of selection
    • 1:34:03sort are indeed n squared.
    • 1:34:06And so we can also describe selection sort, therefore,
    • 1:34:08as being in theta of n squared.
    • 1:34:09That's the first algorithm we've had the chance to describe that in,
    • 1:34:12which is to say that it's kind of slow.
    • 1:34:14I mean, maybe other algorithms are slower,
    • 1:34:16but this isn't the best starting point.
    • 1:34:18Can we do better?
    • 1:34:19Well, there's a reason that I guided us to doing the second algorithm second.
    • 1:34:22Even though you verbally proposed them in a different order,
    • 1:34:25this second algorithm we did is generally known as bubble sort.
    • 1:34:28And I deliberately used that word a bit ago,
    • 1:34:31saying the big values are bubbling their way up to the right
    • 1:34:34to kind of capture the fact that, indeed, this algorithm works
    • 1:34:37differently.
    • 1:34:38But let's consider if it's better or worse.
    • 1:34:40So here, we have pseudo code for bubble sort.
    • 1:34:43You could write this too in different ways.
    • 1:34:45But let's consider what we did on the stage.
    • 1:34:48We repeated the following n minus 1 times.
    • 1:34:51We initialized at least, even though I didn't verbalize it this way,
    • 1:34:55a variable like i from 0 to n minus 2, n minus 2.
    • 1:35:00And then I asked this question.
    • 1:35:01If numbers bracket i and numbers bracket i plus 1 are out of order,
    • 1:35:08then swap them.
    • 1:35:10So again, I just did it more intuitively by pointing,
    • 1:35:12but this would be a way, with a bit of pseudo code,
    • 1:35:14to describe what's going on.
    • 1:35:16But notice that I'm doing something a little differently here.
    • 1:35:18I'm iterating from if equals 0 to n minus 2.
    • 1:35:22Why?
    • 1:35:23Well, if I'm comparing two things, left hand and right hand,
    • 1:35:26I'd still want to start at 0.
    • 1:35:28But I don't want to go all the way to n minus 1
    • 1:35:31because then, I'd be going past the boundary of my array, which
    • 1:35:35would be bad.
    • 1:35:35I want to make sure that my left hand-- i, if you will--
    • 1:35:38stops at n minus 2 so that when I plus 1 in my pseudo code,
    • 1:35:42I'm looking at the last two elements, not the last element
    • 1:35:45and then pass the boundary.
    • 1:35:46That's actually a common programming mistake
    • 1:35:48that we'll undoubtedly soon make by going
    • 1:35:50beyond the boundaries of your array.
    • 1:35:52So this pseudo code, then, allows me to say compare every one again and again
    • 1:35:59and swap them if they're out of order.
    • 1:36:01Why do I repeat the whole thing n minus 1 times?
    • 1:36:06Like, why does it not suffice just to do this loop here?
    • 1:36:11Think what happened with Celeste.
    • 1:36:14Why do I repeat this whole thing n minus 1 times?
    • 1:36:19Yeah, in the back?
    • 1:36:20AUDIENCE: [INAUDIBLE]
    • 1:36:27DAVID J. MALAN: Indeed, and I think if I can recap accurately,
    • 1:36:30think back to Celeste again.
    • 1:36:31And I'm sorry to keep calling on you as our number 0.
    • 1:36:34Each time through bubble sort, she only moved one step.
    • 1:36:37And so in total, if there's n locations, at the end of the day,
    • 1:36:41she needs to move n minus 1 steps to get 0 all the way to where it needs to be.
    • 1:36:45And so this inner loop, if you will, where we're iterating using i,
    • 1:36:50that just fixes some of the problems.
    • 1:36:52But it doesn't fix all of the problems until we do that same logic again
    • 1:36:56and again and again.
    • 1:36:57And so how might we quantify the running time of this algorithm?
    • 1:37:01Well, one way to see it is to just literally look at the pseudo code.
    • 1:37:04The outer loop repeats n minus 1 times by definition.
    • 1:37:09It literally says that.
    • 1:37:10The inner loop, the for loop, also iterates n minus 1 times.
    • 1:37:15Why?
    • 1:37:16Because it's going from 0 to n minus 2.
    • 1:37:18And if that's hard to think about, that's the same thing is 1 to n minus 1
    • 1:37:22if you just add 1 to both ends of the formula.
    • 1:37:25So that means you're doing n minus 1 things n minus 1 times.
    • 1:37:29So I literally multiply how many times the outer loop
    • 1:37:32is running by how many times the inner loop is running, which gives me
    • 1:37:35sort of FOIL method n minus 1 squared.
    • 1:37:39And I could multiply that whole thing out.
    • 1:37:41Well, let's consider this just a little more methodically here.
    • 1:37:44If I have n minus 1 on the outer, n minus 1 on the inner--
    • 1:37:48let's go ahead and FOIL this.
    • 1:37:49So n squared minus n minus n plus 1, combine
    • 1:37:53like terms-- n squared minus 2n plus 1.
    • 1:37:56And now which of these terms is clearly going to be dominant, so to speak?
    • 1:38:01The--
    • 1:38:01AUDIENCE: N squared.
    • 1:38:02DAVID J. MALAN: --the n squared.
    • 1:38:03So yes, even though minus 2n is a good thing
    • 1:38:06because it's subtracting off some of the time required,
    • 1:38:08plus 1 is not that big a thing, there's such drops in the bucket when
    • 1:38:11n gets really large, like in the millions or billions,
    • 1:38:14certainly, that bubble sort 2 is on the order of n squared.
    • 1:38:18It's not the same exactly as selection sort.
    • 1:38:21But as n gets big, honestly, we're barely
    • 1:38:23going to be able to notice the difference most likely.
    • 1:38:25And so it too might be said to be on the order of n squared.
    • 1:38:29And if we consider now the lower bound on bubble sort's running time,
    • 1:38:34here's where things get potentially interesting.
    • 1:38:38What might you claim is the running time of bubble sort in the best case?
    • 1:38:44And the best case, I claim, is when the numbers are already sorted.
    • 1:38:48Is our pseudo code going to take that into account?
    • 1:38:50AUDIENCE: N
    • 1:38:52DAVID J. MALAN: OK, n.
    • 1:38:53Why do you propose n?
    • 1:38:54AUDIENCE: [INAUDIBLE]
    • 1:38:59DAVID J. MALAN: Yes, and that's the key word.
    • 1:39:00To summarize, in bubble sort, I do have to minimally make one pass because if I
    • 1:39:05don't look at all n elements, that I'm theoretically
    • 1:39:07just guessing if it's sorted or not.
    • 1:39:09Like, I obviously intuitively have to look
    • 1:39:11at every element to decide yay or nay, it's in the right order.
    • 1:39:14And my original pseudo code, though, is pretty naive.
    • 1:39:17It's just going to blindly go back and forth n minus 1 times again and again,
    • 1:39:22and that's going to add up.
    • 1:39:23But what if I add a bit of an optimization
    • 1:39:25that you might have glimpsed on the slide a moment ago
    • 1:39:28where if I compare two people and I don't swap them, compare two people,
    • 1:39:31don't swap them, and I go all the way through the list comparing
    • 1:39:34every pair of adjacent people, and I make no swaps,
    • 1:39:38it would be kind of not just naive but stupid
    • 1:39:40to do that same process again because if the humans have not moved,
    • 1:39:44I'm not going to make any different decisions.
    • 1:39:47I'm going to do nothing again, nothing again.
    • 1:39:49So at that point, it would be stupid, very inefficient, to go back and forth
    • 1:39:53and back and forth.
    • 1:39:54So if I modify our pseudo code with just an additional if condition,
    • 1:39:58I bet we can speed this up.
    • 1:40:00Inside of that same pseudo code, what if I say, hey, if no swaps, quit?
    • 1:40:06Like quit, prematurely before the loops are finished running.
    • 1:40:10One of the loops has gone through per the indentation here.
    • 1:40:13But if I do a loop from left to right and I
    • 1:40:16have made no swaps, which you can think of as just being
    • 1:40:18one other variable that's plus plusing as I go keeping, track of how many
    • 1:40:22swaps--
    • 1:40:22if I've made no swaps from left to right,
    • 1:40:24I'm not going to make any swaps the next time around either.
    • 1:40:27So let's just quit at that point.
    • 1:40:30And that is to say in the best case, if you will,
    • 1:40:32when the list is already sorted, the omega notation for bubble sort
    • 1:40:36might indeed be omega of n if you add that optimization
    • 1:40:41so as to short circuit all of that inefficient
    • 1:40:44looping to do it only as many times as is necessary.
    • 1:40:49Let me pause to see if there's any questions here.
    • 1:40:52Yeah.
    • 1:40:53AUDIENCE: [INAUDIBLE] to optimize the running time for all cases possible?
    • 1:41:02DAVID J. MALAN: Good question.
    • 1:41:03If the running time of selection sort and bubble sort are both in big O
    • 1:41:08of n squared but selection sort is in omega of n squared while bubble sort is
    • 1:41:14in omega of n, which sounds better--
    • 1:41:17I think if I may, should we just always use bubble sort?
    • 1:41:20Yes if we think that we might benefit over time
    • 1:41:24from a lot of good case scenarios or best case scenarios.
    • 1:41:29However, the goal at hand in just a bit is
    • 1:41:31going to be to do even better than both of these.
    • 1:41:33So hold that question further for a moment.
    • 1:41:35Yeah.
    • 1:41:35AUDIENCE: [INAUDIBLE] n minus 1?
    • 1:41:41DAVID J. MALAN: No.
    • 1:41:41So yes, good question.
    • 1:41:43So I say omega of n, but is it technically omega of n minus 1?
    • 1:41:46Maybe, but again, we're throwing away lower order terms.
    • 1:41:50And that's an advantage because we're not comparing things ever so precisely.
    • 1:41:53Just like I plotted with the green and yellow and red chart,
    • 1:41:57I just want to get a sense of the shape of these algorithms
    • 1:41:59so that when n gets really large, which of these choices
    • 1:42:03is going to matter the most?
    • 1:42:05At the end of the day, it's actually perfectly reasonable
    • 1:42:07to use selection sort or bubble sort if you
    • 1:42:09don't have that much data because they're going to be pretty fast.
    • 1:42:12My God, our computers nowadays are 1 gigahertz,
    • 1:42:142 gigahertz, 1 billion things per second, 2 billion things per second.
    • 1:42:18But if we have large data sets, as we will later in the term
    • 1:42:20and as you might in the real world, that the Googles of the world,
    • 1:42:23then you're going to want to be more thoughtful.
    • 1:42:25And that's where we're going today.
    • 1:42:27All right, so let's actually see this visualized a little bit.
    • 1:42:30In a moment, I'm going to change screens here
    • 1:42:32to open up what is a little visualization tool that will give us
    • 1:42:37a sense of how these things actually work and look at a faster rate
    • 1:42:40than our humans are able to do here on stage.
    • 1:42:42So here is another visualization of a bunch of numbers, an array of numbers.
    • 1:42:47Short bars mean small numbers, tall bars mean big numbers.
    • 1:42:50So instead of having the numbers on their torsos here,
    • 1:42:52we just have bars that are small or tall based on the magnitude of the number.
    • 1:42:57Let me go ahead, and I preconfigured this in advance
    • 1:43:00to operate somewhat quickly.
    • 1:43:02Let's go ahead and do selections sort by clicking this button.
    • 1:43:05And you'll see some pink bars flying by.
    • 1:43:08And that's like me walking left and right, left and right,
    • 1:43:11to select the next smallest number.
    • 1:43:14And so what you'll see happening on the left of this array of numbers
    • 1:43:17is Celeste, if you will, and all of the other smaller numbers
    • 1:43:20are appearing on the left while we continue to solve the remaining
    • 1:43:24problems to the right.
    • 1:43:25So again, we no longer have to touch the smaller numbers here.
    • 1:43:29So that's why the problem is getting smaller and smaller and smaller
    • 1:43:32over time.
    • 1:43:32But you can notice now visually, look at how many times
    • 1:43:36we're retracing our steps.
    • 1:43:37This is why things that are n squared tend
    • 1:43:40to be frowned upon if avoidable because I'm touching the same elements again
    • 1:43:44and again.
    • 1:43:45When I was walking through, I kept pointing at the same humans
    • 1:43:47again and again.
    • 1:43:48And that adds up.
    • 1:43:49So let's see if bubble sort looks or feels a little different.
    • 1:43:52Let me re-randomize the thing, and let me now click Bubble Sort at the top.
    • 1:43:56And as you might infer, there's other sorting algorithms out there,
    • 1:43:58not all of which we'll look at.
    • 1:44:00But here's bubble sort.
    • 1:44:01Same pink coloration, but it's doing something different.
    • 1:44:04It's two pink bars going through again and again comparing
    • 1:44:07the adjacent numbers.
    • 1:44:09And you'll see that the largest numbers are indeed bubbling the way up
    • 1:44:13to the right, but the smaller numbers, like our number 0 was,
    • 1:44:18is only slowly making its way over.
    • 1:44:20Here's a comparable.
    • 1:44:21Here's the number one.
    • 1:44:22And it's going to take a while to get all the way to the left.
    • 1:44:24And here too, notice how many times the same bars
    • 1:44:28are becoming pink, how many times the algorithm is retracing and retracing
    • 1:44:32its steps.
    • 1:44:33Why?
    • 1:44:33Because it's only solving one problem at a time on each pass.
    • 1:44:37And each time we do that, we're stepping through practically the whole array.
    • 1:44:40And now granted, I could speed this up even further if I really wanted to,
    • 1:44:43but my God, this is only, what, like 50 or 60 elements, something like that?
    • 1:44:48This is slow.
    • 1:44:49Like, this is what n squared looks like and feels like.
    • 1:44:52And now I'm just trying to come up with words to say
    • 1:44:54until we get to the finish line here.
    • 1:44:56Like, this would be annoying if this is the speed of sorting,
    • 1:44:59and this is why I sort of secretly sorted the numbers for Rave in advance
    • 1:45:03because it would have taken us an annoying number of steps
    • 1:45:05to get that in place for her.
    • 1:45:07So those two algorithms are n squared.
    • 1:45:10Can we do, in fact, better?
    • 1:45:12Well, to save the best algorithm for last, let's take a shorter five minute
    • 1:45:15break here.
    • 1:45:15And when we come back, we'll do even better than n squared.
    • 1:45:20All right.
    • 1:45:22So the challenge at hand is to do better than selection sort
    • 1:45:27and better than bubble sort and ideally not just marginally
    • 1:45:30better but fundamentally better.
    • 1:45:32Just like in week zero, that third and final divide and conquer algorithm
    • 1:45:35was sort of fundamentally faster than the other two.
    • 1:45:38So can we do better than something on the order of n squared?
    • 1:45:41Well, I bet we can if we start to approach
    • 1:45:44the problem a little differently.
    • 1:45:46The sorts we've done thus far, generally known
    • 1:45:48as comparison sorts-- and that kind of captures
    • 1:45:50the reality that we were doing a huge number of comparisons again and again.
    • 1:45:54And you kind of saw that in the vertical bars that were going pink as everything
    • 1:45:57was being compared again and again.
    • 1:45:58But there's this programming technique, and it's actually
    • 1:46:01a mathematical technique known as recursion
    • 1:46:04that we've actually seen before.
    • 1:46:05And this is a building block or a mental model
    • 1:46:09we can bring to bear on the problem to solve the sorting problem
    • 1:46:12sort of fundamentally differently.
    • 1:46:13But first, let's look at it in a more familiar context.
    • 1:46:16A little bit ago, I proposed this pseudo code for the binary search algorithm.
    • 1:46:23And notice that what was interesting about this code,
    • 1:46:26even though I didn't call it out at the time, it's kind of cyclically defined.
    • 1:46:29Like, I claim this is an algorithm for search,
    • 1:46:32and yet it seems a little unfair that I'm using the verb search
    • 1:46:36inside of the algorithm for search.
    • 1:46:38It's like an English sort of defining a word by using the word.
    • 1:46:41Normally, you shouldn't really get away with that.
    • 1:46:43But there's something interesting about this technique
    • 1:46:46here because even though this whole thing is a search algorithm
    • 1:46:51and I'm using my own algorithm to search the left half or the right half,
    • 1:46:56the key feature here that doesn't normally
    • 1:46:58happen in English when you define a word in terms of a word
    • 1:47:01is that when I search the left half or search the right half, yes,
    • 1:47:05I'm doing the same thing.
    • 1:47:06I'm using the same algorithm.
    • 1:47:08But the problem is, by definition, half as large.
    • 1:47:11So this isn't going to be a cyclical argument in the same way.
    • 1:47:14This approach, by using search within search
    • 1:47:17is going to whittle the problem down and down and down until hopefully,
    • 1:47:21one door or no doors remains.
    • 1:47:24And so recursion is a programming technique
    • 1:47:26whereby a function calls itself.
    • 1:47:29And we haven't seen this yet in C, and we haven't seen this really in Scratch.
    • 1:47:34But in C, you can have a function call itself.
    • 1:47:38And the form that takes is like literally
    • 1:47:39using the function's name inside of the function's implementation itself.
    • 1:47:44We've actually seen an opportunity for this once before too.
    • 1:47:48Think back to week zero.
    • 1:47:49Here's that same pseudo code for searching for someone
    • 1:47:51in an actual, physical phone book.
    • 1:47:53And notice these yellow lines here.
    • 1:47:55We described those in week zero as inducing a loop, a cycle.
    • 1:47:59And this is a very procedural approach, if you will, because lines 8 and 11
    • 1:48:04are very mechanically, if you will, telling
    • 1:48:06me to go back to line three to do this kind of looping thing.
    • 1:48:10But really, what that's doing in the binary search algorithm for the phone
    • 1:48:14book is it's just telling me to search the left half or search the right half.
    • 1:48:20I'm doing it more mechanically again by sort of telling myself
    • 1:48:23what line number to go back to.
    • 1:48:25But that's equivalent to just telling myself go search the left half,
    • 1:48:28search the right half, the key thing being the left
    • 1:48:30have and the right half are smaller than the original problem.
    • 1:48:33It would be a bug if I just said search the phone book, search the phone book,
    • 1:48:37because obviously, you never get anywhere.
    • 1:48:39But if you search the half, the half, the half,
    • 1:48:41problem gets smaller and smaller.
    • 1:48:43So let's reformulate week zero's phone book code to be not procedural as here
    • 1:48:50but recursive whereby in this search algorithm,
    • 1:48:55AKA binary search, formerly called divide and conquer, I'm
    • 1:48:58going to literally use also the keyword search here.
    • 1:49:01Notice among the benefits of doing this is it
    • 1:49:03kind of tightens the code up, makes it a little more succinct,
    • 1:49:06even though that's kind of a fringe benefit here.
    • 1:49:08But it's an elegant way too of describing
    • 1:49:12a problem by just having a function use itself
    • 1:49:17to solve a smaller puzzle at hand.
    • 1:49:21So let's now consider a familiar problem, a smaller
    • 1:49:24version than the one you've dabbled with-- this sort of pyramid,
    • 1:49:27this half pyramid from Mario.
    • 1:49:28And let's throw away the parts that aren't that interesting
    • 1:49:31and just consider how we might, up until now, implement this in C code,
    • 1:49:35this left aligned pyramid, if you will.
    • 1:49:37Let me go over here, and let me create a file called-- how about iteration.c?
    • 1:49:44And in this file, I'm going to go ahead and include cs50.h.
    • 1:49:48And I'm going to include stdio.h.
    • 1:49:51And the goal at hand is to implement in C a little program that just prints out
    • 1:49:57this and exactly this pyramid.
    • 1:49:59So no get string or any of that-- we're just going to keep it simple
    • 1:50:02and print exactly this pyramid of height 4 here.
    • 1:50:05So how might I do this?
    • 1:50:06Well, let me go ahead, and in main, let me first ask the user for--
    • 1:50:10well, we'll go ahead and generalize it.
    • 1:50:12Let's go ahead and ask the user for heights.
    • 1:50:14We're using getint as before.
    • 1:50:16And I'll store that in a variable called height.
    • 1:50:18And then let me go ahead and simply call the function
    • 1:50:20draw passing in that height.
    • 1:50:22So for the moment, let me assume that someone somewhere
    • 1:50:25has implemented a draw function.
    • 1:50:26And this, then, is the entirety of my program.
    • 1:50:29All right, unfortunately, C does not come with a draw function.
    • 1:50:32So let me go ahead and invent one.
    • 1:50:34It doesn't need to return a value.
    • 1:50:36It just needs to print something-- so-called side effect.
    • 1:50:38So I'm going to define a function called draw that takes as input an int.
    • 1:50:43I'll call it n for number, but I could call it anything I want.
    • 1:50:46And inside of this.
    • 1:50:47I'm going to go ahead and print out a left aligned pyramid like this from top
    • 1:50:53to bottom.
    • 1:50:53The salient features here are that this is a pyramid, at least in this example,
    • 1:50:57of height four.
    • 1:50:58And now in height four, the first row has one brick.
    • 1:51:02The second row has two.
    • 1:51:03The third has three.
    • 1:51:04The fourth has four.
    • 1:51:05That's a nice pattern that I can probably represent in code.
    • 1:51:08So how might I do this?
    • 1:51:09Well, how about 4 int i gets--
    • 1:51:11let me do it the old school way--
    • 1:51:121.
    • 1:51:13And then i is less than or equal to n.
    • 1:51:18And then i plus plus--
    • 1:51:20so I'm going from 1 to 4 just to keep myself sane here.
    • 1:51:23And then inside of this loop, what do I want to do?
    • 1:51:26Well, let me keep it conventional, in fact.
    • 1:51:28Let me just change this to be the more conventional 0
    • 1:51:31to n even though it might not be as intuitive because now on row 0,
    • 1:51:36I want one brick.
    • 1:51:37On row 1, I want two bricks, dot dot dot.
    • 1:51:39On row 3, I want four.
    • 1:51:41So it's kind of offset now.
    • 1:51:42But I'm being more conventional.
    • 1:51:44So on each row, how many bricks do I want to print?
    • 1:51:48Well, I think I want to do this.
    • 1:51:49For int j, for instance, common to use j after if you have a nested loop,
    • 1:51:55let's start j at 0 and do this so long as is less than i plus 1
    • 1:52:02and then do j plus plus.
    • 1:52:04So why i plus 1?
    • 1:52:06Well, again, when I equals 0, that's the first row, and I want one brick.
    • 1:52:10When i equals 1, that's the second row.
    • 1:52:12I want two bricks.
    • 1:52:13And dot dot dot, when i is 3, I want four bricks.
    • 1:52:16So again, I have to add 1 to i to get the total number of bricks
    • 1:52:19that I want to print to the screen.
    • 1:52:21So inside of this nested for loop, I'm going to do printf of a hash
    • 1:52:25with no new line.
    • 1:52:28I'm going to save the new line for about here instead.
    • 1:52:33All right, the last thing I'm going to do
    • 1:52:34is copy and paste the prototype at the top of the file.
    • 1:52:38So that I can call this.
    • 1:52:39And again, this is of now week one, week two.
    • 1:52:42Wouldn't necessarily come to your mind as quickly as it
    • 1:52:45might to mine after all this practice, but this is something reminiscent
    • 1:52:48of what you yourself did already for Mario-- printing out
    • 1:52:51a pyramid that hopefully in a moment is going to look like this.
    • 1:52:54So let me go back to my code.
    • 1:52:56Let me run make iteration, and let me do dot slash iteration.
    • 1:53:00I'll type in 4, and voila.
    • 1:53:02Seems to be correct, and let's assume it's going to work for other inputs
    • 1:53:05as well.
    • 1:53:06Oh, thank you.
    • 1:53:11So this is indeed an example of iteration-- doing something
    • 1:53:15again and again.
    • 1:53:16And it's very procedural.
    • 1:53:18Like, I literally have a function called draw that does this thing.
    • 1:53:21But I can think about implementing draw in a somewhat different way that's
    • 1:53:25kind of clever.
    • 1:53:26And it's not strictly necessary for this problem
    • 1:53:28because this problem honestly is not that complicated
    • 1:53:31to solve once you have practice under your belt.
    • 1:53:33Certainly the first time around, probably significantly challenging.
    • 1:53:36But now that you kind of associate, OK, row one
    • 1:53:39with one brick, row two with two bricks, it kind of comes together
    • 1:53:42with these for loops.
    • 1:53:43But how else could we think about this problem?
    • 1:53:45Well, this physical structure, these bricks, in some sense
    • 1:53:48is a recursive structure, a structure that's defined in terms of itself.
    • 1:53:54Now what do I mean by that?
    • 1:53:56Well, if I were to ask you the question, what does a pyramid of height 4
    • 1:54:01look like, you would point, of course, to this picture.
    • 1:54:04But you could also kind of cleverly say to me, well,
    • 1:54:11it's actually a pyramid of height 3 plus 1 additional row.
    • 1:54:16And here's that cyclical argument, right?
    • 1:54:18Kind of obnoxious to do typically in English or in a spoken language
    • 1:54:21because you're defining one thing in terms of itself.
    • 1:54:23What's a pyramid of height 4?
    • 1:54:24Well, it's a pyramid of height 3 plus 1 more row.
    • 1:54:28But we can kind of leverage this logic in code.
    • 1:54:30Well, what's a pyramid of height 3?
    • 1:54:32Well, it's a pyramid of height 2 plus 1 more row.
    • 1:54:34Fine, what's a pyramid of height 2?
    • 1:54:36Well, it's a pyramid of height 1 plus 1 more row.
    • 1:54:39And then hopefully, this process ends, and it does because notice,
    • 1:54:42the pyramid is getting smaller and smaller.
    • 1:54:44So you're not going to have this sort of silly back and forth with me
    • 1:54:48infinitely many times because when we finally get to the base case,
    • 1:54:52the end of the pyramid, fine.
    • 1:54:54What is a pyramid of height 1?
    • 1:54:55Well, it's a pyramid of no height plus one more row.
    • 1:54:58And at that point, things just get negative--
    • 1:55:01no pun intended.
    • 1:55:02Things just would otherwise go negative.
    • 1:55:04And so you can just kind of stop.
    • 1:55:05The base case is when there is no more pyramid.
    • 1:55:07So there's a way to draw a line in the sand and say, stop, no more arguments.
    • 1:55:12But this idea of defining a physical structure in terms of itself
    • 1:55:16or code in terms of itself actually lets us do some interesting new algorithms.
    • 1:55:22Let me go back to my code here.
    • 1:55:24Let me go ahead and create one final file here called recursion.c
    • 1:55:29that leverages this idea of this built-in self-referential nature.
    • 1:55:36Let me include cs50.h.
    • 1:55:38Let me go ahead and include standardio.h, int main void.
    • 1:55:41And then inside of main, I'm going to do the exact same thing-- int
    • 1:55:46height equals get int, asking the user for height.
    • 1:55:50And then I'm going to go ahead and call draw passing in height.
    • 1:55:53So that's going to stay the same.
    • 1:55:55I even am going to make my prototype the same-- void draw int n semicolon.
    • 1:56:01And now I'm going to implement void down here
    • 1:56:03with that same prototype, of course.
    • 1:56:05But the code now is going to be a little different.
    • 1:56:08What am I going to do here?
    • 1:56:10Well, first of all, if you ask me to draw a pyramid of height n,
    • 1:56:15I'm going to be kind of a wise ass here and say, well, just
    • 1:56:18draw a pyramid of n minus 1--
    • 1:56:21done.
    • 1:56:21All right, but there's still a little more work to be done.
    • 1:56:24What happens after I print or draw a pyramid of height n minus 1
    • 1:56:28according to our structural definition a moment ago?
    • 1:56:33What remains after drawing a pyramid of height n minus 1 or 3, specifically?
    • 1:56:38AUDIENCE: [INAUDIBLE]
    • 1:56:40We need one more row of hashes.
    • 1:56:42OK, so I can do that, right?
    • 1:56:43I'm OK with the single loops.
    • 1:56:45There's no nesting necessary here.
    • 1:56:46I'm just going to do this-- for int i get 0, i is less than n,
    • 1:56:51which is the height that's passed in, i plus plus.
    • 1:56:53And then inside of this loop, I'm very simply
    • 1:56:55going to print out a single hash.
    • 1:56:56And then down here, I'm going to print out a new line at the very end.
    • 1:57:00So that's good, right?
    • 1:57:01I might not be as comfortable with nested loops.
    • 1:57:03This is nice and simple.
    • 1:57:04What does this loop do here on line 17 through 20?
    • 1:57:07It literally prints n hashes by counting from i equals 0 on up to
    • 1:57:13but not through n.
    • 1:57:15So that's sort of week one style syntax.
    • 1:57:17But this is kind of trippy now because I've somehow
    • 1:57:20boiled down the implementation of draw into printing a row after just
    • 1:57:25drawing the thing above it.
    • 1:57:27But this is problematic as is because in this case,
    • 1:57:31my drawer function, notice, is always going to call the draw function forever
    • 1:57:37in some sense.
    • 1:57:38But ideally, when do I want this cyclical process to stop?
    • 1:57:44When do I want to not call draw anymore?
    • 1:57:48Yeah, when n is 1, right?
    • 1:57:51When I get to the top of the pyramid, when n is 1,
    • 1:57:53or heck, when the pyramids all gone and n equals 0.
    • 1:57:55I can pick any line in the sand, so long as it's
    • 1:57:58sort of at the end of the process.
    • 1:57:59Then I don't want to call draw anymore.
    • 1:58:01So maybe what I should do is this.
    • 1:58:03If n equals equals 0, there's really nothing to draw.
    • 1:58:10So I'm just going to go ahead and return like this.
    • 1:58:14Otherwise, I'm going to go ahead and draw
    • 1:58:16n minus 1 rows and then one more row.
    • 1:58:20And I could express this differently.
    • 1:58:21I could do something like this, which would be equivalent.
    • 1:58:24I could say something like if n is greater than or equal to 0,
    • 1:58:29then go ahead and draw the row.
    • 1:58:31But I like it this way first.
    • 1:58:32For now, I'm going to go with the original way
    • 1:58:34just to ask a simple question and then just bail out of the function
    • 1:58:37if n equals 0.
    • 1:58:38And heck, just to be super safe, just in case
    • 1:58:41the user types in a negative number, let me also
    • 1:58:44just check if n is a negative number, also, just return immediately.
    • 1:58:46Don't do anything.
    • 1:58:48I'm not returning a value because again, the function is void.
    • 1:58:51It doesn't need or have a return value.
    • 1:58:53So just saying return suffices.
    • 1:58:55But if n equals 1 or 2 or 3 or anything higher,
    • 1:59:00it is reasonable to draw a pyramid of slightly shorter height like, instead
    • 1:59:06of 4, 3, and then go ahead and print one more row.
    • 1:59:10So this is an example now of code that calls itself within itself.
    • 1:59:16Draw is calling draw.
    • 1:59:18But this so-called base case ensures, this conditional ensures,
    • 1:59:22that we're not going to do this forever.
    • 1:59:24Otherwise, we literally would do this infinitely many times,
    • 1:59:27and something bad is probably going to happen.
    • 1:59:30All right, let me go ahead and compile this code-- make recursion.
    • 1:59:34OK, no syntax errors-- dot slash recursion, Enter, height of 4,
    • 1:59:38and voila.
    • 1:59:40If only because some of you have run into this issue accidentally already,
    • 1:59:44let me get rid of the base case here, and let me recompile the code.
    • 1:59:48Make recursion.
    • 1:59:49Oh, and actually, now it's actually catching it.
    • 1:59:52So the compiler is smart enough here to realize
    • 1:59:55that all paths through this function will call itself.
    • 1:59:58AKA, It's going to loop forever.
    • 2:00:01So let me do the first thing.
    • 2:00:03Suppose I only check for n equaling 0.
    • 2:00:05Let me go ahead and recompile this code with make recursion.
    • 2:00:09And now let me just be kind of uncooperative.
    • 2:00:12When I run this program, still works for 4, still works for 0.
    • 2:00:16What if I do like negative 100?
    • 2:00:19Have any of you experienced a segmentation fault or core dump?
    • 2:00:22OK, so no shame in this.
    • 2:00:24Like, this means I have somehow touched memory that I shouldn't have.
    • 2:00:28And in short, I actually called this function thousands of times
    • 2:00:32accidentally, it would seem now, until the program just
    • 2:00:35bailed on me because I eventually touched memory in the computer
    • 2:00:38that I shouldn't have.
    • 2:00:39That'll make even more sense next week.
    • 2:00:41But for now, it's simply a bug.
    • 2:00:42And I can avoid that bug in this context,
    • 2:00:44probably not your own pset context, by just making sure we don't even
    • 2:00:48allow for negative numbers at all.
    • 2:00:51So with this building block in place, what
    • 2:00:53can we now do in terms of those same numbers to sort?
    • 2:00:57Well, it turns out there's a sorting algorithm called merge sort.
    • 2:01:00And there's bunches of others too.
    • 2:01:01But merge sort is a nice one to discuss because it fundamentally, we hope,
    • 2:01:06is going to do better than selection sort and bubble
    • 2:01:09sort that is better than n squared.
    • 2:01:11But the catch is it's a little harder to think about.
    • 2:01:14In fact, I'll act it out myself with just these numbers on the shelf here
    • 2:01:17rather than humans because recursion in general takes a little bit of effort
    • 2:01:21to wrap your mind around, typically a bit of practice.
    • 2:01:23But I'll see if we can't walk through it methodically
    • 2:01:25enough such that this comes to light.
    • 2:01:28So here's the pseudo code I propose for this algorithm called merge sort.
    • 2:01:32In the spirit of recursion, this sorting algorithm
    • 2:01:35literally calls itself by using the verb sort in its pseudo code.
    • 2:01:41So how does merge sort work?
    • 2:01:42It sort of obnoxiously says, well, if you want to sort all of these things,
    • 2:01:46go sort the left half, then go sort the right half,
    • 2:01:48and then merge the two together.
    • 2:01:50Now obnoxious in what sense?
    • 2:01:51Well, if I just asked you to sort something and you just tell me,
    • 2:01:54well, go sort that thing and then go sort
    • 2:01:56that thing, what was the point of asking you in the first place?
    • 2:01:58But the key is that each of these lines is
    • 2:02:01sorting a smaller piece of the problem.
    • 2:02:03So eventually, we'll be able to pare this down
    • 2:02:06into something that doesn't go on forever because in fact, in merge sort,
    • 2:02:10there's a base case too.
    • 2:02:12There's a scenario where we just check, wait a minute,
    • 2:02:14if there's only one number to sort, that's it.
    • 2:02:17Quit then because you're all done.
    • 2:02:19So there has to be this base case in any use of recursion
    • 2:02:22to make sure that you don't mindlessly call yourself forever.
    • 2:02:27You've got to stop at some point.
    • 2:02:29So let's focus on the third of these steps.
    • 2:02:32What does it mean to merge two lists, two halves of a list,
    • 2:02:37just because this is apparently going to be
    • 2:02:39a key ingredient-- so here, for instance,
    • 2:02:41are two halves of a list of size 8.
    • 2:02:43We have the numbers 2-- and I'll call it out if you're at a bad angle--
    • 2:02:472457 and 0136.
    • 2:02:52Notice that the left half at the moment, 2457, is already sorted,
    • 2:02:56and the right half, 0136, is also sorted as well.
    • 2:03:01So that's a good thing because it means that theoretically, I've
    • 2:03:03sorted the left half already.
    • 2:03:05I've sorted the right half already before we began.
    • 2:03:07I just need to merge these two halves.
    • 2:03:09What does it mean to sort two halves?
    • 2:03:11Well, for the sake of discussion, I'm just
    • 2:03:13going to turn over most of the numbers except for the first numbers in each
    • 2:03:19of these halves.
    • 2:03:20There's two halves here, left and right.
    • 2:03:23At the moment, I'm only going to consider
    • 2:03:25the leftmost element of each half-- that is, the one on the left here
    • 2:03:29and the one on the left here.
    • 2:03:30How do I merge these two lists together?
    • 2:03:33Well, if I look at 2 and I look at 0, which one should presumably come first?
    • 2:03:38The smaller one.
    • 2:03:39So I'm going to grab the 0, and I'm going
    • 2:03:41to put it into its own place on this new shelf here.
    • 2:03:44And now I'm going to consider, as part of my iteration,
    • 2:03:50the beginning of this list and the new beginning of this list.
    • 2:03:53So I'm now comparing 2 and 1.
    • 2:03:55Which one's smaller?
    • 2:03:56I'm going to go ahead and grab the 1.
    • 2:03:58Now I'm going to compare the beginning of the left list
    • 2:04:01and the new beginning of the right list, 2 and 3.
    • 2:04:03Of course, it's 2.
    • 2:04:05Now I'm going to compare the beginning of the left list
    • 2:04:07and the beginning of the right list, 4 and 3.
    • 2:04:09It's of course 3.
    • 2:04:11Now I'm going to compare the 4 against the beginning and end,
    • 2:04:14it turns out, of the second list--
    • 2:04:154, of course.
    • 2:04:16Now I'm going to compare the beginning of the left list
    • 2:04:19and the beginning of the right list--
    • 2:04:205, of course.
    • 2:04:22I'm realizing this is not going to end well because I left
    • 2:04:25too much distance between the numbers.
    • 2:04:26But that has nothing to do with the algorithm.
    • 2:04:287 is the beginning of the left list.
    • 2:04:296 is the beginning of the right list.
    • 2:04:31It's, of course, 6.
    • 2:04:32And at the risk of knocking all of these over,
    • 2:04:36if I now make room for this element, we have hopefully
    • 2:04:43sorted the whole thing by having merged together the two halves of the list.
    • 2:04:50So in short-- thank you.
    • 2:04:56I'm a little worried that's just getting sarcastic now,
    • 2:04:58but we now have merged two half lists.
    • 2:05:03We haven't done the guts of the algorithm yet-- sort the left half
    • 2:05:07and sort the right half.
    • 2:05:08But I claim that that is how mechanically you
    • 2:05:11merge two sorted halves.
    • 2:05:12You keep looking at the beginning of each list,
    • 2:05:15and you just kind of weave them together based
    • 2:05:17on which one belongs first based on its size.
    • 2:05:21So if you agree that that was a reasonable way
    • 2:05:23to merge two lists together, let's go ahead and focus lastly
    • 2:05:28on what it means to actually sort the left half
    • 2:05:31and sort the right half of a whole bunch of numbers.
    • 2:05:33And for this, I'm going to go ahead and order them
    • 2:05:35in this seemingly random order.
    • 2:05:37And I just have a little cheat sheet above so that I don't mess up.
    • 2:05:40And I'm going to start at the very top this time.
    • 2:05:42And hopefully, these will not fall down at any point.
    • 2:05:45But I'm just deliberately putting them in this random order, 5274.
    • 2:05:51And then we have 1630--
    • 2:05:531630.
    • 2:05:57Hopefully this won't fall over.
    • 2:05:59Here is now an array of size 8 with eight integers.
    • 2:06:03And I want to sort this.
    • 2:06:05I could use selection sort and just go back and forth and back and forth.
    • 2:06:08I could use bubble sort and just compare pairs, pairs, pairs.
    • 2:06:11But those are going to be on the order of big O of n squared.
    • 2:06:13My hope is to do fundamentally better here.
    • 2:06:16So let's see if we can do better.
    • 2:06:17All right, so let me look now at my code.
    • 2:06:19I'll keep it on the screen.
    • 2:06:21How do I implement merge sort?
    • 2:06:23Well, if there's only one number, I quit.
    • 2:06:25There's obviously not.
    • 2:06:26There's eight numbers, so that's not applicable.
    • 2:06:28I'm going to go ahead and sort the left half of numbers.
    • 2:06:30All right, here's the left half--
    • 2:06:325274.
    • 2:06:34Do I sort an array of size 4?
    • 2:06:37Well, here's where the recursion kicks in.
    • 2:06:40How do you sort a list of size 4?
    • 2:06:42Well, there's the pseudo code on the board.
    • 2:06:44I sort the left half of the list of size 4.
    • 2:06:47So here we go.
    • 2:06:49I have a list of size 4.
    • 2:06:50How do I sort it?
    • 2:06:51I sort the left half.
    • 2:06:52All right, now I have a list of size 2.
    • 2:06:54How do I sort this?
    • 2:06:56Well, sort the left half.
    • 2:06:58So here we go.
    • 2:06:59Here's a list of size 1.
    • 2:07:01How do I sort this?
    • 2:07:03I think it's done, right?
    • 2:07:05That's quit, right?
    • 2:07:06If only one number, I'm done.
    • 2:07:08The 5 is sorted.
    • 2:07:09All right, what was the next step?
    • 2:07:10You have to now rewind in time.
    • 2:07:12I just sorted the left half of the left half of the left half.
    • 2:07:16What do I now sort?
    • 2:07:17The right half, which is 2.
    • 2:07:20This is one element.
    • 2:07:22So I'm done.
    • 2:07:22So now at this point in the story, I have sorted, sort of idiotically--
    • 2:07:27the 5 assorted, and the 2 is sorted.
    • 2:07:29But what's the third and final step of this phase of the algorithm?
    • 2:07:34Merge the two together.
    • 2:07:35So here's the left, here's the right list.
    • 2:07:37How do I merge these together?
    • 2:07:38I compare the lists, and I put the two there.
    • 2:07:41I only have the [? 5 ?] left, and I do that.
    • 2:07:43So now we see some visible progress.
    • 2:07:45But again, let's rewind.
    • 2:07:47How did we get here?
    • 2:07:48We started to sort the left half of the left half of the left half, then
    • 2:07:52the right half.
    • 2:07:53And now where are we?
    • 2:07:54We've just sorted the left half of the left half.
    • 2:07:58So what comes after sorting the left half of anything?
    • 2:08:01Right half.
    • 2:08:01All right, here's the sort of same nonsensical thing.
    • 2:08:03Here's a list of size 2.
    • 2:08:05Let's sort the left half.
    • 2:08:06Done.
    • 2:08:07Let's sort the right half.
    • 2:08:08Done.
    • 2:08:09What's the third step?
    • 2:08:10Merge them together.
    • 2:08:11So that's the 4, and that's the 7.
    • 2:08:14What have I now done?
    • 2:08:16In total, I've now sorted the left half of the original thing.
    • 2:08:21So what happens next?
    • 2:08:23Wait a minute, wait a minute.
    • 2:08:24I have not done that.
    • 2:08:25What have I done?
    • 2:08:26I have sorted the left half of the left half,
    • 2:08:29and I've sorted the right half of the left half.
    • 2:08:32What do I now need to do lastly?
    • 2:08:34Merge those two lists together.
    • 2:08:36So again, I put my finger on the beginning
    • 2:08:38of this list, the beginning of this list.
    • 2:08:40And if you want, I'll do the same thing when I merged last time
    • 2:08:42to be clear what I'm comparing.
    • 2:08:432 and 4-- the 2 obviously comes first.
    • 2:08:46What comes next?
    • 2:08:48Well, the 4 comes next.
    • 2:08:50What comes next?
    • 2:08:51The 5 comes next and then lastly, of course, the 7.
    • 2:08:55Notice that the 2457 are now sorted.
    • 2:09:00So the original left half is sorted.
    • 2:09:02And I'll do the rest a little faster because, my God,
    • 2:09:05this feels like it takes forever.
    • 2:09:06But I bet we're on to something here.
    • 2:09:08What step remains next?
    • 2:09:10I've just sorted the left half of the original.
    • 2:09:12Sort the right half of the original.
    • 2:09:14How do I sort this?
    • 2:09:15I sort the left half of the right half.
    • 2:09:17How do I sort this?
    • 2:09:19I sort the left half of the left half.
    • 2:09:21Done.
    • 2:09:22I sort the right half of the left half.
    • 2:09:24Done.
    • 2:09:25Now I merge the two together.
    • 2:09:27The 1 comes first, the 6 comes next.
    • 2:09:30Now I sort the right half of the right half.
    • 2:09:34What do I do?
    • 2:09:35Sort the left half.
    • 2:09:36Done.
    • 2:09:37Sort the right half.
    • 2:09:38Done.
    • 2:09:38What do I do?
    • 2:09:39Merge them together.
    • 2:09:41So that's the third step of that phase.
    • 2:09:43Now where are we in the stor-- oh my God, where are we in the story?
    • 2:09:47We have sorted the left half of the right half
    • 2:09:52and the right half of the right half.
    • 2:09:54What comes next?
    • 2:09:55Merge.
    • 2:09:56So I'm going to compare, and I'm going to move those
    • 2:09:58down just to make clear what I'm comparing,
    • 2:10:00the beginning of both sublists.
    • 2:10:02What comes first?
    • 2:10:03Of course, the 0.
    • 2:10:04What comes next?
    • 2:10:07What comes next?
    • 2:10:08The 1.
    • 2:10:10What comes next?
    • 2:10:11The 3.
    • 2:10:13And then lastly comes the 6.
    • 2:10:14All right, where are we in the story?
    • 2:10:16We've now sorted the left half of the original
    • 2:10:19and the right half of the original.
    • 2:10:20What step remains?
    • 2:10:22Merge.
    • 2:10:23All right, so I'm going to make the same point.
    • 2:10:25And this is actually literally what we did earlier
    • 2:10:27because I deliberately demoed those original numbers in this order, 2
    • 2:10:31and a 0.
    • 2:10:32This comes out first.
    • 2:10:34What comes next?
    • 2:10:352 and 1.
    • 2:10:36The 1 comes out next.
    • 2:10:37What comes next?
    • 2:10:39The 2 comes next.
    • 2:10:40What comes next?
    • 2:10:42The 3 comes next.
    • 2:10:43What comes next?
    • 2:10:44The 4.
    • 2:10:47What comes after that?
    • 2:10:48The 5.
    • 2:10:50What comes after that?
    • 2:10:51The 6.
    • 2:10:52And lastly-- this is when we run out of memory--
    • 2:10:56the 7 over there is actually in place.
    • 2:11:01OK.
    • 2:11:03OK, so admittedly, a little harder to explain,
    • 2:11:05and honestly, it gets a little trippy because it's
    • 2:11:07so easy to forget about where you are in the story
    • 2:11:10because we're constantly diving into the algorithm
    • 2:11:13and then backing back out of it.
    • 2:11:15But in code, we could express this pretty correctly
    • 2:11:17and, it turns out, pretty efficiently because what
    • 2:11:21I was doing, even though it's longer when I do it verbally,
    • 2:11:24I was touching these elements a minimal amount of times, right?
    • 2:11:28I wasn't going back and forth, back and forth in front of the shelf
    • 2:11:31again and again.
    • 2:11:32I was deliberately only ever merging the smallest elements in each list.
    • 2:11:37So every time we merge, even though I was doing it quickly,
    • 2:11:40my fingers were only touching each of the elements once.
    • 2:11:44And how many times did we divide, divide, divide in half the list?
    • 2:11:50Well, we started with all of the elements here,
    • 2:11:52and there were eight of them.
    • 2:11:53And then we moved them 1, 2, 3 positions.
    • 2:11:56So the height of this visualization, if you will, is actually log n, right?
    • 2:12:03If I started with 8, turns out if you do the arithmetic,
    • 2:12:06this is log n height because 2 to the 3 is 8.
    • 2:12:10But for now, just trust that this is a log n height.
    • 2:12:12And how wide is the shelf?
    • 2:12:14Well, it's of width n because there's n elements any time
    • 2:12:18they were on the shelf.
    • 2:12:19So technically, I was kind of cheating this algorithm because this
    • 2:12:23is the first time I've needed shelves.
    • 2:12:25With the human examples, we just had the humans, and that's it, and only eight
    • 2:12:29of them.
    • 2:12:29Here, I was sort of using more and more memory.
    • 2:12:32In fact, I was using like four times as much memory
    • 2:12:34even though that was just for visualization's sake.
    • 2:12:36Merge sort actually requires that you have some spare space, an empty array
    • 2:12:41to move the elements into when you're merging them together.
    • 2:12:44But if I really wanted and if I didn't have this shelf or this shelf,
    • 2:12:46honestly, I could have just gone back and forth between the two shelves.
    • 2:12:49That would have been sufficient.
    • 2:12:51So merge sort uses more memory for this merging process,
    • 2:12:56but the advantage of using more memory is
    • 2:12:59that the total running time, if you can perhaps infer from that math, is what?
    • 2:13:04The big O notation for merge sort, it turns out,
    • 2:13:07is actually going to be n times log n.
    • 2:13:10And even if you're a little rusty still on your logarithms,
    • 2:13:13we saw in week zero and again today that log n is smaller than n.
    • 2:13:18That's a good thing.
    • 2:13:19Binary search was log n.
    • 2:13:20That's faster than linear search, which was n.
    • 2:13:23So n times log n is, of course, smaller than n times n or n squared.
    • 2:13:28So it's sort of lower on this little cheat sheet that I've been drawing,
    • 2:13:31which is to suggest that it's running time is indeed better or faster.
    • 2:13:35And in fact, if we consider the best case running time,
    • 2:13:38turns out it's not quite as good as bubble sort with omega of n,
    • 2:13:42where you can just sort of abort if you realize, wait a minute,
    • 2:13:45I've done no work.
    • 2:13:46Merge sort, you actually have to do that work to get to the finish line anyway.
    • 2:13:50So it's actually in omega and ultimately theta of n log n as well.
    • 2:13:56So again, a trade off there because if you
    • 2:13:58happen to have a data set that is very often sorted,
    • 2:14:00honestly, you might want to stick with bubble sort.
    • 2:14:02But in the general case, where the data is unsorted,
    • 2:14:05n log n as sounding better than n squared.
    • 2:14:08Well, what does it actually look or feel like?
    • 2:14:10Give me a moment to just change over to our visualization here.
    • 2:14:14And we'll see with this example what merge sort looks like
    • 2:14:18depicted with now these vertical bars.
    • 2:14:20So same algorithm, but instead of my numbers on shelves,
    • 2:14:22here is a random array of numbers being sorted.
    • 2:14:27And you can see it being done half at a time.
    • 2:14:30And you see sort of remnants of the previous bars.
    • 2:14:33Actually, that was unfair.
    • 2:14:35Let me zoom out here.
    • 2:14:37Let me zoom out so you can actually see the height here.
    • 2:14:42Let me go ahead and randomize this again and run merge sort.
    • 2:14:44There we go.
    • 2:14:45Now you can see the second array and where the values are going temporarily.
    • 2:14:50And even though this one looks way more cryptic visualization-wise,
    • 2:14:54it does seem to be moving faster.
    • 2:14:56And it seems to be merging halves together, and boom, it's done.
    • 2:14:59So let's actually see, in conclusion, what these algorithms compare to
    • 2:15:03and consider that moving forward as we write more and more code,
    • 2:15:07the goal is, again, not just to be correct but to be well-designed.
    • 2:15:10And one measure of design is going to indeed be efficiency.
    • 2:15:13So here we have, in final, a visualization of three algorithms--
    • 2:15:18selection sort, bubble sort, and merge sort--
    • 2:15:21from top to bottom.
    • 2:15:22And let's see what these algorithms might look or sound like here.
    • 2:15:25Oh, if we can dim the lights for dramatic effect--
    • 2:15:32selection's on top, bubble on bottom, merge in the middle.
    • 2:15:35[MUSIC PLAYING]
    • 2:16:33[MUSIC PLAYING]
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video