CS50 Video Player
    • 🧁

    • 🍮

    • 🍌

    • 🍿
    • 0:00:00Introduction
    • 0:01:17Data Structures
    • 0:02:25Arrays
    • 0:09:41Arrays in C
    • 0:23:50Realloc
    • 0:26:34Arrow Notation
    • 0:28:58Linked Lists
    • 0:43:19Building a Linked List
    • 0:51:29Linked Lists in C
    • 1:10:09Linked List Demonstration
    • 1:17:45Linked List Time Complexities
    • 1:23:20Binary Search Trees
    • 1:30:52tree.c
    • 1:37:17Searching a Binary Search Tree
    • 1:43:10Binary Search Tree Time Complexities
    • 1:44:00Hash Tables
    • 1:50:33Hash Functions
    • 1:51:40Hashing Demonstration
    • 1:53:25Buckets and Collisions
    • 1:57:20Tries
    • 2:04:04Stacks and Queues
    • 2:08:23Jack Learns the Facts
    • 2:12:14This was CS50
    • 0:00:00[MUSIC PLAYING]
    • 0:01:18SPEAKER 1: All right.
    • 0:01:20This is CS 50.
    • 0:01:21And this is already week 5, which means this is actually
    • 0:01:24our last week in C together.
    • 0:01:27In fact, in just a few days' time, what has looked like this
    • 0:01:31and much more cryptic than this perhaps, is
    • 0:01:33going to be distilled into something much simpler next week.
    • 0:01:35When we transition to a language called Python.
    • 0:01:38And with Python, we'll still have our conditionals, and loops, and functions,
    • 0:01:42and so forth.
    • 0:01:43But a lot of the low-level plumbing that you might have been wrestling with,
    • 0:01:46struggling with, frustrated by, over the past couple of weeks,
    • 0:01:49especially, now that we've introduced pointers.
    • 0:01:51And it feels like you probably have to do everything yourself.
    • 0:01:54In Python, and in a lot of higher level languages
    • 0:01:57so to speak-- more modern, more recent languages,
    • 0:01:59you'll be able to do so much more with just single lines of code.
    • 0:02:02And indeed, we're going to start leveraging libraries, all the more code
    • 0:02:05that other people wrote.
    • 0:02:06Frameworks, which is collections of libraries that other people wrote.
    • 0:02:10And on top of all that, will you be able to make even better, grander, more
    • 0:02:13impressive projects, that actually solve problems of particular interest to you.
    • 0:02:17Particularly, by way of your own final project.
    • 0:02:20So last week though, in week 4, recall that we focused on memory.
    • 0:02:23And we've been treating this memory inside of your computer
    • 0:02:26is like a canvas, right.
    • 0:02:27At the end of the day, it's just zeros and ones, or bytes, really.
    • 0:02:30And it's really up to you what you do with those bytes.
    • 0:02:33And how you interconnect them, how you represent information on them.
    • 0:02:37And arrays, were like one of the simplest ways.
    • 0:02:39We started playing around with that memory.
    • 0:02:41Just contiguous chunks of memory.
    • 0:02:43Back-to-back, to back.
    • 0:02:44But let's consider, for a moment, some of the problems that
    • 0:02:47pretty quickly arise with arrays.
    • 0:02:48And then, today focus on what more generally are called data structures.
    • 0:02:52Using your computer's memory as a much more versatile canvas,
    • 0:02:57to create even two-dimensional structures.
    • 0:02:59To represent information, and, ultimately,
    • 0:03:01to solve more interesting problems.
    • 0:03:03So here's an array of size 3.
    • 0:03:04Maybe, the size of 3 integers.
    • 0:03:06And suppose that this is inside of a program.
    • 0:03:08And at this point in the story, you've got 3 numbers in it already.
    • 0:03:111, 2 and 3.
    • 0:03:13And suppose, whatever the context, you need to now add a fourth number
    • 0:03:17to this array.
    • 0:03:17Like, the number 4.
    • 0:03:18Well, instinctively, where should the number 4 go?
    • 0:03:21If this is your computer's memory and we currently
    • 0:03:24have this array 1, 2, 3, from what.
    • 0:03:25Left to right.
    • 0:03:27Where should the number 4 just, perhaps, naively go.
    • 0:03:30Yeah, what do you think?
    • 0:03:31AUDIENCE: Replace number 1.
    • 0:03:32SPEAKER 1: Sorry?
    • 0:03:32AUDIENCE: Replace number 1.
    • 0:03:33SPEAKER 1: Oh, OK.
    • 0:03:34So you could replace number 1.
    • 0:03:36I don't really like that, though, because I'd
    • 0:03:37like to keep number 1 around.
    • 0:03:39But that's an option.
    • 0:03:40But I'm losing, of course, information.
    • 0:03:42So what else could I do if I want to add the number 4.
    • 0:03:44Over there?
    • 0:03:45AUDIENCE: On the right side of 3.
    • 0:03:46SPEAKER 1: Yeah.
    • 0:03:47So, I mean, it feels like if there's some ordering
    • 0:03:49to these, which seems kind of a reasonable inference,
    • 0:03:51that it probably belongs somewhere over here.
    • 0:03:53But recall last week, as we started poking around a computer's memory,
    • 0:03:57there's other stuff potentially going on.
    • 0:03:59And if fill that in, ideally, we'd want to just plop the number 4 here.
    • 0:04:02If we're maintaining this kind of order.
    • 0:04:04But recall in the context of your computer's memory,
    • 0:04:06there might be other stuff there.
    • 0:04:08Some of these garbage values that might be usable,
    • 0:04:10but we don't really know or care what they are.
    • 0:04:12As represented by Oscar here.
    • 0:04:14But there might actually be useful data in use.
    • 0:04:17Like, if your program has not just a few integers in this array,
    • 0:04:20but also a string that says like, "Hello, world."
    • 0:04:23It could be that your computer has plopped the H-E-L-L-O W-O-R-L-D right
    • 0:04:29after this array.
    • 0:04:30Why?
    • 0:04:30Well, maybe, you created the array in one line of code
    • 0:04:32and filled it with 1, 2, 3.
    • 0:04:34Maybe the next line of code used GET-STRING.
    • 0:04:37Or maybe just hard coded a string in your code for "Hello, world."
    • 0:04:40And so you painted yourself into a corner, so to speak.
    • 0:04:42Now I think you might claim, well, let's just overwrite the H.
    • 0:04:45But that's problematic for the same reasons.
    • 0:04:47We don't want to do that.
    • 0:04:49So where else could the 4 go?
    • 0:04:52Or how do we solve this problem if we want to add a number,
    • 0:04:55and there's clearly memory available.
    • 0:04:57Because those garbage values are junk that we don't care about anymore.
    • 0:05:00So we could certainly reuse those.
    • 0:05:02Where could the 4, and perhaps this whole array, go?
    • 0:05:06OK.
    • 0:05:06So I'm hearing we could move it somewhere.
    • 0:05:08Maybe, replace some of those garbage values.
    • 0:05:10And honestly, we have a lot of options.
    • 0:05:12We could use any of these garbage values up here.
    • 0:05:14We could use any of these down here, or even further down.
    • 0:05:17The point is there is plenty of memory available as
    • 0:05:20indicated by these Oscars, where we could put 4, maybe even, 5,
    • 0:05:246 or more integers.
    • 0:05:25The catch is that we chose poorly early on.
    • 0:05:28Or we just got unlucky.
    • 0:05:30And 1, 2, 3 ended up back-to-back with some other data that we care about.
    • 0:05:33All right, so that's fine.
    • 0:05:34Let's go ahead and assume that we'll abstract away everything else.
    • 0:05:37And we'll plop the new array in this location here.
    • 0:05:40So I'm going to go ahead and copy the 1 over.
    • 0:05:42The 2 over.
    • 0:05:43The 3 over.
    • 0:05:44And then, ultimately, once I'm ready to fill the 4,
    • 0:05:47I can throw away, essentially, the old array at this point.
    • 0:05:49Because I have it now entirely in duplicate.
    • 0:05:51And I can populate it with the number 4.
    • 0:05:53All right.
    • 0:05:54So problem solved.
    • 0:05:55That is a correct potential solution to this problem.
    • 0:05:58But, what's the trade off?
    • 0:05:59And this is something we're going to start thinking about all the more.
    • 0:06:02What's the downside of having solved this problem in this way?
    • 0:06:04Yeah.
    • 0:06:06I'm adding a lot of running time.
    • 0:06:07It took me a lot of effort to copy those additional numbers.
    • 0:06:10Now, granted, it's a small array.
    • 0:06:123 numbers, who cares.
    • 0:06:13It's going to be over in the blink of an eye.
    • 0:06:14But if we start talking about interesting data sets,
    • 0:06:17web application data sets, mobile app data sets.
    • 0:06:20Where you have not just a few, but maybe a few hundred, few thousand,
    • 0:06:23a few million pieces of data.
    • 0:06:25This is probably a suboptimal solution to just, oh,
    • 0:06:28move all your data from one place to another.
    • 0:06:30Because who's to say that we're not going
    • 0:06:32to paint ourselves into a new corner.
    • 0:06:34And it would feel like you're wasting all of this time moving stuff around.
    • 0:06:37And, ultimately, just costing yourself a huge amount of time.
    • 0:06:41In fact, if we put this now into the context of our Big O notation
    • 0:06:44from a few weeks back, what might the running time now of Search
    • 0:06:49be for an array?
    • 0:06:50Let's start simple.
    • 0:06:51A throwback a couple of weeks ago.
    • 0:06:53If you're using an array, to recap, what was the running time
    • 0:06:56of a Search algorithm in Big O notation?
    • 0:06:59So, maybe, in the worst case.
    • 0:07:01If you've got n numbers, 3 in this case or 4, but n more generally.
    • 0:07:05Big O of what for Search?
    • 0:07:08Yeah.
    • 0:07:08What do you think?
    • 0:07:09AUDIENCE: Big O of n.
    • 0:07:10SPEAKER 1: Big O of n.
    • 0:07:11And what's your intuition for that?
    • 0:07:12AUDIENCE: [INAUDIBLE].
    • 0:07:18SPEAKER 1: OK.
    • 0:07:19Yeah.
    • 0:07:19So if we go through each element, for instance, from left to right,
    • 0:07:22then Search is going to take this a Big O running time.
    • 0:07:25If, though, we're talking about these numbers, specifically.
    • 0:07:28And now I'll explicitly stipulate that, yeah, they're sorted.
    • 0:07:31Does that buy us anything?
    • 0:07:32What would the Big O notation be for Searching an array in this case,
    • 0:07:36be it of size 3, or 4, or n, more generally.
    • 0:07:39AUDIENCE: Big O of n.
    • 0:07:40SPEAKER 1: Big O of, not n, but rather?
    • 0:07:42AUDIENCE: Log n.
    • 0:07:42SPEAKER 1: Log n, right.
    • 0:07:43Because we could use per week zero binary search on an array like this,
    • 0:07:47we'd have to deal with some rounding.
    • 0:07:49Because there's not a perfect number of elements at the moment.
    • 0:07:51But you could use binary search.
    • 0:07:52Go to the middle roughly.
    • 0:07:54And then go left or right, left or right,
    • 0:07:55until you find the element you care about.
    • 0:07:57So Search remains in Big O of log n when using arrays.
    • 0:08:01But what about insertion, now?
    • 0:08:03If we start to think about other operations.
    • 0:08:05Like, adding a number to this array, or adding a friend to your contacts
    • 0:08:09app, or Google finding another page on the internet.
    • 0:08:12So insertion happens all the time.
    • 0:08:14What's the running time of Insert?
    • 0:08:17When it comes to inserting into an existing array of size n.
    • 0:08:20How many steps might that take?
    • 0:08:23Big O of n.
    • 0:08:24It would be, indeed, n.
    • 0:08:25Why?
    • 0:08:25Because in the worst case, where you're out of space,
    • 0:08:28you have to allocate, it would seem, a new array.
    • 0:08:31Maybe, taking over some of the previous garbage values.
    • 0:08:33But the catch is, even though you're only
    • 0:08:35inserting one new number, like the number 4,
    • 0:08:37you have to copy over all the darn existing numbers into the new one.
    • 0:08:41So if your original array of size n, the copying of that
    • 0:08:44is going to take Big O of n plus 1.
    • 0:08:45But we can throw away the plus 1 because of the math we did in the past.
    • 0:08:48So Insert now becomes Big O of n.
    • 0:08:51And that might not be ideal.
    • 0:08:53Because if you're in the habit of inserting things frequently,
    • 0:08:56that could start to add up, and add up, and add up.
    • 0:08:58And this is why computer programs, and websites, and mobile apps
    • 0:09:01could be slow.
    • 0:09:02If you're not being mindful of these trade offs.
    • 0:09:06So what about, just for good measure, Omega notation.
    • 0:09:10And maybe, the best case.
    • 0:09:11Well just to recap here, we could get lucky
    • 0:09:13and Search could just take one step.
    • 0:09:16Because you might just get lucky, and boom the number
    • 0:09:18you're looking for is right there in the middle, if using binary search.
    • 0:09:20Or even linear search, for that matter.
    • 0:09:22And insert 2.
    • 0:09:23If there's enough room, and we didn't have to move all of those numbers--
    • 0:09:271, 2, and 3, to a new location.
    • 0:09:29You could get lucky.
    • 0:09:30And we could have, as someone suggested, just
    • 0:09:32put the number 4 right there at the end.
    • 0:09:34And if we don't get lucky, it might take n steps.
    • 0:09:36If we do get lucky, it might just take the one, or constant number, of steps.
    • 0:09:39In fact, let me go ahead and do this.
    • 0:09:41How about we do something like this?
    • 0:09:43Let me switch over to some code here.
    • 0:09:45Let me start to make a program called List.C.
    • 0:09:48And in List.C, let's start with the old way.
    • 0:09:50So we follow the breadcrumbs we've laid for ourselves as follows.
    • 0:09:54So in this List.C, I'm going to include standardio.h.
    • 0:09:57Int main(void) as usual.
    • 0:09:59Then inside of my code here, I'm going to go ahead and give myself
    • 0:10:02the first version of memory.
    • 0:10:04So int list 3 is now implemented at the moment, in an array.
    • 0:10:09So we're rewinding for now to week 2 style code.
    • 0:10:11And then, let me just initialize this thing.
    • 0:10:13At the first location will be 1.
    • 0:10:15At the next location will be 2.
    • 0:10:17And at the last location will be 3.
    • 0:10:19So the array is zero indexed always.
    • 0:10:22I, for just the sake of discussion though,
    • 0:10:23am putting in the numbers 1, 2, 3, like a normal person might.
    • 0:10:27All right.
    • 0:10:27So now let's just print these out.
    • 0:10:294 int i gets 0.
    • 0:10:30I less than 3, i++.
    • 0:10:32Let's go ahead now and print out using printf.
    • 0:10:35%i/n list [i].
    • 0:10:38So very simple program, inspired by what we did in week 2.
    • 0:10:42Just to create and then print out the contents of an array.
    • 0:10:46So let's Make List.
    • 0:10:48So far, so good. ./list And voila, we see 1, 2, 3.
    • 0:10:52Now let's start to practice some of what we're preaching with this new syntax.
    • 0:10:57So let me go in now and get rid of the array version.
    • 0:11:02And let me zoom out a little bit to give ourselves some more space.
    • 0:11:04And now let's begin to create a list of size 3.
    • 0:11:08So if I'm going to do this now, dynamically,
    • 0:11:11so that I'm allocating these things again and again,
    • 0:11:15let me go ahead and do this.
    • 0:11:17Let me give myself a list that's of type int* equal the return value of malloc
    • 0:11:24of 3 times the size of an int, so what this is going to do for me is give me
    • 0:11:31enough memory for that very first picture we drew on the board.
    • 0:11:34Which was the array containing 1, 2, and 3.
    • 0:11:37But laying the foundation to be able to resize it,
    • 0:11:39which was ultimately the goal.
    • 0:11:41So my syntax is a little different here.
    • 0:11:43I'm going to use malloc and get memory from the so-called "heap", as we
    • 0:11:47called it last week.
    • 0:11:48Instead of using the stack by just doing the previous version where I said,
    • 0:11:51int list 3.
    • 0:11:54That is to say this line of code from the first version is in some sense
    • 0:11:59identical to this line of code in the second version.
    • 0:12:02But the first line of code puts the memory
    • 0:12:04on the stack, automatically, for me.
    • 0:12:06The second line of code, that I've left here now,
    • 0:12:09is creating an array of size 3, but it's putting it on the heap.
    • 0:12:13And that's important because it was only on the heap and via this new function
    • 0:12:16last week, malloc.
    • 0:12:17That you can actually ask for more memory, and even give it back.
    • 0:12:20When you just use the first notation int list 3,
    • 0:12:24you have permanently given yourself an array of size 3.
    • 0:12:28You cannot add to that in code.
    • 0:12:31So let me go ahead and do this.
    • 0:12:33If list==null, something went wrong.
    • 0:12:36The computers out of memory.
    • 0:12:37So let's just return 1 and quit out of this program.
    • 0:12:39There's nothing to see here.
    • 0:12:40So just a good error check there.
    • 0:12:42Now let me go ahead and initialize this list.
    • 0:12:44So list [0] will be 1 again.
    • 0:12:46List [1] will be 2.
    • 0:12:48And list [2] will be 3.
    • 0:12:50So that's the same kind of syntax as before.
    • 0:12:52And notice this equivalence.
    • 0:12:55Recall that there's this relationship between chunks of memory and arrays.
    • 0:13:00And arrays are really just doing pointer arithmetic for you,
    • 0:13:03where the square bracket notation is.
    • 0:13:05So if I've asked myself here, in line 5, for enough memory for 3 integers,
    • 0:13:10it is perfectly OK to treat it now like an array using square bracket notation.
    • 0:13:15Because the computer will do the arithmetic for me
    • 0:13:17and find the first location, the second, and the third.
    • 0:13:20If you really want to be cool and hacker-like, well,
    • 0:13:24you could say list=1, list+1=2, list+2=3.
    • 0:13:33That's the same thing using very explicit,
    • 0:13:36pointer arithmetic, which we looked at briefly last week.
    • 0:13:38But this is atrocious to look at for most people.
    • 0:13:41It's just not very user friendly.
    • 0:13:42It's longer to type, so most people, even when
    • 0:13:45allocating memory dynamically as I did a second ago,
    • 0:13:48would just use the more familiar notation of an array.
    • 0:13:52All right.
    • 0:13:53So let's go on.
    • 0:13:54Now suppose time passes and I realize, oh shoot,
    • 0:13:58I really wanted this array to be of size 4 instead of size 3.
    • 0:14:03Now, obviously, I could just rewind and like fix the program.
    • 0:14:06But suppose that this is a much larger program.
    • 0:14:08And I've realized, at this point, that I need
    • 0:14:10to be able to dynamically add more things to this array for whatever
    • 0:14:14reason.
    • 0:14:14Well let me go ahead and do this.
    • 0:14:16Let me just say, all right, list should actually
    • 0:14:18be the result of asking for 4 chunks of memory from malloc.
    • 0:14:24And then, I could do something like this, list [3]=4.
    • 0:14:31Now this is buggy, potentially, in a couple of ways.
    • 0:14:34But let me ask first, what's really wrong, first, with this code?
    • 0:14:41The goal at hand is to start with the array of size 3 with the 1, 2, 3.
    • 0:14:45And I want to add a number 4 to it.
    • 0:14:47So at the moment, in line 17, I've asked the computer for a chunk of 4 integers.
    • 0:14:53Just like the picture.
    • 0:14:54And then I'm adding the number 4 to it.
    • 0:14:57But I have skipped a few steps and broken this somehow.
    • 0:15:00Yeah.
    • 0:15:01AUDIENCE: You don't know exactly [INAUDIBLE]..
    • 0:15:04SPEAKER 1: Yeah.
    • 0:15:04I don't necessarily know where this is going to end up in memory.
    • 0:15:07It's probably not going to be immediately
    • 0:15:08adjacent to the previous chunk.
    • 0:15:09And so, yes, even though I'm putting the number for there,
    • 0:15:12I haven't copied the 1, the 2, or the 3 over to this chunk of memory.
    • 0:15:16So well let me fix--
    • 0:15:18well, that's actually, indeed, really the essence of the problem.
    • 0:15:22I am orphaning the original chunk of memory.
    • 0:15:26If you think of the picture that I drew earlier, the line of code
    • 0:15:29up here on line 5 that allocates space for the initial 3 integers.
    • 0:15:35This code is fine.
    • 0:15:36This code is fine.
    • 0:15:38But as soon as I do this, I'm clobbering the value of list.
    • 0:15:41And saying no, don't point at this chunk of memory.
    • 0:15:43Point at this chunk of memory, at which point I've forgotten if you will,
    • 0:15:47where the original chunk of memory is.
    • 0:15:50So the right way to do something like this, would be a little more involved.
    • 0:15:54Let me go ahead and give myself a temporary variable.
    • 0:15:57And I'll literally call it TMP.
    • 0:15:58T-M-P, like I did last week.
    • 0:16:00So that I can now ask the computer for a completely different chunk of memory
    • 0:16:04of size 4.
    • 0:16:05I'm going to again say if TMP equals null,
    • 0:16:08I'm going to say bad things happened here.
    • 0:16:10So let me just return 1.
    • 0:16:11And you know what, just to be tidy, let me
    • 0:16:13free the original list before I quit.
    • 0:16:16Because remember from last week, any time
    • 0:16:18you use malloc you eventually have to use free.
    • 0:16:20But this chunk of code here is just a safety check.
    • 0:16:24If there's no more memory, there's nothing to see here.
    • 0:16:26I'm just going to clean up my state and quit.
    • 0:16:29But now, if I have asked for this chunk of memory,
    • 0:16:32now I can do this 4 int i gets 0.
    • 0:16:38I is less than 3, i++.
    • 0:16:40What if I do something like this?
    • 0:16:42TMP [i] equals list [i].
    • 0:16:46That would seem to have the effect of copying all of the memory from one
    • 0:16:50to the other.
    • 0:16:51And then, I think I need to do one last thing TMP [3]
    • 0:16:55gets the number 4, for instance.
    • 0:16:57Again, I'm hard coding the numbers for the sake of discussion.
    • 0:17:01After I've done this, what could I now do?
    • 0:17:06I could now set list equals to TMP.
    • 0:17:10And now, I have updated my linked list properly.
    • 0:17:14So let me go ahead and do this.
    • 0:17:154 int i gets 0.
    • 0:17:17I is less than 4, i++.
    • 0:17:19Let me go ahead and print each of these elements out with %i using list [i].
    • 0:17:24And then, I'm going to return 0 just to signify that all is successful.
    • 0:17:27Now so to recap, we initialize the original array
    • 0:17:31of size 3 and plug-in the values 1, 2, 3.
    • 0:17:35Time passes.
    • 0:17:35And then, I realize, wait a minute, I need more space.
    • 0:17:38And so I asked the computer for a second chunk of memory.
    • 0:17:40This one of size 4.
    • 0:17:41Just as a safety check, I make sure that TMP doesn't equal null.
    • 0:17:44Because if it does I'm out of memory.
    • 0:17:46So I should just quit altogether.
    • 0:17:47But once I'm sure that it's not null, I'm
    • 0:17:50going to copy all the values from the old list into the new list.
    • 0:17:55And then, I'm going to add my new number at the end of that list.
    • 0:17:58And then, now that I'm done playing around with this temporary variable,
    • 0:18:02I'm going to remember in my list variable what
    • 0:18:05the addresses of this new chunk of memory.
    • 0:18:07And then, I'm going to print all of those values out.
    • 0:18:10So at least, aesthetically, when I make this new version of my list,
    • 0:18:14except for my missing semicolon.
    • 0:18:16Let me try this again.
    • 0:18:17When I make lists, Oh OK.
    • 0:18:19What did I do this time?
    • 0:18:20Implicitly declaring a library function malloc.
    • 0:18:23What's my mistake any time you see that kind of error?
    • 0:18:27AUDIENCE: Library.
    • 0:18:28SPEAKER 1: Yeah.
    • 0:18:28A library.
    • 0:18:29So up here, I forgot to do include stdlib.h, which is where malloc lives.
    • 0:18:34Let me go ahead and, again, do make list.
    • 0:18:36There we go.
    • 0:18:37So I fixed that dot/list.
    • 0:18:38And I should see 1, 2, 3, 4.
    • 0:18:41But they're still a bug here.
    • 0:18:45Does anyone see the the-- bug or question?
    • 0:18:48AUDIENCE: You forgot to free them.
    • 0:18:50SPEAKER 1: I'm sorry, say again.
    • 0:18:50AUDIENCE: You forgot to free them.
    • 0:18:52SPEAKER 1: I forgot to free the original list.
    • 0:18:54And we could see this, even if not just with our own eyes or intuition.
    • 0:18:58If I do something like Valgrind of dot/list,
    • 0:19:00remember our tool from this past week.
    • 0:19:02Let me increase the size of my terminal window, temporarily.
    • 0:19:05The output is crazy cryptic at first.
    • 0:19:07But, notice that I have definitely lost some number of bytes here.
    • 0:19:12And indeed, it's even pointing at the line number
    • 0:19:15in which some of those bytes were lost.
    • 0:19:16So let me go ahead and back to my code.
    • 0:19:18And indeed, I think what I need to do is, before I clobber the value of list
    • 0:19:23pointing it at this new chunk of memory instead of the old,
    • 0:19:27I think I now need to first, proactively,
    • 0:19:29say free the old list of memory.
    • 0:19:32And then, change its value.
    • 0:19:34So if I now do Make List and do dot /list, the output is still the same.
    • 0:19:39And, if I cross my fingers and run Valgrind again
    • 0:19:42after increasing my window size, hopefully here.
    • 0:19:46Oh, still a bug.
    • 0:19:48So better.
    • 0:19:49It seems like less memory is lost.
    • 0:19:52What have I now forgotten to do?
    • 0:19:54AUDIENCE: You forgot to free the end.
    • 0:19:56SPEAKER 1: I forgot to free it at the very end, too.
    • 0:19:58Because I still have a chunk of memory that I got from malloc.
    • 0:20:01So let me go to the very bottom of the program now.
    • 0:20:04And after I'm done senselessly just printing this thing out,
    • 0:20:09let me free the new list.
    • 0:20:12And now let me do Make List, dot/list.
    • 0:20:15It's still works, visually.
    • 0:20:17Now let's do Valgrind of dot/list, Enter.
    • 0:20:22And now, hopefully, all heap blocks were freed.
    • 0:20:25No leaks are possible.
    • 0:20:27So this is perhaps the best output you can see from a tool like Valgrind.
    • 0:20:30I used the heap, but I freed all the memory as well.
    • 0:20:32So there were 2 fixes needed there.
    • 0:20:34All right.
    • 0:20:35Any questions then on this array-based approach, the first of which
    • 0:20:38is statically allocating an array, so to speak.
    • 0:20:41By just hard coding the number 3.
    • 0:20:43The second version now is dynamically allocating the array,
    • 0:20:47using not the stack but the heap.
    • 0:20:49But, it too, suffers from the slowness we described earlier,
    • 0:20:52of having to copy all those values from one to the other.
    • 0:20:55OK.
    • 0:20:55A hand was over here.
    • 0:20:57AUDIENCE: Why do you not have to free the TMP?
    • 0:20:59SPEAKER 1: Good question.
    • 0:21:00Why did I not have to free the TMP?
    • 0:21:02I essentially did eventually.
    • 0:21:05Because TMP was pointing at the chunk of 4 integers.
    • 0:21:10But on line 33 here, I assigned list to be
    • 0:21:15identical to what TMP was pointing at.
    • 0:21:18And so, when I finally freed the list, that was the same thing as freeing TMP.
    • 0:21:23In fact, if I wanted to, I could say free TMP here and it would be the same.
    • 0:21:26But conceptually, it's wrong.
    • 0:21:28Because at this point in the story, I should be freeing the actual list, not
    • 0:21:32that temporary variable.
    • 0:21:33But they were the same at that point in the story.
    • 0:21:35Yeah.
    • 0:21:35AUDIENCE: Is [? the line ?] part of it?
    • 0:21:37SPEAKER 1: Good question.
    • 0:21:38And long story short, everything we're doing thus far
    • 0:21:41is still in the world of arrays.
    • 0:21:42The only distinction we're making is that
    • 0:21:44in version 1, when I said int list [3], that was an array of fixed size.
    • 0:21:51So-called statically allocated on the stack, as per last week.
    • 0:21:55This version now is still dealing with arrays, but I'm flexing my muscles
    • 0:21:58and using dynamic memory allocation.
    • 0:22:00So that I can still use an array per the first pictures
    • 0:22:03we started talking about.
    • 0:22:04But I can at least grow the array if I want.
    • 0:22:07So we haven't even now solved this, even better in a sense, with linked lists.
    • 0:22:10That's going to come next.
    • 0:22:12Yeah.
    • 0:22:12AUDIENCE: How are you able to free list and then still make list?
    • 0:22:16SPEAKER 1: How am I able to free list?
    • 0:22:19I freed the original address of list.
    • 0:22:24I, then, changed what list is storing.
    • 0:22:27I'm moving its arrow to a new chunk of memory.
    • 0:22:30And that is perfectly reasonable for me to now manipulate
    • 0:22:33because now list is pointing at the same value of TMP.
    • 0:22:37And TMP is what was given the return value of malloc, the second time.
    • 0:22:42So that chunk of memory is valid.
    • 0:22:44So these are just squares on the board, right.
    • 0:22:48There's just pointers inside of them.
    • 0:22:49So what I'm technically saying is, and I'm not
    • 0:22:51pointing I'm not freeing list per se, I am
    • 0:22:54freeing the chunk of memory that begins at the address currently in list.
    • 0:22:58Therefore, if a few lines later, I change what the address is in list.
    • 0:23:04Totally reasonable to then touch that memory, and eventually free it later.
    • 0:23:08Because you're not freeing the variable per se,
    • 0:23:10you're freeing the address in the variable.
    • 0:23:12Good distinction.
    • 0:23:13All right.
    • 0:23:14So let me back up here and now make one final edit.
    • 0:23:19So let's finish this with one final improvement here.
    • 0:23:24Because it turns out, there's a somewhat better way
    • 0:23:27to actually resize an array as we've been doing here.
    • 0:23:30And there's another function in stdlib that's called realloc, for re-allocate.
    • 0:23:35And I'm just going to go in and make a little bit of a change
    • 0:23:37here so that I can do the following.
    • 0:23:40Let me go ahead and first comment this now,
    • 0:23:42just so we can keep track of what's been going on this whole time.
    • 0:23:45So dynamically allocate an array of size 3.
    • 0:23:51Assign 3 numbers to that array.
    • 0:23:56Time passes.
    • 0:23:58Allocate new array of size 4.
    • 0:24:03Copy numbers from old array into new array.
    • 0:24:09And add fourth number to new array.
    • 0:24:14Free old array.
    • 0:24:18Remember, if you will, new array using my same list variable.
    • 0:24:24And now, print new array.
    • 0:24:28Free new array.
    • 0:24:31Hopefully, that helps.
    • 0:24:32And we'll post this code online after 2, which tells a more explicit story.
    • 0:24:35So it turns out that we can reduce some of the labor involved with this.
    • 0:24:39Not so much with the printing here, but with this copying.
    • 0:24:41Turns out c does have a function called realloc,
    • 0:24:44that can actually handle the resizing of an array for you, as follows.
    • 0:24:49I'm going to scroll up to where I previously
    • 0:24:51allocated a new array of size 4.
    • 0:24:54And I'm instead going to say this, resize old array to be of size 4.
    • 0:25:02Now, previously this wasn't necessarily possible.
    • 0:25:04Because recall that we had painted ourselves
    • 0:25:06into a corner with the example on the screen
    • 0:25:08where "Hello, world" happened to be right after the original array.
    • 0:25:10But let me do this.
    • 0:25:12Let me use realloc, for re-allocate.
    • 0:25:15And pass in not just the size of memory we want this time,
    • 0:25:18but also the address that we want to resize.
    • 0:25:22Which, again, is this array called list.
    • 0:25:25All right.
    • 0:25:26The code thereafter is pretty much the same.
    • 0:25:29But what I don't need to do is this.
    • 0:25:33So realloc is a pretty handy function that will do the following.
    • 0:25:36If at the very beginning of class, when we had 1, 2, 3 on the board.
    • 0:25:39And someone's instinct was to just plop the 4 right at the end of the list.
    • 0:25:43If there's available memory, realloc will just do that.
    • 0:25:45And boom, it will just grow the array for you in the computer's memory.
    • 0:25:50If, though, it realizes, sorry, there's already a string like "Hello, world"
    • 0:25:54or something else there, realloc will handle
    • 0:25:57the trouble of moving that whole array from 1 chunk of memory,
    • 0:26:00originally, to a new chunk of memory.
    • 0:26:03And then realloc will return to you, the address of that new chunk of memory.
    • 0:26:09And it will handle the process of freeing the old chunk for you.
    • 0:26:13So you do not need to do this yourself.
    • 0:26:15So in fact, let me go ahead and get rid of this as well.
    • 0:26:19So realloc just condenses, a lot of what we just did, into a single function.
    • 0:26:24Whereby, realloc handles it for you.
    • 0:26:28All right.
    • 0:26:28So that's the final improvement on this array-based approach.
    • 0:26:31So what now, knowing what your memory is,
    • 0:26:34what can we now do with it that solves that kind of problem?
    • 0:26:37Because the world is going to get really slow.
    • 0:26:39And our apps, and our phones, and our computers are getting really slow,
    • 0:26:42if we're just constantly wasting time moving things around in memory.
    • 0:26:46What could we perhaps do instead?
    • 0:26:48Well there's one new piece of syntax today
    • 0:26:50that builds on these 3 pieces of syntax from the past.
    • 0:26:53Recall, that we've looked at struct, which
    • 0:26:55is a keyword in C, that just lets you invent your own structure.
    • 0:26:58Your own variable, if you will, in conjunction with typedef.
    • 0:27:02Which lets you say a person has a name and a number, or something like that.
    • 0:27:06Or a candidate has a name and some number of votes.
    • 0:27:08You can encapsulate multiple pieces of data inside of just one using struct.
    • 0:27:13What did we use the Dot Notation for now, a couple of times?
    • 0:27:17What does the Dot operator do in C?
    • 0:27:20AUDIENCE: Access the structure.
    • 0:27:21SPEAKER 1: Perfect.
    • 0:27:22To access the field inside of a structure.
    • 0:27:24So if you've got a person with a name and a number,
    • 0:27:26you could say something like person.name or person.number,
    • 0:27:29if person is the name of one such variable.
    • 0:27:31Star, of course, we've seen now in a few ways.
    • 0:27:33Like way back in week 1, we saw it as like, multiplication.
    • 0:27:37Last week, we began to see it in the context of pointers,
    • 0:27:40whereby, you use it to declare a pointer.
    • 0:27:42Like, int* p, or something like that.
    • 0:27:45But we also saw it in one other context, which
    • 0:27:48was like the opposite, which was the dereference operator.
    • 0:27:51Which says if this is an address, that is
    • 0:27:53if this is a variable like a pointer, and you put a star in front of it
    • 0:27:56then with no int or no char, no data type in front of it.
    • 0:27:59That means go to that address.
    • 0:28:01And it dereferences the pointer and goes to that location.
    • 0:28:05So it turns out that using these 3 building blocks,
    • 0:28:07you can actually start to now use your computer's memory almost any way
    • 0:28:10you want.
    • 0:28:11And even next week, when we transition to Python,
    • 0:28:13and you start to get a lot of features for free.
    • 0:28:16Like a single line of code will just do so much
    • 0:28:18more in Python than it does in C. It boils down to those basic primitives.
    • 0:28:23And just so you've seen it already.
    • 0:28:25It turns out that it's so common in C to use this operator
    • 0:28:29to go inside of a structure and this operator to go to an address,
    • 0:28:33that there's shorthand notation for it, a.k.a.
    • 0:28:36syntactic sugar.
    • 0:28:37That literally looks like an arrow.
    • 0:28:39So recall last week, I was in the habit of pointing, even
    • 0:28:41with the big foam finger.
    • 0:28:42This arrow notation, a hyphen and an angled bracket,
    • 0:28:47denotes going to an address and looking at a field inside of it.
    • 0:28:53But we'll see this in practice in just a bit.
    • 0:28:56So what might be the solution, now, to this problem
    • 0:28:59we saw a moment ago whereby, we had painted ourselves into a corner.
    • 0:29:02And our memory, a few moments ago, looked like this.
    • 0:29:05We could just copy the whole existing array to a new location, add the 4,
    • 0:29:10and go about our business.
    • 0:29:12What would another, perhaps better solution longer term
    • 0:29:15be, that doesn't require constantly moving stuff around?
    • 0:29:21Maybe hang in there for your instincts if you
    • 0:29:23know the buzz phrase we're looking for from past experience, hang in there.
    • 0:29:27But if we want to avoid moving the 1, 2, and the 3,
    • 0:29:29but we still want to be able to add endless amounts of data.
    • 0:29:32What could we do?
    • 0:29:33Yeah.
    • 0:29:34So maybe create some kind of list using pointers that
    • 0:29:37just point at a new location, right.
    • 0:29:39In an ideal world, even though this piece of memory
    • 0:29:42is being used by this h in the string "Hello, world",
    • 0:29:45maybe we could somehow use a pointer from last week.
    • 0:29:47Like an arrow, that says after the 3, oh I don't know, go down over here
    • 0:29:52to this location in memory.
    • 0:29:54And you just stitch together these integers in memory
    • 0:29:58so that each one leads to the next.
    • 0:30:00It's not necessarily the case that it's literally back-to-back.
    • 0:30:03That would have the downside, it would seem,
    • 0:30:05of costing us a little bit of space.
    • 0:30:07Like a pointer, which recall, takes up some amount of space.
    • 0:30:10Typically 8 bytes or 64 bits.
    • 0:30:12But I don't have to copy potentially a huge amount of data just
    • 0:30:16to add one more number.
    • 0:30:17And so these things do have a name.
    • 0:30:19And indeed, these things are what generally
    • 0:30:21would be called a linked list.
    • 0:30:24A linked list captures exactly that intuition
    • 0:30:27of linking together things in memory.
    • 0:30:29So let's take a look at an example.
    • 0:30:30Here's a computer's memory in the abstract.
    • 0:30:32Suppose that I'm trying to create an array.
    • 0:30:35Let's generalize it as a list, now, of numbers.
    • 0:30:38An array has a very specific meaning.
    • 0:30:39It's memory that's contiguous, back, to back, to back.
    • 0:30:42At the end of the day, I as the programmer, just care about the data--
    • 0:30:461, 2, 3, 4, and so forth.
    • 0:30:48I don't really care how it's stored.
    • 0:30:52I don't care how it's stored when I'm writing the code,
    • 0:30:54I just wanted to work at the end of the day.
    • 0:30:56So suppose that I first insert my number 1.
    • 0:30:58And, who knows, it ends up, up there at location, 0X123,
    • 0:31:02for the sake of discussion.
    • 0:31:03All right.
    • 0:31:03Maybe there's something already here.
    • 0:31:06And heck, maybe there's something already here,
    • 0:31:08but there's plenty of other options for where this thing can go.
    • 0:31:11And suppose that, for the sake of discussion,
    • 0:31:12the first available spot for the next number
    • 0:31:14happens to be over here at location 0X456, for the sake of discussion.
    • 0:31:20So that's where I'm going to plop the number 2.
    • 0:31:22And where might the number 3 end up?
    • 0:31:24Oh I don't know, maybe down over there at 0X789.
    • 0:31:26The point being, I don't know what is, or really care about,
    • 0:31:31everything else that's in the computer's memory.
    • 0:31:33I just care that there are at least 3 locations available where
    • 0:31:37I can put my 1, my 2, and my 3.
    • 0:31:40But the catch is, now that we're not using an array,
    • 0:31:44we can't just naively assume that you just add 1 to an index and boom,
    • 0:31:48you're at the next number.
    • 0:31:49Add 2 to an index, and boom you're at the next, next number.
    • 0:31:52Now you have to leave these little breadcrumbs, or use the arrow notation,
    • 0:31:57to lead from one to the other.
    • 0:31:59And sometimes, it might be close, a few bytes away.
    • 0:32:01Maybe, it's a whole gigabyte away in an even bigger computer's memory.
    • 0:32:05So how might I do this?
    • 0:32:07Like where do these pointers go, as you proposed?
    • 0:32:12All right.
    • 0:32:13All I have access to here are bytes.
    • 0:32:15I've already stored the 1, the 2, and the 3.
    • 0:32:17So what more should I do?
    • 0:32:19OK, yeah.
    • 0:32:20So let me, you put the pointers right next to these numbers.
    • 0:32:23So let me at least plan ahead, so that when I ask the computer like malloc,
    • 0:32:27recall from last week, for some memory, I don't just ask it now
    • 0:32:30for space for just the number.
    • 0:32:32Let me start getting into the habit of asking
    • 0:32:34malloc for enough space for the number and a pointer to another such number.
    • 0:32:39So it's a little more aggressive of me to ask for more memory.
    • 0:32:42But I'm planning ahead.
    • 0:32:43And here is an example of a trade off.
    • 0:32:45Almost any time in CS, when you start using more space, you can save time.
    • 0:32:48Or if you try to conserve space, you might have to lose time.
    • 0:32:53It's being that trade off there.
    • 0:32:54So how might I solve this?
    • 0:32:56Well let me abstract this away.
    • 0:32:58And either next to or below, I'm just drawing it vertically, just
    • 0:33:01for the sake of discussion.
    • 0:33:02So the arrows are a bit prettier.
    • 0:33:04I've asked malloc for now twice as much space,
    • 0:33:07it would seem, than I previously needed.
    • 0:33:09But I'm going to use this second chunk of memory to refer to the next number.
    • 0:33:13And I'm going to use this chunk of memory to refer to the next,
    • 0:33:16essentially, stitching this thing together.
    • 0:33:17So what should go in this first box?
    • 0:33:20Well, I claim the number, 0X456.
    • 0:33:23And it's written in hex because it represents a memory address.
    • 0:33:26But this is the equivalent of drawing an arrow from one to the other.
    • 0:33:30As a little check here, what should go in this second box
    • 0:33:34if the goal is to stitch these together in order 1, 2, 3?
    • 0:33:37Feel free to just shout this out.
    • 0:33:40AUDIENCE: 0X789.
    • 0:33:41SPEAKER 1: OK, that worked well.
    • 0:33:42So 0X789, indeed.
    • 0:33:43And you can't do that with the hands because I can't count that fast.
    • 0:33:46So 0X789 should go here because that's like a little breadcrumb to the next.
    • 0:33:51And then, we don't really have terribly many possibilities here.
    • 0:33:54This has to have a value, right.
    • 0:33:56Because at the end of the day, it's got to use its 64 bits in some way.
    • 0:34:01So what value should go here, if this is the end of this list?
    • 0:34:05AUDIENCE: 0.
    • 0:34:06SPEAKER 1: So it could be 0X123.
    • 0:34:08The implication being that it would be a cyclical list.
    • 0:34:12Which is OK, but potentially problematic.
    • 0:34:14If any of you have accidentally lost control over your code space
    • 0:34:18because you had an infinite loop, this would seem a very easy way
    • 0:34:21to give yourself the accidental probability of an infinite loop.
    • 0:34:26What might be simpler than that and ward that off?
    • 0:34:28AUDIENCE: Null.
    • 0:34:29SPEAKER 1: Say again?
    • 0:34:30AUDIENCE: Null.
    • 0:34:31SPEAKER 1: So just the null character.
    • 0:34:32Not N-U-L, confusingly, which is at the end of strings.
    • 0:34:35But N-U-L-L, as we introduced it last week.
    • 0:34:38Which is the same as 0x0.
    • 0:34:40So this is just a special value that programmers decades ago
    • 0:34:43decided that if you store the address 0, that's not a valid address.
    • 0:34:47There's never going to be anything useful at 0x0.
    • 0:34:50Therefore, it's a sentinel value, just a special value,
    • 0:34:53that indicates that's it.
    • 0:34:54There's nowhere further to go.
    • 0:34:56It's OK to come back to your suggestion of making a cyclical list.
    • 0:35:00But we'd better be smart enough to, maybe,
    • 0:35:02remember where did the list start so that you can detect cycles.
    • 0:35:06If you start looping around in this structure, otherwise.
    • 0:35:08All right.
    • 0:35:09But these addresses, who really cares at the end of the day
    • 0:35:11if we abstract this away.
    • 0:35:12It really just now looks like this.
    • 0:35:14And indeed, this is how most anyone would draw this on a whiteboard
    • 0:35:17if having a discussion at work.
    • 0:35:19Talking about what data structure we should
    • 0:35:20use to solve some problem in the real world.
    • 0:35:22We don't care generally about the addresses.
    • 0:35:25We care that in code we can access them.
    • 0:35:27But in terms of the concept alone this would be, perhaps,
    • 0:35:30the right way to think about this.
    • 0:35:32All right, let me pause here and see if there's
    • 0:35:34any questions on this idea of creating a linked list in memory by just storing,
    • 0:35:38not just the numbers like 1, 2, 3, but twice as much data.
    • 0:35:42So that you have little breadcrumbs in the form of pointers
    • 0:35:45that can lead you from one to the next.
    • 0:35:48Any questions on these linked lists?
    • 0:35:54Any questions?
    • 0:35:54No?
    • 0:35:55All right.
    • 0:35:55Oh, yeah.
    • 0:35:56Over here.
    • 0:35:57AUDIENCE: So does this takes time more memory than an array?
    • 0:36:02SPEAKER 1: This does take more memory than an array
    • 0:36:04because I now need space for these pointers.
    • 0:36:06And to be clear, I technically didn't really draw this to scale.
    • 0:36:10Thus far, in the class, we've generally thought about integers
    • 0:36:13like, 1, 2 and 3, as being 4 bytes, or 32 bits.
    • 0:36:16I made the claim last week that on modern computer's pointers
    • 0:36:19tend to be 8 bytes or 64 bits.
    • 0:36:22So, technically, this box should actually be a little bigger.
    • 0:36:25It was just going to look a little stupid in the picture.
    • 0:36:26So I abstracted it away.
    • 0:36:28But, indeed, you're using more space as a result.
    • 0:36:31AUDIENCE: [INAUDIBLE].
    • 0:36:32SPEAKER 1: Oh, how does-- sorry.
    • 0:36:34How does the computer identify useful data from used data?
    • 0:36:37So, for instance, garbage values or non-garbage values.
    • 0:36:40For now, think of that as the job of malloc.
    • 0:36:43So when you ask malloc for memory, as we started to last week,
    • 0:36:46malloc keeps track of the addresses of the memory
    • 0:36:49it has handed to as valid values.
    • 0:36:52The other type of memory you use, not just from the heap.
    • 0:36:55Because recall we briefly discussed that malloc uses space
    • 0:36:58from the heap, which was drawn at the top of the picture, pointing down.
    • 0:37:01There's also stack memory, which is where all of your local variables go.
    • 0:37:05And where all of the memory used by individual functions go.
    • 0:37:07And that was drawn in the picture is working its way up.
    • 0:37:10That's just an artist's rendition of direction.
    • 0:37:12The compiler, essentially, will also help
    • 0:37:16keep track of which values are valid or not inside of the stack.
    • 0:37:19Or really the underlying code that you've written
    • 0:37:21will keep track of that for you.
    • 0:37:23So it's managed for you at that point.
    • 0:37:26All right.
    • 0:37:26Good question.
    • 0:37:27Sorry it took me a bit to catch on.
    • 0:37:29So let's now translate this to actual code.
    • 0:37:31How could we implement this idea of, let's call these things nodes.
    • 0:37:34And that's a term of our NCS.
    • 0:37:36Whenever you have some data structure that encapsulates information, node,
    • 0:37:40N-O-D-E, is the generic term for that.
    • 0:37:42So each of these might be said to be a node.
    • 0:37:44Well, how can we do this?
    • 0:37:45Well a couple of weeks ago, we saw how we could represent something
    • 0:37:48like a student or a candidate.
    • 0:37:50And a student, or rather a person, we said has a name and a number.
    • 0:37:54And we used a few pieces of syntax here.
    • 0:37:56One, we use the struct keyword, which gives us a data structure.
    • 0:37:59We use typedef, which defines the name person to be our new data
    • 0:38:04type representing that whole structure.
    • 0:38:06So we probably have the right ingredients here
    • 0:38:08to build up this thing called a node.
    • 0:38:11And just to be clear, what should go inside of one of these nodes,
    • 0:38:14do we think?
    • 0:38:15It's not going to be a name or a number, obviously.
    • 0:38:17But what should a node have in terms of those fields, perhaps?
    • 0:38:22Yeah?
    • 0:38:22AUDIENCE: [? Data. ?]
    • 0:38:23SPEAKER 1: So a number like a number and a pointer in some form.
    • 0:38:26So let's translate this to actual code.
    • 0:38:28So let's rename person to node to capture this notion here.
    • 0:38:33And the number is easy.
    • 0:38:34If it's just going to be an int, that's fine.
    • 0:38:36We can just say int number, or int n, or whatever
    • 0:38:38you want to call that particular field.
    • 0:38:41The next one is a little non-obvious.
    • 0:38:43And this is where things get a little weird at first,
    • 0:38:45but, in retrospect, it should all fit together.
    • 0:38:47Let me propose that, ideally, we would say something like node* next.
    • 0:38:53And I could call the word next anything I want.
    • 0:38:55Next just means what comes after me is the notion I'm using it at.
    • 0:39:00So a lot of CS people would just use next to represent
    • 0:39:02the name of this pointer.
    • 0:39:03But there's a catch here.
    • 0:39:05C and C compilers are pretty naive, recall.
    • 0:39:08They only look at code top to bottom, left to right.
    • 0:39:11And any time they encounter a word they have never
    • 0:39:13seen before, bad things happen.
    • 0:39:15Like, you can't compile your code.
    • 0:39:16You get some cryptic error message or the like.
    • 0:39:18And that seems to be about to happen here.
    • 0:39:21Because if the compiler is reading this code from top to bottom,
    • 0:39:24it's going to say, oh, inside of this struct
    • 0:39:27should be a variable called next.
    • 0:39:29Which is of type node*.
    • 0:39:31What the heck is a node?
    • 0:39:32Because it literally does not find out until 2 lines
    • 0:39:35later, after that semicolon.
    • 0:39:37So the way to avoid this, which we haven't quite seen before,
    • 0:39:40is that you can temporarily name this whole thing up here, struct node.
    • 0:39:45And then, down here inside of the data structure, you say struct node*.
    • 0:39:50And then, you leave the rest alone.
    • 0:39:52This is a workaround this is possible because now you're
    • 0:39:56teaching the compiler, from the first line, that here comes
    • 0:39:59a data structure called struct node.
    • 0:40:01Down here, you're shortening the name of this whole thing to just node.
    • 0:40:05Why?
    • 0:40:05It's just a little more convenient than having to write struct everywhere.
    • 0:40:09But you do have to write struct node* inside of the data structure.
    • 0:40:12But that's OK because it's already come into existence
    • 0:40:15now, as of that first line of code.
    • 0:40:17So that's the only fundamental difference
    • 0:40:19between what we did last week with a person or a candidate.
    • 0:40:22We just now have to use this struct workaround, syntactically.
    • 0:40:27All right.
    • 0:40:28Yeah, question.
    • 0:40:29AUDIENCE: So [INAUDIBLE] have like right next to the [INAUDIBLE] point
    • 0:40:33to another [INAUDIBLE].
    • 0:40:33SPEAKER 1: Why is the next variable a struct node* pointer and not an int
    • 0:40:39star pointer, for instance?
    • 0:40:41So think about the picture we are trying to draw.
    • 0:40:43Technically, yes, each of these arrows I deliberately drew
    • 0:40:47is pointing at the number.
    • 0:40:49But that's not alone.
    • 0:40:50They need to point at the whole data structure in memory.
    • 0:40:53Because the computer, ultimately, and the compiler,
    • 0:40:55in turn, needs to know that this chunk of memory is not just an int.
    • 0:40:59It is a whole node.
    • 0:41:01Inside of a node is a number and also another pointer.
    • 0:41:04So when you draw these arrows, it would be
    • 0:41:06incorrect to point at just the number.
    • 0:41:09Because that throws away information that
    • 0:41:11would leave the compiler wondering, OK, I'm at a number.
    • 0:41:14Where the heck is the pointer?
    • 0:41:15You have to tell it that it's pointing at a whole node
    • 0:41:17so it knows a few bytes away is that corresponding pointer.
    • 0:41:20Good question.
    • 0:41:21Yeah.
    • 0:41:23AUDIENCE: How do you [INAUDIBLE].
    • 0:41:24SPEAKER 1: Really good question.
    • 0:41:25It would seem that just as copying the array earlier
    • 0:41:29required twice as much memory, because we copied from old to new.
    • 0:41:32So, technically, twice as much plus 1 for the new number.
    • 0:41:35Here, too, it looks like we're using twice as much memory, also.
    • 0:41:38And to my comment earlier, it's even more than twice as much memory
    • 0:41:41because these pointers are 8 bytes, and not just 4 bytes like a typical integer
    • 0:41:45is.
    • 0:41:45The differences are these.
    • 0:41:47In the context of the array, you were using that memory temporarily.
    • 0:41:50So, yes, you needed twice as much memory.
    • 0:41:52But then you were quickly freeing the original array.
    • 0:41:55So you weren't consuming long-term, more memory than you might need.
    • 0:41:58The difference here, too, is that, as we'll see in a moment,
    • 0:42:02it turns out it's going to be relatively quick for me, potentially,
    • 0:42:05to insert new numbers in here.
    • 0:42:07Because I'm not going to have to do a huge amount of copying.
    • 0:42:10And even though I might still have to follow all of these arrows, which
    • 0:42:13is going to take some amount of time, I'm
    • 0:42:16not going to have to be asking for more memory, freeing more memory.
    • 0:42:19And certain operations in the computer, anything involving asking for or giving
    • 0:42:23back memory, tends to be slower.
    • 0:42:25So we get to avoid that situation as well.
    • 0:42:26There's going to be some downsides, though.
    • 0:42:28This is not all upside.
    • 0:42:29But we'll see in a bit just what some of those trade offs actually are.
    • 0:42:33All right.
    • 0:42:34So from here, if we go back to the structure in code as we left it,
    • 0:42:38let's start to now build up a linked list with some actual code.
    • 0:42:41How do you go about, in C, representing a linked list in code?
    • 0:42:46Well, at the moment, it would actually be as simple as this.
    • 0:42:48You declare a variable, called list, for instance.
    • 0:42:51That itself stores the address of a node.
    • 0:42:54That's what node* means.
    • 0:42:56The address of a node.
    • 0:42:57So if you want to store a linked list in memory,
    • 0:42:59you just create a variable called list, or whatever else.
    • 0:43:02And you just say that this variable is going
    • 0:43:04to be pointing at the first node in a list, wherever it happens to end up.
    • 0:43:08Because malloc is ultimately going to be the tool that we use just to go
    • 0:43:12get at any one particular node in memory.
    • 0:43:16All right.
    • 0:43:16So let's actually do this in pictorial form.
    • 0:43:18When you write a line of code, like I just did here--
    • 0:43:21and I do not initialize it to anything with the assignment operator,
    • 0:43:25an equal sign.
    • 0:43:26It does exist in memory as a box, as I'll draw it here, called list.
    • 0:43:30But I've deliberately drawn Oscar inside of it.
    • 0:43:33Why?
    • 0:43:33To connote what exactly?
    • 0:43:35AUDIENCE: Garbage value.
    • 0:43:36SPEAKER 1: It's a garbage value.
    • 0:43:37I have been allocated the variable in memory, called list.
    • 0:43:42Which is going to give me 64 bits or 8 bytes somewhere drawn here
    • 0:43:46with this box.
    • 0:43:47But if I myself have not used the assignment operator,
    • 0:43:50it's not going to get magically initialized to any particular address
    • 0:43:53for me.
    • 0:43:54It's not going to even give me a node.
    • 0:43:56This is literally just going to be an address of a future node that exists.
    • 0:44:01So what would be a solution here?
    • 0:44:02Suppose that I'm beginning to create my linked list,
    • 0:44:05but I don't have any nodes yet.
    • 0:44:07What would be a sensible thing to initialize the list to, perhaps?
    • 0:44:11AUDIENCE: Null.
    • 0:44:12SPEAKER 1: Yeah, again.
    • 0:44:13AUDIENCE: To null.
    • 0:44:13SPEAKER 1: So just null, right.
    • 0:44:15When in doubt with pointers, generally it's
    • 0:44:16a good thing to initialize things to null,
    • 0:44:18so at least it's not a garbage value.
    • 0:44:20It's a known value.
    • 0:44:21Invalid, yes.
    • 0:44:22But it's a special value you can then check
    • 0:44:24for with a conditional, or the like.
    • 0:44:26So this might be a better way to create a linked list,
    • 0:44:30even before you've inserted any numbers into the thing itself.
    • 0:44:34All right.
    • 0:44:34So after that, how can we go about adding something to this linked list?
    • 0:44:37So now the story looks like this.
    • 0:44:39Oscar is gone because inside of this box is all zero bits.
    • 0:44:42Just because it's nice and clean, and this represents an empty linked list.
    • 0:44:46Well, if I want to add the number 1 to this linked list, what could I do?
    • 0:44:50Well, perhaps I could start with code like this.
    • 0:44:52Borrowing inspiration from last week.
    • 0:44:54Let's ask malloc for enough space for the size of a node.
    • 0:44:58And this gets to your question earlier, like, what is it I'm manipulating here?
    • 0:45:03I don't just need space for an int and I don't just need space for a pointer.
    • 0:45:06I need space for both.
    • 0:45:07And I gave that thing a name, node.
    • 0:45:10So size of node figures out and does the arithmetic for me.
    • 0:45:12And gives me back the right number of bytes.
    • 0:45:15This, then, stores the address of that chunk of memory
    • 0:45:18in what I'll temporarily called n.
    • 0:45:20Just to represent a generic new node.
    • 0:45:23And it's of type node*.
    • 0:45:24Because just like last week when I asked malloc for enough space for an int
    • 0:45:28and I stored it in an int* pointer.
    • 0:45:30This week, if I'm asking for memory for a node,
    • 0:45:32I'm storing it in a node* pointer.
    • 0:45:35So technically, nothing new there except for this new term
    • 0:45:38of art in data structure called node.
    • 0:45:41All right.
    • 0:45:41So what does that do for me?
    • 0:45:42It essentially draws a picture like this in memory.
    • 0:45:45I still have my list variable from my previous line of code initialize
    • 0:45:49to null.
    • 0:45:50And that's why I've drawn it blank.
    • 0:45:51I also now have a temporary variable called
    • 0:45:54n, which I initialize to the return value of malloc.
    • 0:45:57Which gave me one of these nodes in memory.
    • 0:45:59But I've drawn it having garbage values, too,
    • 0:46:02because I don't know what int is there.
    • 0:46:03I don't know what pointer is there.
    • 0:46:05It's garbage values because malloc does not magically initialize memory for me.
    • 0:46:09There is another function for that.
    • 0:46:11But malloc alone just says, sure, use this chunk of memory.
    • 0:46:14Deal with whatever is there.
    • 0:46:15So how can I go about initializing this to known values?
    • 0:46:18Well, suppose I want to insert the number 1 and then, leave it at that.
    • 0:46:23A list of size 1, I could do something like this.
    • 0:46:27And this is where you have to think back to some of these basics.
    • 0:46:29My conditional here is asking the question if n does not equal null.
    • 0:46:34So that is, if malloc gave me valid memory,
    • 0:46:37and I don't have to quit altogether because my computer's out of memory.
    • 0:46:40If n does not equal null, but is equal to valid address,
    • 0:46:44I'm going to go ahead and do this.
    • 0:46:46And this is cryptic looking syntax now.
    • 0:46:48But does someone want to take a stab at translating this inside line of code
    • 0:46:52to English, in some sense?
    • 0:46:56How might you explain what that inner line of code is doing? *n.
    • 0:47:00number equals 1.
    • 0:47:03Let me go further back.
    • 0:47:05Nope?
    • 0:47:06OK, over here.
    • 0:47:07Yeah.
    • 0:47:07AUDIENCE: [INAUDIBLE].
    • 0:47:09SPEAKER 1: Perfect.
    • 0:47:09The place that n is pointing to, set it equal to 1.
    • 0:47:12Or using the vernacular of going there, go to the address in n
    • 0:47:16and set it's number field to 1.
    • 0:47:18However you want to think about it, that's fine.
    • 0:47:20But the * again is the dereference operator here.
    • 0:47:22And we're doing the parentheses, which we
    • 0:47:24haven't needed to do before because we haven't dealt with pointers and data
    • 0:47:28structures together until today.
    • 0:47:30This just means go there first.
    • 0:47:32And then once you're there, go access number.
    • 0:47:34You don't want to do one thing before the other.
    • 0:47:36So this is just enforcing order of operations.
    • 0:47:38The parentheses just like in grade school math.
    • 0:47:41All right.
    • 0:47:41So this line of code is cryptic.
    • 0:47:43It's ugly.
    • 0:47:43It's not something most people easily remember.
    • 0:47:45Thankfully, there's that syntactic sugar that simplifies this line of code
    • 0:47:49to just this.
    • 0:47:50And this, even though it's new to you today,
    • 0:47:52should eventually feel a little more familiar.
    • 0:47:54Because this now is shorthand notation for saying, start at n.
    • 0:47:58Go there as by following the arrow.
    • 0:48:00And when you get there, change the number field.
    • 0:48:02In this case, to 1.
    • 0:48:04So most people would not write code like this.
    • 0:48:07It's just ugly.
    • 0:48:08It's a couple extra keystrokes.
    • 0:48:09This just looks more like the artist's renditions we've been talking about.
    • 0:48:13And how most CS people would think about pointers as really just being arrows
    • 0:48:17in some form.
    • 0:48:18All right.
    • 0:48:19So what have we just done?
    • 0:48:20The picture now, after setting number to 1, looks a little something like this.
    • 0:48:24So there's still one step missing.
    • 0:48:26And that's, of course, to initialize, it would seem,
    • 0:48:28the pointer in this new node to something known like null.
    • 0:48:33So I bet we could do this like this.
    • 0:48:34With a different line of code, I'm just going
    • 0:48:36to say if n does not equal null, then set n's next field to null.
    • 0:48:42Or more pedantically, go to n, follow the arrow,
    • 0:48:46and then update the next field that you find there to equal null.
    • 0:48:50And again, this is just doing some nice bookkeeping.
    • 0:48:52Technically speaking, we might not need to set
    • 0:48:55this to null if we're going to keep adding more and more numbers to it.
    • 0:48:58But I'm doing it step-by-step so that I have a very clean picture.
    • 0:49:02And there's no bugs in my code at this point.
    • 0:49:05But I'm still not done.
    • 0:49:07There's one last thing I'm going to have to do here.
    • 0:49:09If the goal, ultimately, was to insert the number 1 into my linked list,
    • 0:49:14what's the last step I should, perhaps, do here?
    • 0:49:18Just been English is fine.
    • 0:49:20Yeah.
    • 0:49:20AUDIENCE: Set the pointer value to null.
    • 0:49:23SPEAKER 1: Yes.
    • 0:49:24I now need to update the actual variable, that represents my linked
    • 0:49:27list, to point at this brand new node.
    • 0:49:31That is now perfectly initialized as having an integer and a null pointer.
    • 0:49:35Yeah, technically, this is already pointing there.
    • 0:49:37But I describe this deliberately earlier as being temporary.
    • 0:49:40I just needed this to get it back from malloc and clean things up, initially.
    • 0:49:44This is the long term variable I care about.
    • 0:49:47So I'm going to want to do something simple like this.
    • 0:49:49List equals n.
    • 0:49:51And this seems a little weird that list equals n.
    • 0:49:53But again, think about what's inside this box.
    • 0:49:55At the moment this is null because there is no linked
    • 0:49:57list at the beginning of our story.
    • 0:49:59N is the address of the beginning, and it turns out, end of our linked list.
    • 0:50:03So it stands to reason that if you set list equal to n,
    • 0:50:07that has the effect of copying this address up here.
    • 0:50:10Or really just copying the arrow into that same location
    • 0:50:13so that now the picture looks like this.
    • 0:50:14And heck, if this was a temporary variable, it will eventually go away.
    • 0:50:18And now, this is the picture.
    • 0:50:19So an annoying number of steps, certainly,
    • 0:50:22to walk through verbally like this.
    • 0:50:24But it's just malloc to give yourself a node,
    • 0:50:26initialize the 2 fields inside of it, update the linked list, and boom,
    • 0:50:31you're on your way.
    • 0:50:32I didn't have to copy anything.
    • 0:50:34I just had to insert something in this case.
    • 0:50:38Let me pause here to see if there's any questions on those steps.
    • 0:50:40And we'll see before long it all in context with some larger code.
    • 0:50:44AUDIENCE: So if the statements [INAUDIBLE]..
    • 0:50:48SPEAKER 1: Yes.
    • 0:50:49I drew them separately just for the sake of the voiceover
    • 0:50:53of doing each thing very methodically.
    • 0:50:55In real code, as we'll transition to now,
    • 0:50:57I could have and should have just done it
    • 0:50:59all inside of one conditional after checking if n is not equal to null.
    • 0:51:03I could set number to a value like 1.
    • 0:51:05And I could set the pointer itself to something like null.
    • 0:51:08All right.
    • 0:51:09Well let's translate, then, this into some similar code
    • 0:51:12that allows us to build up a linked list now using code similar in spirit
    • 0:51:17to before.
    • 0:51:18But now, using this new primitive.
    • 0:51:19So I'm going to go back into VS Code here.
    • 0:51:22I'm going to go ahead now and delete the entirety of this old version that
    • 0:51:25was entirely array-based.
    • 0:51:27And now, inside of my main function, I'm going to go ahead and first do this.
    • 0:51:32I'm going to first give myself a list of size 0.
    • 0:51:36And I'm going to call that node* list.
    • 0:51:38And I'm going to initialize that to null, as we proposed earlier.
    • 0:51:41But I'm also now going to have to take the additional step of defining
    • 0:51:44what this node is.
    • 0:51:45So recall that I might do something like typedef, struct node.
    • 0:51:49Inside of this struct node, I'm going to have a number, which
    • 0:51:52I'll call number of type int.
    • 0:51:54And I'm going to have a structure called node
    • 0:51:56with a * that says the next pointer is called next.
    • 0:51:59And I'm going to call this whole thing, more succinctly, node,
    • 0:52:03instead of struct node.
    • 0:52:04Now as an aside, for those of you wondering what the difference really
    • 0:52:07is between struct and node.
    • 0:52:09Technically, I could do something like this.
    • 0:52:12Not use typedef and not use the word node alone.
    • 0:52:15This syntax here would actually create for me a new data
    • 0:52:19type called, verbosely, struct node.
    • 0:52:22And I could use this throughout my code saying struct node.
    • 0:52:25Struct node.
    • 0:52:26That just gets a little tedious.
    • 0:52:27And it would be nicer just to refer to this thing more simplistically
    • 0:52:30as a node.
    • 0:52:31So what typedef has been doing for us is it,
    • 0:52:34again, lets us invent our own word that's even more succinct.
    • 0:52:37And this just has the effect now of calling this whole thing
    • 0:52:41node without the need, subsequently, to keep saying struct all over the place.
    • 0:52:44Just FYI.
    • 0:52:46All right.
    • 0:52:46So now that this thing exists in main, let's go ahead and do this.
    • 0:52:50Let's add a number to list.
    • 0:52:52And to do this, I'm going to give myself a temporary variable.
    • 0:52:55I'll call it n for consistency.
    • 0:52:57I'm going to use malloc to give myself the size of a node,
    • 0:53:00just like in our slides.
    • 0:53:02And then, I'm going to do a little safety check.
    • 0:53:03If n equals equals null, I'm going to do the opposite of the slides.
    • 0:53:06I'm just going to quit out of this program
    • 0:53:08because there's nothing useful to be done at this point.
    • 0:53:10But most likely my computer is not going to run out of memory.
    • 0:53:13So I'm going to assume we can keep going with some of the logic here.
    • 0:53:16If n does not equal null, and that is it's a valid memory address,
    • 0:53:21I'm going to say n []--
    • 0:53:23I'm going to build this up backwards.
    • 0:53:24Well let's do.
    • 0:53:26That's OK, let's go ahead and do this.
    • 0:53:28N [number] equals 1.
    • 0:53:30And then n [arrow next] equals null.
    • 0:53:35And now, update list to point to new node, list equals n.
    • 0:53:42So at this point in the story, we've essentially
    • 0:53:44constructed what was that first picture, which looks like this.
    • 0:53:49This is the corresponding code via which we built up this node in memory.
    • 0:53:53Suppose now, we want to add the number 2 to the list.
    • 0:53:56So let's do this again.
    • 0:53:58Add a number to list.
    • 0:54:02How might I do this?
    • 0:54:03Well, I don't need to redeclare n because I can use
    • 0:54:06the same temporary variables before.
    • 0:54:08So this time, I'm just going to say n equals malloc and the size of a node.
    • 0:54:13I'm, again, going to have my safety check.
    • 0:54:15So if n equals equals null, then let's just quit out of this altogether.
    • 0:54:19But, I have to be a little more careful now.
    • 0:54:23Technically speaking, what do I still need
    • 0:54:26to do before I quit out of my program to be really proper?
    • 0:54:30Free the memory that did succeed a little higher up.
    • 0:54:33So I think it suffices to free what is now called list, way at the top.
    • 0:54:39All right.
    • 0:54:39Now, if all was well, though, let's go ahead and say n [number] equals 2.
    • 0:54:46And now, n [arrow next] equals null.
    • 0:54:51And now, let's go ahead and add it to the list.
    • 0:54:54If I go ahead and do list arrow next equals n,
    • 0:55:02I think what we've just done is build up the equivalent, now,
    • 0:55:06of this in the computer's memory.
    • 0:55:09By going to the list field's next field, which
    • 0:55:12is synonymous with the 1 nodes, bottom-most box.
    • 0:55:16And store the address of what was n, which a moment ago looked like this.
    • 0:55:19And I'm just throwing away, in the picture, the temporary variable.
    • 0:55:22All right.
    • 0:55:22One last thing to do.
    • 0:55:24Let me go down here and say, add a number to list, n equals malloc.
    • 0:55:30Let's do it one more time.
    • 0:55:31Size of node.
    • 0:55:32And clearly, in a real program, we might want to start using a loop.
    • 0:55:35And do this dynamically or a function because it's a lot of repetition now.
    • 0:55:39But just to go through the syntax here, this is fine.
    • 0:55:42If n equals equals null, out of memory for some reason.
    • 0:55:45Let's return 1, but we should free the list itself
    • 0:55:51and even the second node, list [next].
    • 0:55:55But I've deliberately done this poorly.
    • 0:55:58All right.
    • 0:55:59This is a little more subtle now.
    • 0:56:01And let me get rid of the highlighting just so it's a little more visible.
    • 0:56:04If n happens to equal equal null, and something really just
    • 0:56:08went wrong they're out of memory, why am I freeing 2 addresses now?
    • 0:56:15And again, it's not that I'm freeing those variables per se.
    • 0:56:17I'm freeing the addresses at in those variables.
    • 0:56:21But there's also a bug with my code here.
    • 0:56:23And it's subtle.
    • 0:56:26Let me ask more pointedly.
    • 0:56:27This line here, 43, what is that freeing specifically?
    • 0:56:31Can I go to you?
    • 0:56:32AUDIENCE: You're freeing list 2 times.
    • 0:56:34SPEAKER 1: I'm freeing, not so.
    • 0:56:36That's OK.
    • 0:56:37I'm not freeing list 2 times.
    • 0:56:38Technically, I'm freeing list once and list next once.
    • 0:56:41But let me just ask the more explicit question.
    • 0:56:43What am I freeing with line 43 at the moment?
    • 0:56:46Which node?
    • 0:56:49I think node number 1.
    • 0:56:50Why?
    • 0:56:51Because if 1 is at the beginning of the list,
    • 0:56:53list contains the address of that number 1 node.
    • 0:56:56And so this frees that node.
    • 0:56:58This line of code, you might think now intuitively, OK,
    • 0:57:01it's probably freeing the node number 2.
    • 0:57:03But this is bad.
    • 0:57:04And this is subtle.
    • 0:57:05Valgrind might help you catch this.
    • 0:57:07But by eyeing it, it's not necessarily obvious.
    • 0:57:09You should never touch memory that you have already freed.
    • 0:57:13And so, the fact that I did in this order, very bad.
    • 0:57:16Because I'm telling the operating system, I don't know.
    • 0:57:19I don't need the list address anymore.
    • 0:57:22Do with it what you want.
    • 0:57:23And then, literally one line later, you're saying, wait a minute.
    • 0:57:25Let me actually go to that address for a moment
    • 0:57:27and look at the next field of that first node.
    • 0:57:30It's too late.
    • 0:57:31You've already given up control over the node.
    • 0:57:33So it's an easy fix in this case, logically.
    • 0:57:36But we should be freeing the second node first
    • 0:57:39and then the first one so that we're doing it
    • 0:57:43in, essentially, reverse order.
    • 0:57:45And again, Valgrind would help you catch that.
    • 0:57:46But that's the kind of thing one needs to be careful about when
    • 0:57:49touching memory at all.
    • 0:57:50You cannot touch memory after you freed it.
    • 0:57:53But here is my last step.
    • 0:57:54Let me go ahead and update the number field of n to be 3.
    • 0:58:00The next node of n to be null.
    • 0:58:03And then, just like in the slide earlier,
    • 0:58:05I think I can do list next, next equals n.
    • 0:58:11And that has the effect now of building up in the computer's memory,
    • 0:58:14essentially, this data structure.
    • 0:58:16Very manually.
    • 0:58:17Very pedantically.
    • 0:58:18Like, in a better world, we'd have a loop and some functions
    • 0:58:20that are automating this process.
    • 0:58:22But, for now, we're doing it just to play around with the syntax.
    • 0:58:26So at this point, unfortunately, suppose I want to print the numbers.
    • 0:58:31It's no longer as easy as int i equals 0, i less than 3, i++.
    • 0:58:36Because you cannot just do something like this.
    • 0:58:43Because pointer arithmetic no longer comes into play
    • 0:58:48when it's you, who are stitching together the data structure in memory.
    • 0:58:52In all of our past examples with arrays, you've
    • 0:58:55been trusting that all of the bytes in the array are back, to back, to back.
    • 0:58:58So it's perfectly reasonable for the compiler and the computer
    • 0:59:01to just figure out, oh, well if you want [0], that's at the beginning.
    • 0:59:04[1], it's one location over.
    • 0:59:06[2], it's one location over.
    • 0:59:08This is way less obvious now.
    • 0:59:11Because even though you might want to go to the first element in the linked
    • 0:59:14list, or the second, or the third, you can't just jump to those arithmetically
    • 0:59:19by doing a bit of math.
    • 0:59:20Instead, you have to follow all of those arrows.
    • 0:59:24So with linked lists, you can't use this square bracket notation anymore
    • 0:59:27because one node might be here, over here, over here, over here.
    • 0:59:30You can't just use some simple offset.
    • 0:59:33So I think our code is going to have to be a little fancier.
    • 0:59:36And this might look scary at first, but it's just an application
    • 0:59:39of some of the basic definitions here.
    • 0:59:42Let me do a for-loop that actually uses a node* variable initialized
    • 0:59:49to the list itself.
    • 0:59:51I'm going to keep doing this, so long as TMP does not equal null.
    • 0:59:55And on each iteration of this loop, I'm going
    • 0:59:58to update TMP to be whatever TMP arrow next is.
    • 1:00:03And I'll remind you in a moment and explain in more detail.
    • 1:00:05But when I print something here with printf, I can still use %i.
    • 1:00:09Because it's still a number at the end of the day.
    • 1:00:12But what I want to print out is the number in this temporary variable.
    • 1:00:16So maybe the ugliest for-loop we've ever seen.
    • 1:00:19Because it's mixing, not just the idea of a for-loop, which
    • 1:00:21itself was a bit cryptic weeks ago.
    • 1:00:23But now, I'm using pointers instead of integers.
    • 1:00:26But I'm not violating the definition of a for-loop.
    • 1:00:28Recall that a for-loop has 3 main things in parentheses.
    • 1:00:30What do you want to initialize first?
    • 1:00:32What condition do you want to keep checking again and again?
    • 1:00:35And what update do you want to make on every iteration of the loop?
    • 1:00:39So with that basic definition in mind, this
    • 1:00:41is giving me a temporary variable called TMP
    • 1:00:44that is initialized to the beginning of the loop.
    • 1:00:46So it's like pointing my finger at the number 1 node.
    • 1:00:50Then, I'm asking the question, does TMP not equal null?
    • 1:00:53Well, hopefully, not because I'm pointing at a valid node
    • 1:00:56that is the number 1 node.
    • 1:00:57So, of course, it doesn't equal null yet.
    • 1:00:59Null won't be until we get to the end of the list.
    • 1:01:02So what do I do?
    • 1:01:03I started this TMP variable.
    • 1:01:05I follow the arrow and go to the number field they're in.
    • 1:01:10What do I then do?
    • 1:01:11The for-loop says, change TMP to be whatever
    • 1:01:15is at TMP, by following the arrow and grabbing the next field.
    • 1:01:19That, then, has the result of being checked against this conditional.
    • 1:01:22No, of course, it doesn't equal null because the second node
    • 1:01:24is the number 2 node.
    • 1:01:26Null is still at the very end.
    • 1:01:27So I print out the number 2.
    • 1:01:29Next step, I update TMP one more time to be whatever is next.
    • 1:01:33That, then, does not yet equal null.
    • 1:01:36So I go ahead and print out the number 3 node.
    • 1:01:38Then one last time, I update TMP to be whatever TMP is in the next field.
    • 1:01:44But after 1, 2, 3, that last next field is null.
    • 1:01:47And so, I break out of this for-loop altogether.
    • 1:01:51So if I do this in pictorial form, all we're
    • 1:01:54doing, if I now use my finger to represent the TMP variable.
    • 1:01:58I initialize TMP to be whatever list is, so it points here.
    • 1:02:02That's obviously not null so I print out whatever
    • 1:02:04is that TMP, follow the arrow in number, and I print that out.
    • 1:02:09Then I update TMP to point here.
    • 1:02:11Then I update TMP to point here.
    • 1:02:13Then I update TMP to point here.
    • 1:02:14Wait, that's null.
    • 1:02:15The for-loop ends.
    • 1:02:17So, again, admittedly much more cryptic than our familiar int i equals 0,
    • 1:02:21and so forth.
    • 1:02:22But it's just a different utilization of the for-loop syntax.
    • 1:02:28Yes.
    • 1:02:29AUDIENCE: How does it happen that you're always printing out the numbers.
    • 1:02:33Because it seems to me that addresses-
    • 1:02:35SPEAKER 1: Good question.
    • 1:02:36How is it that I'm actually printing numbers and not printing out
    • 1:02:39addresses instead.
    • 1:02:40The compiler is helping me here.
    • 1:02:42Because I taught it, in the very beginning of my program,
    • 1:02:44what a node is.
    • 1:02:45Which looks like this here.
    • 1:02:47The compiler knows that a node has a number of fields and a next field
    • 1:02:51down here, in the for-loop.
    • 1:02:53Because I'm iterating using a node* pointer, and not an int* pointer,
    • 1:02:59the compiler knows that any time I'm pointing at something,
    • 1:03:02I'm pointing at the whole node.
    • 1:03:03Doesn't matter where specifically in the rectangle I'm pointing per se.
    • 1:03:07It's, ultimately, pointing at the whole node itself.
    • 1:03:09And the fact that I, then, use TMP arrow number means, OK,
    • 1:03:13adjust your finger slightly.
    • 1:03:14So you're literally pointing at the number field and not the next field.
    • 1:03:18So that's sufficient information for the computer to distinguish the 2.
    • 1:03:22Good question.
    • 1:03:23Other questions then on this approach here.
    • 1:03:26Yeah, in the back.
    • 1:03:28AUDIENCE: How would you--
    • 1:03:29SPEAKER 1: How would I use a for-loop to add elements to a linked list?
    • 1:03:33You will do something like this, if I may, in problem set 5.
    • 1:03:38We will give you some of the scaffolding for doing this.
    • 1:03:41But in this coming weeks materials will we guide you to that.
    • 1:03:44But let me not spoil it just yet.
    • 1:03:47Fair question, though.
    • 1:03:48Yeah.
    • 1:03:48AUDIENCE: So I had a question about line 49.
    • 1:03:51SPEAKER 1: OK.
    • 1:03:51AUDIENCE: Is line 49 possible in line 43?
    • 1:03:53SPEAKER 1: Good question.
    • 1:03:54Is line 49 acceptable, even if we freed it earlier.
    • 1:03:57We didn't free it in line 43, in this case, right.
    • 1:04:00You can only reach line 49, if n does not equal null.
    • 1:04:04And you do not return on line 45.
    • 1:04:06So that's safe.
    • 1:04:07I was only doing those freeing, if I knew on line 45 that I'm out of here
    • 1:04:12anyway, at that point.
    • 1:04:13Good question.
    • 1:04:14And, yeah.
    • 1:04:15AUDIENCE: I had a quick question.
    • 1:04:16Is TMP [INAUDIBLE].
    • 1:04:19SPEAKER 1: Correct You're asking about TMP, because it's in a for-loop,
    • 1:04:22does that mean you don't have to free it?
    • 1:04:24You never have to free pointers, per se.
    • 1:04:26You should only free addresses that were returned to you by malloc.
    • 1:04:31So I haven't finished the program, to be fair.
    • 1:04:33But you're not freeing variables.
    • 1:04:35You're not freeing like, fields.
    • 1:04:37You are freeing specific addresses, whatever they may be.
    • 1:04:40So the last thing, and I was stalling on showing this
    • 1:04:43because it too is a little cryptic.
    • 1:04:45Here is how you can free, now, a whole linked list.
    • 1:04:48In the world of arrays, recall, it was so easy.
    • 1:04:51You just say free list.
    • 1:04:52You return 0 and you're done.
    • 1:04:53Not with a linked list.
    • 1:04:55Because, again, the computer doesn't know
    • 1:04:57what you have stitched together using all of these pointers
    • 1:04:59all over the computer's memory.
    • 1:05:01You need to follow those arrows.
    • 1:05:03So one way to do this would be as follows.
    • 1:05:05While the list itself is not null, so while there's a list to be freed.
    • 1:05:10What do I want to do?
    • 1:05:12I'm going to give myself a temporary variable called TMP again.
    • 1:05:14And it's a different TMP because it's in a different scope.
    • 1:05:17It's inside of the while loop instead the for-loop, a few lines earlier.
    • 1:05:21I am going to initialize TMP to be the address of the next node.
    • 1:05:26Just so I can get one step ahead of things.
    • 1:05:29Why am I doing this?
    • 1:05:30Because now, I can boldly free the list itself,
    • 1:05:34which does not mean the whole list.
    • 1:05:35Again, I'm freeing the address in list, which
    • 1:05:38is the address of the number 1 node.
    • 1:05:41That's what list is.
    • 1:05:42It's just the address of the number 1 node.
    • 1:05:44So if I first use TMP to point out the number
    • 1:05:472 slightly in the middle of the picture, then it is safe for me on line 61,
    • 1:05:53at the moment, to free list.
    • 1:05:55That is the address of the first node.
    • 1:05:57Now I'm going to say, all right, once I freed the first node in the list,
    • 1:06:02I can update the list itself to be literally TMP.
    • 1:06:07And now, the loop repeats.
    • 1:06:09So what's happening here?
    • 1:06:10If you think about this picture, TMP is initially pointing at not the list,
    • 1:06:16but list arrow next.
    • 1:06:17So TMP, represented by my right hand here, is pointing at the number 2.
    • 1:06:20Totally safe and reasonable to free now the list itself a.k.a.
    • 1:06:25the address of the number 1 node.
    • 1:06:27That has the effect of just throwing away the number 1 node,
    • 1:06:29telling the computer you can reuse that memory for you.
    • 1:06:32The last line of code I wrote updated list to point at the number
    • 1:06:362, at which point my loop proceeded to do the exact same thing again.
    • 1:06:40And only once my finger is literally pointing at nowhere,
    • 1:06:43the null symbol, will the loop, by nature of a while
    • 1:06:46loop as I'll toggle back to, break out.
    • 1:06:48And there's nothing more to be freed.
    • 1:06:51So again, what you'll see, ultimately, in problem set 5,
    • 1:06:54more on that later, is an opportunity to play around with just this syntax.
    • 1:06:58But also these ideas.
    • 1:06:59But again, even though the syntax is admittedly pretty cryptic,
    • 1:07:02we're still using basics like these for-loops or while loops.
    • 1:07:06We're just starting to now follow explicit addresses rather
    • 1:07:09than letting the computer do all of the arithmetic for us,
    • 1:07:13as we previously benefited from.
    • 1:07:15At the very end of this thing, I'm going to return 0 as though all is well.
    • 1:07:18And I think, then, we're good to go.
    • 1:07:22All right.
    • 1:07:22Questions on this linked list code now?
    • 1:07:25And again, we'll walk through this again in the coming weeks spec.
    • 1:07:28Yeah.
    • 1:07:29AUDIENCE: Can you explain the while loop [INAUDIBLE] starts in other ways?
    • 1:07:33SPEAKER 1: Sure.
    • 1:07:34Can we explain this while loop here for freeing the list.
    • 1:07:37So notice that, first, I'm just asking the obvious question.
    • 1:07:40Is the list null?
    • 1:07:41Because if it is, there's no work to be done.
    • 1:07:45However, while the list is not null, according to line 58,
    • 1:07:49what do we want to do?
    • 1:07:50I want to create a temporary variable that points at the same thing
    • 1:07:54that list arrow next is pointing at.
    • 1:07:57So what does that mean?
    • 1:07:58Here is list.
    • 1:08:00List arrow next is whatever this thing is here.
    • 1:08:03So if my right hand represents the temporary variable,
    • 1:08:06I'm literally pointing at the same thing as the list is itself.
    • 1:08:10The next line of code, recall, was free the list.
    • 1:08:13And unlike, in our world of arrays, like half an hour
    • 1:08:16ago where that just meant free the whole darn list,
    • 1:08:19you now have taken over control over the computer's memory with a linked list,
    • 1:08:23in ways that you didn't with the array.
    • 1:08:25The computer knew how to free the whole array because you
    • 1:08:28malloc the whole thing at once.
    • 1:08:30You are now mallocing the linked list one node at a time.
    • 1:08:34And the operating system does not keep track of for you
    • 1:08:37where all these nodes are.
    • 1:08:38So when you free list, you are literally freeing
    • 1:08:42the value of the list variable, which is just this first node here.
    • 1:08:46Then my last line of code, which I'll flip back to in a second, updates
    • 1:08:49list to now ignore the free memory and point at 2.
    • 1:08:54And the story then repeats.
    • 1:08:57So, again, it's just a very pedantic way of using
    • 1:09:00this new syntax of star notation, and the arrow notation, and the like,
    • 1:09:04to do the equivalent of walking down all of these arrows.
    • 1:09:08Following all of these breadcrumbs.
    • 1:09:10But it does take admittedly some getting used to.
    • 1:09:13Syntax, you only have to do one week.
    • 1:09:16But, again, next week in Python will we begin
    • 1:09:18to abstract a lot of this complexity away.
    • 1:09:20But none of this complexity is going away.
    • 1:09:22It's just that someone else, the authors of Python for instance,
    • 1:09:24will have automated this stuff for us.
    • 1:09:26The goal this week is to understand what it
    • 1:09:28is we're going to get for free, so to speak, next week.
    • 1:09:31All right.
    • 1:09:32Questions on these length lists.
    • 1:09:36All right.
    • 1:09:37Just, yeah, in the back.
    • 1:09:38AUDIENCE: So are the while loops strictly necessary
    • 1:09:41for the freeing [INAUDIBLE].
    • 1:09:42SPEAKER 1: Fair question.
    • 1:09:43Let me summarize as, could we have freed this with a for-loop?
    • 1:09:46Absolutely.
    • 1:09:47It just is a matter of style.
    • 1:09:48It's a little more elegant to do it in a while loop, according to me.
    • 1:09:51But other people will reasonably disagree.
    • 1:09:53Anything you can do with a while loop you can do with a for-loop,
    • 1:09:56and vise versa.
    • 1:09:57Do while loops, recall, are a little different.
    • 1:09:59But they will always do at least one thing.
    • 1:10:02But for-loops and while loops behave the same in this case.
    • 1:10:04AUDIENCE: Thank you.
    • 1:10:05SPEAKER 1: Sure.
    • 1:10:06Other questions?
    • 1:10:08All right, well let's just vary things a little bit here.
    • 1:10:10Just to see what some of the pitfalls might now be
    • 1:10:12without getting into the weeds of code.
    • 1:10:14Indeed, we'll try to save some of that for problem set 5's exploration.
    • 1:10:18But instead, let's imagine that we want to create a list here of our own.
    • 1:10:22I can offer, in exchange for a few volunteers, some foam fingers
    • 1:10:25to bring to the next game, perhaps.
    • 1:10:27Could we get maybe just one volunteer first?
    • 1:10:29Come on up.
    • 1:10:30You will be our linked list from the get go.
    • 1:10:33What's your name?
    • 1:10:33AUDIENCE: Pedro.
    • 1:10:34SPEAKER 1: Pedro, come on up.
    • 1:10:36All right, thank you to Pedro.
    • 1:10:38[AUDIENCE CLAPPING]
    • 1:10:41And if you want to just stand roughly over here.
    • 1:10:43But you are a null pointer so just point sort of at the ground,
    • 1:10:45as though you're pointing at 0.
    • 1:10:46All right.
    • 1:10:47So Pedro is our linked list of size 0, which pictorially
    • 1:10:50might look a little something like this for consistency with our past pictures.
    • 1:10:53Now suppose that we want to go ahead and malloc, oh, how about the number 2.
    • 1:10:58Can we get a volunteer to be on camera here?
    • 1:11:00OK.
    • 1:11:00You jumped out of your seat.
    • 1:11:01Do you want to come up?
    • 1:11:04OK, you really want the foam finger, I say.
    • 1:11:06All right.
    • 1:11:06Round of applause, sure.
    • 1:11:07[AUDIENCE CLAPPING]
    • 1:11:12OK.
    • 1:11:13And what's your name?
    • 1:11:14AUDIENCE: Caleb.
    • 1:11:14SPEAKER 1: Say again?
    • 1:11:15AUDIENCE: Caleb.
    • 1:11:15SPEAKER 1: Halen?
    • 1:11:16AUDIENCE: Caleb.
    • 1:11:16SPEAKER 1: Caleb.
    • 1:11:17Caleb, sorry.
    • 1:11:18All right.
    • 1:11:19So here is your number 2 for your number field.
    • 1:11:21And here is your pointer.
    • 1:11:23And come on, let's say that there was room for Caleb like, right there.
    • 1:11:26That's perfect.
    • 1:11:26So Caleb got malloced, if you will, over here.
    • 1:11:29So now if we want to insert Caleb and the number 2 into this linked list,
    • 1:11:33well what do we need to do?
    • 1:11:34I already initialized you to 2.
    • 1:11:36And pointing as you are to the ground means
    • 1:11:38you're initialized to null for your next field.
    • 1:11:40Pedro, what you should you-- perfect.
    • 1:11:42What should Pedro do.
    • 1:11:43That's fine, too.
    • 1:11:44So Pedro is now pointing at the list.
    • 1:11:46So now our list looks a little something like this.
    • 1:11:48So far, so good.
    • 1:11:49All is well.
    • 1:11:50So the first couple of these will be pretty straightforward.
    • 1:11:52Let's insert one more, if anyone really wants another foam finger.
    • 1:11:56Here, how about right in the middle.
    • 1:11:57Come on down.
    • 1:11:58And just in anticipation, how about let's malloc someone else.
    • 1:12:01OK, your friends are pointing at you.
    • 1:12:03Do you want to come down too, preemptively?
    • 1:12:05This is a pool of memory, if you will.
    • 1:12:07What's your name?
    • 1:12:08AUDIENCE: Hannah.
    • 1:12:09SPEAKER 1: Hannah.
    • 1:12:09All right, Hanna.
    • 1:12:10You are number 4.
    • 1:12:11[AUDIENCE CLAPPING]
    • 1:12:13And hang there for just a moment.
    • 1:12:14All right.
    • 1:12:15So we've just malloced Hannah.
    • 1:12:16And Hannah, how about Hannah, suppose you ended up over there
    • 1:12:20in just some random location.
    • 1:12:21All right.
    • 1:12:22So what should we now do, if the goal is to keep these things sorted?
    • 1:12:25How about?
    • 1:12:26So Pedro, do you have to update yourself?
    • 1:12:28AUDIENCE: No.
    • 1:12:29SPEAKER 1: No.
    • 1:12:29All right.
    • 1:12:29Caleb, what do you have to do?
    • 1:12:31OK.
    • 1:12:31And Hannah what should you be doing?
    • 1:12:34I would, it's just for you for now, so point at the ground representing null.
    • 1:12:37OK.
    • 1:12:38So, again demonstrating the fact that, unlike in past weeks where
    • 1:12:41we had our nice, clean array back, to back, to back,
    • 1:12:43contiguously, these guys are deliberately all over the stage.
    • 1:12:46So let's malloc another.
    • 1:12:47How about number 5.
    • 1:12:49What's your name?
    • 1:12:49AUDIENCE: Jonathan.
    • 1:12:50SPEAKER 1: Jonathan.
    • 1:12:50All right, Jonathan.
    • 1:12:51You are our number 5.
    • 1:12:53And pick your favorite place in memory.
    • 1:12:55[AUDIENCE CLAPPING]
    • 1:12:56OK.
    • 1:12:58All right.
    • 1:12:59So Jonathan's now over there.
    • 1:13:01And Hannah is over there.
    • 1:13:02So 5, we want to point Hannah at number 5.
    • 1:13:04So you, of course, are going to point there.
    • 1:13:06And where should you be pointing?
    • 1:13:07Down to represent null, as well.
    • 1:13:09OK.
    • 1:13:10So pretty straightforward.
    • 1:13:11But now things get a little interesting.
    • 1:13:13And here, we'll use a chance to, without the weeds of code,
    • 1:13:16point out how order of operations is really going to matter.
    • 1:13:19Suppose that I next want to allocate say, the number 1.
    • 1:13:23And I want to insert the number 1 into this list.
    • 1:13:25Yes.
    • 1:13:26This is what the code would look like.
    • 1:13:27But if we act this out-- could we get one more volunteer?
    • 1:13:31How about on the end there in the sweater.
    • 1:13:32Yeah.
    • 1:13:33Come on down.
    • 1:13:34We have, what's your name?
    • 1:13:35AUDIENCE: Lauren.
    • 1:13:36SPEAKER 1: Lauren.
    • 1:13:37OK.
    • 1:13:37Lauren, come on down.
    • 1:13:38[AUDIENCE CLAPPING]
    • 1:13:43And how about, Lauren, why don't you go right
    • 1:13:45in here in front, if you don't mind.
    • 1:13:47Here is your number.
    • 1:13:48Here is your pointer.
    • 1:13:49So I've initialized Lauren to the number 1.
    • 1:13:51And your pointer will be null, pointing at the ground.
    • 1:13:54Where do you belong if we're maintaining sorted order?
    • 1:13:57Looks like right at the beginning.
    • 1:13:58What should happen here?
    • 1:14:00OK.
    • 1:14:01So Pedro has presumed to point now at Lauren.
    • 1:14:06But how do you know where to point?
    • 1:14:10AUDIENCE: He's number 2.
    • 1:14:11SPEAKER 1: Pedro's undoing what he did a moment ago.
    • 1:14:13So this was deliberate.
    • 1:14:14And that was perfect that Pedro presumed to point immediately at Lauren.
    • 1:14:17Why?
    • 1:14:18You literally just orphaned all of these folks, all of these chunks of memory.
    • 1:14:21Why?
    • 1:14:22Because if Pedro was our only variable pointing at that chunk of memory,
    • 1:14:26this is the danger of using pointers, and dynamic memory allocation,
    • 1:14:29and building your own data structures.
    • 1:14:31The moment you point temporarily, if you could,
    • 1:14:33to Lauren, I have no idea where he's pointing to.
    • 1:14:36I have no idea how to get back to Caleb, or Hannah, or anyone else on stage.
    • 1:14:41So that was bad.
    • 1:14:42So you did undo it.
    • 1:14:43So that's good.
    • 1:14:44I think we need Lauren to make a decision first.
    • 1:14:46Who should you point at?
    • 1:14:47AUDIENCE: Caleb.
    • 1:14:47SPEAKER 1: So pointing at Caleb.
    • 1:14:48Why?
    • 1:14:49Because you're pointing at literally who Pedro is pointing at.
    • 1:14:51Pedro, now what are you safe to do?
    • 1:14:53Good.
    • 1:14:53So order of operations there matters.
    • 1:14:55And if we had just done this line of code in red here, list equals n.
    • 1:14:59That was like Pedro's first instinct, bad things happen.
    • 1:15:02And we orphaned the rest of the list.
    • 1:15:04But if we think through it logically and do this, as Lauren did for us, instead,
    • 1:15:08we've now updated the list to look a little something more like this.
    • 1:15:11Let's do one last one.
    • 1:15:12We got one more foam finger here for the number 3.
    • 1:15:15How about on the end?
    • 1:15:16Yeah.
    • 1:15:16You want to come down.
    • 1:15:18All right.
    • 1:15:18One final volunteer.
    • 1:15:19[AUDIENCE CLAPPING]
    • 1:15:26All right.
    • 1:15:26And what's your name?
    • 1:15:27AUDIENCE: Miriam.
    • 1:15:28SPEAKER 1: I'm sorry?
    • 1:15:28AUDIENCE: Miriam.
    • 1:15:28SPEAKER 1: Miriam.
    • 1:15:29All right.
    • 1:15:29So here is your number 3.
    • 1:15:30Here is your pointer.
    • 1:15:31If you want to go maybe in the middle of the stage in a random memory location.
    • 1:15:35So here, too, the goal is to maintain sorted order.
    • 1:15:39So let's ask the audience, who or what number should point at whom first here?
    • 1:15:44So we don't screw up and orphan some of the memory.
    • 1:15:46And if we do orphan memory, this is what's called, again per last week,
    • 1:15:50a memory leak.
    • 1:15:51Your Mac, your PC, your phone can start to slow down
    • 1:15:53if you keep asking for memory but never give it back or lose track of it.
    • 1:15:56So we want to get this right.
    • 1:15:58Who should point at whom?
    • 1:16:00Or what number?
    • 1:16:01Say again.
    • 1:16:02AUDIENCE: 3 to 4.
    • 1:16:03SPEAKER 1: 3 should point at 4.
    • 1:16:04So 3, do you want to point at 4.
    • 1:16:08And not, so, OK, good.
    • 1:16:09And how did you know, Miriam, whom to point at?
    • 1:16:14AUDIENCE: Copying Caleb.
    • 1:16:15SPEAKER 1: Perfect.
    • 1:16:16OK, so copying Caleb.
    • 1:16:18Why?
    • 1:16:18Because if you look at where this list is currently constructed,
    • 1:16:22and you can cheat on the board here, 2 is pointing to 4.
    • 1:16:25If you point at whoever Caleb, number 2, is pointing out,
    • 1:16:28that, indeed, leads you to Hannah for number 4.
    • 1:16:31So now what's the next step to stitch this together?
    • 1:16:35Our voice in the crowd.
    • 1:16:37AUDIENCE: 2 to 3.
    • 1:16:38SPEAKER 1: 2 to 3.
    • 1:16:39So, 2 to 3.
    • 1:16:40So Caleb, I think it's now safe for you to decouple.
    • 1:16:42Because someone is already pointing at Hannah.
    • 1:16:44We haven't orphaned anyone.
    • 1:16:45So now, if we follow the breadcrumbs, we've
    • 1:16:47got Pedro leading to 1, to 2, to 3, to 4, to 5.
    • 1:16:52We need the numbers back, but you can keep the foam fingers.
    • 1:16:55Thank you to our volunteers here.
    • 1:16:57AUDIENCE: Thank you.
    • 1:16:58Thank you.
    • 1:16:58[AUDIENCE CLAPPING]
    • 1:17:00SPEAKER 1: You can just put the numbers here.
    • 1:17:03AUDIENCE: Thank you.
    • 1:17:04SPEAKER 1: Thank you to all.
    • 1:17:05So this is only to say that when you start looking at the code this week
    • 1:17:09and in the problem set, it's going to be very easy to lose
    • 1:17:11sight of the forest for the trees.
    • 1:17:13Because the code does get really dense.
    • 1:17:15But the idea is, again, really do bubble up to these higher level descriptions.
    • 1:17:20And if you think about data structures at this level.
    • 1:17:23If you go off in program after a class like CS50
    • 1:17:25and your whiteboarding something with a friend or a colleague,
    • 1:17:28most people think at and talk at this level.
    • 1:17:31And they just assume that, yeah, if we went back and looked
    • 1:17:33at our textbooks or class notes, we could figure out how to implement this.
    • 1:17:36But the important stuff is the conversation.
    • 1:17:38And the idea is up here.
    • 1:17:40Even though, via this week, will we get some practice with the actual code.
    • 1:17:45So when it comes to analyzing an algorithm like this,
    • 1:17:49let's consider the following.
    • 1:17:51What might be now the running time of operations like searching and inserting
    • 1:17:58into a linked list?
    • 1:18:00We talked about arrays earlier.
    • 1:18:01And we had some binary search possibilities still, as soon
    • 1:18:04as it's an array.
    • 1:18:05But as soon as we have a linked list, these arrows, like our volunteers,
    • 1:18:08could be anywhere on stage.
    • 1:18:10And so you can't just assume that you can
    • 1:18:11jump arithmetically to the middle element, to the middle element,
    • 1:18:14to the middle one.
    • 1:18:15You pretty much have to follow all of these breadcrumbs again and again.
    • 1:18:19So how might that inform what we see?
    • 1:18:21Well, consider this too.
    • 1:18:23Even though I keep drawing all these pictures with all of the numbers
    • 1:18:26exposed.
    • 1:18:26And all of us humans in the room can easily
    • 1:18:28spot where the 1 is, where the 2 is, where the 3 is, the computer, again,
    • 1:18:32just like with our lockers and arrays, can only see one location at a time.
    • 1:18:36And the key thing with a linked list is that the only address
    • 1:18:40we've fundamentally been remembering is what Pedro represented a moment ago.
    • 1:18:44He was the link to all of the other nodes.
    • 1:18:47And, in turn, each person led to the next.
    • 1:18:49But without Pedro, we would have lost some of, or all of, the linked list.
    • 1:18:54So when you start with a linked list, if you
    • 1:18:56want to find an element as via search, you have to do it linearly.
    • 1:19:00Following all of the arrows.
    • 1:19:02Following all of the pointers on the stage
    • 1:19:04in order to get to the node in question.
    • 1:19:06And only once you hit null can you conclude, yep, it was there.
    • 1:19:09Or no, it was not.
    • 1:19:11So given that if a computer, essentially,
    • 1:19:14can only see the number 1, or the number 2, or the number 3, or the number 4,
    • 1:19:18or the number 5, one at a time, how might we
    • 1:19:22think about the running time of search?
    • 1:19:25And it is indeed Big O of n.
    • 1:19:27But why is that?
    • 1:19:28Well, in the worst case, the number you might be looking for
    • 1:19:30is all the way at the end.
    • 1:19:32And so, obviously, you're going to have to search all of the n elements.
    • 1:19:35And I drew these things with boxes on top of them.
    • 1:19:37Because, again, even though you and I can immediately see,
    • 1:19:40where the 5 is for instance, the computer
    • 1:19:42can only figure that out by starting at the beginning and going there.
    • 1:19:46So there, too, is another trade off.
    • 1:19:48It would seem that, overnight, we have lost the ability
    • 1:19:52to do a very powerful algorithm from week 0 known as binary search, right.
    • 1:19:57It's gone.
    • 1:19:57Because there's no way in this picture to jump mathematically
    • 1:20:01to the middle node, unless you remember where it is.
    • 1:20:04And then, remember where every other node is.
    • 1:20:06And at that point, you're back to an array.
    • 1:20:08Linked list, by design, only remember the next node in the list.
    • 1:20:12All right.
    • 1:20:12How about something like insert?
    • 1:20:15In the worst case, perhaps, how many steps
    • 1:20:18might it take to insert something into a linked list?
    • 1:20:21Someone else.
    • 1:20:22Someone else.
    • 1:20:23Yeah.
    • 1:20:24AUDIENCE: N squared.
    • 1:20:25SPEAKER 1: Say again?
    • 1:20:25AUDIENCE: N squared.
    • 1:20:26SPEAKER 1: N squared.
    • 1:20:26Fortunately, it's not that bad.
    • 1:20:28It's not as bad as n squared.
    • 1:20:29That typically means doing n things, n times.
    • 1:20:31And I think we can stay under that, but not a bad thought.
    • 1:20:36Yeah.
    • 1:20:36AUDIENCE: Is it n?
    • 1:20:37SPEAKER 1: Why would it be n?
    • 1:20:39AUDIENCE: Because the [INAUDIBLE].
    • 1:20:42SPEAKER 1: OK.
    • 1:20:43So to summarize, you're proposing n.
    • 1:20:45Because to find where the thing goes, you
    • 1:20:47have to traverse, potentially, the whole list.
    • 1:20:49Because if I'm inserting the number 6 or the number 99,
    • 1:20:52that numerically belongs at the very end,
    • 1:20:54I can only find its location by looking for all of them.
    • 1:20:57At this point, though, in the term.
    • 1:20:59And really, at this point in the story, you
    • 1:21:01should start to question these very simplistic questions, to be honest.
    • 1:21:04Because the answer is almost always going to depend, right.
    • 1:21:08If I've just got a link to list that looks like this,
    • 1:21:10the first question back to someone asking this question
    • 1:21:14would be, well does the list need to be sorted, right?
    • 1:21:17I've drawn it as sorted and it might imply as much.
    • 1:21:19So that's a reasonable assumption to have made.
    • 1:21:21But if I don't care about maintaining sorted order,
    • 1:21:24I could actually insert into a linked list in constant time.
    • 1:21:28Why?
    • 1:21:28I could just keep inserting into the beginning, into the beginning,
    • 1:21:31into the beginning.
    • 1:21:32And even though the list is getting longer,
    • 1:21:34the number of steps required to insert something between the first element
    • 1:21:38is not growing at all.
    • 1:21:40You just keep inserting.
    • 1:21:42If you want to keep it sorted though, yes, it's
    • 1:21:44going to be, indeed, Big O of n.
    • 1:21:46But again, these kinds of, now, assumptions
    • 1:21:47are going to start to matter.
    • 1:21:49So let's for the sake of discussion say it's Big O of n,
    • 1:21:51if we do want to maintain sorted order.
    • 1:21:53But what about in the case of not caring.
    • 1:21:56It might indeed be a Big O of 1.
    • 1:21:58And now these are the kinds of decisions that will start to leave to you.
    • 1:22:01What about in the best case here?
    • 1:22:03If we're thinking about Big Omega notation,
    • 1:22:05then, frankly, we could just get lucky in the best case.
    • 1:22:07And the element we're looking for happens to be at the beginning.
    • 1:22:10Or heck, we just blindly insert to the beginning irrespective of the order
    • 1:22:14that we want to keep things in.
    • 1:22:16All right.
    • 1:22:17So besides then, how can we improve further on this design?
    • 1:22:22We don't need to stop at linked list.
    • 1:22:23Because, honestly, it's not been a clear win.
    • 1:22:26Like, linked list allow us to use more of our memory
    • 1:22:28because we don't need massive growing chunks of contiguous memory.
    • 1:22:32So that's a win.
    • 1:22:33But they still require Big O of n time to find the end of it,
    • 1:22:37if we care about order.
    • 1:22:38We're using at least twice as much memory for the darn pointer.
    • 1:22:41So that seems like a sidestep.
    • 1:22:44It's not really a step forward.
    • 1:22:46So can we do better?
    • 1:22:47Here's where we can now accelerate the story by just stipulating that, hey,
    • 1:22:52even if you haven't used this technique yet,
    • 1:22:53we would seem to have an ability to stitch together pieces of memory just
    • 1:22:58using pointers .
    • 1:22:59And anything you could imagine drawing with arrows,
    • 1:23:01you can implement, it would seem, in code.
    • 1:23:04So what if we leverage a second dimension.
    • 1:23:06Instead of just stringing together things laterally,
    • 1:23:09left to right, essentially, even though they
    • 1:23:10were bouncing around on the screen.
    • 1:23:12What if we start to leverage a second dimension here, so to speak.
    • 1:23:15And build more interesting structures in the computer's memory.
    • 1:23:19Well it turns out that in a computer's memory,
    • 1:23:22we could create a tree, similar to a family tree.
    • 1:23:25If you've ever seen or draw on a family tree with grandparents, and parents,
    • 1:23:28and siblings, and so forth.
    • 1:23:32So inverted branch of a tree that grows, typically
    • 1:23:36when it's drawn, downward instead of upward like a typical tree.
    • 1:23:39But that's something we could translate into code as well.
    • 1:23:41Specifically, let's do something called a binary search tree.
    • 1:23:45Which is a type of tree.
    • 1:23:47And what I mean by this is the following.
    • 1:23:49Notice this.
    • 1:23:50This is an example of an array from like week 2,
    • 1:23:53when we first talked about those.
    • 1:23:54And we had the lockers on stage.
    • 1:23:56And recall that what was nice about an array, if 1, it's sorted.
    • 1:24:02And 2, all of its numbers are indeed contiguous,
    • 1:24:05which is by definition an array.
    • 1:24:07We can just do some simple math.
    • 1:24:09For instance, if there are 7 elements in this array, and we do 7 divided by 2,
    • 1:24:13that's what?
    • 1:24:143 and 1/2, round down through truncation, that's 3.
    • 1:24:170, 1, 2, 3.
    • 1:24:18That gives me the middle element, arithmetically, in this thing.
    • 1:24:21And even though I have to be careful about rounding, using
    • 1:24:24simple arithmetic, I can very quickly, with a single line of code or math,
    • 1:24:28find for you the middle of the left half, of the left half,
    • 1:24:30of the right half, or whatever.
    • 1:24:32That's the power of arrays.
    • 1:24:33And that's what gave us binary search.
    • 1:24:35And how did binary search work?
    • 1:24:36Well, we looked at the middle.
    • 1:24:38And then, we went left or right.
    • 1:24:39And then, we went left or right again, implied by this color scheme here.
    • 1:24:45Wouldn't it be nice if we somehow preserved the new upsides
    • 1:24:50today of dynamic memory allocation, giving ourselves
    • 1:24:53the ability to just add another element, add another element,
    • 1:24:55add another element.
    • 1:24:56But retain the power of binary search.
    • 1:24:59Because log of n was much better than n, certainly for large data sets, right.
    • 1:25:04Even the phone book demonstrated as much weeks ago.
    • 1:25:06So what if I draw this same picture in 2 dimensions.
    • 1:25:11And I preserve the color scheme, just so it's obvious what came where.
    • 1:25:14What are these things look like now?
    • 1:25:18Maybe, like, things we might now call nodes, right.
    • 1:25:21A node is just a generic term for like, storing some data.
    • 1:25:25What if the data these nodes are storing are numbers.
    • 1:25:28So still integers.
    • 1:25:29But what if we connected these cleverly, like an old family tree.
    • 1:25:33Whereby, every node has not one pointer now, but as many as 2.
    • 1:25:39Maybe 0, like in the leaves at the bottom are in green.
    • 1:25:42But other nodes on the interior might have as many as 2.
    • 1:25:45Like having 2 children, so to speak.
    • 1:25:47And indeed, the vernacular here is exactly that.
    • 1:25:49This would be called the root of the tree.
    • 1:25:51Or this would be a parent, with respect to these children.
    • 1:25:54The green ones would be grandchildren, respect to these.
    • 1:25:56The green ones would be siblings with respect to each other.
    • 1:26:01And over there, too.
    • 1:26:02So all the same jargon you might use in the real world,
    • 1:26:04applies in the world of data structures and CS trees.
    • 1:26:07But this is interesting because I think we could build this now, this data
    • 1:26:12structure in the computer's memory.
    • 1:26:15How?
    • 1:26:15Well, suppose that we defined a node to be no longer just
    • 1:26:20this, a number in a next field.
    • 1:26:22What if we give ourselves a bit more room here?
    • 1:26:24And give ourselves a pointer called left and another one called right.
    • 1:26:29Both of which is a pointer to a struct node.
    • 1:26:32So same idea as before, but now we just make sure we think of these things
    • 1:26:36as pointing this way and this way, not just this way.
    • 1:26:39Not just a single direction, but 2.
    • 1:26:41So you could imagine, in code, building something up like this with a node.
    • 1:26:45That creates, in essence, this diagram here.
    • 1:26:48But why is this compelling?
    • 1:26:50Suppose I want to find the number 3.
    • 1:26:52I want to search for the number 3 in this tree.
    • 1:26:54It would seem, just like Pedro was the beginning of our linked list,
    • 1:26:58in the world of trees, the root, so to speak,
    • 1:27:01is the beginning of your data structure.
    • 1:27:03You can retain and remember this entire tree just by pointing at the root node,
    • 1:27:08ultimately.
    • 1:27:09One variable can hang on to this whole tree.
    • 1:27:12So how can I find the number 3?
    • 1:27:14Well, if I look at the root node and the number I'm looking for is less than.
    • 1:27:18Notice, I can go this way.
    • 1:27:20Or if it's greater than, I can go this way.
    • 1:27:22So I preserve that property of the phone book,
    • 1:27:24or just assorted array in general.
    • 1:27:27What's true over here?
    • 1:27:28If I'm looking for 3, I can go to the right of the 2
    • 1:27:31because that number is going to be greater.
    • 1:27:33If I go left, it's going to be smaller instead.
    • 1:27:35And here's an example of actually recursion.
    • 1:27:38Recursion in a physical sense much like the Mario's pyramid.
    • 1:27:42Which was recursively to find.
    • 1:27:44Notice this.
    • 1:27:45I claim this whole thing is a tree.
    • 1:27:47Specifically, a binary search tree, which means every node
    • 1:27:50has 2, or maybe 1, or maybe 0 children.
    • 1:27:53But no more than 2.
    • 1:27:55Hence the bi in binary.
    • 1:27:56And it's the case that every left child is smaller than the root.
    • 1:28:02And every right child is larger than the root.
    • 1:28:05That definition certainly works for 2, 4, and 6.
    • 1:28:08But it also works recursively for every sub tree, or branch of this tree.
    • 1:28:12Notice, if you think of this as the root,
    • 1:28:14it is indeed bigger than this left child.
    • 1:28:16And it's smaller than this right child.
    • 1:28:19And if you look even at the leaves, so to speak.
    • 1:28:21The grandchildren here.
    • 1:28:23This root node is bigger than its left child, if it existed.
    • 1:28:26So it's a meaningless statement.
    • 1:28:28And it's less than its right child.
    • 1:28:30Or it's not greater than, certainly, so that's meaningless too.
    • 1:28:33So we haven't violated the definition even for these leaves, as well.
    • 1:28:36And so, now, how many steps does it take to find in the worst case
    • 1:28:40any number in a binary search tree, it would seem?
    • 1:28:44So it seems 2, literally.
    • 1:28:46And the height of this thing is actually 3.
    • 1:28:48And so long story short, especially, if you're a little less comfy
    • 1:28:51with your logarithms from yesteryear.
    • 1:28:53Log base 2 is the number of times you can divide something in half, and half,
    • 1:28:57and half, until you get down to 1.
    • 1:28:58This is like a logarithm in the reverse direction.
    • 1:29:01Here's a whole lot of elements.
    • 1:29:03And we're having, we're having until we get down to 1.
    • 1:29:05So the height of this tree, that is to say, is log base 2 of n.
    • 1:29:09Which means that even in the worst case, the number you're looking for maybe
    • 1:29:12it's all the way at the bottom in the leaves.
    • 1:29:14Doesn't matter.
    • 1:29:15It's going to take log base 2 of n steps, or log of n steps,
    • 1:29:20to find, maximally, any one of those numbers.
    • 1:29:23So, again, binary search is back.
    • 1:29:28But we've paid a price, right.
    • 1:29:30This isn't a linked list anymore.
    • 1:29:32It's a tree.
    • 1:29:33But we've gained back binary search, which is pretty compelling, right.
    • 1:29:36That's where the whole class began, on making that distinction.
    • 1:29:38But what price have we paid to retain binary search in this new world.
    • 1:29:44Yeah.
    • 1:29:47It's no longer sorted left to right, but this
    • 1:29:49is a claim sorted, according to the binary search tree definition.
    • 1:29:52Where, again, left child is smaller than root.
    • 1:29:56And right child is greater than root.
    • 1:29:58So it is sorted, but it's sorted in a 2-dimensional sense, if you will.
    • 1:30:01Not just 1.
    • 1:30:02But another price paid?
    • 1:30:05AUDIENCE: [INAUDIBLE] nodes now.
    • 1:30:06SPEAKER 1: Exactly.
    • 1:30:07Every node now needs not one number, but 2, 3 pieces of data.
    • 1:30:11A number and now 2 pointers.
    • 1:30:13So, again, there's that trade off again.
    • 1:30:15Where, well, if you want to save time, you've
    • 1:30:17got to give something if you start giving space.
    • 1:30:20And you start using more space, you can speed up time.
    • 1:30:22Like, you've got it.
    • 1:30:23There's always a price paid.
    • 1:30:24And it's very often in space, or time, or complexity, or developer time,
    • 1:30:30the number of bugs you have to solve.
    • 1:30:32I mean, all of these are finite resources
    • 1:30:34that you have to juggle them on.
    • 1:30:35So if we consider now the code with which we can implement this,
    • 1:30:38here might be the node.
    • 1:30:40And how might we actually use something like this?
    • 1:30:43Well, let's take a look at, maybe, one final program.
    • 1:30:45And see here, before we transition to higher level concepts, ultimately.
    • 1:30:49Let me go ahead here and let me just open a program I wrote here in advance.
    • 1:30:54So let me, in a moment, copy over file called tree.c.
    • 1:30:58Which we'll have on the course's websites.
    • 1:31:01And I'll walk you through some of the logic
    • 1:31:02here that I've written for tree.c.
    • 1:31:07All right.
    • 1:31:08So what do we have here first?
    • 1:31:09So here is an implementation of a binary search tree for numbers.
    • 1:31:14And as before, I've played around and I've inserted the numbers manually.
    • 1:31:18So what's going on first?
    • 1:31:20Here is my definition of a node for a binary search tree, copied and pasted
    • 1:31:24from what I proposed on the board a moment ago.
    • 1:31:27Here are 2 prototypes for 2 functions, that I'll
    • 1:31:29show you in a moment, that allow me to free
    • 1:31:31an entire tree, one node at a time.
    • 1:31:35And then, also allow me to print the tree in order.
    • 1:31:37So even though they're not sorted left to right,
    • 1:31:40I bet if I'm clever about what child I print first,
    • 1:31:43I can reconstruct the idea of printing this tree properly.
    • 1:31:46So how might I implement a binary search tree?
    • 1:31:49Here's my main function.
    • 1:31:50Here is how I might represent a tree of size 0.
    • 1:31:53It's just a null pointer called tree.
    • 1:31:55Here's how I might add a number to that list.
    • 1:31:58So here, for instance, is me malllocing space for a node.
    • 1:32:02Storing it in a temporary variable called n.
    • 1:32:04Here is me just doing a safety check.
    • 1:32:06Make sure n does not equal null.
    • 1:32:07And then, here is me initializing this node to contain the number 2, first.
    • 1:32:12Then, initializing the left child of that node to be null.
    • 1:32:14And the right child of that null node to be null.
    • 1:32:17And then, initializing the tree itself to be equal to that particular node.
    • 1:32:22So at this point in the story, there's just one rectangle on the screen
    • 1:32:25containing the number 2 with no children.
    • 1:32:28All right.
    • 1:32:29Let's just add manually to this a little further.
    • 1:32:31Let's add another number to the list, by mallocing another node.
    • 1:32:34I don't need to declare n as a node* because it already exists at this
    • 1:32:38point.
    • 1:32:38Here's a little safety check.
    • 1:32:40I'm going to not bother with my, let me do this, free memory here.
    • 1:32:45Just to be safe.
    • 1:32:47Do I want to do this?
    • 1:32:49We want a free memory too, which I've not done here,
    • 1:32:51but I'll save that for another time.
    • 1:32:53Here, I'm going to initialize the number to 1.
    • 1:32:55I'm going to initialize the children of this node to null and null.
    • 1:33:00And now, I'm going to do this.
    • 1:33:01Initialize the tree's left child to be n.
    • 1:33:06So what that's essentially doing here is if this
    • 1:33:09is my root node, the single rectangle I described a moment ago that currently
    • 1:33:12has no children, neither left nor right.
    • 1:33:14Here's my new node with the number 1.
    • 1:33:16I want it to become the new left child.
    • 1:33:18So that line of code on the screen there, tree left equals n,
    • 1:33:22is like stitching these 2 together with a pointer from 2 to the 1.
    • 1:33:26All right.
    • 1:33:27The next lines of code, you can probably guess,
    • 1:33:30are me adding another number to the list.
    • 1:33:32Just the number 3.
    • 1:33:33So this is a simpler tree with 2, 1, and, 3 respectively.
    • 1:33:39And this code, let me wave my hands, is almost the same.
    • 1:33:41Except for the fact that I'm updating the tree's right child
    • 1:33:45to be this new and third node.
    • 1:33:46Let's now run the code before looking at those 2 functions.
    • 1:33:50Let me do make tree, ./tree.
    • 1:33:54And while I'll 1, 2, 3.
    • 1:33:55So it sounds like the data structure is sorted, to your concern earlier.
    • 1:33:58But how did I actually print this?
    • 1:34:00And then, eventually, free the whole thing?
    • 1:34:02Well let's look at the definition of first print tree.
    • 1:34:05And this is where things get interesting.
    • 1:34:08Print tree returns nothing so it's a void function.
    • 1:34:12But it takes a pointer to a root element as its sole argument, node* root.
    • 1:34:18Here's my safety check.
    • 1:34:19If root equals equals null, there's obviously
    • 1:34:21nothing to print, just return.
    • 1:34:23That goes without saying.
    • 1:34:24But here's where things get a little magical.
    • 1:34:27Otherwise, print your left child.
    • 1:34:30Then print your own number.
    • 1:34:33Then, print your right child.
    • 1:34:36What is this an example of, even though it's not mentioned by name here?
    • 1:34:41What programming technique here?
    • 1:34:43AUDIENCE: Recursion.
    • 1:34:44SPEAKER 1: Yeah.
    • 1:34:44So this is actually perhaps the most compelling use of recursion, yet.
    • 1:34:48It wasn't really that compelling with the Mario thing
    • 1:34:50because we had such an easy implementation with a for-loop loop
    • 1:34:52weeks ago.
    • 1:34:53But here is a perfect application of recursion, where your data structure
    • 1:34:58itself is recursive, right.
    • 1:34:59If you take any snip of any branch, it all
    • 1:35:02still looks like a tree, just a smaller one.
    • 1:35:04That lends itself to recursion.
    • 1:35:06So here is this leap of faith where I say, print my left tree, or my left sub
    • 1:35:11tree, if you will, via my child at the left.
    • 1:35:13Then, I'll print my own root node here in the middle.
    • 1:35:17Then, go ahead and print my right sub tree.
    • 1:35:19And because we have this base case that makes sure that if the root is null,
    • 1:35:24there's nothing to do, you're not going to recurse infinitely.
    • 1:35:26You're not going to call yourself again, and again, and again,
    • 1:35:29infinitely, many times.
    • 1:35:31So it works out and prints the 1, the 2, and the 3.
    • 1:35:35And notice what we could do, too.
    • 1:35:36If you wanted to print the tree in reverse order, you could do that.
    • 1:35:40Print your right tree first, the greater element.
    • 1:35:43Then, yourself.
    • 1:35:43Then, your smaller sub tree.
    • 1:35:45And if I do make tree here and ./tree, well now,
    • 1:35:47I've reversed the order of the list.
    • 1:35:50And that's pretty cool.
    • 1:35:51You can do it with a for-loop in an array.
    • 1:35:52But you can also do it, even with this 2-dimensional structure.
    • 1:35:56Let's lastly look at this free tree function.
    • 1:36:00And this one's almost the same.
    • 1:36:02Order doesn't matter in quite the same way, but it does still matter.
    • 1:36:05Here's what I did with free tree.
    • 1:36:07Well, if the root of the tree is null, there's obviously nothing to do.
    • 1:36:09Just return.
    • 1:36:10Otherwise, go ahead and free your left child and all of its descendants.
    • 1:36:15Then free your right child and all of its descendants.
    • 1:36:18And then, free yourself.
    • 1:36:19And again, free literally just frees the address in that variable.
    • 1:36:25It doesn't free the whole darn thing.
    • 1:36:27It just frees literally what's at that address.
    • 1:36:29Why was it important that I did line 72 last, though?
    • 1:36:33Why did I free the left child and the right child
    • 1:36:36before I freed myself, so to speak?
    • 1:36:39AUDIENCE: [INAUDIBLE].
    • 1:36:40SPEAKER 1: Exactly.
    • 1:36:41If you free yourself first, if I had done incorrectly this line higher up,
    • 1:36:46you're not allowed to touch the left child tree or the right child tree.
    • 1:36:50Because the memory address is no longer valid at that point.
    • 1:36:53You would get some memory error, perhaps.
    • 1:36:55The program would crash.
    • 1:36:56Valgrind definitely wouldn't like it.
    • 1:36:57Bad things would otherwise happen.
    • 1:37:00But here, then, is an example of recursion.
    • 1:37:01And again, just a recursive use of an actual data structure.
    • 1:37:06And what's even cooler here is, relatively speaking,
    • 1:37:09suppose we wanted to search something like this.
    • 1:37:11Binary search actually gets pretty straightforward to implement 2.
    • 1:37:15For instance.
    • 1:37:16here might be the prototype for a search function for a binary search tree.
    • 1:37:20You give me the root of a tree, and you give me a number I'm looking for,
    • 1:37:25and I can pretty easily now return true if it's in there or false if it's not.
    • 1:37:29How?
    • 1:37:30Well, let's first ask a question.
    • 1:37:32If tree equals equals null, then you just return false.
    • 1:37:35Because if there's no tree, there's no number, so it's obviously not there.
    • 1:37:38Return false.
    • 1:37:39Else if, the number you're looking for is less than the tree's own number,
    • 1:37:46which direction should we go?
    • 1:37:48AUDIENCE: Left.
    • 1:37:49SPEAKER 1: OK, left.
    • 1:37:50How do we express that?
    • 1:37:51Well, let's just return the answer to this question.
    • 1:37:54Search the left sub tree, by way of my left child,
    • 1:37:58looking for the same number.
    • 1:37:59And you just assume through the beauty of recursion
    • 1:38:02that you're kicking the can and let yourself figure it out
    • 1:38:05with a smaller problem.
    • 1:38:06Just that snipped left tree instead.
    • 1:38:09Else if, the number you're looking for is greater than the tree's own number,
    • 1:38:13go to the right, as you might infer.
    • 1:38:15So I can just return the answer to this question.
    • 1:38:18Search my right sub tree for that same number.
    • 1:38:21And there's a fourth and final condition.
    • 1:38:23What's the fourth scenario we have to consider, explicitly?
    • 1:38:26Yeah.
    • 1:38:26AUDIENCE: The number.
    • 1:38:27SPEAKER 1: If the number, itself, is right there.
    • 1:38:29So else if, the number I'm looking for equals the tree's own number,
    • 1:38:33then and only then, should you return true.
    • 1:38:36And if you're thinking quickly here, there's
    • 1:38:38an optimization possible, better design opportunity.
    • 1:38:42Think back to even our scratch days.
    • 1:38:43What could we do a little better here?
    • 1:38:45You're pointing at it.
    • 1:38:46AUDIENCE: Else.
    • 1:38:47SPEAKER 1: Exactly.
    • 1:38:48An else suffices.
    • 1:38:49Because if there's logically only 4 things that could happen,
    • 1:38:51you're wasting your time by asking a fourth gratuitous question.
    • 1:38:54And else here suffices.
    • 1:38:55So here to, more so than the Mario example a few weeks ago,
    • 1:38:59there's just this elegance arguably to recursion.
    • 1:39:02And that's it.
    • 1:39:02This is not pseudocode.
    • 1:39:03This is the code for binary search on a binary search tree.
    • 1:39:07And so, recursion tends to work in lockstep
    • 1:39:10with these kinds of data structures that have this structure to them
    • 1:39:14as we're seeing here.
    • 1:39:16All right.
    • 1:39:16Any questions, then, on binary search as implemented here with a tree?
    • 1:39:22Yeah.
    • 1:39:23AUDIENCE: About like third years.
    • 1:39:25[INAUDIBLE]
    • 1:39:29SPEAKER 1: Good question.
    • 1:39:30So when returning a Boolean value, true and false are values that are defined
    • 1:39:36in a library called Standard Bool, S-T-D-B-O-O-L dot H.
    • 1:39:40With a header file that you can use.
    • 1:39:42It is the case that true is, it's not well defined what they are.
    • 1:39:49But they would map indeed, yes.
    • 1:39:50To 0 and 1, essentially.
    • 1:39:51But you should not compare them explicitly to 0 and 1.
    • 1:39:54When you're using true and false, you should compare them to each other.
    • 1:39:57AUDIENCE: I meant if it's in a code return.
    • 1:40:01SPEAKER 1: Oh, sorry.
    • 1:40:02So if I am in my own code from earlier, an avoid function,
    • 1:40:05it is totally fine to return.
    • 1:40:08You just can't return something explicitly.
    • 1:40:10So return just means that's it.
    • 1:40:12Quit out of this function.
    • 1:40:14You're not actually handing back a value.
    • 1:40:16So it's a way of short circuiting the execution.
    • 1:40:19If you don't like that, and some people do frown
    • 1:40:22upon having code return from functions prematurely, you could invert the logic
    • 1:40:26and do something like this.
    • 1:40:28If the root does not equal null, do all of these things.
    • 1:40:31And then, indent all three of these lines underneath.
    • 1:40:34That's perfectly fine too.
    • 1:40:35I happen to write it the other way just so
    • 1:40:37that there was explicitly a base case that I could point to on the screen.
    • 1:40:40Whereas, now, it's implicitly there for us only.
    • 1:40:43But a good observation too.
    • 1:40:45All right.
    • 1:40:46So let's ask the question as before about running time of this.
    • 1:40:49It would look like binary search is back.
    • 1:40:51And we can now do things in logarithmic time, but we should be careful.
    • 1:40:57Is this a binary search tree?
    • 1:40:59Just to be clear.
    • 1:41:01And again, a binary search tree is a tree
    • 1:41:04where the root is greater than its left child and smaller than its right child.
    • 1:41:11That's the essence.
    • 1:41:11So you're nodding your head.
    • 1:41:13You agree?
    • 1:41:15I agree.
    • 1:41:16So this is a binary search tree.
    • 1:41:18Is this a binary search tree?
    • 1:41:20[INTERPOSING VOICES]
    • 1:41:21OK.
    • 1:41:21I'm hearing yeses.
    • 1:41:22Or I'm hearing just my delay changing the vote it would seem.
    • 1:41:25So this is one of those trick questions.
    • 1:41:28This is a binary search tree because I've not
    • 1:41:30violated the definition of what I gave you, right.
    • 1:41:33Is there any example of a left child that is greater than its parent?
    • 1:41:39Or is there any example of a right child that's smaller than its parent?
    • 1:41:42That's just the opposite way of describing the same thing.
    • 1:41:44No, this is a binary search tree.
    • 1:41:47Unfortunately, it also looks like, albeit at a different axis, what?
    • 1:41:50AUDIENCE: A linked list.
    • 1:41:51SPEAKER 1: A linked list.
    • 1:41:51But you could imagine this happening, right.
    • 1:41:53Suppose that I hadn't been as thoughtful as I was earlier
    • 1:41:56by inserting 2, And then 1, and then 3.
    • 1:41:59Which nicely balanced everything out.
    • 1:42:02Suppose that instead, because of what the user is typing in
    • 1:42:04or whatever you contrive in your own code, suppose you insert a 1,
    • 1:42:07and then a 2, and then a 3.
    • 1:42:10Like, you've created a problem for yourself.
    • 1:42:12Because if we follow the same logic as before, going left or going right,
    • 1:42:16this is how you might implement a binary search tree accidentally
    • 1:42:21if you just blindly keep following that definition.
    • 1:42:24I mean, this would be better designed as what?
    • 1:42:27If we rotated the whole thing around.
    • 1:42:29And that's totally fine.
    • 1:42:30And those kinds of trees actually have names.
    • 1:42:33There's trees called AVL trees in computer science.
    • 1:42:35There are red-black black trees in computer science.
    • 1:42:37There are other types of trees that, additionally,
    • 1:42:39add some logic that tell you when you got to pivot the thing,
    • 1:42:42and rotate it, and snip off the root, and fix things in this way.
    • 1:42:46But a binary search tree, in and of itself,
    • 1:42:48does not guarantee that it will be balanced, so to speak.
    • 1:42:51And so, if you consider the worst case scenario
    • 1:42:54of even using a binary search tree.
    • 1:42:55If you're not smart about the code you're writing
    • 1:42:57and you just blindly follow this definition,
    • 1:43:00you might accidentally create a crazy, long and stringy binary search
    • 1:43:04tree that essentially looks like a linked list.
    • 1:43:07Because you're not even using any of the left children.
    • 1:43:09So unfortunately, the literal answer to the question
    • 1:43:12here is what's the running time of search?
    • 1:43:15Well, hopefully, log n.
    • 1:43:17But not if you don't maintain the balance of the tree.
    • 1:43:19Both, in certain search, could actually devolve into instead of big O of log n,
    • 1:43:25literally, big O of n.
    • 1:43:26If you don't somehow take into account, and we're not
    • 1:43:29going to do the code for that here.
    • 1:43:30It's a higher level thing you might explore down the road.
    • 1:43:34It can devolve into something that you might not have intended.
    • 1:43:37And so, now that we're talking about 2 dimensions,
    • 1:43:40it's really the onus is on the programmer
    • 1:43:41to consider what kinds of perverse situations might happen.
    • 1:43:44Where the thing devolves into a structure
    • 1:43:46that you don't actually want it to devolve into.
    • 1:43:50All right.
    • 1:43:50We've got just a few structures to go.
    • 1:43:52Let's go ahead and take one more 5 minute break here.
    • 1:43:53When we come back, we'll talk at this level
    • 1:43:55about some final applications of this.
    • 1:43:57See you in 5.
    • 1:43:58All right.
    • 1:44:00So we are back.
    • 1:44:01And as promised, we'll operate now at this higher level.
    • 1:44:05Where if we take for granted that, even though you haven't had an opportunity
    • 1:44:08to play with these techniques yet, you have the ability now in code
    • 1:44:11to stitch things together.
    • 1:44:12Both in a one dimension and even 2 dimensions,
    • 1:44:15to build things like lists and trees.
    • 1:44:17So if we have these building blocks.
    • 1:44:19Things like now arrays, and lists, and trees,
    • 1:44:22what if we start to amalgamate them such that we build things out
    • 1:44:26of multiple data structures?
    • 1:44:28Can we start to get some of the best of both worlds by way of, for instance,
    • 1:44:32something called a hash table.
    • 1:44:33So a hash table is a Swiss army knife of data structures
    • 1:44:37in that it's so commonly used.
    • 1:44:39Because it allows you to associate keys with value, so to speak.
    • 1:44:44So, for instance, it allows you to associate a username with a password.
    • 1:44:49Or a name with a number.
    • 1:44:51Or anything where you have to take something as input,
    • 1:44:53and get as output a corresponding piece of information.
    • 1:44:56A hash table is often a data structure of choice.
    • 1:44:59And here's what it looks like.
    • 1:45:00It's actually looks like an array, at first glance.
    • 1:45:02But for discussion's sake, I've drawn this array vertically,
    • 1:45:05which is totally fine.
    • 1:45:06It's still just an array.
    • 1:45:08But it allows you, a hash table, to jump to any of these locations randomly.
    • 1:45:13That is instantly.
    • 1:45:14So, for instance, there's actually 26 locations in this array.
    • 1:45:18Because I want to, for instance, store initially
    • 1:45:21names of people, for instance.
    • 1:45:23And wouldn't it be nice if the person's name starts with A,
    • 1:45:26I have a go to place for it.
    • 1:45:27Maybe the first box.
    • 1:45:28And if it starts with Z, I put them at the bottom.
    • 1:45:30So that I can jump instantly, arithmetically,
    • 1:45:33using a little bit of Ascii or Unicode fanciness,
    • 1:45:35exactly to the location that they want to they need to go.
    • 1:45:38So, for instance, here's our array 0 index.
    • 1:45:400 through 25.
    • 1:45:42If I think of this, though, as A through Z,
    • 1:45:44I'm going to think of these 26 locations,
    • 1:45:46now in the context of a hash table, is what we'll generally call buckets.
    • 1:45:49So buckets into which you can put values.
    • 1:45:52So, for instance, suppose that we want to insert a value, one name
    • 1:45:56into this data structure.
    • 1:45:57And that name is say, Albus.
    • 1:45:59So Albus starting with A. Albus might go at the very beginning of this list.
    • 1:46:03All right.
    • 1:46:04And then, we want to insert another name.
    • 1:46:06This one happens to be Zacharias.
    • 1:46:07Starting with Z, so it goes all the way at the end of this data
    • 1:46:10structure in location 25 a.k.a.
    • 1:46:12Z.
    • 1:46:13And then, maybe a third name like Hermione, and that goes at location H
    • 1:46:17according to that position in the alphabet.
    • 1:46:19So this is great because in constant time,
    • 1:46:22I can insert and conversely search for any of these names,
    • 1:46:26based on the first letter of their name.
    • 1:46:27A, or Z, or H, in this case.
    • 1:46:30Let's fast forward and assume we put a whole bunch of other names--
    • 1:46:32might look familiar, into this hash table.
    • 1:46:34It's great because every name has its own location.
    • 1:46:39But if you're thinking of names you don't yet see it on the screen,
    • 1:46:43we eventually encounter a problem with this, right.
    • 1:46:45When could something go wrong using a hash table like this
    • 1:46:49if we wanted to insert even more names?
    • 1:46:52What's going to eventually happen?
    • 1:46:54Yeah.
    • 1:46:54There's already someone with the first letter, right.
    • 1:46:56Like I haven't even mentioned Harry, for instance, or Hagrid.
    • 1:46:59And yet, Hermione's already using that spot.
    • 1:47:01So that invites the question, well, what happens?
    • 1:47:04Maybe, if we want to insert Harry next, do we maybe cheat and put him
    • 1:47:07at location I?
    • 1:47:08But then if there's a location I, where do we put them?
    • 1:47:11And it just feels like the situation could very quickly devolve.
    • 1:47:13But I've deliberately drawn this data structure,
    • 1:47:16that I claim as a hash table, in 2 directions.
    • 1:47:19An array vertically, here.
    • 1:47:22But what might this be hinting I'm using horizontally,
    • 1:47:25even though I'm drawing the rectangles a little differently from before?
    • 1:47:28AUDIENCE: An array.
    • 1:47:29SPEAKER 1: Yeah.
    • 1:47:29Maybe another array, to be fair.
    • 1:47:31But, honestly, arrays are such a pain with the allocating, and reallocating,
    • 1:47:34and so forth.
    • 1:47:34These look like the beginnings of a linked list, if you will.
    • 1:47:38Where the name is where the number used to be, even though I'm drawing it
    • 1:47:42horizontally now just for discussion's sake.
    • 1:47:44And this seems to be a pointer that isn't pointing anywhere yet.
    • 1:47:47But it looks like the array is 26 pointers, some of which are null,
    • 1:47:53that is empty.
    • 1:47:53Some of which are pointing at the first node in a linked list.
    • 1:47:56So that's really what a hash table might be in your mind.
    • 1:47:59An amalgam of an array, whose elements are linked lists.
    • 1:48:03And in theory, this gives you the best of both worlds, right.
    • 1:48:06You get random access with high probability, right.
    • 1:48:09You get to jump immediately to the location you want to put someone.
    • 1:48:12But, if you run into this perverse situation where there's someone
    • 1:48:15already there, OK, fine.
    • 1:48:16It starts to devolve into a linked list, but it's at least 26
    • 1:48:20smaller length lists.
    • 1:48:21Not one massive linked list, which would be Big O of n.
    • 1:48:24And quite slow to solve.
    • 1:48:26So if Harry gets inserted in Hagrid.
    • 1:48:28Yeah, you have to chain them together, so to speak, in this way.
    • 1:48:32But, at least you've not painted yourself into a corner.
    • 1:48:35And in fact, if we fast forward and put a whole bunch of familiar names in,
    • 1:48:38the data structure starts to look like this.
    • 1:48:41So the chains not terribly long.
    • 1:48:43And some of them are actually of size 0 because there's just
    • 1:48:46some unpopular letters of the alphabet among these names.
    • 1:48:49But it seems better than just putting everyone
    • 1:48:51in one big array, or one big linked list.
    • 1:48:53We're trying to balance these trade offs a little bit in the middle here.
    • 1:48:58Well, how might we represent something like this?
    • 1:49:00Here's how we could describe this thing.
    • 1:49:02A node in the context of a linked list could be this.
    • 1:49:05I have an array called word of type char.
    • 1:49:08And it's big enough to fit the longest word in the alphabet plus 1.
    • 1:49:13And the plus 1 why, probably?
    • 1:49:14AUDIENCE: The null.
    • 1:49:15SPEAKER 1: The null character.
    • 1:49:16So I'm assuming that longest word is like a constant defined elsewhere
    • 1:49:19in the story.
    • 1:49:20And it's something big like 40, 100, whatever.
    • 1:49:22Whatever the longest word in the Harry Potter universe
    • 1:49:25is or the English dictionary is.
    • 1:49:28Longest word plus 1 should be sufficient to store any name in the story here.
    • 1:49:34And then, what else does it each of these nodes have?
    • 1:49:36Well it has a pointer to another node.
    • 1:49:40So here's how we might implement the notion of a node
    • 1:49:42in the context of storing not integers, but names.
    • 1:49:46Instead, like this.
    • 1:49:48But how do we decide what the hash table itself is?
    • 1:49:51Well, if we now have a definition of a node, we could have a variable in main,
    • 1:49:55or even globally, called hash table.
    • 1:49:57That itself is an array of node* pointers.
    • 1:50:02That is an array of pointers to nodes.
    • 1:50:05The beginnings of linked lists.
    • 1:50:07Number of buckets is to me.
    • 1:50:08I proposed, verbally, that it be 26.
    • 1:50:11But honestly, if you get a lot of collisions, so to speak.
    • 1:50:13A lot of H names trying to go to the same place.
    • 1:50:15Well, maybe, we need to be smarter and not just look
    • 1:50:17at the first letter of their name.
    • 1:50:19But, maybe, the first and the second.
    • 1:50:20So it's H-A and H-E. But wait, no, then Harry and Hagrid still collide.
    • 1:50:24But we start to at least make the problem a little less
    • 1:50:27impactful by tinkering with something like the number of buckets
    • 1:50:31in a hash table like this.
    • 1:50:32But how do we decide where someone goes in a hash table in this way?
    • 1:50:37Well, it's an old school problem of input and output.
    • 1:50:39The input to the problem is going to be something like the name.
    • 1:50:43And the algorithm in the middle, as of today,
    • 1:50:45is going to be something called a hash function.
    • 1:50:47A hash function is generally something that
    • 1:50:49takes as input, a string, a number, whatever, and produces
    • 1:50:53as output a location in our context.
    • 1:50:55Like a number 0 through 25.
    • 1:50:57Or 0 through 16,000.
    • 1:50:59Or whatever the number of buckets you want is,
    • 1:51:02it's going to just tell you where to put that input at a specific location.
    • 1:51:06So, for instance, Albus, according to the story thus far, gave me back to 0
    • 1:51:10as output.
    • 1:51:10Zacharias gave me 25.
    • 1:51:12So the hash function, in the middle of that black box,
    • 1:51:15is pretty simplistic in this story.
    • 1:51:17It's just looking at the Ascii value, it seems, of the first letter
    • 1:51:21in their name.
    • 1:51:22And then, subtracting off what capital A is 65.
    • 1:51:25So like doing some math to get back in number between 0 and 25.
    • 1:51:29So that's how we got to this point in the story.
    • 1:51:32And how might we, then, resolve the problem further and use
    • 1:51:37this notion of hashing more generally?
    • 1:51:39Well just for demonstration sake here, here's
    • 1:51:40actually some buckets, literally.
    • 1:51:43And we've labeled, in advance, these buckets with the suits
    • 1:51:46from a deck of cards.
    • 1:51:47So we've got some spades.
    • 1:51:49And we've got diamonds here.
    • 1:51:54And we've got, what else here?
    • 1:51:58Clubs and hearts.
    • 1:52:01So we have a deck of cards here, for instance, right.
    • 1:52:04And this is something you, yourself, might do instinctively
    • 1:52:07if you're getting ready to start playing a game of cards.
    • 1:52:09You're just cleaning up or you want things in order.
    • 1:52:11Like, here is literally a jumbo deck of cards.
    • 1:52:13What would be the easiest way for me to sort these things?
    • 1:52:16Well we've got a whole bunch of sorting algorithms from the past.
    • 1:52:19So I could go through like, here's the 3 of diamonds.
    • 1:52:21And I could, here let me throw this up on the screen.
    • 1:52:23Just so, if you're far in back.
    • 1:52:25So here's diamonds.
    • 1:52:27I could put this here.
    • 1:52:283, 4.
    • 1:52:30I could do this in order here.
    • 1:52:32But a lot of us, honestly, if given a deck of cards.
    • 1:52:34And you just want to clean it up and sort it in order,
    • 1:52:37you might do things like this.
    • 1:52:38Well here's my input, 3 of diamonds, let's put it in this bucket.
    • 1:52:424 of diamonds, this bucket.
    • 1:52:435 of diamonds, this bucket.
    • 1:52:45And if you keep going through the cards, here's seven of hearts, hearts bucket.
    • 1:52:498's bucket.
    • 1:52:51Queen of spades over here.
    • 1:52:53And it's still going to take you 52 steps.
    • 1:52:55But at the end of it, you have hashed all of the cards
    • 1:52:58into 4 distinct buckets.
    • 1:52:59And now you have problems of size 13, which
    • 1:53:02is a little more tenable than doing one massive 52 card problem.
    • 1:53:06You can now do 4, 13 size problems.
    • 1:53:08And so hashing is something that even you and I might do instinctively.
    • 1:53:11Taking as input some card, some name, and producing as output some location.
    • 1:53:16A temporary pile in which you want to stage things, so to speak.
    • 1:53:21But these collisions are inevitable.
    • 1:53:24And honestly, if we kept going through the Harry Potter universe,
    • 1:53:27some of these chains would get longer, and longer and longer.
    • 1:53:29Which means that instead of getting someone's name quickly,
    • 1:53:33by searching for them or inserting them, might
    • 1:53:36start taking a decent amount of time.
    • 1:53:37So what could we do instead to resolve situations like this?
    • 1:53:40If the problem, fundamentally, is that the first letter is just too darn
    • 1:53:44popular, H, we need to take in more input.
    • 1:53:47Not just the first letter but maybe the first 2 letters.
    • 1:53:49So if we do that, we can go from A through Z
    • 1:53:52to something more extreme like maybe H-A, H-B, H-C, H-D, H-F, and so forth.
    • 1:53:59So that now Harry and Hermione end up at different locations.
    • 1:54:02But, darn it, Hagrid still collides with Harry.
    • 1:54:05So it's better than before.
    • 1:54:07The chains aren't quite as long.
    • 1:54:09But the problem isn't fundamentally gone.
    • 1:54:11And in this case here, anyone know how many buckets we just
    • 1:54:14increased to, if we now look at not just a through Z but AA through ZZ, roughly?
    • 1:54:22AUDIENCE: 26 squared.
    • 1:54:24SPEAKER 1: Yeah.
    • 1:54:24OK, good.
    • 1:54:25So the easy answer to 26 squared are 676.
    • 1:54:28So that's a lot more buckets.
    • 1:54:30And this is why I only showed a few of them on the screen.
    • 1:54:33So that's a lot more.
    • 1:54:33And it spreads things out in particular.
    • 1:54:37What if we take this one step further?
    • 1:54:38Instead of H-A, we do like H-A-A, H-A-B, H-A-C, H-Z-Z, and so forth.
    • 1:54:44Well now, we have an even better situation.
    • 1:54:46Because Hermoine has her one spot.
    • 1:54:48Harry has his one spot.
    • 1:54:49Hagrid has his one spot.
    • 1:54:51But there's a trade off here.
    • 1:54:53The upside is now, arithmetically, we can find their locations
    • 1:54:57in constant time.
    • 1:54:58Maybe, technically 3 steps.
    • 1:55:00But 3 is constant, no matter how many other names are in here, it would seem.
    • 1:55:03But what's the downside here?
    • 1:55:07Sorry, say again.
    • 1:55:07AUDIENCE: Memory.
    • 1:55:08SPEAKER 1: Memory.
    • 1:55:09So significantly more.
    • 1:55:10We're now up to 17,576 buckets, which itself isn't that big a deal, right.
    • 1:55:15Computers have a lot of memory these days.
    • 1:55:17But as you can infer, I can't really think
    • 1:55:21of someone whose name started with H-E-Q, for instance, in the Harry
    • 1:55:26Potter universe.
    • 1:55:26And if we keep going, definitely don't know of anyone
    • 1:55:29whose name started with Z-Z-Z or A-A-A. There's
    • 1:55:32a lot of not useful combinations that have to be there mathematically,
    • 1:55:37so that you can do a bit of math and jump to randomly, so to speak,
    • 1:55:41the precise location.
    • 1:55:42But they're just going to be empty.
    • 1:55:43So it's a very sparsely populated array, so to speak.
    • 1:55:47So what does that really mean for performance, ultimately?
    • 1:55:50Well let's consider, again, in the context of our Big O notation.
    • 1:55:53It turns out that a hash table, technically speaking,
    • 1:55:56is still just going to give us Big O of n in the worst case.
    • 1:56:00Why?
    • 1:56:01If you have some crazy perverse case where everyone in the universe
    • 1:56:04has a name that starts with A, or starts with H, or starts with Z,
    • 1:56:07you just get really unlucky.
    • 1:56:09And your chain is massively long.
    • 1:56:11Well then, at that point, it's just a linked list.
    • 1:56:13It's not a hash table.
    • 1:56:14It's like the perverse situation with the tree, where
    • 1:56:16if you insert it without any mind for keeping it balance, it just evolves.
    • 1:56:22But there's a difference here between a theoretical performance
    • 1:56:26and an actual performance.
    • 1:56:28If you look back at the the hash table here,
    • 1:56:31this is absolutely, in practice, going to be faster than a single linked list.
    • 1:56:37Mathematically, asymptotically, big O notation, sure.
    • 1:56:40It's all the same.
    • 1:56:41Big O of n.
    • 1:56:42But if what we're really caring about is real humans using our software,
    • 1:56:46there's something to be said for crafting a data structure.
    • 1:56:48That technically, if this data were uniformly distributed,
    • 1:56:51is 26 times faster than a linked list alone.
    • 1:56:55And so, there's this tension too between systems, types of CS,
    • 1:57:00and theoretical CS.
    • 1:57:01Where yeah, theoretically, these are all the same.
    • 1:57:03But in practice, for making real-world software,
    • 1:57:06improving this speed by a factor of 26 in this case, let alone 576 or more,
    • 1:57:12might actually make a big difference.
    • 1:57:14But there's going to be a trade off.
    • 1:57:15And that's typically some other resource like giving up more space.
    • 1:57:19All right.
    • 1:57:20How about another data structure we could build.
    • 1:57:23Let me fast forward to something here called a trie.
    • 1:57:26So a trie, a weird name in pronunciation.
    • 1:57:28Short for retrieval, pronounced trie typically.
    • 1:57:31A trie is a tree that actually gives us constant time lookup,
    • 1:57:37even for massive data sets.
    • 1:57:41What do I mean by this?
    • 1:57:42In the world of a trie, you create a tree out of arrays.
    • 1:57:47So we're really getting into the Frankenstein territory
    • 1:57:49of just building things up with spare parts of data structures
    • 1:57:52that we have here.
    • 1:57:53But the root of a trie is, itself, an array.
    • 1:57:56For instance, of size 26.
    • 1:57:58Where each element in that trie points to another node,
    • 1:58:04which is to say another array.
    • 1:58:06And each of those locations in the array represents a letter
    • 1:58:09of the alphabet like A through Z.
    • 1:58:10So for instance, if you wanted to store the names of the Harry Potter universe,
    • 1:58:14not in a hash table, not in a linked list, not in a tree, but in a trie.
    • 1:58:19What you would do is hash on every letter in the person's name one
    • 1:58:23at a time.
    • 1:58:24So a trie is like a multi-tier hash table, in a sense.
    • 1:58:28Where you first look at the first letter,
    • 1:58:29then the second letter, then the third, and you do the following.
    • 1:58:32For instance, each of these locations represents a letter A
    • 1:58:35through Z. Suppose I wanted to insert someone's name into this
    • 1:58:39that starts with the letter H, like Hagrid for instance.
    • 1:58:43Well, I go to the location H. I see it's null,
    • 1:58:46which means I need to malloc myself another node or another array.
    • 1:58:49And that's depicted here.
    • 1:58:50Then, suppose I want to store the second letter in Hagrid's name,
    • 1:58:54an A. So I go to that location in the second node.
    • 1:58:57And I see, OK, it's currently null.
    • 1:58:58There's nothing below it.
    • 1:58:59So I allocate another node using malloc or the like.
    • 1:59:02And now I have H-A-G. And I continue this with R-I-D.
    • 1:59:06And then, when I get to the bottom of this person's name,
    • 1:59:10I just have to indicate here in color, but probably
    • 1:59:12with a Boolean value or something.
    • 1:59:14Like a true value that says, a name stops here.
    • 1:59:18So that it's clear that the person's name is not H-A, or H-A-G, or H-A-G-R,
    • 1:59:23or H-A-G-R-I. It's H-A-G-R-I-D. And the D is green,
    • 1:59:28just to indicate there's like some other Boolean value that just says, yes.
    • 1:59:31This is the node in which the name stops.
    • 1:59:35And if I continue this logic, here's how I might insert someone like Harry.
    • 1:59:40And here's how I might insert someone like Hermione.
    • 1:59:43And what's interesting about the design here is that some of these names
    • 1:59:48share a common prefix.
    • 1:59:49Which starts to get compelling because you're reusing space.
    • 1:59:52You're using the same nodes for names like H-A-G and H-A-R
    • 1:59:57because they share H and an A in common.
    • 2:00:00And they all share an H in common.
    • 2:00:02So you have this data structure now that, itself, is a tree.
    • 2:00:06Each node in the tree is, itself, an array.
    • 2:00:10And we, therefore, might implement this thing using code like this.
    • 2:00:13Every node is containing, I'll do it in reverse order, an array.
    • 2:00:19I'll call it children because that's what it really represents.
    • 2:00:21Up to 26 children for each of these nodes.
    • 2:00:24Size of the alphabet.
    • 2:00:25So I might have used just a constant for number 26,
    • 2:00:28to give myself 26 letters of the alphabet.
    • 2:00:30And each of those arrays stores that many node stars.
    • 2:00:34That many pointers to another node.
    • 2:00:36And here's an example of the Bool.
    • 2:00:38This is what I represented in green on the slide a moment ago.
    • 2:00:40I also need another piece of data.
    • 2:00:42Just a 0 or 1, a true or false, that says yes.
    • 2:00:45A name stops in this node or it's just a path to the rest of the person's name.
    • 2:00:50But the upside of this is that the height of this tree
    • 2:00:55is only as tall as the person's longest name.
    • 2:00:58H-A-G-R-I-D or H-E-R-M-O-I-N-E. And notice that no matter how many other
    • 2:01:04people are in this data structure, there's 3 at the moment,
    • 2:01:08if there were 3 million, it would still take me how many steps to search
    • 2:01:13for Hermoine?
    • 2:01:14H-E-R-M-I-O-N-E. So, 8 steps total.
    • 2:01:19No matter if there's 2 other people, 2 million, 10 million other people.
    • 2:01:24Because the path to her name is always on the same path.
    • 2:01:28And if you assume that there's a maximum limit on the length of names
    • 2:01:33in the human world.
    • 2:01:34Maybe it's 40, 100, whatever.
    • 2:01:36Whatever the longest name in the world is.
    • 2:01:38That's constant.
    • 2:01:39Maybe it's 40, 100, but that's constant.
    • 2:01:41Which is to say that with a trie, technically speaking,
    • 2:01:44it is the case that your lookup time, Big O of n, a big O notation,
    • 2:01:49would be big O of 1.
    • 2:01:51It's constant time, because unlike every other data structure
    • 2:01:54we've looked at, with a trie, the amount of time it takes you to find one person
    • 2:01:59or insert one person is completely independent of how
    • 2:02:02many other pieces of data are already in the data structure.
    • 2:02:07And this holds true even if one name is a prefix of another.
    • 2:02:09I don't think there was a Daniel or Danielle in the Harry Potter universe
    • 2:02:13that I could think of.
    • 2:02:14But, D-A-N-I-E-L could be one name.
    • 2:02:18And, therefore, we have a true there in green.
    • 2:02:20And if there's a longer name like Danielle.
    • 2:02:22Then, you keep going until you get to the E.
    • 2:02:24So you can still have with a trie, one name that's
    • 2:02:27a substring of another name.
    • 2:02:29So it's not as though we've created a problem there.
    • 2:02:32That, too, is still possible.
    • 2:02:34But at the end of the day, it only takes a finite number of steps
    • 2:02:36to find any of these people.
    • 2:02:38And again, that's what's particularly compelling.
    • 2:02:41That you effectively have constant time lookup.
    • 2:02:43So that's amazing, right.
    • 2:02:44We've gone through this whole story for weeks now of like, linear time.
    • 2:02:48And then, it went up to n squared.
    • 2:02:49And then, log n.
    • 2:02:50And now constant time, what's the price paid for a data structure like this?
    • 2:02:55This so-called trie?
    • 2:02:58What's the downside here?
    • 2:02:59There's got to be a catch.
    • 2:03:01And in fact, tries are not actually used that often,
    • 2:03:03amazing as they might sound on some CS level here.
    • 2:03:07AUDIENCE: Memory.
    • 2:03:08SPEAKER 1: Memory.
    • 2:03:09In what sense?
    • 2:03:10AUDIENCE: Much like a [INAUDIBLE].
    • 2:03:12SPEAKER 1: Exactly.
    • 2:03:13If you're storing all of these darn arrays
    • 2:03:15it's, again, a sparsely populated data structure.
    • 2:03:18And you can see it here.
    • 2:03:19Granted there's only 3 names, but most of those boxes, most of those pointers,
    • 2:03:23are going to remain null.
    • 2:03:25So this is an incredibly wide data structure, if you will.
    • 2:03:28It uses a huge amount of memory to store the names.
    • 2:03:31But again, you've got to pick a lane.
    • 2:03:32Either you're going to minimize space or you're going to minimize time.
    • 2:03:35It's not really possible to get truly the best of both worlds.
    • 2:03:39You have to decide where the inflection point is
    • 2:03:41for the device you're writing software for, how much memory it has,
    • 2:03:44how expensive it is.
    • 2:03:45And again, taking all of these things into account.
    • 2:03:48So lastly, let's do one further abstraction.
    • 2:03:51So even higher level to discuss something that are generally
    • 2:03:54known as abstract data structures.
    • 2:03:56It turns out we could spend like all day,
    • 2:03:58all week, talking about different things we
    • 2:04:00could build with these data structures.
    • 2:04:01But for the most part, now that we have arrays.
    • 2:04:03Now that we have linked lists or their cousin's trees, which
    • 2:04:06are 2-dimensional.
    • 2:04:07And beyond that, there's even graphs, where
    • 2:04:09the arrows can go in multiple directions, not just down, so to speak.
    • 2:04:12Now that we have this ability to stitch things together,
    • 2:04:14we can solve all different types of problems.
    • 2:04:16So, for instance, a very common type of data structure
    • 2:04:20to use in a program, or even our human world, are things called queues.
    • 2:04:24A queue being a data structure like a line outside of a store.
    • 2:04:28Where it has what's called a FIFO property.
    • 2:04:30First In, First Out.
    • 2:04:32Which is great for fairness, at least in the human world.
    • 2:04:34And if you've ever waited outside of Tasty Burger, or Salsa Fresca,
    • 2:04:38or some other restaurant nearby, presumably,
    • 2:04:40if you're queuing up at the counter, you want
    • 2:04:43them store to maintain a FIFO system.
    • 2:04:46First in and first out.
    • 2:04:47So that whoever's first in line gets their food first and gets out first.
    • 2:04:51So a queue is actually a computer science term, too.
    • 2:04:54And even if you're still in the habit of printing things on paper,
    • 2:04:57there are things you might have heard called printer
    • 2:04:59queues, which also do things in order.
    • 2:05:02The first person to send their essay to the printer
    • 2:05:04should, ideally, be printed before the last person
    • 2:05:06to send their essay to the printer.
    • 2:05:08Again, in the interest of fairness.
    • 2:05:10But how can you implement a queue?
    • 2:05:12Well, you typically have to implement 2 fundamental operations,
    • 2:05:15enqueue and dequeue.
    • 2:05:16So adding something to it and removing something from it.
    • 2:05:19And the interesting thing here is that how do you implement a queue?
    • 2:05:23Well in the human world, you would just have literally physical space
    • 2:05:26for humans to line up from left to right, or right to left.
    • 2:05:29Same in a computer.
    • 2:05:30Like a printer queue, if you send a whole bunch of jobs to be printed,
    • 2:05:33a whole bunch of essays or documents, well, you
    • 2:05:35need a chunk of memory like an array.
    • 2:05:37All right.
    • 2:05:37Well, if you use an array, what's a problem
    • 2:05:40that could happen in the world of printing, for instance?
    • 2:05:43If you use an array to store all of the documents that need to be printed.
    • 2:05:47AUDIENCE: It can be filled.
    • 2:05:48SPEAKER 1: It could be filled, right.
    • 2:05:49So if the programmer decided, HP or whoever makes the printer decides,
    • 2:05:53oh, you can send like a megabyte worth of documents to this printer at once.
    • 2:05:56At some point you might get an error message,
    • 2:05:58which says, sorry out of memory.
    • 2:06:00Wait a few minutes.
    • 2:06:00Which is maybe a reasonable solution, but a little annoy.
    • 2:06:03Or HP could write code that maybe dynamically resizes the array
    • 2:06:07or so forth.
    • 2:06:07But at that point, maybe they should just use a linked list.
    • 2:06:10And they could.
    • 2:06:11So there, too, you could implement the notion of a queue
    • 2:06:14using a linked list instead.
    • 2:06:16You're going to spend more memory, but you're not
    • 2:06:18going to run out of space in your array.
    • 2:06:20Which might be more compelling.
    • 2:06:22This happens even in the physical world.
    • 2:06:24You go to the store and you start having to line up outside and down the road.
    • 2:06:27And like, for a really busy store, they run out of space so they make do.
    • 2:06:31But in that case, it tends to be more of an array just because
    • 2:06:34of the physical notion of humans lining up.
    • 2:06:36But there's other data structures, too.
    • 2:06:38If you've ever gone to the dining hall and picked up like a Harvard or Yale
    • 2:06:41tray, you're typically picking up the last tray that was just cleaned,
    • 2:06:46not the first tray that was cleaned.
    • 2:06:48Why?
    • 2:06:49Because these cafeteria trays stack up on top of each other.
    • 2:06:53And indeed a stack is another type of abstract data structure.
    • 2:06:56In the physical world, it's literally something physical
    • 2:06:58like a stack of trays.
    • 2:07:01Which have what we would call a LIFO property.
    • 2:07:03Last In, First Out.
    • 2:07:05So as these things come out of the washer,
    • 2:07:07they're putting the most recent ones on the top.
    • 2:07:09And then you, the human, are probably taking the most recently cleaned one.
    • 2:07:13Which means in the extreme, no one on campus
    • 2:07:15might ever use that very first tray.
    • 2:07:19Which is probably fine in the world of trays,
    • 2:07:21but would really be bad in the world of Tasty Burger lining up for food if LIFO
    • 2:07:24were the property being implemented.
    • 2:07:26But here, too, it could be an array.
    • 2:07:28It could be a linked list.
    • 2:07:29And you see this, honestly, every day.
    • 2:07:31If you're using Gmail and your Gmail inbox.
    • 2:07:33That is actually a stack, at least by default,
    • 2:07:36where your newest message last in are the first ones
    • 2:07:39at the top of the screen.
    • 2:07:40That's a LIFO data structure.
    • 2:07:42And it means that you see your most recent emails.
    • 2:07:44But if you have a busy day, you're getting a lot of emails,
    • 2:07:47it might not be a good thing.
    • 2:07:48Because now you're ignoring the people who wrote you
    • 2:07:50way earlier in the day or the week.
    • 2:07:53So LIFO and FIFO are just properties that you
    • 2:07:55can achieve with these very specific types of data structures.
    • 2:07:58And the parliaments in the world of stacks
    • 2:08:00is to push something onto a stack or pop something out.
    • 2:08:03These are here, for instance, as an example of why
    • 2:08:06might you always wear the same color.
    • 2:08:07Well, if you're storing all of your clothes in a stack,
    • 2:08:09you might not ever get to the different colored
    • 2:08:11clothes at the bottom of the list.
    • 2:08:12And in fact, to paint this picture, we have a couple of minute video here.
    • 2:08:17Just to paint this here, made by a faculty member elsewhere.
    • 2:08:20Let's go ahead and dim the lights for just a minute or 2 here.
    • 2:08:23So that we can take a look at Jack learning some facts.
    • 2:08:27[VIDEO PLAYING]
    • 2:08:28SPEAKER 2: Once upon a time, there was a guy named Jack.
    • 2:08:31When it came to making friends Jack did not have the knack.
    • 2:08:34So Jack went to talk to the most popular guy he knew.
    • 2:08:37He went up to Lou and asked, what do I do?
    • 2:08:40Lou saw that his friend was really distressed.
    • 2:08:42Well, Lou began, just look how you're dressed.
    • 2:08:45Don't you have any clothes with a different look?
    • 2:08:48Yes, said Jack.
    • 2:08:49I sure do.
    • 2:08:50Come to my house and I'll showed them to you.
    • 2:08:52So they went off the Jack's.
    • 2:08:54And Jack showed Lou the box, where he kept all his shirts, and his pants,
    • 2:08:57at his socks.
    • 2:08:58Lou said, I see you have all your clothes in a pile.
    • 2:09:01Why don't you wear some others once in a while?
    • 2:09:04Jack said, well, when I remove clothes and socks,
    • 2:09:07I wash them and put them away in the box.
    • 2:09:10Then comes the next morning and up I hop.
    • 2:09:12I go to the box and get my clothes off the top.
    • 2:09:15Lou quickly realized the problem with Jack.
    • 2:09:18He kept clothes, CDs, and books in a stack.
    • 2:09:21When he'd reached for something to read or to wear,
    • 2:09:23he chose a top book or underwear.
    • 2:09:26Then when he was done he would put it right back.
    • 2:09:28Back it would go on top of the stack.
    • 2:09:31I know the solution, said a triumphant Lou.
    • 2:09:33You need to learn to start using a queue.
    • 2:09:36Lou took Jack's clothes and hung them in a closet.
    • 2:09:39And when he had emptied the box, he just tossed it.
    • 2:09:42Then he said, now Jack, at the end of the day, put your clothes on the left
    • 2:09:45when you put them away.
    • 2:09:47Then tomorrow morning when you see the sunshine, get
    • 2:09:50your clothes from the right, from the end of the line.
    • 2:09:52Don't you see, said Lou, it will be so nice.
    • 2:09:55You'll wear everything once before you wear something twice.
    • 2:09:59And with everything in queues in his closet and shelf,
    • 2:10:02Jack started to feel quite sure of himself.
    • 2:10:04All thanks to Lou and his wonderful queue.
    • 2:10:09SPEAKER 1: So just to help you realize that these things are everywhere.
    • 2:10:12[AUDIENCE CLAPPING]
    • 2:10:14Even in our human world.
    • 2:10:16If you've ever lined up at this place.
    • 2:10:18Anyone recognize this?
    • 2:10:19OK, so sweetgreen, little salad place in the square.
    • 2:10:22This is if you order online or in advance,
    • 2:10:24your food ends up according to the first letter in your name.
    • 2:10:27Which actually sounds awfully reminiscent of something
    • 2:10:29like a hash table.
    • 2:10:30And in fact, no matter whether you implement a hash table like we
    • 2:10:33did, with an array and linked list.
    • 2:10:35Or with 3 shelves like this.
    • 2:10:37This is actually an abstract data type called a dictionary.
    • 2:10:40And a dictionary, just like in our human world, has keys and values.
    • 2:10:43Words and their definitions.
    • 2:10:45This just has letters of the alphabet and salads as their value.
    • 2:10:49But here, too, there's a real world constraint.
    • 2:10:52In what kind of scenario does this system at sweetgreen
    • 2:10:55devolve into a problem, for instance?
    • 2:10:58Because they, too, are using only finite space, finite storage.
    • 2:11:02What could go wrong?
    • 2:11:03Yeah.
    • 2:11:03AUDIENCE: Run out of space.
    • 2:11:04SPEAKER 1: Yeah.
    • 2:11:04If they run out of space on the shelf and there's
    • 2:11:05a lot of people whose names start with D, or E, or whatever.
    • 2:11:08And so, they just pile up.
    • 2:11:09And then, maybe, they kind of overflow into the E's or the F's.
    • 2:11:11And they probably don't really care because any human
    • 2:11:13is going to come by, and just eyeball it, and figure it out anyway.
    • 2:11:16But in the world of a computer, you're the one coding
    • 2:11:18and have to be ever so precise.
    • 2:11:20We thought we would lastly do one final thing here.
    • 2:11:24In advance, we prepared a linked list of sorts in the audience.
    • 2:11:28Since this has become a bit of a thing.
    • 2:11:29I am starting to represent the beginning of this linked list.
    • 2:11:32And so far as I have a pointer here with seat location G9.
    • 2:11:37Whoever is in G9, would you mind standing up?
    • 2:11:40And what letter is on your sheet there?
    • 2:11:43AUDIENCE: F15.
    • 2:11:44SPEAKER 1: OK, so you have S15 and your letter--
    • 2:11:46AUDIENCE: F15.
    • 2:11:47SPEAKER 1: Say again?
    • 2:11:48AUDIENCE: F.
    • 2:11:48SPEAKER 1: F15.
    • 2:11:49So I see you're holding a C in your node.
    • 2:11:51You are pointing to, if you could physically, F15.
    • 2:11:55F15, what do you hold?
    • 2:11:56AUDIENCE: S.
    • 2:11:57SPEAKER 1: You have an S. And who should you be pointing at?
    • 2:12:00AUDIENCE: F5.
    • 2:12:01SPEAKER 1: F5.
    • 2:12:01Could you stand up, F5.
    • 2:12:03You're holding a 5, I see.
    • 2:12:04What address?
    • 2:12:06AUDIENCE: F12.
    • 2:12:07SPEAKER 1: F12.
    • 2:12:08Big finale.
    • 2:12:08F12, if you'd like to stand up holding a 0 and null, which means that was CS50.
    • 2:12:13[AUDIENCE CLAPPING]
    • 2:12:16All right.
    • 2:12:17We'll see you next time.
    • 2:12:19[MUSIC PLAYING]
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video