CS50 Video Player
    • 🧁

    • 🍪

    • 🍈

    • 🍿
    • 0:00:00Introduction
    • 0:00:50Week 3 Recap
    • 0:05:55memory.c
    • 0:19:48ddb50
    • 0:21:40Week 3 Recap, continued
    • 0:31:07struct.h
    • 0:32:37struct0.c
    • 0:35:03Linked Lists
    • 0:45:22typedef
    • 0:50:23Linked Lists on Stage
    • 1:02:24list0.c
    • 1:08:44list1.c
    • 1:25:26list2.c
    • 1:40:21Hash Tables
    • 1:50:05Trees
    • 1:54:50Tries
    • 0:00:00[MUSIC PLAYING]
    • 0:00:50DAVID J. MALAN: All right, this is CS50 and this is a lecture four.
    • 0:00:54So we're here in beautiful Lowell Lecture Hall
    • 0:00:56and Sanders is in use today.
    • 0:00:57And we're joined by some friends that will soon
    • 0:00:59be clear and present in just a moment.
    • 0:01:02But before then, recall that last time we took a look at CS50 IDE.
    • 0:01:06This was a new web-based programming environment similar in spirit
    • 0:01:08to CS50 Sandbox and CS50 Lab, but added a few features.
    • 0:01:12For instance, what features did it add to you--
    • 0:01:16to your capabilities?
    • 0:01:17Yeah?
    • 0:01:18AUDIENCE: Debugger.
    • 0:01:19DAVID J. MALAN: What's that?
    • 0:01:19AUDIENCE: The debugger.
    • 0:01:20DAVID J. MALAN: The debugger.
    • 0:01:20So debug50, which opens that side panel that
    • 0:01:23allows you to step through your code, step by step, and see variables.
    • 0:01:26Yeah?
    • 0:01:26AUDIENCE: Check50.
    • 0:01:27DAVID J. MALAN: Sorry, say again?
    • 0:01:28AUDIENCE: Check50.
    • 0:01:28DAVID J. MALAN: Check50 as well, which is a CS50 specific tool that
    • 0:01:31allows you to check the correctness of your code
    • 0:01:33much like the teaching fellows would when providing feedback on it.
    • 0:01:36Running a series of tests that pretty much are
    • 0:01:38the same tests that a lot of the homework's
    • 0:01:40will encourage you yourself to run manually,
    • 0:01:42but it just automates the process.
    • 0:01:43And anything else?
    • 0:01:44AUDIENCE: [INAUDIBLE]
    • 0:01:51DAVID J. MALAN: So that is true too.
    • 0:01:52There's a little hidden Easter egg that we don't use this semester,
    • 0:01:55but yes indeed.
    • 0:01:56If you look for a small puzzle piece, you
    • 0:01:58can actually convert your C code back to Scratch like puzzle pieces
    • 0:02:02and back and forth, and back to forth, thanks to Kareem and some of the team.
    • 0:02:06So that is there, but by now, it's probably better
    • 0:02:08to get comfortable with text as well.
    • 0:02:09So there's a couple of our other tools that we've
    • 0:02:11used over time of course besides check50 and debug50.
    • 0:02:15We've of course used printf and when is printf useful?
    • 0:02:20Like when might you want to use it beyond needing to just print something
    • 0:02:24because the problem set tells you to.
    • 0:02:25Yeah?
    • 0:02:25AUDIENCE: To find where your bug is.
    • 0:02:27DAVID J. MALAN: Yeah, so to find where your bug is.
    • 0:02:29If you just, kind of, want to print out variables, value or some kind of text
    • 0:02:32so you know what's going on and you don't necessarily
    • 0:02:35want to deploy debug50, you can do that.
    • 0:02:37When else?
    • 0:02:37AUDIENCE: If you have a long formula for something [INAUDIBLE]
    • 0:02:41and you want to see [INAUDIBLE].
    • 0:02:43DAVID J. MALAN: Good.
    • 0:02:44Yeah.
    • 0:02:44AUDIENCE: How running-- like going through debug50 50 times.
    • 0:02:48DAVID J. MALAN: Indeed.
    • 0:02:49Well, in real life-- so you might want to use printf
    • 0:02:51when you have maybe a nested loop, and you want to put a printf inside loop
    • 0:02:55so as to see when it kicks in.
    • 0:02:57Of course, you could use debug50, but you
    • 0:02:59might end up running debug50 or clicking next, next, next, next, next, next,
    • 0:03:02next so many times that gets a little tedious.
    • 0:03:04But do keep in mind, you can just put a breakpoint deeper into your code
    • 0:03:08as well and perhaps remove an earlier breakpoint as well.
    • 0:03:10And honestly, all the time, whether it's in C or other languages,
    • 0:03:13do I find myself occasionally using printf just to type out printf in here
    • 0:03:18just so that I can literally see if my code got to a certain point in here
    • 0:03:22to see if something's printed.
    • 0:03:23But the debugger you're going to find now
    • 0:03:25and hence forth so much more powerful, so much more versatile.
    • 0:03:28So if you haven't already gotten to the habit of using debug50 by all
    • 0:03:31means start and use those breakpoints to actually walk through your code
    • 0:03:34where you care to see what's going on.
    • 0:03:36So style50, of course, checks the style of your code much like the teaching
    • 0:03:39fellows might, and it shows you in red or green
    • 0:03:41what spaces you might want to delete, what spaces you might
    • 0:03:44want to add just to pretty things up.
    • 0:03:45So it's more readable for you and others.
    • 0:03:47And then what about help50?
    • 0:03:49When should you instinctively reach for help50?
    • 0:03:52AUDIENCE: When you don't understand an error message.
    • 0:03:55DAVID J. MALAN: Exactly.
    • 0:03:56Yeah, when you don't understand an error message.
    • 0:03:58So you're compiling something.
    • 0:03:59You're running a command.
    • 0:03:59It doesn't really quite work and you're seeing a cryptic error message.
    • 0:04:02Eventually, you'll get the muscle memory and the sort of exposure
    • 0:04:05to just know, oh, I remember what that means.
    • 0:04:07But until then, run help50 at the beginning of that same command,
    • 0:04:10and it's going to try to detect what your error is
    • 0:04:13and provide TF-like feedback on how to actually work around that.
    • 0:04:17You'll see two on the course's website is a wonderful handout made
    • 0:04:22by Emily Hong, one of our own teaching fellows,
    • 0:04:24that introduces all of these tools, and a few more,
    • 0:04:26and gets you into the habit of thinking about things.
    • 0:04:29It's kind of a flow chart.
    • 0:04:30If I have this problem, then do this or else
    • 0:04:32if I have this problem do this other thing.
    • 0:04:34So to check that out as well.
    • 0:04:36But today, let's introduce really the last, certainly for C,
    • 0:04:39of our command line tools that's going to help
    • 0:04:41you chase down problems in your code.
    • 0:04:44Last week, recall that we had talked about memory a lot.
    • 0:04:47We talked about malloc, allocating memory,
    • 0:04:49and we talked about freeing memory and the like.
    • 0:04:51But it turns out, you can do a lot of damage
    • 0:04:53when you start playing with memory.
    • 0:04:55In fact, probably by now, almost everyone-- segmentation fault?
    • 0:04:58[LAUGHTER]
    • 0:04:59Yeah, so that's just one of the errors that you might run into,
    • 0:05:02and frankly, you might have errors in your code now
    • 0:05:06and hence forth that have bugs but you don't even realize it
    • 0:05:08because you're just getting lucky.
    • 0:05:10And the program is just not crashing or it's not freezing,
    • 0:05:13but this can still happen.
    • 0:05:14And so Valgrind is a command line program that is probably
    • 0:05:17looks the scariest of the tools we've used,
    • 0:05:19but you can also use it with help50, that
    • 0:05:21just tries to find what are called memory leaks in your program.
    • 0:05:24Recall that last week we introduced malloc,
    • 0:05:26and malloc lets you allocate memory.
    • 0:05:28But if you don't free that memory, by literally calling the free function,
    • 0:05:32you're going to constantly ask your operating system, MacOS, Linux,
    • 0:05:35Windows, whatever, can I have more memory?
    • 0:05:37Can I have more memory?
    • 0:05:38Can I have more memory?
    • 0:05:39And if you never, literally, hand it back by calling free your computer
    • 0:05:42may very well slow down or freeze or crash.
    • 0:05:45And frankly, if you've ever had that happen on your Mac or PC, very likely
    • 0:05:49that's what some human accidentally did.
    • 0:05:50He or she just allocated more and more memory
    • 0:05:53but never really got around to freeing that memory.
    • 0:05:56So Valgrind can help you find those mistakes before you or your users do.
    • 0:05:59So let's do a quick example, let me go CS50 IDE, and let me go ahead
    • 0:06:04and make one new program here.
    • 0:06:07We'll call it memory.c because we'll see later today how
    • 0:06:10I might chase down those memory leaks.
    • 0:06:12But for now, let's start with something even simpler, which all of you
    • 0:06:15may be done by now, which is to accidentally touch memory
    • 0:06:18that you shouldn't, changing it, reading it and let's see what this might mean.
    • 0:06:21So let me do the familiar at the top here.
    • 0:06:25Include standard IO.
    • 0:06:28Well, let's not even do that yet.
    • 0:06:30Let's just do this first.
    • 0:06:31Let's do int, main(void), just to start a simple program
    • 0:06:34and in here let me go ahead and just call a function called f.
    • 0:06:38I don't really care what its name is for today.
    • 0:06:40I just want to call a function f, and then that's it.
    • 0:06:43Now this function f, let me go ahead and define it as follows, void f(void).
    • 0:06:47It's not going to do much of anything at all.
    • 0:06:50But let's suppose, just for the sake of discussion, that f's purpose in life
    • 0:06:53is just to allocate memory for whatever useful purpose,
    • 0:06:55but for now it's just for demonstration's sake.
    • 0:06:58So what's the function with which you can allocate memory?
    • 0:07:01AUDIENCE: Malloc.
    • 0:07:02DAVID J. MALAN: Malloc.
    • 0:07:03So suppose I want malloc space for, I don't know,
    • 0:07:06something simple like just one integer.
    • 0:07:08We're just doing this for demonstration purposes,
    • 0:07:10or actually let's do more, 10 integers, 10 integers.
    • 0:07:13I could, of course, do-- well, give me 10, but how many bytes do what I want?
    • 0:07:17How many bytes do I need for 10 integers?
    • 0:07:19AUDIENCE: sizeof(int).
    • 0:07:20DAVID J. MALAN: Yeah, so I can do literally sizeof(int)
    • 0:07:23and most likely the size of an int is going to be?
    • 0:07:28AUDIENCE: Four.
    • 0:07:28DAVID J. MALAN: Four, probably.
    • 0:07:30On many systems today, it's just 4 bytes or 32 bits,
    • 0:07:32but you don't want to hard code that lest someone else's computer not use
    • 0:07:36those same values.
    • 0:07:37So the size of an int.
    • 0:07:38So 10 times the size of an int.
    • 0:07:39Malloc returns what type of data?
    • 0:07:42What does that hand me back?
    • 0:07:45AUDIENCE: [INAUDIBLE]
    • 0:07:46DAVID J. MALAN: Yeah, returns an address or a pointer.
    • 0:07:49Specifically, the address, 100, 900, whatever, of the chunk of memory
    • 0:07:53it just allocated for you.
    • 0:07:55So if I want to keep that around, I need to declare a pointer.
    • 0:07:58Let's just call it x for today that stores that address.
    • 0:08:01Could call it x, y, z, whatever, but it's not an int that it's returning.
    • 0:08:04It's the address of an int.
    • 0:08:05And remember, that's what the star operator now means.
    • 0:08:07The address of some data type.
    • 0:08:10It's just a number.
    • 0:08:11All right, so now if I were to--
    • 0:08:14first, let's clean this up.
    • 0:08:15Turns out that you use malloc, I need to use stdlib.h.
    • 0:08:20We saw that last week, albeit briefly, and then of course
    • 0:08:22if I'm going to call f, what do I have to do to fix this code?
    • 0:08:26AUDIENCE: You need to declare.
    • 0:08:27DAVID J. MALAN: Yeah, I need to declare it up here,
    • 0:08:29or I could just move f's implementation up top.
    • 0:08:32So I think this works, even though this program at the moment
    • 0:08:34is completely stupid.
    • 0:08:35It doesn't do anything useful, but it will allocate memory.
    • 0:08:38And I'll do something with it as follows.
    • 0:08:41If I want to change the first value in this chunk of memory,
    • 0:08:45well how might I do that?
    • 0:08:46Well, I've asked the computer for 10 integers or rather space
    • 0:08:51for 10 integers.
    • 0:08:52What's interesting about malloc is that when
    • 0:08:54it returns a chunk of memory for you it's contiguous, back-to-back.
    • 0:08:58And when you hear contiguous or back-to-back,
    • 0:09:00what kind of data structure does that recall to mind?
    • 0:09:03AUDIENCE: An array.
    • 0:09:04DAVID J. MALAN: An array.
    • 0:09:05So it turns out we can treat this just random chunk of memory
    • 0:09:09like it's an array.
    • 0:09:10So if we want to go to the first location in that array of memory,
    • 0:09:14I can just do this and put in the number say 50.
    • 0:09:18Or if I want to go to the next location, I can do this.
    • 0:09:21Or if I want to do the next location, I can do this.
    • 0:09:24Or if I want to go to the last location, I might do this,
    • 0:09:27but is that good or bad?
    • 0:09:32AUDIENCE: Bad.
    • 0:09:32DAVID J. MALAN: Why bad?
    • 0:09:33AUDIENCE: It's-- it's out of bounds
    • 0:09:35DAVID J. MALAN: Yeah, so it's out of bounds.
    • 0:09:36Right?
    • 0:09:37This is sort of week one style mistakes when it came to loops.
    • 0:09:39Recall, with for loops or while loops, you might go a little too far,
    • 0:09:42and that's fine.
    • 0:09:43But now we actually will see we have a tool that
    • 0:09:45can help us notice these things.
    • 0:09:47So hopefully, just visually, it's apparent that what I have going on here
    • 0:09:50is just-- on line 12, I have a variable x
    • 0:09:54that storing the address of that chunk of memory.
    • 0:09:56And then on line 13, I'm just trying to access location 10
    • 0:10:00and set the value 50 there.
    • 0:10:02But as you note, there is no location 10.
    • 0:10:04There's location 0, 1, 2, 3, all the way through 9, of course.
    • 0:10:08So how might we detect this with a program?
    • 0:10:10Well, let me go ahead and increase my terminal window just a bit
    • 0:10:13here, save my file, and let me go ahead and compile make memory.
    • 0:10:17OK, all is well.
    • 0:10:18It compiled without any error messages, and now
    • 0:10:20let me go ahead and run memory, enter.
    • 0:10:24All right, so that worked pretty well.
    • 0:10:25Let's actually be a little more explicit here just for good measure.
    • 0:10:29Let me go ahead and print something out.
    • 0:10:30So printf, %i for an integer, and let's make it just more explicit.
    • 0:10:36You inputted %i and then comma x bracket 10.
    • 0:10:42And what do I have to include you use printf?
    • 0:10:46AUDIENCE: stdio.h.
    • 0:10:47DAVID J. MALAN: Yeah, so stdio.
    • 0:10:48So let's just quickly add that, stdio.h, save.
    • 0:10:51All right, let me recompile this, make memory, enter.
    • 0:10:55And now let me go ahead and do ./memory.
    • 0:10:59Huh?
    • 0:11:00Feels like it's a correct program.
    • 0:11:02And yet, for a couple of weeks now we've been claiming that mm-hmm,
    • 0:11:05don't do that.
    • 0:11:06Don't go beyond the boundaries of your array.
    • 0:11:09So how do we reconcile this?
    • 0:11:10Feels like buggy code or at least we've told you it's buggy code,
    • 0:11:13and yet it's working.
    • 0:11:16Yeah?
    • 0:11:16AUDIENCE: [INAUDIBLE]
    • 0:11:19DAVID J. MALAN: That's a good way of putting it.
    • 0:11:21AUDIENCE: It's still very similar.
    • 0:11:23We want that.
    • 0:11:23DAVID J. MALAN: OK.
    • 0:11:24AUDIENCE: So we can theoretically--
    • 0:11:27it just created a program.
    • 0:11:29DAVID J. MALAN: Yeah, and I think if I heard you correctly,
    • 0:11:32you said C doesn't scream if you go too far?
    • 0:11:35AUDIENCE: Yeah.
    • 0:11:35DAVID J. MALAN: Yeah, OK.
    • 0:11:36So that's a good way of putting it.
    • 0:11:37Like, you can get lucky in C. And you can
    • 0:11:39do something that is objectively, pedagogically, like technically wrong,
    • 0:11:43but the computer's not going to crash.
    • 0:11:45It's not going to freeze because you just get lucky.
    • 0:11:46Because often, for performance reasons, when
    • 0:11:48you allocate space for 10 integers, you're
    • 0:11:51actually going to get a chunk of memory back
    • 0:11:53that's a little bigger than you need.
    • 0:11:54It's just not safe to assume that it's bigger than you need,
    • 0:11:58but you might just get lucky.
    • 0:11:59And you might end up having more memory that you can technically get away
    • 0:12:02with touching or accessing or changing, and the computer's not going to notice.
    • 0:12:05But that's not safe because on someone else's Mac or PC,
    • 0:12:08their computer might just be operating a little bit differently than yours,
    • 0:12:11and bam, that bug is going to bite them and not you.
    • 0:12:13And those are the hardest, most annoying bugs to chase down as some of you
    • 0:12:16might have experienced.
    • 0:12:17Right?
    • 0:12:18It works on your computer but not a friends or vise versa.
    • 0:12:21These are the kinds of explanations for that.
    • 0:12:23So Valgrind can help us track down even these most subtle errors.
    • 0:12:26The program seems to be working.
    • 0:12:28Check50 or tools like it might even assume
    • 0:12:30that it's working because it is printing the right thing,
    • 0:12:32but let's take a look at what this program Valgrind thinks.
    • 0:12:36Let me increase the size of the terminal window here,
    • 0:12:38and go ahead and type in Valgrind ./memory.
    • 0:12:41So same program name ./memory but I'm prefixing it with the name Valgrind.
    • 0:12:47All right?
    • 0:12:47Unfortunately, Valgrind is really quite ugly,
    • 0:12:50and it prints out a whole bunch of stuff here.
    • 0:12:52So let's take a look.
    • 0:12:53At the very top, you'll see all these numbers on the left,
    • 0:12:56and that's just an unfortunate aesthetic.
    • 0:12:58But we do see some useful information.
    • 0:13:00Invalid read of size 4 and then it has these cryptic
    • 0:13:03looking letters and numbers.
    • 0:13:05What are those?
    • 0:13:07They're just addresses and hexadecimal.
    • 0:13:09It doesn't really matter what they are, but Valgrind
    • 0:13:11can tell us where the memory is that's acting up suspiciously.
    • 0:13:15You can then see next to that, that Valgrind is pointing
    • 0:13:18to function f on memory. c 15th line.
    • 0:13:21So that's perhaps helpful, and then main on line 8
    • 0:13:24because that's the function that was called.
    • 0:13:26So Valgrind is actually kind of nice in that it's showing us all the functions
    • 0:13:29that you called from bottom up, much like the stack from last week.
    • 0:13:33And so something's going wrong line 15, and if we go back to that,
    • 0:13:37let's see line 15 was--
    • 0:13:39well, sure enough.
    • 0:13:41I'm actually trying to access that memory location
    • 0:13:43and frankly I did it on line 14 as well.
    • 0:13:46So hopefully fixing one or both of those will address this issue.
    • 0:13:49And notice here, this frankly just gets overwhelming pretty quickly.
    • 0:13:54And then, oh, 40 bytes in one block are definitely lost in lost record.
    • 0:13:58I mean, this is the problem with Valgrind, honestly.
    • 0:14:00It was written some years ago, not particularly user friendly,
    • 0:14:03but that's fine we have a tool to address this.
    • 0:14:05Let me go ahead and rerun Valgrind with help50,
    • 0:14:09enter, and see if we can't just assist with this.
    • 0:14:13All right, so still the same amount of black and white input but down here now
    • 0:14:16help50 is noticing, oh, I can help you with an invalid write of size 4.
    • 0:14:21So it's still at the same location, but this time--
    • 0:14:23or rather same file, memory.c but line 14.
    • 0:14:26And we propose, looks like you're trying to modify 4 bytes of memory that
    • 0:14:30isn't yours, question mark.
    • 0:14:32Did you try to store something beyond the bounds of an array?
    • 0:14:34Take a closer look at line 14 of memory.c.
    • 0:14:37So hopefully, even though Valgrind's output is crazy esoteric,
    • 0:14:40at least that yellow output will point you toward, ah, line 14.
    • 0:14:43I'm indeed touching 4 bytes, an integer, that shouldn't be.
    • 0:14:48And so let's go ahead and fix this.
    • 0:14:49If I go into my program, and I don't do this.
    • 0:14:53Let's change it to location 9, and location 9 here and save.
    • 0:14:57Then let me go ahead and rerun Valgrind without help50.
    • 0:15:02All right, progress except--
    • 0:15:05oops.
    • 0:15:05Nope, no progress.
    • 0:15:07I skipped the step.
    • 0:15:08Yeah, I didn't recompile it.
    • 0:15:10A little puzzled why I saw the same thing.
    • 0:15:12So now let's rerun Valgrind and here it seems to be better.
    • 0:15:18So I don't see that same error message up
    • 0:15:20at the very top like we did before, but notice here, 40 bytes in one blocks.
    • 0:15:25OK, that was bad grammar in the program, but are definitely
    • 0:15:29lost in loss record 1 of 1.
    • 0:15:30So I still don't quite understand that.
    • 0:15:32No big deal.
    • 0:15:33Let's go ahead and run help50 and see what the second of two errors
    • 0:15:36apparently is here.
    • 0:15:38So here it's highlighting those lines.
    • 0:15:4040 bytes and one blocks are definitely lost, and looks like your program
    • 0:15:43leaked 40 bytes of memory.
    • 0:15:45Did you forget the free memory that you allocated with malloc?
    • 0:15:48Take a closer look at line 13 of memory.c.
    • 0:15:51So in this case line 13 indeed has a call to malloc.
    • 0:15:54So what's the fix for this problem?
    • 0:15:57AUDIENCE: Free.
    • 0:15:58DAVID J. MALAN: Per help50 or your own intuition?
    • 0:16:00What do I have to add to this program?
    • 0:16:02AUDIENCE: Free.
    • 0:16:03AUDIENCE: Free.
    • 0:16:03Yeah, free, and where does that go?
    • 0:16:07Right here.
    • 0:16:08So we can free the memory.
    • 0:16:11Why would this be bad?
    • 0:16:13AUDIENCE: [INAUDIBLE]
    • 0:16:17DAVID J. MALAN: Exactly.
    • 0:16:18We're freeing the memory, which is like saying to the operating system,
    • 0:16:20I don't need this anymore.
    • 0:16:21And yet, two lines later we're using it again and again.
    • 0:16:24So bad.
    • 0:16:24We didn't do that mistake last week, but you should only
    • 0:16:27be freeing memory when, literally, you're
    • 0:16:29ready to free it up and give it back, which should probably
    • 0:16:31be at the end of the program.
    • 0:16:33So let me go ahead and re-save this, Open, up my terminal window,
    • 0:16:36recompile it this time, and now, let me run Valgrind one last time
    • 0:16:40without help50.
    • 0:16:42And still a little verbose, but zero errors, from zero contexts.
    • 0:16:47That sounds pretty good.
    • 0:16:48And moreover, it also explicitly says, all heap blocks were freed.
    • 0:16:52And recall that the heap, is that chunk of memory
    • 0:16:54that we drew visually up here, which is where malloc takes memory from.
    • 0:16:58So, done.
    • 0:16:59So this is kind of the mentality with which
    • 0:17:02to have when approaching the correctness of your code.
    • 0:17:05Like, it's one thing to run sample inputs, or run the program like I did.
    • 0:17:08All looked well.
    • 0:17:09It's one thing to run tools like check50, which we humans wrote.
    • 0:17:12But we too are fallible, certainly, and we might not think of anything.
    • 0:17:15And thankfully, smart humans have made tools, that at first glance,
    • 0:17:18might be a little hard to use.
    • 0:17:19Like debug 50, as is Valgrind now.
    • 0:17:21But they ultimately help you get your code 100% correct
    • 0:17:24without you having to struggle visually over just staring at the screen.
    • 0:17:28And we see this a lot in office hours, honestly.
    • 0:17:30A lot of students, to their credit, sort of reasoning through, staring
    • 0:17:33at the screen, just trying to understand what's going wrong,
    • 0:17:35but they're not taking any additional input other than the characters
    • 0:17:38on the screen.
    • 0:17:39You have so many tools that can feed you more and more hints along the way.
    • 0:17:43So do acquire those instincts.
    • 0:17:46Any questions on this?
    • 0:17:48Yeah?
    • 0:17:48AUDIENCE: Sir, if you had a main function that took arguments.
    • 0:17:53Would you run Valgrind with those arguments as well?
    • 0:17:57DAVID J. MALAN: Yes, indeed.
    • 0:17:58So Valgrind works just like debug 50, just like help50.
    • 0:18:02If you have command line arguments, just run them as usual,
    • 0:18:05but prefix your command with Valgrind, or maybe even help50 Valgrind,
    • 0:18:08to help one with the other.
    • 0:18:10Good question.
    • 0:18:10Other thoughts?
    • 0:18:11Yeah?
    • 0:18:12AUDIENCE: Where does the data go [INAUDIBLE]??
    • 0:18:18DAVID J. MALAN: Good question.
    • 0:18:19So at the end of the day, think about what's
    • 0:18:21inside the computer, which is just something like this.
    • 0:18:24So physically, it's obviously still there.
    • 0:18:26It's just being treated by the operating system--
    • 0:18:29Mac, OS, Windows, Linux, whatever, as like a pool of memory.
    • 0:18:32We keep drawing it as a grid that looks a little something like this.
    • 0:18:36So the operating systems job is to just keep track of which of those squares
    • 0:18:40is in use, thanks to malloc.
    • 0:18:42And which has been freed.
    • 0:18:43And so you can think of it as having little check
    • 0:18:44marks next to them saying, this is in use, this is in use,
    • 0:18:47these others are not in use.
    • 0:18:48So they just go back on the so-called free list into that pool of memory.
    • 0:18:53Good question.
    • 0:18:53If you take a higher level course on operating systems in fact,
    • 0:18:56or CS61 or 161 at Harvard, you'll actually build these kinds of things
    • 0:19:00yourself.
    • 0:19:01And implement tools like, malloc, yourself.
    • 0:19:03Yeah?
    • 0:19:03AUDIENCE: So why did we have to allocate memory in this case, and what happens
    • 0:19:07[INAUDIBLE]?
    • 0:19:07DAVID J. MALAN: Good question.
    • 0:19:08Why did we have to allocate memory in this case?
    • 0:19:10We did not.
    • 0:19:11This was purely, as mentioned, for demonstration purposes.
    • 0:19:14If we had some program in which we wanted
    • 0:19:16to allocate some amount of memory, then this is how we might do it.
    • 0:19:20However, a cleaner way to do all of this,
    • 0:19:24would have been to say, hey, computer, give me 10 integers like this,
    • 0:19:29and not have to worry about memory management.
    • 0:19:31And that's where we began in week one, just using arrays on the stack,
    • 0:19:35so to speak.
    • 0:19:36Not using malloc at all.
    • 0:19:37So the point is only, that once you start using malloc, and free,
    • 0:19:40and memory more generally, you take on more responsibilities
    • 0:19:43than we did in week one.
    • 0:19:46Good question.
    • 0:19:47And the others?
    • 0:19:49All right.
    • 0:19:49So, turns out, there's one more tool, in all seriousness.
    • 0:19:53This is the thing.
    • 0:19:55[? DDB50. ?] So debug 50 is an allusion to a very popular tool called, GDB 50,
    • 0:20:01[? Gnu ?] debugger.
    • 0:20:02It's an older tool that you won't use at the command line,
    • 0:20:05but it's what makes debug 50 work.
    • 0:20:07Turns out, there's a thing.
    • 0:20:08And there's an actual Wikipedia article that you
    • 0:20:10might have clicked on in my email last night, called rubber duck debugging.
    • 0:20:14And frankly, you don't have to go as all out, as excessive, as we did here,
    • 0:20:18but the purpose of this technique, of rubber duck debugging,
    • 0:20:20is to keep, literally, like a rubber duck on your shelf, or on your desk.
    • 0:20:24And when you have a bug and you don't have the luxury of a teaching fellow,
    • 0:20:27or a roommate who took CS50, or a more technical friend who can help walk you
    • 0:20:31through your code, literally, start walking through your code
    • 0:20:34verbally, talking to the duck saying, well, online 2, I'm declaring main,
    • 0:20:39and on line 3, I'm allocating space for an array.
    • 0:20:42And then, on line 4, I'm calling-- ah!
    • 0:20:44That's what I'm doing wrong.
    • 0:20:46So if any of you have ever had that kind of moment, whether in office hours,
    • 0:20:49or alone, where you're either talking in your head,
    • 0:20:51or you're talking through your code to someone else.
    • 0:20:53And here, she doesn't even have to respond.
    • 0:20:55You just hear yourself saying the wrong thing, or having that aha moment.
    • 0:21:01You can approximate that by just keeping one of these little guys on your desk,
    • 0:21:05and have that conversation.
    • 0:21:06And it's actually not as crazy sounding as it actually is.
    • 0:21:09It's that process of just talking through your code logically,
    • 0:21:12step by step, in a way that you can't necessarily do in your own mind.
    • 0:21:15At least I can't.
    • 0:21:16When you hear yourself say something wrong,
    • 0:21:18or that didn't quite follow logically, bam, you
    • 0:21:20can actually have that aha moment.
    • 0:21:22So on the way out today, by all means, take any one of these ducks.
    • 0:21:25That took quite a long, time for [? Colten ?] to lay out today.
    • 0:21:28And we'll have more at office hours in the weeks to come, if you would like.
    • 0:21:31So some of you might recall such a duck from [? Currier ?] House
    • 0:21:35last year too, which was a cousin of his as well.
    • 0:21:38All right.
    • 0:21:39So that is rubber duck debugging.
    • 0:21:41Now, last week, recall that we began to take off training wheels.
    • 0:21:44We'd use for a few weeks, the CS50 library.
    • 0:21:46And that's kind of in the past now.
    • 0:21:47That was just a technique, a tool, via which
    • 0:21:50we could get user input a little more pleasantly, than if we actually
    • 0:21:53started dealing with memory early on.
    • 0:21:55And we revealed last week that a "string", quote, unquote,
    • 0:21:58is just what, underneath the hood in C?
    • 0:22:02Say again.
    • 0:22:04An array of characters.
    • 0:22:05And even more specifically, it's a synonym S-T-R-I-N-G for what actual
    • 0:22:10data type?
    • 0:22:12char star, as we've called it.
    • 0:22:14So a char star is just the computer scientists
    • 0:22:16way of describing a pointer to a character,
    • 0:22:19or rather the address of a character, which
    • 0:22:21is functionally equivalent to saying an array of memory, or sequence of memory.
    • 0:22:26But it's kind of the more precise, more technical way of describing it.
    • 0:22:29And so now that we know that we have char stars underneath the hood, well,
    • 0:22:33where is all of that coming from?
    • 0:22:34Well, indeed, it maps directly to that memory.
    • 0:22:36We keep pointing out that something like this is inside of your computer.
    • 0:22:40And we can think of the memory as just being chunks of memory,
    • 0:22:43all of whose bytes are numbered.
    • 0:22:450 on up to 2 gigabytes, or 2 billion, whatever the value might be.
    • 0:22:49But of course last week, we pointed out that you think about this memory
    • 0:22:52not as being hardware per se, but as just being this pool of memory that's
    • 0:22:56divided into different regions.
    • 0:22:58The very top of your computer's memory, so to speak,
    • 0:23:00is what we call the text segment.
    • 0:23:02And what goes in the text segment of your computer's memory
    • 0:23:05when you're running a program?
    • 0:23:08Text is like, poor choice of words, frankly, but what is it?
    • 0:23:12Say again.
    • 0:23:13AUDIENCE: File Headers?
    • 0:23:14DAVID J. MALAN: Not the file headers, in this case.
    • 0:23:16This is in the context of running a program, not necessarily saving a file.
    • 0:23:19Yeah?
    • 0:23:20AUDIENCE: String literals.
    • 0:23:21DAVID J. MALAN: Not string literals here,
    • 0:23:23but they're nearby, actually, in memory.
    • 0:23:25AUDIENCE: Functions.
    • 0:23:26DAVID J. MALAN: Functions, closer.
    • 0:23:27Yeah.
    • 0:23:28The text segment of your computer's memory
    • 0:23:31is where, when you double click a program to run it,
    • 0:23:33or in Linux, when you do dot flash something, to run it.
    • 0:23:37That's where the zeros and ones of your actual program, the machine code,
    • 0:23:41that we talked about in week zero, is just loaded into RAM.
    • 0:23:44So recall from last week, that, you know, anything physical in this world--
    • 0:23:48hard drives, solid state drives, is slow.
    • 0:23:51So those devices are slow, but RAM, the stuff we keep pulling up on the screen,
    • 0:23:55is relatively fast.
    • 0:23:56If only because it has no moving parts.
    • 0:23:57It's purely electronic.
    • 0:23:58So when you double click a program on your Mac or PC,
    • 0:24:01or do dot slash something in Linux, that is
    • 0:24:03loading from a slow device, your hard drive,
    • 0:24:05where the data is stored long term, into RAM or memory,
    • 0:24:09where it can run much more quickly and pleasurably in terms of performance.
    • 0:24:14And so, what does this actually mean for us?
    • 0:24:16Well, it's got to go somewhere.
    • 0:24:18We just decided, humans, years ago that it's
    • 0:24:20going to go at the top, so to speak, of this chunk of memory.
    • 0:24:22Below that though, are the more dynamic regions of memory--
    • 0:24:25the stack and the heap.
    • 0:24:27And we said this a moment ago, and last week as well, what goes on the heap?
    • 0:24:31Or who uses the heap?
    • 0:24:33AUDIENCE: Dynamic memory.
    • 0:24:34DAVID J. MALAN: Dynamic memory.
    • 0:24:36Any time you call malloc, you're asking the operating system
    • 0:24:38for memory from the so-called heap.
    • 0:24:40Anytime you call free, you're sort of conceptually putting it back.
    • 0:24:43Like, it's not actually going anywhere.
    • 0:24:45You're just marking it as available for other functions and variables to use.
    • 0:24:49The stack, meanwhile, is used for what?
    • 0:24:53AUDIENCE: Local variables.
    • 0:24:54DAVID J. MALAN: Local variables and any of your functions.
    • 0:24:56So main, typically takes a sliver of memory at the bottom.
    • 0:24:59If main calls another function, it gets a sliver of memory above that.
    • 0:25:03If that function calls one, it gets a sliver of memory above that.
    • 0:25:06So they each have their own different regions of memory.
    • 0:25:08But of course, these arrows, both pointing at each other,
    • 0:25:11doesn't seem like such a good design.
    • 0:25:13But the reality, is bad things can happen.
    • 0:25:16You can allocate so much memory that, bam, the stack overflows the heap.
    • 0:25:20Or the heap overflows the stack.
    • 0:25:22Thus was born websites like Stack Overflow, and the like.
    • 0:25:25But that's just a reality.
    • 0:25:26If you have a finite amount of memory, at some point,
    • 0:25:28something's going to break.
    • 0:25:30Or the computer's going to have to say, mm-mm, no more memory.
    • 0:25:32You're going to have to quit some programs, or close some files,
    • 0:25:35or whatnot.
    • 0:25:36So that was only to say that that's how the memory is laid out.
    • 0:25:38And we started to explore this by way of a few programs.
    • 0:25:42This one here-- it's a little dark here.
    • 0:25:44This one here, was a swap function.
    • 0:25:46Now it's even darker.
    • 0:25:48It was a swap function that actually did swap two values, A and B.
    • 0:25:54But it didn't actually work in the way we intended.
    • 0:25:57What was broken about this swap function last week?
    • 0:26:02Like, I'm pretty sure it worked.
    • 0:26:04And when our brave volunteer came up and swapped the orange juice and the milk,
    • 0:26:08that worked.
    • 0:26:08So like, the logic was correct, but the program itself did not work.
    • 0:26:14Why?
    • 0:26:14AUDIENCE: It changed the values of the copy variables.
    • 0:26:17DAVID J. MALAN: Exactly.
    • 0:26:17It changed values in the copies of the variable.
    • 0:26:20So recall, that when main was the function
    • 0:26:22we called, and it had two values, x and y, that chunk of memory was here.
    • 0:26:26That chunk of memory was here.
    • 0:26:28And it had like the numbers 1 and 2.
    • 0:26:29But when it called the swap function, that got its own chunk of memory.
    • 0:26:33So main was at the bottom, swap was above that.
    • 0:26:35It had its own chunks of memory called, a and b, which
    • 0:26:38initially, got the values 1 and 2.
    • 0:26:401 and 2 were indeed successfully swapped,
    • 0:26:42but that had no effect on x and y.
    • 0:26:44So we fixed that.
    • 0:26:45With the newer version of this program, of course,
    • 0:26:47it looked a lot more cryptic at first glance, but in English,
    • 0:26:50could someone just describe what it is that happens
    • 0:26:53in this example that was more correct?
    • 0:26:56Like, what does this program do line by line?
    • 0:26:58Yeah?
    • 0:26:59AUDIENCE: Instead of passing copies of the variables,
    • 0:27:01you pass pointers to their addresses.
    • 0:27:03DAVID J. MALAN: Exactly.
    • 0:27:04Instead of passing the values of the variables, thereby copying them,
    • 0:27:06it passes the addresses of those variables.
    • 0:27:09So that's like saying, I don't technically care where it is in memory,
    • 0:27:13but I do need to know that it is somewhere in memory.
    • 0:27:15So instead of passing an x in the number 1,
    • 0:27:18let's suppose that x is at location 100--
    • 0:27:20my go to example.
    • 0:27:21It's actually the number 100 that's going to go there.
    • 0:27:24And if y is at the location like, 104, well, it's
    • 0:27:27104 that's going to go there, which are not the values we want to swap,
    • 0:27:31but those are sort of like little maps, or breadcrumbs if you will,
    • 0:27:34that lead us to the right location.
    • 0:27:36So that when we execute this code, what we're ultimately
    • 0:27:39swapping in those three lines, is this and this, and all along the way,
    • 0:27:43recall, we're using a temporary variable there
    • 0:27:45that can be just thrown away after.
    • 0:27:48So that's what pointers allowed us to do.
    • 0:27:50And that's what allowed us to actually change values on the so-called stack,
    • 0:27:54even by calling on other function.
    • 0:27:58All right.
    • 0:27:59Any questions then, on where we left off last time with the stack and with swap?
    • 0:28:05No?
    • 0:28:07All right.
    • 0:28:07So recall we introduced Binky as well, who lost his head at one point,
    • 0:28:11but why?
    • 0:28:13What went horribly, horribly awry with this scene from last week's film
    • 0:28:16from Stanford?
    • 0:28:20Binky was doing everything correctly, right?
    • 0:28:22Like, moving values.
    • 0:28:2342 was successful.
    • 0:28:24And then, yeah?
    • 0:28:25AUDIENCE: He tried to dereference something that
    • 0:28:27wasn't pointing to any actual address.
    • 0:28:31DAVID J. MALAN: Exactly.
    • 0:28:32He tried to dereference a pointer, an address, that wasn't actually pointing
    • 0:28:36to a valid address.
    • 0:28:37Recall that this was the line in code in question that was unlucky and bad.
    • 0:28:41Star y, means, go to the address in y, and do something to it.
    • 0:28:45Set it equal to the number 13.
    • 0:28:47But the problem was, that in the code we looked at last week,
    • 0:28:50all we did at the start was say, hey, computer give me a pointer to an int,
    • 0:28:54and call it x.
    • 0:28:55Do the same, and call it y.
    • 0:28:58Allocate space and point x at it.
    • 0:29:02But we never did the same for y.
    • 0:29:04So whereas x contained, last week, the address of an actual chunk of memory,
    • 0:29:08thanks to malloc, what did y contain at that point in the story?
    • 0:29:12The yellow line there.
    • 0:29:16What did y contain?
    • 0:29:17What value?
    • 0:29:21AUDIENCE: Null.
    • 0:29:22DAVID J. MALAN: Null.
    • 0:29:23Maybe.
    • 0:29:24Maybe.
    • 0:29:25But it's not obvious because there's no mention of null in the program.
    • 0:29:28We might get lucky.
    • 0:29:29Null is just 0.
    • 0:29:30And sometimes we've seen that 0 are the default values in a program.
    • 0:29:33So maybe.
    • 0:29:34But I say, maybe, and I'm hedging why.
    • 0:29:37AUDIENCE: [INAUDIBLE].
    • 0:29:39DAVID J. MALAN: Yeah.
    • 0:29:40And it doesn't allocate-- well, allocate, is not quite the right word.
    • 0:29:42That suggests you are allocating actual memory.
    • 0:29:44It's a garbage value.
    • 0:29:45There's something there.
    • 0:29:46Right?
    • 0:29:47My Mac has been running for a few hours.
    • 0:29:48And your Macs, and PCs, and phones, are probably running all day long.
    • 0:29:51Or certainly when the lid is up.
    • 0:29:52And so, your memory is getting used, and unused, and used.
    • 0:29:55Like, lots of stuff is going on.
    • 0:29:57So your computer is not filled with all zeros or all ones.
    • 0:30:00If you look at it at some random point in the day,
    • 0:30:02it's filled with like bunches and bunches of zeros and ones
    • 0:30:05from previous programs that you quit long ago.
    • 0:30:07Windows you have in the background and the like.
    • 0:30:09So, the short of it is, when you're running
    • 0:30:11a program for the first time, that's been running now for some time,
    • 0:30:15it's going to get messy.
    • 0:30:16That big rectangle of memory is going to have some ones over here
    • 0:30:18some zeros over here and vise versa.
    • 0:30:21So they're garbage values, because those bytes have some values in them.
    • 0:30:26You just don't necessarily know what they are.
    • 0:30:28So the point is, you should never ever dereference a pointer
    • 0:30:31that you have not set yourself.
    • 0:30:33Maybe you will crash.
    • 0:30:35Maybe it won't crash.
    • 0:30:36Valgrind can help you find these things but sometimes.
    • 0:30:38But it's just not a safe operation.
    • 0:30:41And lastly, the last thing we introduced last week,
    • 0:30:43which will be the stepping stone for what problems we'll solve this week,
    • 0:30:46was struct.
    • 0:30:47So struck is kind of cool, in that you can design your own custom data
    • 0:30:52structures.
    • 0:30:53C is pretty limited out of the box, so to speak.
    • 0:30:55You only have chars and boules, and floats, and ints, and doubles,
    • 0:30:59and longs, and str--
    • 0:31:00well, we don't even have strings, per se.
    • 0:31:02So it doesn't really come with many features, like a lot of languages do.
    • 0:31:05Like Python, which we'll see in a few weeks.
    • 0:31:07So with struct in C, you have the ability
    • 0:31:09to solve some problems of your own.
    • 0:31:11For instance, with the struct, we can actually
    • 0:31:15start to implement our own features.
    • 0:31:19Or our own data types.
    • 0:31:20For instance, let me go up here.
    • 0:31:22And let me go ahead and create a file called say,
    • 0:31:25student, or rather destruct dot h.
    • 0:31:28So recall that dot h is a header file.
    • 0:31:30Thus far, you have used header files that other people made.
    • 0:31:33Like, CS50 dot h, and standard IO dot h, and standard [? lid ?] dot h,
    • 0:31:36but you can make your own.
    • 0:31:38Header files are just files that typically contain code that you
    • 0:31:41want to share across multiple programs.
    • 0:31:43And we'll see more of this in time.
    • 0:31:45So let me go ahead and just save this file.
    • 0:31:46And suppose that I want to represent a student in memory.
    • 0:31:50A student of course, is probably going to have what?
    • 0:31:54For instance, how about a string for their name,
    • 0:31:59a string for their dorm-- but string is kind of two weeks ago.
    • 0:32:02Lets call this char star.
    • 0:32:04And lets call name, char star.
    • 0:32:07And so you might want to associate like, multiple pieces of data with students.
    • 0:32:11Right?
    • 0:32:11And you don't want to have multiple variables, per se.
    • 0:32:13It would be nice to kind of encapsulate these together.
    • 0:32:14And recall at the very end of last week, we
    • 0:32:16saw this feature where you can define your own type,
    • 0:32:20with typedef, that is a structure itself.
    • 0:32:23And you can give it a name.
    • 0:32:25So in short, simply by executing this these lines of code,
    • 0:32:29you have just created your own custom data type.
    • 0:32:31It's now called student.
    • 0:32:32And every student in the world shall have, per this code, a name
    • 0:32:36and a dorm associated with them.
    • 0:32:38Now, why is this useful?
    • 0:32:39Well the program, we looked at the very end of last time looked
    • 0:32:42a little something like this.
    • 0:32:43Instruct zero dot c, we had the following,
    • 0:32:48I first allocated some amount of space for student.
    • 0:32:52I asked the user what's the enrollment in the class or whatnot?
    • 0:32:54That gives us an int.
    • 0:32:56And then, we allocated an array of type student, called students, plural.
    • 0:33:01This was an alternative, recall, to doing something
    • 0:33:04like this, string names enrollment, and string dorms enrollment.
    • 0:33:10Which would work.
    • 0:33:11You could have two separate arrays, and you'd just
    • 0:33:13have to remember that name zero and dorm zero is the same human.
    • 0:33:17But why do that if you can keep things together.
    • 0:33:19So with structs, we were able to do this.
    • 0:33:21Give me this many student structures, and call the whole array, students.
    • 0:33:27And the only new syntax we introduce to satisfy this goal, was what operator?
    • 0:33:34AUDIENCE: The dot.
    • 0:33:35DAVID J. MALAN: The dot.
    • 0:33:36Yeah.
    • 0:33:36So in the past, recall from like week two, we introduced arrays.
    • 0:33:40And arrays allow you to do square bracket notation.
    • 0:33:42So that is no different from a couple of weeks back.
    • 0:33:45But if your array is not storing just integers, or chars, or floats,
    • 0:33:49or whatever, it's actually storing a structure, like a student,
    • 0:33:53you can get at that student's name by literally just saying dot name.
    • 0:33:57And you can get at their dorm by doing dot dorm.
    • 0:33:59And then everything else is the same.
    • 0:34:01This is what's called, encapsulation.
    • 0:34:03And it's kind of like a fundamental principle of programming
    • 0:34:05where, if you have some real world entity, like a student,
    • 0:34:08and you want to represent students with code, yeah,
    • 0:34:11you can have a bunch of arrays that all have called names, dorms, emails, phone
    • 0:34:16numbers, but that just gets messy.
    • 0:34:18You can instead encapsulate all of that related Information about a student
    • 0:34:22into one data structure so that now you have, per week zero, an abstraction.
    • 0:34:27Like, a student is an abstraction.
    • 0:34:30And if we break that abstraction, what is a student actually?
    • 0:34:34Not in the real world, but in our code world here?
    • 0:34:37Student is an abstraction.
    • 0:34:39It's a useful word, all of us can kind of agree means something,
    • 0:34:41but technically, what does it apparently mean?
    • 0:34:45A student is actually a name in a dorm, which really kind of is
    • 0:34:48diminutive to everyone in this room, but we've distilled it in code
    • 0:34:52to just those two values.
    • 0:34:53So there we have encapsulation.
    • 0:34:55You're kind of encapsulating together multiple values.
    • 0:34:57And you're abstracting away just have a more useful term,
    • 0:35:00because no one is going to want to talk in terms of lines of code
    • 0:35:02to describe anything.
    • 0:35:04So, same topic as in the past.
    • 0:35:05So, now we have the ability to come up with our own custom data structures
    • 0:35:10it seems.
    • 0:35:10That we can store anything inside of them that we want.
    • 0:35:13So let's now see how poorly we've been designing
    • 0:35:16some things for the past few weeks.
    • 0:35:19So it turns out that much of the code, hopefully
    • 0:35:22we've been writing in recent weeks has been correct,
    • 0:35:25but we've been not necessarily designing solutions in the best way.
    • 0:35:28Recall that when we have this chunk of memory,
    • 0:35:30we've typically treated it as at most, an array.
    • 0:35:34So just a contiguous chunk of memory.
    • 0:35:35And thanks to this very simple mental model, do we get strings,
    • 0:35:39do we get arrays of students now.
    • 0:35:42But arrays aren't necessarily the best data structure in the world.
    • 0:35:45Like, what is a downside of an array if you've encountered ones thus far.
    • 0:35:52In C, what's a downside of an array?
    • 0:35:54Yeah?
    • 0:35:55AUDIENCE: [INAUDIBLE].
    • 0:35:58DAVID J. MALAN: Can or cannot?
    • 0:35:59AUDIENCE: Cannot.
    • 0:36:00DAVID J. MALAN: You cannot.
    • 0:36:00That is true.
    • 0:36:01So in C, you cannot mix data types inside of an array.
    • 0:36:05They must all be ints, they must all be chars, they must all be students.
    • 0:36:09It's a bit of a white lie because technically, you
    • 0:36:11can have something called a void star, and you can actually map-- but yes.
    • 0:36:15That is true though, strictly speaking-- cannot mix data types.
    • 0:36:18Though frankly, even though other languages let you do that,
    • 0:36:20it's not necessarily the best design decision.
    • 0:36:22But sure, a limitation.
    • 0:36:23Other thoughts.
    • 0:36:24Yeah?
    • 0:36:24AUDIENCE: The size cannot change.
    • 0:36:26DAVID J. MALAN: The size cannot change.
    • 0:36:27Let's focus on that one.
    • 0:36:28Because that's sort of even more constraining it would seem.
    • 0:36:32So if you want an array for, say, two values, what do you do?
    • 0:36:37Well, you can do something like int, x, bracket, 2, semi-colon.
    • 0:36:41And what does that actually give you inside of your computer's memory?
    • 0:36:44It gives you some chunk that we'll draw a rectangle.
    • 0:36:47This is location 0.
    • 0:36:48This is location 1.
    • 0:36:49Suppose that, oh, a few minutes later, you change your mind.
    • 0:36:52Oh, darn, I just took a--
    • 0:36:54I want to type in a third value, or I want
    • 0:36:56to add another student to the array.
    • 0:36:58Where do you put that?
    • 0:37:00Well, you don't.
    • 0:37:01If you want to add a third value to an array of size 2,
    • 0:37:04what's your only option in C?
    • 0:37:06AUDIENCE: You make a new array.
    • 0:37:08DAVID J. MALAN: You make a new array.
    • 0:37:09So literally.
    • 0:37:09And if this array had the number like 42,
    • 0:37:13and this had the number 13, the only way to add a third number is to allocate
    • 0:37:17a second array, copy the values into the same locations, 42, 13, and then,
    • 0:37:23we'll add another value, 50.
    • 0:37:25And then, so that you're not using up twice as much space
    • 0:37:28almost permanently, now you can sort of free somehow,
    • 0:37:31or stop using that chunk of memory.
    • 0:37:33So that's fine.
    • 0:37:34It's correct what we just did.
    • 0:37:35But what's the running time of that process?
    • 0:37:40Recall a couple of weeks ago, we started talking about efficiency and design.
    • 0:37:43What's the running time of resizing an array.
    • 0:37:47AUDIENCE: Too long.
    • 0:37:48DAVID J. MALAN: Say Again.
    • 0:37:49AUDIENCE: I said, too long.
    • 0:37:50DAVID J. MALAN: Too long.
    • 0:37:51Fair.
    • 0:37:53But let's be more precise.
    • 0:37:54Big o of-- big o of what?
    • 0:38:01AUDIENCE: N.
    • 0:38:02DAVID J. MALAN: N. What's n?
    • 0:38:03AUDIENCE: [INAUDIBLE].
    • 0:38:05DAVID J. MALAN: OK.
    • 0:38:05True.
    • 0:38:05But what does n represent?
    • 0:38:06AUDIENCE: [INAUDIBLE].
    • 0:38:08DAVID J. MALAN: Yeah.
    • 0:38:09So you don't actually have to not know.
    • 0:38:10It's just a general answer.
    • 0:38:11In this case, however long the array is, call it n.
    • 0:38:14It is that many steps to resize it into that plus 1.
    • 0:38:18Technically it's big o, over n, plus 1.
    • 0:38:20But recall in our discussion, "The big o notation," we just
    • 0:38:22ignore the smaller terms-- the plus 1s, the divided by 2s, the plus n.
    • 0:38:26We focus only on the most powerful term in the expression, which
    • 0:38:30is just n here.
    • 0:38:31So yes, if you have an array of size 2, and you resize it
    • 0:38:35into an array of size 3, or really, n plus 1, that's
    • 0:38:38going to take me roughly n steps.
    • 0:38:40Technically n plus 1 steps.
    • 0:38:41But n steps.
    • 0:38:42Ergo big o of n.
    • 0:38:44So it's a linear process.
    • 0:38:45So possible but not necessarily the fastest
    • 0:38:48thing because he literally had to move all those damn values around.
    • 0:38:51So what would be better than this?
    • 0:38:56And if you've programed before, you might have the right instincts already.
    • 0:38:59How do we solve this problem?
    • 0:39:04Yeah?
    • 0:39:05AUDIENCE: Would you allocate more memory at the end of the array?
    • 0:39:07DAVID J. MALAN: Reallocate more memory at the end of the array.
    • 0:39:10So it turns out c does have a function called, realloc.
    • 0:39:15Perfectly, if not obviously, named that reallocates memory.
    • 0:39:19And if you pass it, the address of a chunk of memory you've allocated,
    • 0:39:23and the operating system notices, oh, yeah you got lucky.
    • 0:39:26I've got more memory at the end of this array,
    • 0:39:28it will then allocate that additional RAM for you, and let you use it.
    • 0:39:32Or worst case, if there's nothing available at the end
    • 0:39:34of the array in memory, because it's being
    • 0:39:36used by something else in your program.
    • 0:39:38That's fine.
    • 0:39:39Realloc will take on the responsibility of creating another array somewhere
    • 0:39:44in memory, copying all of that data for you into it,
    • 0:39:48and returning the address of that new chunk of memory.
    • 0:39:51Unfortunately, that's still linear.
    • 0:39:53Yeah?
    • 0:39:53AUDIENCE: Is this all being done in the heap?
    • 0:39:55Or--
    • 0:39:55DAVID J. MALAN: This is all being done in the heap.
    • 0:39:57Malloc, and realloc, and free, all operate on the heap.
    • 0:40:00Yes.
    • 0:40:01So that is a solution, but it doesn't really speak to the efficiency.
    • 0:40:04Yeah?
    • 0:40:05AUDIENCE: Could you use linked list?
    • 0:40:06DAVID J. MALAN: Yeah.
    • 0:40:07What is a linked list?
    • 0:40:09Go ahead.
    • 0:40:10AUDIENCE: It's when you have an element that points to different elements.
    • 0:40:13DAVID J. MALAN: OK.
    • 0:40:14Points to other elements.
    • 0:40:15Yeah.
    • 0:40:15So let me speak to what's the fundamental issue here.
    • 0:40:18The fundamental problem is much like painting yourself into a corner,
    • 0:40:23so to speak, as the cliche goes.
    • 0:40:25With an array, you're deciding in advance how big the data structure is
    • 0:40:29and committing to it.
    • 0:40:30Well, what if you just do the opposite.
    • 0:40:32Don't do that.
    • 0:40:33If you want initially, room for just one value, say one integer,
    • 0:40:39only ask the computer for that.
    • 0:40:41Give me space for one integer and I'll put my number 42 in here.
    • 0:40:44And then, if and only if, you want a second integer,
    • 0:40:48do you ask the computer for a second integer.
    • 0:40:50And so the computer, as by a malloc, or whatnot, will give you another one
    • 0:40:54like, the number 13.
    • 0:40:55And if you want a third, just ask the same question of the operating system.
    • 0:40:58Each time just getting back one chunk of memory.
    • 0:41:02But there's a fundamental gotcha here.
    • 0:41:05There's always a trade off.
    • 0:41:06So yes, this is possible.
    • 0:41:08You can call malloc three times.
    • 0:41:10Each time asking for a chunk of memory of size 1, instead of size 3,
    • 0:41:13for instance.
    • 0:41:15But what's the price you pay?
    • 0:41:16Or what problem do we still need to solve?
    • 0:41:18Yeah?
    • 0:41:19AUDIENCE: They're not stored next to each other.
    • 0:41:20DAVID J. MALAN: Yeah.
    • 0:41:20They're not being stored next to each other.
    • 0:41:22So even though I can think of this as being the first element, the second,
    • 0:41:26and the third, you do not have, in this story, random access to elements.
    • 0:41:31And random access, ergo, random access memory, or RAM,
    • 0:41:35just means that arithmetically, like, mathematically, you
    • 0:41:38can jump to location 0, location 1, location 2, randomly, or in constant
    • 0:41:43time.
    • 0:41:43Just instantly.
    • 0:41:44Because if they're all back to back to back, all you have to do is like,
    • 0:41:47add 1, or add 4, or whatever to the address, and you're there.
    • 0:41:51But the problem is, if you're calling malloc again and again
    • 0:41:55and again, there's no guarantee that these things are even
    • 0:41:58going to be proximal to one another.
    • 0:42:00These second chunks of memory might end up--
    • 0:42:03if this is a big chunk of memory we've been talking about,
    • 0:42:06where the heaps up here, and the stacks down here--
    • 0:42:0942 might end up over here.
    • 0:42:11The next chunk of memory, 50, might end up over here.
    • 0:42:14The third chunk might end up over here.
    • 0:42:16So you can't just jump from location 0, to 1, to 2,
    • 0:42:19because you have to somehow remember where location 0, and 1, and 2, are.
    • 0:42:25So how do we solve this?
    • 0:42:27Even if you haven't programed before, like, what would a solution be here?
    • 0:42:33AUDIENCE: Somehow store [INAUDIBLE].
    • 0:42:35DAVID J. MALAN: OK.
    • 0:42:36Somehow storing the addresses of--
    • 0:42:38AUDIENCE: Of the [INAUDIBLE]
    • 0:42:40DAVID J. MALAN: All right.
    • 0:42:40So let's just suppose, for the sake of discussion, that this chunk of memory
    • 0:42:44ended up at location 100.
    • 0:42:45This one ended up at like 150.
    • 0:42:48This one ended up at like 475.
    • 0:42:51Whatever those values are.
    • 0:42:53It would seem that somehow or other I need to remember three values--
    • 0:42:56100, 150, and 475.
    • 0:43:00So where can I store that?
    • 0:43:01Well, it turns out, I can be a little clever but a little greedy.
    • 0:43:05I could say to malloc, you know what, every time I call you, don't just
    • 0:43:08give me space for an integer, give me space for an integer
    • 0:43:11plus the address of another integer.
    • 0:43:15So if you've ever kind of seen like popcorn strung together on a string,
    • 0:43:19or any kind of chain link fence where one link is linking to another.
    • 0:43:24We could create the equivalent of-- oops not that.
    • 0:43:29We could create the equivalent of this kind of picture,
    • 0:43:33where each of these squares, or nodes, we'll start calling them, kind of links
    • 0:43:38graphically to the other.
    • 0:43:39Well, we've seen these links, or these pointers,
    • 0:43:41literally arrows that are pointing implemented in code.
    • 0:43:44An arrow or a pointer is just an address.
    • 0:43:46So you know what?
    • 0:43:47We should just ask malloc not for enough space for just the number 42,
    • 0:43:53we should instead, ask for a little more memory in each of these squares,
    • 0:43:57making them pictorially rectangles now.
    • 0:44:00So that now, yes, we do have these arrows conceptually
    • 0:44:04pointing from one location to another.
    • 0:44:06But what values do I actually want to put in these new additional boxes?
    • 0:44:10AUDIENCE: The addresses of the next.
    • 0:44:12DAVID J. MALAN: The addresses of the next.
    • 0:44:13So they're like little breadcrumbs.
    • 0:44:15So in this box here, associated with the first value,
    • 0:44:18should be the address of my second value, 475.
    • 0:44:22Associated with my second value here, per the arrow--
    • 0:44:26and let me draw the arrow from the right place.
    • 0:44:28--from the arrow, should be the address 150, because that's the last.
    • 0:44:33And then, from this extra box, what should I put there?
    • 0:44:37Yeah?
    • 0:44:37AUDIENCE: Slash 0 or something?
    • 0:44:38DAVID J. MALAN: Yeah.
    • 0:44:39So probably, the equivalent of slash 0, which in the world of pointer's recall,
    • 0:44:43is null.
    • 0:44:44So just a special value that means that's it, this is the end of the line.
    • 0:44:47That still leaves us with room to add a fourth value and point to it,
    • 0:44:51but it for now, signifies very clearly to us there's nothing actually there.
    • 0:44:56So what did we just do?
    • 0:44:58We created a list of values 50, oh sorry 42, 50, 13,
    • 0:45:03but we linked to them together.
    • 0:45:04First, pictorially, with just arrows.
    • 0:45:06Like any human might with a piece of chalk.
    • 0:45:08But technically in code, we could do this
    • 0:45:10by just storing addresses in each of these places.
    • 0:45:14So just to be clear then, what might this actually translate to in code?
    • 0:45:19Well, what if I proposed this.
    • 0:45:22In code, we might do something like this.
    • 0:45:28If we want to store an integer.
    • 0:45:29We're of course, going to need to store like int n, we'll call it.
    • 0:45:32n will represent 42, or 50, or 13.
    • 0:45:35But if we want to create a data structure,
    • 0:45:37we might want to start giving this data structure a name.
    • 0:45:39I called it, a moment ago, node, which is a CS term for a node in a linked
    • 0:45:44list, so to speak.
    • 0:45:45And it looks like this.
    • 0:45:46So typedef means, give me my own type.
    • 0:45:48Struct means, make it a structure, like a student was.
    • 0:45:51And then, node, which is going to be the name of this thing.
    • 0:45:53And I'll explain in a moment why I have the word node twice this time.
    • 0:45:57But I left room on the board for just one more line.
    • 0:46:01In addition to an int, called n, or whatever,
    • 0:46:06I need to somehow represent in code, the additional memory
    • 0:46:09that I want malloc to give me for the address.
    • 0:46:11So first of all, these are addresses of what data types?
    • 0:46:14Each of those three new boxes.
    • 0:46:16AUDIENCE: [INAUDIBLE].
    • 0:46:17DAVID J. MALAN: They are the addresses of integers in that point in the story.
    • 0:46:21But technically, what is this box really pointing to?
    • 0:46:26Is it pointing specifically to the ints?
    • 0:46:29AUDIENCE: [INAUDIBLE].
    • 0:46:30DAVID J. MALAN: It's pointing to that whole chunk of memory, if you will.
    • 0:46:33So if you start thinking about each of these rectangles as being a node,
    • 0:46:37and each of the arrows as pointing to another node,
    • 0:46:39we need to somehow express, I need to somehow store a pointer to a node.
    • 0:46:45In other words, each of these arrows needs to point to another node.
    • 0:46:48And in code, we could say this.
    • 0:46:51Right?
    • 0:46:52Like, let's give it a name.
    • 0:46:53Instead of n, which is the number, let's call it next.
    • 0:46:55So next, shall be the name of this field that points to the next node in memory.
    • 0:46:59And node star, what does that mean in English, if you will?
    • 0:47:04AUDIENCE: [INAUDIBLE].
    • 0:47:05DAVID J. MALAN: Say again?
    • 0:47:06AUDIENCE: Pointing to an address.
    • 0:47:07DAVID J. MALAN: Pointing to an address.
    • 0:47:08Right?
    • 0:47:08It looks different.
    • 0:47:09Node is a new word today and that's fine.
    • 0:47:11But node star, just means a pointer to a node.
    • 0:47:14The address of a node.
    • 0:47:15And it turns out that this is a custom structure
    • 0:47:18so we actually have to say this.
    • 0:47:20But it's the same principle even though things are kind of escalating quickly
    • 0:47:23here, we just need to values, an int, and then, a pointer to another thing.
    • 0:47:29That other thing is going to be another node.
    • 0:47:31And we're just using a node, frankly, to encapsulate two values--
    • 0:47:35an int and a pointer.
    • 0:47:36And the way you express in C, albeit somewhat cryptically,
    • 0:47:39a pointer, or one of those arrows, is you say give me a variable called next,
    • 0:47:43have it point to a structure called node.
    • 0:47:47Or rather, have it be the address of a structure of type node.
    • 0:47:51Yeah?
    • 0:47:52AUDIENCE: How can you [? reveal ?] the timing of struct node [INAUDIBLE]??
    • 0:48:00DAVID J. MALAN: Good question.
    • 0:48:01So this feels like a circular kind of definition because I'm defining a node,
    • 0:48:06and yet, inside of a node is a node.
    • 0:48:08That is OK because of the star.
    • 0:48:11It is necessary in C--
    • 0:48:14remember that C always is kind of read top to bottom.
    • 0:48:18So accordingly, this very first line of code here, typedef struct note,
    • 0:48:22at that point in the story, when clang has read that line,
    • 0:48:25it knows that a phrase, struct node, exists.
    • 0:48:28AUDIENCE: That's why you say nodes [INAUDIBLE]..
    • 0:48:30DAVID J. MALAN: Exactly.
    • 0:48:32Exactly.
    • 0:48:32We didn't need to do this with students because there were
    • 0:48:34no pointers involved to other students.
    • 0:48:36But yes, in this case.
    • 0:48:37So in short, this tells clang, hey, clang, give me a structure called node.
    • 0:48:42And then, in here, we say, hey, clang, each of those nodes
    • 0:48:45shall have two things, an integer called n,
    • 0:48:47and a pointer to another one of these data structures of type node,
    • 0:48:52and call the whole thing, node.
    • 0:48:55It's a bit of a mouthful.
    • 0:48:56But all this is, is the following.
    • 0:48:58Let me go ahead and erase all of this.
    • 0:49:00All this data type is--
    • 0:49:03if we get rid of the picture we draw on the fly there.
    • 0:49:07--is this says, hey, clang, give me a data structure
    • 0:49:10that pictorially looks like this.
    • 0:49:12It's divided into two parts.
    • 0:49:14The first part is called n, the second type is called, next.
    • 0:49:18This data type is of type int.
    • 0:49:20This is a pointer to another such node.
    • 0:49:24And that's it.
    • 0:49:24Even though the code looks complex, the idea is exactly that.
    • 0:49:28Yeah?
    • 0:49:29AUDIENCE: [INAUDIBLE]?
    • 0:49:31Why do you have to say struct node again?
    • 0:49:34DAVID J. MALAN: Good question.
    • 0:49:37The reason is, as just came up a moment ago, clang
    • 0:49:42and C, in general, are kind of dumb.
    • 0:49:43They just read code top to bottom.
    • 0:49:45And the problem is, you have to declare the name of this structure
    • 0:49:49as being a struct node before you actually use it.
    • 0:49:52It's similar in spirit to our discussion of prototypes-- y functions need
    • 0:49:55to be mentioned way up top.
    • 0:49:57This just says to clang, give me a type called struct node.
    • 0:50:00You don't know what it's going to look like yet.
    • 0:50:02But I'll finish my thought later.
    • 0:50:05And then in here, we're just telling clang, inside of that node
    • 0:50:08should be an integer, as well as, a pointer to the very type of thing
    • 0:50:12I'm in the middle of defining.
    • 0:50:14But if I had left off the word node up there, and just said struct,
    • 0:50:17you couldn't do that because it hasn't seen the word N-O-D-E yet.
    • 0:50:21That's all.
    • 0:50:22Other questions?
    • 0:50:24All right.
    • 0:50:25So if I now have a data structure called node,
    • 0:50:29I can use it to kind of stitch together these linked lists.
    • 0:50:32And maybe just the very things a little bit,
    • 0:50:34and to start giving away some ducks, would folks
    • 0:50:37be comfortable with volunteering to solve a problem here?
    • 0:50:40Yeah?
    • 0:50:41OK.
    • 0:50:41Come on up.
    • 0:50:421, 2--
    • 0:50:44AUDIENCE: [INAUDIBLE].
    • 0:50:45DAVID J. MALAN: Sure.
    • 0:50:46Or you can take a duck and run.
    • 0:50:48OK.
    • 0:50:481, 2, how about 3?
    • 0:50:49Come on over here, 3.
    • 0:50:51So if you want to be our first pointer, you can be number 5.
    • 0:50:54Come on over here.
    • 0:50:55You want to be number 9.
    • 0:50:57And one more.
    • 0:50:58One more volunteer.
    • 0:50:59Come on over here.
    • 0:51:00Yeah.
    • 0:51:01All right.
    • 0:51:02So-- I'll meet you over here.
    • 0:51:08OK, 17.
    • 0:51:10All right.
    • 0:51:10So if you'd like to--
    • 0:51:11just so we pick this up for those following along at home.
    • 0:51:14If you would like to just say hello to the audience.
    • 0:51:16ANDREA: Hi, I'm Andrea.
    • 0:51:17[? COMEY: ?] Hi, [? I'm Comey. ?]
    • 0:51:19[? KYONG: ?] Hi, [? I'm Kyong. ?]
    • 0:51:21SPEAKER 2: Hi, I'm [INAUDIBLE].
    • 0:51:22DAVID J. MALAN: Wonderful.
    • 0:51:24OK.
    • 0:51:24If you wouldn't mind all just taking a big step back over the ducks,
    • 0:51:26just so that we're a little farther back.
    • 0:51:28Let's go ahead and do this.
    • 0:51:29If you're our first pointer, if you could come over here for instance,
    • 0:51:33and just stand outside the ducks.
    • 0:51:34And if you guys could come a little over here in front is still fine.
    • 0:51:37So here we have the makings of a linked list.
    • 0:51:40And what's your name again?
    • 0:51:41[? COMEY: ?] [? Comey. ?]
    • 0:51:42DAVID J. MALAN: [? Comey ?] is our first pointer if you will.
    • 0:51:45Via [? Comey's ?] variable are we just going
    • 0:51:47to keep track of the first element of the linked list.
    • 0:51:49So if you could, with your left hand, represent first.
    • 0:51:52Just point over at-- what was your name again?
    • 0:51:54ANDREA: Andrea.
    • 0:51:55DAVID J. MALAN: So Andrea is the number 9.
    • 0:51:56If you could use your left hand to point at number 5.
    • 0:51:59And if you could use your left hand, yep, to point at number 17.
    • 0:52:02And your left hand to just point at null, which we'll just call the ground.
    • 0:52:05So you don't want to just point it randomly
    • 0:52:07because that would be like following a bogus pointer, so here means null.
    • 0:52:10All right.
    • 0:52:11So this is a linked list.
    • 0:52:12All you need to store are linked list of three values
    • 0:52:15is three nodes, inside of which are three integers,
    • 0:52:19and their left hands represents that next pointer, so to speak.
    • 0:52:22[? Comey's ?] a little different, in that she's not holding a value.
    • 0:52:25She's not holding an integer.
    • 0:52:27Rather, holding just the name of the variable, first.
    • 0:52:31So you're the only one that's different here fundamentally.
    • 0:52:34So suppose I want to insert the number 20?
    • 0:52:36Could someone volunteer to be number 20?
    • 0:52:38OK.
    • 0:52:38Come on up.
    • 0:52:40All right.
    • 0:52:41And what's your name?
    • 0:52:43ERIC: Eric.
    • 0:52:43DAVID J. MALAN: Eric.
    • 0:52:44Eric, you're the number 20.
    • 0:52:45And Eric, actually, let's see.
    • 0:52:50Actually can we do this?
    • 0:52:52Let me give-- let me make this a little more different.
    • 0:52:57OK.
    • 0:52:57That never happened.
    • 0:52:59OK.
    • 0:52:59Eric, give me that please.
    • 0:53:02I want to insert Eric as number 5.
    • 0:53:04So Eric, I'm keeping this list sorted.
    • 0:53:06So where, obviously, you're going to go?
    • 0:53:08ERIC: Go right there.
    • 0:53:09DAVID J. MALAN: All right.
    • 0:53:09But before you do that, let's just consider what this looks like in code.
    • 0:53:12In code, presumably, we have malloced Eric from the audience.
    • 0:53:17I've given him a value, n of number 5.
    • 0:53:20And his left hand is like, it's garbage value right now, because it's not
    • 0:53:23pointing to anything specific.
    • 0:53:25So he's got two values-- an integer, and a left hand representing
    • 0:53:28the next pointer.
    • 0:53:30If the goal is to put Eric in sorted order.
    • 0:53:34What should our steps be?
    • 0:53:36Like, whose hand should point where, and in what order?
    • 0:53:38Yeah.
    • 0:53:39Give us one step.
    • 0:53:39AUDIENCE: You should point to number 9.
    • 0:53:41DAVID J. MALAN: OK so you should point at number 9,
    • 0:53:43which is equivalent to saying, point at whatever first.
    • 0:53:46Where [? Comey ?] is pointing at.
    • 0:53:48So go ahead and do that.
    • 0:53:49All right next?
    • 0:53:50What's the next step?
    • 0:53:50Someone else?
    • 0:53:54Someone else.
    • 0:53:54Almost there.
    • 0:53:55Yeah?
    • 0:53:55AUDIENCE: First should point to 5.
    • 0:53:57DAVID J. MALAN: OK.
    • 0:53:58So first, or [? Comey, ?] could you point to 5.
    • 0:54:00And that's fine.
    • 0:54:00You don't even have to move.
    • 0:54:01Right?
    • 0:54:01This is the beauty of a linked list.
    • 0:54:03It doesn't matter where you are in memory,
    • 0:54:05it's the whole beauty of these pointers, where you can literally
    • 0:54:08point at that other location.
    • 0:54:09It's not an array where they need to be standing back to back to back.
    • 0:54:12They can be pointing anywhere.
    • 0:54:13All right.
    • 0:54:14Let's go ahead and insert one more.
    • 0:54:15Who wants to be say, 55?
    • 0:54:17Big value.
    • 0:54:17Yeah.
    • 0:54:18Come on down.
    • 0:54:20All right.
    • 0:54:21What's your name?
    • 0:54:21[? KYONG: ?] [? Kyong. ?]
    • 0:54:22DAVID J. MALAN: [? Kyong. ?] OK.
    • 0:54:23So come on over.
    • 0:54:24So we've just malloced [? Kyong ?] from the audience.
    • 0:54:26I've given him his end value of 55.
    • 0:54:28His left hand is just some garbage value right now.
    • 0:54:31How do we insert [? Kyong ?] in the right order?
    • 0:54:34Where is the obviously supposed to go?
    • 0:54:36In sorted order, he obviously belongs at the end.
    • 0:54:39But here's the catch with the linked list.
    • 0:54:42Just like when we've discussed searching and sorting in the past,
    • 0:54:45the computer is pretty blind to all but just one value.
    • 0:54:48And the linked list, at the moment--
    • 0:54:50like, I don't know that these three, these four, exist.
    • 0:54:52All I know really, is that [? Comey ?] exists.
    • 0:54:55Because via this first pointer, is the only access
    • 0:54:58to the rest of the elements.
    • 0:55:00And so what's cool about a linked list, but perhaps not obvious,
    • 0:55:03is that you only--
    • 0:55:04the most important value is the first.
    • 0:55:06Because from the first value, you can get to everyone else.
    • 0:55:09It's not useful-- excuse me for me to remember, Andrea?
    • 0:55:12--Andrea alone, because if I do, I've just
    • 0:55:14lost track of [? Comey ?] and more importantly, because of his number,
    • 0:55:18Eric.
    • 0:55:18So all I have to do really, is remember [? Comey. ?]
    • 0:55:21So if the goal now is to insert number 55, what steps should come first?
    • 0:55:27No pun intended.
    • 0:55:28AUDIENCE: [INAUDIBLE].
    • 0:55:29DAVID J. MALAN: Say again.
    • 0:55:30AUDIENCE: Finding the first space.
    • 0:55:31DAVID J. MALAN: OK.
    • 0:55:32Finding the first space.
    • 0:55:33So I'm going to start at [? Comey, ?] and I'm going to follow this pointer.
    • 0:55:36Number 5, does 55 belong here?
    • 0:55:39No.
    • 0:55:39So I'm going to follow this pointer and get to Andrea.
    • 0:55:42Does 55 belong here?
    • 0:55:43No.
    • 0:55:43Gonna follow her pointer, and 22, does it belong here?
    • 0:55:46No.
    • 0:55:47I follow this pointer, 26?
    • 0:55:48No.
    • 0:55:49But you have a free hand, it turns out.
    • 0:55:51So what step should come next?
    • 0:55:52AUDIENCE: [INAUDIBLE].
    • 0:55:54DAVID J. MALAN: We could have you point at 55, and now done.
    • 0:55:58So relatively simple, but what was the running time of this?
    • 0:56:02AUDIENCE: [INAUDIBLE].
    • 0:56:04DAVID J. MALAN: It's big o of n.
    • 0:56:05It's linear.
    • 0:56:06Because I had to start at the beginning, even though we
    • 0:56:08humans have the luxury of just eyeballing it.
    • 0:56:10Saying, oh, obviously, he belongs way at the end.
    • 0:56:11Mm-mm.
    • 0:56:12Not in code.
    • 0:56:13Like, we have to start at the beginning to reverse the whole darn list,
    • 0:56:16until we get linearly to the very end.
    • 0:56:17And now we're done.
    • 0:56:18Let's try one last one.
    • 0:56:20How about 20?
    • 0:56:21Yeah.
    • 0:56:22Great.
    • 0:56:22Come on down.
    • 0:56:23What's your name?
    • 0:56:24JAMES: James.
    • 0:56:24DAVID J. MALAN: James.
    • 0:56:25All right, James.
    • 0:56:26All right.
    • 0:56:26So we just malloced James, given him the number 20.
    • 0:56:28He obviously belongs roughly in the middle.
    • 0:56:30What's the first step?
    • 0:56:32AUDIENCE: [INAUDIBLE].
    • 0:56:33DAVID J. MALAN: Sorry?
    • 0:56:34AUDIENCE: [INAUDIBLE].
    • 0:56:35DAVID J. MALAN: All right.
    • 0:56:35So we start with [? Comey, ?] again.
    • 0:56:36All right.
    • 0:56:37First, OK.
    • 0:56:375, do you belong here?
    • 0:56:39No.
    • 0:56:40Let me follow the link.
    • 0:56:41OK 9, do you belong here?
    • 0:56:42No.
    • 0:56:43Do you belong at 22-- ooh.
    • 0:56:44But what did I just do wrong?
    • 0:56:48I went too far.
    • 0:56:49At least in this story.
    • 0:56:50Like, I literally-- Andrea is behind me now.
    • 0:56:52OK.
    • 0:56:52So can I follow the pointer backwards?
    • 0:56:55You can't.
    • 0:56:55Like in every picture we've drawn, and every example
    • 0:56:58we've done with an address, we only have the address of the next pointer.
    • 0:57:00We don't have what's called, a doubly linked list, at least in this story,
    • 0:57:03where I can just turn around.
    • 0:57:04So that was a bug.
    • 0:57:05So I need to start over instead.
    • 0:57:06First, OK 5, OK 19--
    • 0:57:09what I really need in code, ultimately, is
    • 0:57:11to kind of peek ahead and not actually move-- not that far.
    • 0:57:14Just to 22.
    • 0:57:15Peek ahead at 22 and realize, oh, that's going to be too far.
    • 0:57:19This is not yet far enough.
    • 0:57:20So let's go ahead and bring James over.
    • 0:57:22Well, actually, you can stay there physically.
    • 0:57:24But what step has to happen first?
    • 0:57:26I know now he belongs in here.
    • 0:57:29You want to point at him?
    • 0:57:31OK.
    • 0:57:31Point at him.
    • 0:57:32ANDREA: Oh.
    • 0:57:32I'm sorry, he points first.
    • 0:57:34DAVID J. MALAN: Well let's do that, just because it is incorrect.
    • 0:57:35That's fine.
    • 0:57:36OK.
    • 0:57:36Andrea proposed that we point here, but she just broke the whole linked list.
    • 0:57:40Why?
    • 0:57:40ANDREA: Because there's nothing to point at.
    • 0:57:42DAVID J. MALAN: Right.
    • 0:57:43No one is remembering-- what's was your name again?
    • 0:57:45[? KYONG: ?] [? Kyong. ?]
    • 0:57:45DAVID J. MALAN: No one's remembered where [? Kyong ?] was.
    • 0:57:47So you can't do that.
    • 0:57:47Your left hand has to stay there.
    • 0:57:49So what steps should happen first instead?
    • 0:57:50AUDIENCE: [INAUDIBLE].
    • 0:57:52DAVID J. MALAN: James should point at whatever
    • 0:57:54Andrea is pointing at, perhaps?
    • 0:57:56So a little redundantly at the moment, just like before.
    • 0:57:58OK.
    • 0:57:59Now what happens next?
    • 0:58:00That's step one.
    • 0:58:00ANDREA: Now I can point.
    • 0:58:01DAVID J. MALAN: Now you can point at him.
    • 0:58:03OK.
    • 0:58:04You could do that.
    • 0:58:05All right.
    • 0:58:05And so now, this looks like a complete mess,
    • 0:58:08but if we know that [? Comey ?] is first,
    • 0:58:10we can follow the breadcrumbs to Eric, and then to Andrea, and then to James,
    • 0:58:16and then the rest of our list step by step by step.
    • 0:58:20So it's a huge amount of like logic now.
    • 0:58:23But what problem have we solved?
    • 0:58:24And I think we identified it over here earlier.
    • 0:58:26What was the problem first and foremost with the arrays?
    • 0:58:29AUDIENCE: [INAUDIBLE].
    • 0:58:30DAVID J. MALAN: You have to decide on their size in advance.
    • 0:58:33And once you do that, if you want to add an additional element,
    • 0:58:36you have to resize the whole darn thing.
    • 0:58:38Which is expensive because you have to move everyone around.
    • 0:58:41Now frankly, I'm being a little greedy here.
    • 0:58:43And every time we've inserted these new elements,
    • 0:58:45I've been keeping them in sorted order.
    • 0:58:46So it would seem that if you insert things in sorted order, big o event,
    • 0:58:50every time.
    • 0:58:51Because in the worst case, the new element
    • 0:58:52might end up all the way at the end.
    • 0:58:54But what if we relax that constraint?
    • 0:58:55What if I'm not so uptight and need everything nice and orderly and sorted?
    • 0:58:59What if I just want to keep growing the list in any random order?
    • 0:59:02And I allocate the number 34.
    • 0:59:05And I'll play the number 34.
    • 0:59:06Malloc 34.
    • 0:59:08Where is the quickest place for me to go?
    • 0:59:11Yeah?
    • 0:59:12AUDIENCE: Point to 5, and then have [INAUDIBLE]..
    • 0:59:14DAVID J. MALAN: OK.
    • 0:59:14I'll point to 5, and then, [? Comey, ?] if you could point to me.
    • 0:59:16Done.
    • 0:59:17One-- well, two steps.
    • 0:59:18All right.
    • 0:59:19Suppose now, I malloc 17 with someone else, who'll we'll
    • 0:59:22pretend is right here.
    • 0:59:23Where's the best place for 17 to go?
    • 0:59:25AUDIENCE: [INAUDIBLE].
    • 0:59:26DAVID J. MALAN: Right after [? Comey ?] too.
    • 0:59:28So now, [? Comey ?] can point at 17, 17 can point at me, I can point at Eric,
    • 0:59:33and so forth.
    • 0:59:34And that's two steps again.
    • 0:59:35Two steps-- if it's the same number of steps every time,
    • 0:59:38we call that, constant time.
    • 0:59:39And we write it as big o of 1.
    • 0:59:41And so here too, it's just a trade off.
    • 0:59:43If you want really fast insertions, don't worry about sorting.
    • 0:59:46Just put them at the beginning and deal with it later.
    • 0:59:48If you want a dynamic resizeability, don't use an array, use a linked list,
    • 0:59:52and just keep allocating more and more as you go without wasting
    • 0:59:55a huge amount of space too.
    • 0:59:56Which notice, that's another big problem with an array.
    • 0:59:59If you over allocate space, and only use part of it, you're just wasting space.
    • 1:00:02So there's no one solution here.
    • 1:00:04But we do now have the capabilities, thanks to the structs
    • 1:00:06and pointers to stitch together, if you will, these new problems.
    • 1:00:11Yes, please.
    • 1:00:12SPEAKER 2: Why can't the node [INAUDIBLE]??
    • 1:00:17DAVID J. MALAN: And who am I in this story?
    • 1:00:19SPEAKER 2: [INAUDIBLE].
    • 1:00:20DAVID J. MALAN: Oh, OK.
    • 1:00:21Absolutely.
    • 1:00:22So another very reasonable idea would be, well,
    • 1:00:24why don't we just put the new ones at the end?
    • 1:00:26That's fine if I keep track of who is at the end.
    • 1:00:30The problem, is at the moment in the story,
    • 1:00:32and we'll ultimately see this in code, I'm only remembering [? Comey. ?] And
    • 1:00:35from [? Comey ?] am I getting everywhere else.
    • 1:00:37I could have another pointer, a second pointer,
    • 1:00:40and literally call it, last, that's equivalent to you.
    • 1:00:42Or that's always pointing at you.
    • 1:00:44I just need then two pointers, one literally called first,
    • 1:00:46one literally called last.
    • 1:00:47That's fine.
    • 1:00:48That's a nice optimization if I want to throw all the elements at the end.
    • 1:00:52And frankly, I could get really fancy--
    • 1:00:54and to solve the problem that Andrea cited earlier--
    • 1:00:57if I store not just an int and a pointer, but instead,
    • 1:01:00an int and two pointers, I can even have each
    • 1:01:03of these guys pointing with their left and right hands
    • 1:01:05in a doubly linked list, so as to solve the problem Andrea identified, which
    • 1:01:10was if I go too far no big deal.
    • 1:01:12Take one step back.
    • 1:01:13I don't have to think as hard about that logic.
    • 1:01:15So there too, a trade off.
    • 1:01:17Let's go ahead and take a five minute break.
    • 1:01:18I'll turn on some music.
    • 1:01:19Grab a duck now, if you'd like.
    • 1:01:20And we'll return with some fancier data structures still.
    • 1:01:22Thanks.
    • 1:01:23All right.
    • 1:01:24We're back.
    • 1:01:24So let's now translate some of these ideas to code.
    • 1:01:27So that we can actually solve this problem a little more
    • 1:01:29concretely than just having humans pointing at each other.
    • 1:01:32So for instance, let's try to distill everything
    • 1:01:34we've been talking about into just a goal in code
    • 1:01:37of storing a list of numbers.
    • 1:01:39I would propose that we can take like three passes at this problem.
    • 1:01:42The first would be, let's just decide in advance how many numbers we
    • 1:01:45want to store so we don't have to deal with all
    • 1:01:47this complexity with the pointing and the pointers and all this,
    • 1:01:50and just hard code that value somehow, and just stop
    • 1:01:53when the user is inputted that many numbers and no more.
    • 1:01:56Two, we can improve upon that and at least let the user dynamically resize
    • 1:02:01their array.
    • 1:02:02So that if they decide to input more numbers than we intend,
    • 1:02:05it's going to grow, and deal with that.
    • 1:02:06Of course, arrays are not necessarily ideal
    • 1:02:08because they have to do all that damn copying from old to new.
    • 1:02:11That's linear time.
    • 1:02:12It would seem smartest to get subversion 3, which
    • 1:02:14is actually going to use a linked list.
    • 1:02:16So we're just more modestly allocating space for another number,
    • 1:02:20and another number, and another number, or really a node.
    • 1:02:23One number at a time.
    • 1:02:25So let me go ahead and start as follows.
    • 1:02:27I'm going to go ahead and include some familiar lines in list 0.c,
    • 1:02:33of the CS50 library, just to make it easy to get some user input for this.
    • 1:02:36And standard iO dot h, for printdef.
    • 1:02:38And let me go ahead and declare my main function as usual.
    • 1:02:42And then, in here let's do a couple of things.
    • 1:02:44First, let's ask the user for the capacity of the array
    • 1:02:48that we're going to use.
    • 1:02:49Or rather, let's do this first.
    • 1:02:50Let me first rewind and say, you know what?
    • 1:02:53Int, numbers, 50.
    • 1:02:55Well, that's going to be annoying to type in 50 numbers.
    • 1:02:58We're going to give the user two numbers at first, that here, she can type in.
    • 1:03:01Next, let's go ahead and prompt the user for those numbers.
    • 1:03:05So let me go ahead and say--
    • 1:03:09let's do this.
    • 1:03:09Let's at least clean this up a little bit so that we can reuse this value.
    • 1:03:13So we don't have a magic number.
    • 1:03:14This just came up in discussion actually.
    • 1:03:16So while-- do I want to do that?
    • 1:03:20Nope.
    • 1:03:21Let me fix this.
    • 1:03:22This will be my capacity of size 2.
    • 1:03:24And that's going to give me that size.
    • 1:03:26And then, I'm going to keep track of how many integers
    • 1:03:28I've prompted the user for so far.
    • 1:03:30So initially, the size of this structure is going to be 0.
    • 1:03:33But it's capacity, so to speak, is 2.
    • 1:03:35So size means how many things are in it.
    • 1:03:36Capacity means how many things can be in it.
    • 1:03:39And while the size of the structure is less than its capacity,
    • 1:03:43let's go ahead and get some inputs from the user.
    • 1:03:45Let's go ahead and ask them for a number, using our old friend, get int.
    • 1:03:49And just say, give me a number.
    • 1:03:51And then, let me go ahead and insert the number
    • 1:03:54that they type in into this array at location size, like this.
    • 1:04:00And then, do size plus, plus.
    • 1:04:02I think.
    • 1:04:03You know, I wrote it pretty quickly.
    • 1:04:05But let's consider what I just did.
    • 1:04:07I initialized size to 0, because there's nothing in it initially.
    • 1:04:10Then I say, while size is less than the capacity of the whole thing--
    • 1:04:13and capacity is 2 by default--
    • 1:04:15go ahead and do the following.
    • 1:04:17Give me an int from the user.
    • 1:04:19OK.
    • 1:04:19So int number gets int.
    • 1:04:21Then, put at location, size, in my numbers, array,
    • 1:04:25whatever the human typed in, number.
    • 1:04:28And then, increment size with plus, plus.
    • 1:04:30All right.
    • 1:04:31So on the first iteration size is 0.
    • 1:04:33So numbers, bracket, 0, gets the first number.
    • 1:04:35Numbers, bracket, 1, gets the second number.
    • 1:04:37Then, size equals capacity.
    • 1:04:39So it stops, logically.
    • 1:04:40Any questions on the logic of this code?
    • 1:04:44All right.
    • 1:04:45So once we have those numbers, let's just do something simple.
    • 1:04:48Like for int, I gets 0.
    • 1:04:50I is less than the actual size I, plus, plus.
    • 1:04:53Let's just go ahead and print out the number
    • 1:05:00you inputted, percent I, backslash n, and type out numbers, bracket, I. All
    • 1:05:07right.
    • 1:05:07So if I made no typos in list 0 dot C, then, I'm going to go ahead
    • 1:05:12and do dot, slash, o, dot, C. I'm going to be prompted for a couple of numbers.
    • 1:05:16Let's go ahead and do 1, 2.
    • 1:05:18You inputted 1, you inputted 2.
    • 1:05:19All right.
    • 1:05:20So not bad.
    • 1:05:21But this is bad design, arguably, why?
    • 1:05:27Just find one fault. It's correct.
    • 1:05:29But bad design.
    • 1:05:32AUDIENCE: Repetitive.
    • 1:05:33DAVID J. MALAN: Repetitive, because I'm using a couple of loops, sure.
    • 1:05:36And it's fundamentally-- it's very limited in functionality.
    • 1:05:39Why?
    • 1:05:40Like how useful is this program?
    • 1:05:42AUDIENCE: It's hard coded at 2.
    • 1:05:43DAVID J. MALAN: Yeah.
    • 1:05:43It's hard coded at 2.
    • 1:05:44So let's at least improve upon this a little bit,
    • 1:05:47and get rid of this hard coding.
    • 1:05:48Why don't I at least ask the user for something like this?
    • 1:05:52Well, instead of just declaring the capacity, let me go ahead and say,
    • 1:05:56you know what?
    • 1:05:57Let's just replace the 2.
    • 1:05:58Get int, and just say capacity, for instance.
    • 1:06:01All right.
    • 1:06:02And now if I do this, I'm going to be prompted--
    • 1:06:05so make list 0.
    • 1:06:07Dot slash list 0.
    • 1:06:10The capacity will be 2.
    • 1:06:121, 2, that's nice.
    • 1:06:14But if I run it again, and give it a capacity of 3--
    • 1:06:171, 2, 3, I get more capacity.
    • 1:06:21So that's nice.
    • 1:06:21It's an improvement for sure.
    • 1:06:23There is a bug here.
    • 1:06:25Before I test it further, can anyone identify a bug or somehow crash this?
    • 1:06:33AUDIENCE: [INAUDIBLE].
    • 1:06:34DAVID J. MALAN: Oh, go ahead.
    • 1:06:34AUDIENCE: If you don't input an integer.
    • 1:06:36DAVID J. MALAN: If I don't put an integer.
    • 1:06:38Or-- is that same comment up here?
    • 1:06:39AUDIENCE: I was going to say, what happens if you go back
    • 1:06:42and put in [INAUDIBLE] those other [INAUDIBLE] will be in the memory.
    • 1:06:47DAVID J. MALAN: Oh.
    • 1:06:48No.
    • 1:06:48Because I'm rerunning it in each time.
    • 1:06:50I don't need to worry about previous runs of the program.
    • 1:06:53Yeah?
    • 1:06:53AUDIENCE: In the for loop, it just goes 1,
    • 1:06:562, 3, it doesn't actually care what you put it.
    • 1:06:59DAVID J. MALAN: [INAUDIBLE] 1, 2, 3-- well, I am iterating up to size,
    • 1:07:03which could be capacity.
    • 1:07:04Because now they do end up being equivalent.
    • 1:07:06Because I'm filling the whole thing.
    • 1:07:08But let's try this.
    • 1:07:08If you don't type in a value.
    • 1:07:10So let me go ahead and rerun this.
    • 1:07:12My capacity shall be duck.
    • 1:07:15All right.
    • 1:07:16So we did handle that.
    • 1:07:17Because getInt does that for me.
    • 1:07:19But I bet I can still break this.
    • 1:07:21Ooh, yeah, let's always try something negative.
    • 1:07:24Oh, OK.
    • 1:07:25So bad.
    • 1:07:25Like cryptic looking message, but clearly,
    • 1:07:27has to do with a negative value.
    • 1:07:28So I should probably be a little smarter about this.
    • 1:07:31And recall from like, Week 1, we did do this.
    • 1:07:33With Mario, you might have done this.
    • 1:07:35So I could do something like, do, while capacity is less than 1.
    • 1:07:41I could go ahead and say, capacity getInt capacity.
    • 1:07:46So just a little bit of error checking to close the bug that you identified.
    • 1:07:50All right.
    • 1:07:50So let's go ahead and recompile this.
    • 1:07:52Make lists 0-- oops we're going to start hearing that a lot today.
    • 1:07:57Aren't we [INAUDIBLE]?
    • 1:07:57Make list 0, dot, slash, list 0.
    • 1:08:00Capacity will be 3.
    • 1:08:011, 2, 3.
    • 1:08:02Now capacity will be negative 1.
    • 1:08:04Doesn't allow it.
    • 1:08:05Capacity 0, doesn't allow it.
    • 1:08:07Capacity 1, yes.
    • 1:08:09So non-exhaustively, I've tested it.
    • 1:08:11It feels like it's in better shape.
    • 1:08:12OK.
    • 1:08:12But this program, while correct, and while more featureful,
    • 1:08:16still has this fundamental limit.
    • 1:08:18Wouldn't it be nice to allow the user to just keep typing numbers,
    • 1:08:21as many as they want, and then quit once they're done inputting numbers.
    • 1:08:26Right?
    • 1:08:26If you're making a program to compute someone's GPA,
    • 1:08:29different students might have taken different courses,
    • 1:08:31you don't want to have them to type in all 32 courses.
    • 1:08:33If they're younger and haven't taken all those courses.
    • 1:08:35Like there's a lot of scenarios where you don't know in advance how
    • 1:08:38many numbers the user wants to provide.
    • 1:08:40But you want to support a few numbers, lots of numbers, or beyond.
    • 1:08:43So let's do this in a second version.
    • 1:08:46In list 1 dot C, let me go ahead and improve upon that example as follows.
    • 1:08:52First, let me give my familiar friends up here CS50 dot for iO,
    • 1:08:57standard iO dot h, and then, in here, int main void.
    • 1:09:02And then, let's start writing this.
    • 1:09:05So now, I don't know in advance, necessarily, how many numbers the user
    • 1:09:10is going to type in.
    • 1:09:11Like the goal is, I want them to be able to type
    • 1:09:13in a number, another number, another number, and then
    • 1:09:16hit the equivalent of like, q, for quit, when they're done inputting numbers.
    • 1:09:19Like I don't want them to have to think about in advance, how many numbers it
    • 1:09:22is they're inputting.
    • 1:09:24But how do I do that?
    • 1:09:26Like I can't just come up with an array called numbers, and say, 50.
    • 1:09:30Because if the user wants to type in 51 numbers,
    • 1:09:32I'm going to have to resize that.
    • 1:09:34But how do you resize an array?
    • 1:09:39How do you resize an array?
    • 1:09:41AUDIENCE: [INAUDIBLE].
    • 1:09:43DAVID J. MALAN: What's that?
    • 1:09:43AUDIENCE: You can't.
    • 1:09:44DAVID J. MALAN: You can't.
    • 1:09:44Right.
    • 1:09:45We've never seen an instance where you've re-sized an array.
    • 1:09:47We talked about it on the blackboard here.
    • 1:09:49Well, just like, allocate a bigger one and copy everything in.
    • 1:09:52And we did identify realloc.
    • 1:09:54But you can't actually use realloc on an array.
    • 1:09:57Realloc actually accepts an address of a chunk of memory
    • 1:10:01that you want to grow, or shrink.
    • 1:10:04So it turns out, if we now start to harness
    • 1:10:06the sort of fundamental definition of what an array is, a chunk of memory,
    • 1:10:10we can actually build arrays ourselves.
    • 1:10:13If an array is just a chunk of memory, or more specifically,
    • 1:10:16it's like the address of the first byte of a chunk of memory,
    • 1:10:21it would seem that I could declare my array, not with square brackets
    • 1:10:25as we've been doing for weeks, but I can say,
    • 1:10:27you know what numbers really is, it's really just a pointer.
    • 1:10:31And I'm initially going to initialize it to null.
    • 1:10:33Because there is no array.
    • 1:10:35But now I have the ability to point that pointer
    • 1:10:37at any chunk of memory, small or big.
    • 1:10:41Now why is this useful?
    • 1:10:42Well, initially let me claim that my capacity is 0,
    • 1:10:44because nothing's going on yet.
    • 1:10:45I haven't called malloc or anything.
    • 1:10:47And initially, my size is 0 because there's nothing in the array.
    • 1:10:52And it doesn't even have a size.
    • 1:10:53But let me just do this forever.
    • 1:10:55Much like in scratch, we had the forever block you can use, while true, and C,
    • 1:10:59to just say keep doing this until the user breaks out of this.
    • 1:11:02And let me go ahead and ask the user, give me a number, getInt.
    • 1:11:06And just ask them for a number.
    • 1:11:09And then, we just need a place to put that.
    • 1:11:11So where do I put this number?
    • 1:11:14Well, do I have, at the moment, any place to put the number?
    • 1:11:18No.
    • 1:11:18And technically speaking, how do you express that?
    • 1:11:20Like in pseudo code, I want to say, if no place for number.
    • 1:11:25But technically, I could do this.
    • 1:11:26Well, if the size of the array at the moment, equals its capacity,
    • 1:11:33that feels like a lower level way of expressing the same thing.
    • 1:11:36If whatever the capacity is, if the size is the same, there is no more room.
    • 1:11:41And that simple statement also covers the scenario where the capacity is 0,
    • 1:11:47the size is therefore, 0.
    • 1:11:48So its the same question.
    • 1:11:50Either we have no space at all, or we have some space
    • 1:11:52but we've used it all-- size equals, equals, capacity.
    • 1:11:56So if the size equals capacity, or put more casually,
    • 1:11:59if I don't have enough space.
    • 1:12:01What do I want to do intuitively?
    • 1:12:03AUDIENCE: [INAUDIBLE].
    • 1:12:04DAVID J. MALAN: Allocate more memory.
    • 1:12:06And it turns out, you proposed, or someone proposed earlier,
    • 1:12:09reallocating memory.
    • 1:12:10We can use this function for the very first time.
    • 1:12:13Let me go ahead and say this--
    • 1:12:14the catch with realloc is you have to be smart about it,
    • 1:12:16because it returns a pointer.
    • 1:12:18So let me propose this code first.
    • 1:12:19First, just give me a temporary variable, call it, temp,
    • 1:12:23that's going to store the following.
    • 1:12:25Actually, no.
    • 1:12:26Let me start this more simply.
    • 1:12:28Let me go ahead and say, numbers should be reallocated please,
    • 1:12:34realloc by passing its self in.
    • 1:12:39And this time, give me the size of an int, times--
    • 1:12:43how many ints do I want this time?
    • 1:12:47How many numbers did the human just input presumably?
    • 1:12:52AUDIENCE: [INAUDIBLE].
    • 1:12:53DAVID J. MALAN: Just one.
    • 1:12:54Right.
    • 1:12:54Because literally, we've only called getInt once in this story.
    • 1:12:57So whatever the size of this array is now, we need to increase it by 1.
    • 1:13:03That's all.
    • 1:13:04So this line of code here is saying, hey computer,
    • 1:13:09go ahead and reallocate this array from whatever its current size is,
    • 1:13:15and make it this size instead.
    • 1:13:18The size of whatever it is, plus 1, times the size of an int.
    • 1:13:22Because that's what we're trying to store, is an int.
    • 1:13:24So we have to do that multiplication.
    • 1:13:26And realloc, as mentioned earlier, is pretty fancy.
    • 1:13:28It's going to take an pointer, whatever chunk of memory
    • 1:13:31you've already allocated, and it's going to then reallocate
    • 1:13:34a bigger chunk of memory.
    • 1:13:36Hopefully, what's going to happen is this--
    • 1:13:38if your chunk of memory initially looks like this,
    • 1:13:41it's going to hopefully notice, oh, this memory is free.
    • 1:13:44Let me just give you back the same address.
    • 1:13:46So if this is address 100, and you get lucky
    • 1:13:48and this address is also available, the realloc function's
    • 1:13:51going to remember that for the operating system.
    • 1:13:53It's going to return the number 100 again.
    • 1:13:55And you're good to go.
    • 1:13:56You can safely touch memory here.
    • 1:13:57Or if this is in use already, this chunk of memory, and therefore we
    • 1:14:03can't fit another byte there because some other code you wrote
    • 1:14:06is using that memory.
    • 1:14:08But there is twice as much memory available down here.
    • 1:14:11What realloc will do, is if you've stored the number 50,
    • 1:14:14it will handle the process of copying 50 to the new value.
    • 1:14:18This is going to be left as a garbage value for you to deal with.
    • 1:14:20And it's going to return to you the address of the new chunk of memory,
    • 1:14:24having done the copying for you.
    • 1:14:27So even though it's technically re-allocating the array,
    • 1:14:30it's not necessarily just going to grow it.
    • 1:14:32It might relocate it in memory to a bigger chunk,
    • 1:14:36and then give you the new address of that memory.
    • 1:14:39Question?
    • 1:14:39AUDIENCE: Is that process really preferable
    • 1:14:42to just creating extra memory in it's place.
    • 1:14:45And then saving the time and energy of reallocating them [? all at once. ?]
    • 1:14:50DAVID J. MALAN: That's a really good question.
    • 1:14:52Honestly, we could avoid this problem slightly by just doing, you know what,
    • 1:14:55give me at least--
    • 1:14:57go ahead and give me at least the size of an int, times--
    • 1:15:01I don't know, most humans are not going to type in more than 50 numbers.
    • 1:15:04Let's just pick 50.
    • 1:15:05So you could do this, and that would indeed save you time.
    • 1:15:08Because the approach I'm currently taking
    • 1:15:10is pretty inefficient because every damn time the user
    • 1:15:13calls getInt, and gives an int, we're resizing, resizing, resizing.
    • 1:15:17Very expensive.
    • 1:15:18As to what the best value is though--
    • 1:15:2050?
    • 1:15:20Should it be 25?
    • 1:15:21Should it be 1,000?
    • 1:15:23I'm either going to under bet or over bet.
    • 1:15:25And it just depends on you to decide which of those is the worst decisions.
    • 1:15:30AUDIENCE: But, like, in terms of programs,
    • 1:15:32is it also pretty expensive to have memory that you're not using
    • 1:15:36or generally, is it usually more OK?
    • 1:15:37DAVID J. MALAN: Good question.
    • 1:15:39In programs you're writing, is it better to have more memory than you're using,
    • 1:15:42or should you really be conservative?
    • 1:15:43These days, memory is cheap.
    • 1:15:45We all have gigabytes of memory.
    • 1:15:47And so wasting 50 bytes or 200 bytes, times 4, of memory, not a big deal.
    • 1:15:52Like, just get the job done quickly and easily.
    • 1:15:54But in resource constrained devices, maybe, things like phones
    • 1:15:57or little internet of things style devices
    • 1:16:00that have a lot fewer resources, you don't really want to go wasting bytes.
    • 1:16:04But honestly, the CPUs, the brains in our computers,
    • 1:16:06are so darned fast these days, even if you're calling malloc
    • 1:16:0810 times, 1,000 times, it's happening so darned fast
    • 1:16:11that the human doesn't even notice.
    • 1:16:13So there too.
    • 1:16:13These are what are called design decisions.
    • 1:16:15And these are the kinds of things that, in the real world,
    • 1:16:17you might actually debate with someone at a whiteboard,
    • 1:16:19saying, no, this is stupid because of this reason.
    • 1:16:21Or he or she might push back for other reasons.
    • 1:16:23And no one's necessarily right.
    • 1:16:25The whole goal is to just that thought process first
    • 1:16:27so you're at least confident in what you chose.
    • 1:16:29Yeah?
    • 1:16:29AUDIENCE: When we were writing to a file in the last PSET,
    • 1:16:32was it storing it in memory first or putting it right on the hard drive?
    • 1:16:36DAVID J. MALAN: When you were calling fread,
    • 1:16:39you were by definition in the forensics problem set
    • 1:16:42reading bytes from disk into memory.
    • 1:16:45When you were calling fwrite, you were copying bytes from memory back to disk.
    • 1:16:51If that answers the question.
    • 1:16:53OK.
    • 1:16:53Other questions?
    • 1:16:53Yeah?
    • 1:16:54AUDIENCE: Why did you say, size + 1, in line 16?
    • 1:16:58DAVID J. MALAN: Why do I say, size + 1, in line 16?
    • 1:17:01Because the whole goal is to make room in this array for the newly inputted
    • 1:17:06number that the human just typed in.
    • 1:17:08And so whatever the current size of the array is,
    • 1:17:10I clearly need one more space.
    • 1:17:14AUDIENCE: So that repeats on and on?
    • 1:17:17DAVID J. MALAN: It does repeat on and on.
    • 1:17:18Because at the moment, I'm inside of this while loop.
    • 1:17:22So we do need to ask a question, when is the human done inputting.
    • 1:17:26And it turns out-- and this is not obvious.
    • 1:17:28And it's not the best user experience on a keyboard for the human.
    • 1:17:31But we can actually detect the following sentiments--
    • 1:17:35if user is done inputting numbers, then let's go ahead and break.
    • 1:17:42But the question then is, how do you express that pseudo code?
    • 1:17:45Well, you could in some programs maybe type q for quit.
    • 1:17:49But is that going to work when using getInt?
    • 1:17:52Could we detect q?
    • 1:17:54Why not?
    • 1:17:58AUDIENCE: Because getInt immediately prompts you for another integer.
    • 1:18:01DAVID J. MALAN: Exactly.
    • 1:18:02Because getInt immediately prompts you for another int.
    • 1:18:04So because of the way we designed the CS50 library, you can't detect q,
    • 1:18:07or you can't have the human type quit unless you don't use getInt.
    • 1:18:11You instead use?
    • 1:18:14AUDIENCE: getString.
    • 1:18:15DAVID J. MALAN: We could use getString.
    • 1:18:17And then every time the human types in a number, we could use,
    • 1:18:20like, A2i to convert it to an int.
    • 1:18:23But if the human types in q or Q-U-I-T--
    • 1:18:26a string also-- we could just have an if condition with string compare and quit.
    • 1:18:30But honestly, then you're reimplementing getInt--
    • 1:18:32so trade-off.
    • 1:18:34Anyhow, a common way to work around this would
    • 1:18:36be, you know that Control-C quits programs, perhaps,
    • 1:18:39cancels out of your program.
    • 1:18:41There's another popular keystroke, Control-D,
    • 1:18:43that sends what's called end of file.
    • 1:18:45It simulates the end of a file.
    • 1:18:48It simulates the end of the human's input.
    • 1:18:50So it's kind of like the period at the end of an English sentence.
    • 1:18:52So if you want to signal to a computer that's waiting for input from you that
    • 1:18:56you don't want to quit the program-- that would be Control-C--
    • 1:18:59but you just want to be done inputting input to the computer,
    • 1:19:02you hit Control-D, otherwise known as EOF.
    • 1:19:04And the way to express this-- and you would only know this from
    • 1:19:07documentation-- would be to say something like this,
    • 1:19:09if the number the human typed in equals end of file--
    • 1:19:13but there is no such thing in this context--
    • 1:19:16you actually do this because of the CS50 library works.
    • 1:19:19It turns out that if the only values a function can return
    • 1:19:23are integers, that means you can return 0, 1, negative 1, 2 billion,
    • 1:19:28negative 2 billion give or take.
    • 1:19:30What humans did for years with old programming languages
    • 1:19:33is they would just steal one or a few numbers.
    • 1:19:35For instance, you'd steal the number two billion and call it intmax--
    • 1:19:39the maximum integer.
    • 1:19:41And you'd just say, you can never actually type 2 billion,
    • 1:19:43because we're using that as a special value to signify
    • 1:19:46that the human hit Control-D. Or you could do negative 2 billion,
    • 1:19:49or you could do 0, or 50.
    • 1:19:51But at some point, you have to steal one of the 4 billion available numbers
    • 1:19:55to use as a sentinel value, a special value
    • 1:19:58that you can then check for as a constant.
    • 1:20:00So anyhow, this just means, when the user is done typing input,
    • 1:20:04go ahead and break out of this while loop.
    • 1:20:06And as an aside, let me fix one thing.
    • 1:20:08It turns out things can go wrong with realloc.
    • 1:20:11And if realloc fails to allocate memory, it
    • 1:20:15can return null, a special value that just means, eh, something went wrong.
    • 1:20:19It's an invalid pointer.
    • 1:20:20It's the address 0.
    • 1:20:21And so it turns out there's a subtle bug here where, technically, I
    • 1:20:25should actually do this--
    • 1:20:27store realloc's return value in a temporary variable.
    • 1:20:31Because if temp = null, something went wrong.
    • 1:20:35And I should actually go ahead and quit out of this program.
    • 1:20:39But let me wave my hand at that for now because it's more of a corner case.
    • 1:20:42But you'll see in the online version of this program we have additional error
    • 1:20:46checking that just checks, in the rare case that realloc fails,
    • 1:20:50clean it up and return properly.
    • 1:20:52But I'll wave to the online code for that.
    • 1:20:54All right.
    • 1:20:55Any questions on that example before we move on?
    • 1:20:59Yeah?
    • 1:21:00AUDIENCE: So in realloc, when it creates the new pointer for the [INAUDIBLE],,
    • 1:21:06does it clear the memory from the original pointer?
    • 1:21:08Does it automatically clear it?
    • 1:21:09DAVID J. MALAN: Good question.
    • 1:21:10When you call realloc and it ends up allocating more space,
    • 1:21:13does it clear the original memory?
    • 1:21:16No.
    • 1:21:17And that is where garbage values come from, for instance.
    • 1:21:21Because they're just left in memory from the previous use.
    • 1:21:23Other questions?
    • 1:21:24Yeah?
    • 1:21:25AUDIENCE: What does the user actually type to break?
    • 1:21:29DAVID J. MALAN: Oh, Control-D. Control-D. And it's not break.
    • 1:21:33It is to send end of file, end of input.
    • 1:21:36Control-C kills or breaks out of the program itself.
    • 1:21:38AUDIENCE: And that's the same as the intmax kind of?
    • 1:21:41DAVID J. MALAN: Same as intmax?
    • 1:21:43Yes.
    • 1:21:44AUDIENCE: Because you're not adding, like, a giant value.
    • 1:21:46DAVID J. MALAN: Correct.
    • 1:21:47In the CS50 library, intmax, yes, is the symbol.
    • 1:21:50Yes.
    • 1:21:50Yeah?
    • 1:21:51AUDIENCE: Could you also just ask the user to say,
    • 1:21:55do you want to enter another number yes or no?
    • 1:21:57DAVID J. MALAN: Absolutely.
    • 1:21:58We could add more logic.
    • 1:21:59And you could use getString.
    • 1:22:00And we could prompt him or her, hey, do you want to input another number.
    • 1:22:02The only downside of that would be, now, I
    • 1:22:04have to type in not only my number, but yes or no constantly.
    • 1:22:07So it's just a trade-off user interface-wise.
    • 1:22:10All right.
    • 1:22:10So let me go ahead.
    • 1:22:12And let me go ahead and return 0 here just as my simple solution
    • 1:22:16to this problem of something going wrong.
    • 1:22:19I've just compiled this program.
    • 1:22:21Let me go ahead and run it.
    • 1:22:22I'm going to type in one number, two numbers, three numbers.
    • 1:22:27And now I'm bored.
    • 1:22:27I don't want to keep doing this.
    • 1:22:29How do I tell the computer I'm done?
    • 1:22:31AUDIENCE: Control-D.
    • 1:22:31DAVID J. MALAN: Control-D. Oops.
    • 1:22:34Oh, OK.
    • 1:22:35That's correct behavior because I forgot a key step.
    • 1:22:39What's that?
    • 1:22:41AUDIENCE: [INAUDIBLE].
    • 1:22:42DAVID J. MALAN: Yeah.
    • 1:22:42I'm not actually doing anything with the values.
    • 1:22:44I should probably for int I get 0, I less than size,
    • 1:22:49I + + code we had before.
    • 1:22:52And I should probably print out You inputted %I, this.
    • 1:22:57Save that.
    • 1:22:59Make list one.
    • 1:23:00So all I did was re-add the printing code.
    • 1:23:03Now if I rerun this-- one, two three, Control-D--
    • 1:23:08dammit.
    • 1:23:08AUDIENCE: [INAUDIBLE].
    • 1:23:10DAVID J. MALAN: Oh.
    • 1:23:11OK.
    • 1:23:11Now I broke my code here.
    • 1:23:14Let me do this.
    • 1:23:15We're going to get rid of this error checking
    • 1:23:17because I'm not actually ever resizing.
    • 1:23:19numbers gets realloc.
    • 1:23:21Oh, and maybe someone chiming in with this--
    • 1:23:24numbers bracket size gets the user's input.
    • 1:23:29Size + +-- was this a key detail someone wanted me to do?
    • 1:23:34OK.
    • 1:23:34So I didn't actually finish the program earlier.
    • 1:23:36Notice we left off as follows--
    • 1:23:39hey, computer, give me an array of size 0 initially
    • 1:23:43that's null-- there's no memory for it.
    • 1:23:45Therefore, the size of this array is 0.
    • 1:23:47Do the following forever.
    • 1:23:49Get a number from the human.
    • 1:23:51If the number equals this special value, intmax just
    • 1:23:54breakout because the program is done.
    • 1:23:57And actually, sorry.
    • 1:23:59This is why I write these in advance too.
    • 1:24:04OK.
    • 1:24:05Go ahead and prompt the user for a number.
    • 1:24:08If they have inputted the Control-D, just break out of this loop.
    • 1:24:12However, if the size of the array equals its current capacity,
    • 1:24:15go ahead and reallocate space for this thing being one number bigger than it
    • 1:24:21previously was.
    • 1:24:22Now, assuming that succeeded and we have memory,
    • 1:24:25go ahead, and just like our list 0 example, store in the numbers array
    • 1:24:30at the current location, which is 0, whatever number the human typed in.
    • 1:24:34And then increment the size by one to remember what we have done.
    • 1:24:38I'm also though going to need to do capacity + + here
    • 1:24:41to remember that we've increased the capacity of the array.
    • 1:24:44So again, two new measures.
    • 1:24:45capacity is how much space there is in total.
    • 1:24:48size is how much we're using.
    • 1:24:50They happen to be identical at the moment
    • 1:24:52because we're growing this thing step by step by step.
    • 1:24:56All right.
    • 1:24:57Let me go ahead and hit Save.
    • 1:24:58Let me go ahead and compile this one last time.
    • 1:25:01./list1 and input 1, 2, 3.
    • 1:25:05Control-D. OK.
    • 1:25:07Now it's just an aesthetic bug.
    • 1:25:08I forgot my /n.
    • 1:25:10So just to prove that I can actually program, ./list1; 1, 2, 3; Control-D.
    • 1:25:18Phew.
    • 1:25:19All right.
    • 1:25:20So you inputted 1.
    • 1:25:21And the reason it didn't move to another line
    • 1:25:23is because Control-D gets sent immediately without hitting Enter.
    • 1:25:26All right.
    • 1:25:27Phew.
    • 1:25:28That's all using arrays.
    • 1:25:29Now let's do the sort of cake baked already and pull it out of the oven.
    • 1:25:35The third and final example here is list two.
    • 1:25:38And actually, before we get there, let me note one thing.
    • 1:25:41Yeah, let's do one last thing here.
    • 1:25:43Let me go ahead and run, per earlier, our new friend valgrind on list1.
    • 1:25:49Enter.
    • 1:25:50It's waiting for me to type in 1, 2, 3.
    • 1:25:54Let me go ahead and hit Control-D. Interesting.
    • 1:25:57I seem to have a buggy program even though I claimed a moment ago that I
    • 1:26:01knew what I was doing.
    • 1:26:0212 bytes in one blocks are definitely lost in lost record one of one.
    • 1:26:06Again, I don't understand most of those words.
    • 1:26:07But 12 bytes definitely lost--
    • 1:26:10probably my fault. Why is it 12?
    • 1:26:13And what are those 12 bytes?
    • 1:26:16Yeah?
    • 1:26:17AUDIENCE: I think you made three integers.
    • 1:26:19DAVID J. MALAN: Yeah, 1, 2, and 3.
    • 1:26:21AUDIENCE: And each one is 4 bytes.
    • 1:26:22And you never freed them after you used malloc.
    • 1:26:23DAVID J. MALAN: Exactly.
    • 1:26:24I typed in three numbers--
    • 1:26:251, 2, and 3.
    • 1:26:25Each of those is 4 bytes on this computer.
    • 1:26:27That's 12-- 3 times 4.
    • 1:26:29And so I'd never freed them seems to be the source of the issue.
    • 1:26:32So at the end, let's just prove that valgrind
    • 1:26:36can detect correctness as well.
    • 1:26:37Free my numbers, semi-colon.
    • 1:26:40Let me go ahead and rerun make list1.
    • 1:26:43And now let me increase the size of this and do valgrind again
    • 1:26:47on list1, typing in the same values--
    • 1:26:491, 2, and 3.
    • 1:26:50Control-D. All he blocks were freed.
    • 1:26:53No leaks are possible.
    • 1:26:54So again, valgrind is your friend.
    • 1:26:56It finds problems that you didn't even necessarily notice.
    • 1:26:58And you didn't have to read through your lines of code again
    • 1:27:00and again to identify the source of the issue unnecessarily.
    • 1:27:03All right.
    • 1:27:04Any questions then on these arrays that are dynamically allocated
    • 1:27:08and the bugs we find therein with valgrind?
    • 1:27:12All right.
    • 1:27:12So the last demonstration of code is going to be this.
    • 1:27:17I have stolen, for this final example, some of the building blocks
    • 1:27:21that we had on the screen earlier.
    • 1:27:23In my code for list2.c, I need a structure called node.
    • 1:27:28And that node, as we claimed earlier with our human volunteers,
    • 1:27:31is going to contain a number called number,
    • 1:27:33we'll call it this time, instead of n.
    • 1:27:35And it's going to contain a ptr called next to another such node.
    • 1:27:40So that's copied and pasted earlier, albeit with the integer renamed
    • 1:27:43to number for clarity.
    • 1:27:45Now, notice in main what I'm doing first.
    • 1:27:47Go ahead and allocate an array of no space initially.
    • 1:27:53So this was like when Comey was holding up first and representing
    • 1:27:56the beginning of our data structure.
    • 1:27:58This is the analog using an array, that the piece of paper that
    • 1:28:02would be held up here would be numbers.
    • 1:28:04And it's just pointing at nothing, null-- like left hand
    • 1:28:06down on the floor.
    • 1:28:07Because there is no memory yet allocated.
    • 1:28:09But then, and while true, go ahead and get an integer
    • 1:28:13from the user with this code here.
    • 1:28:16Check if the user hit Control-D, as with this arcane technique.
    • 1:28:21And then our code is similar in spirit, but we
    • 1:28:25have to stitch these things together.
    • 1:28:27Allocate space for the number.
    • 1:28:29So when I malloc an additional volunteer from the audience
    • 1:28:32and he or she came down, the equivalent in code is this--
    • 1:28:35hey, computer, allocate with malloc enough space to fit the size of a node,
    • 1:28:41then store the results in a ptr called n.
    • 1:28:44So node *n just means, give me a pointer to a node, call it n,
    • 1:28:49and store the address that was just allocated from the audience as before.
    • 1:28:53Why do I have these lines of code here that I've highlighted in blue?
    • 1:28:56What's that expressing?
    • 1:29:03If bang n, or if not n would be how you pronounce it--
    • 1:29:08what's going on there?
    • 1:29:10Yeah?
    • 1:29:11AUDIENCE: If there is no more memory that you can point to, then it fails.
    • 1:29:15DAVID J. MALAN: Exactly.
    • 1:29:16This isn't going to happen all that often.
    • 1:29:18But if the computer is out of memory, and therefore malloc fails,
    • 1:29:21you don't want the program just to crash or freeze.
    • 1:29:23Like, all of us hate when that happens on Mac OS or Windows.
    • 1:29:26So check for it.
    • 1:29:27If not n, or equivalently, if n = = null, just return 1.
    • 1:29:33Quit gracefully, even though annoyingly.
    • 1:29:35But don't just crash or do something unexpected.
    • 1:29:37So you can simplify that check to just if not n-- if n is not a valid ptr,
    • 1:29:43return 1.
    • 1:29:44Now, here's the code with which we were implementing the demonstration
    • 1:29:49with our humans.
    • 1:29:49And this is the scariest looking or most cryptic at least
    • 1:29:52looking code we're going to see in C.
    • 1:29:54Today is our final day in C. We've been running up
    • 1:29:59a really steep hill of late, learning about memory,
    • 1:30:01and now data structures and syntax.
    • 1:30:03This is the last of our syntax in C.
    • 1:30:06So what are the symbols to be aware of?
    • 1:30:09This line of code here is how I handed one of our volunteers a piece of paper.
    • 1:30:15On the right-hand side is the number that was typed in--
    • 1:30:1855, or 5, or 20, or whatever the value is.
    • 1:30:21On the left-hand side is where you want to put it.
    • 1:30:24n and then literally an arrow number does this.
    • 1:30:29It has, with malloc a line or so prior, given me in memory
    • 1:30:34just one of these big rectangles.
    • 1:30:36And again, the top of this in this example is called the number
    • 1:30:40and the bottom is called next.
    • 1:30:42So that's our human having stood up from the back of the room.
    • 1:30:45When I hand that human a number, like 55, it visually goes there.
    • 1:30:49The line of code with which you achieve that is this here.
    • 1:30:53Because notice on line 31 here, when I malloc that node,
    • 1:30:57I stored its address in a variable called n.
    • 1:31:02And that's a pointer, as drawn with an arrow, to that big node.
    • 1:31:05Or if we really want to be nit-picky, if this is in address 100,
    • 1:31:08yes, then the pointer actually has the value 100 in it.
    • 1:31:11But again, that's rarely useful information.
    • 1:31:13So we can abstract away with just an arrow.
    • 1:31:16So line 31 is what creates those boxes on the screen.
    • 1:31:21Line 38 is what puts the number--
    • 1:31:25for instance, 55-- into the box exactly, much like I handed a piece of paper
    • 1:31:30over.
    • 1:31:31So what is this?
    • 1:31:32This is the only real new notation today,
    • 1:31:35even though we're using lots of stars elsewhere--
    • 1:31:38arrow This is wonderfully the first time in C it actually maps to our pictures.
    • 1:31:43If n is the variable and you do n arrow something,
    • 1:31:46that means follow the arrow--
    • 1:31:48kind of like Chutes and Ladders if you grew up playing that--
    • 1:31:50and then put the number where the arrow has led you in the field called number.
    • 1:31:57So as an aside, we can think about this a different way. n is what data type?
    • 1:32:02What is this thing in blue--
    • 1:32:04n?
    • 1:32:07AUDIENCE: Pointer.
    • 1:32:08DAVID J. MALAN: It's a pointer.
    • 1:32:10And it's a pointer to one of these things that we created earlier.
    • 1:32:12So we're not doing students anymore with our structures.
    • 1:32:15We're implementing nodes, which have numbers and next pointers.
    • 1:32:18So it turns out that if n is a pointer to a node--
    • 1:32:24recall that dot notation from before--
    • 1:32:27this is not how you access number in this case.
    • 1:32:29Because n is not a node itself.
    • 1:32:30It's a pointer.
    • 1:32:32But if n is a pointer, how do you go to a pointer?
    • 1:32:35How do you go to an address?
    • 1:32:37With what notation?
    • 1:32:38AUDIENCE: Star.
    • 1:32:39DAVID J. MALAN: Star.
    • 1:32:40So recall from last week, if we want to go to an address,
    • 1:32:44you could do syntax like this.
    • 1:32:45Ignore the parentheses for a moment.
    • 1:32:47Just *n means if n is an address of a chunk of memory, *n means go there.
    • 1:32:52Once you're there, you're conceptually right here-- top left-hand corner.
    • 1:32:56How do you access individual fields like number or next?
    • 1:32:59You use dot notation.
    • 1:33:01So if you literally do *n.number, that means go to the address and access
    • 1:33:08the number field.
    • 1:33:09There is nice syntactic sugar in C, which
    • 1:33:12is just a fancy way of saying shorthand notation, where it's just the arrow.
    • 1:33:16But that's all it is.
    • 1:33:17This arrow notation doesn't do anything new.
    • 1:33:19It just combines, go there, with, access a field in a struct, all in one breath
    • 1:33:24if you will.
    • 1:33:26And this just looks a little prettier.
    • 1:33:28When I told our volunteers earlier, point your hand
    • 1:33:30down at the floor, that's all that line of code is doing.
    • 1:33:33It's saying, go to n's address, which is here, access the next field,
    • 1:33:38and write in that field null, which is just
    • 1:33:41the address 0-- the default, special address, like pointing at the floor.
    • 1:33:46This line of code, 40, is just a quick error check.
    • 1:33:49if (numbers)-- what is that equivalent to?
    • 1:33:51That's actually just saying, if numbers, not equals null.
    • 1:33:54So if numbers is legitimate, if malloc worked correctly, then let's go ahead
    • 1:33:59and do the following.
    • 1:34:01Phew.
    • 1:34:02This is a mouthful.
    • 1:34:04What is going on here?
    • 1:34:05So this is a for-loop that's not using numbers.
    • 1:34:08Well, or is it?
    • 1:34:09Almost every for-loop we've written and you've probably written just
    • 1:34:12uses I, J, maybe K, but just integers probably.
    • 1:34:16But that doesn't have to be the case.
    • 1:34:18What is a pointer?
    • 1:34:19It's an address.
    • 1:34:20What is an address?
    • 1:34:23AUDIENCE: A place in memory.
    • 1:34:24DAVID J. MALAN: A place in memory, or a number really.
    • 1:34:26So you can certainly use for-loops just involving addresses.
    • 1:34:29But how?
    • 1:34:30So we'll consider this line of code.
    • 1:34:32This here looks different today, but it's everything
    • 1:34:35before that first semi-colon.
    • 1:34:36That's just where you initialize a value.
    • 1:34:38So this is like saying, hey, computer, go ahead and give me
    • 1:34:42a variable called ptr and initialize it to be the start of my list.
    • 1:34:51Then I'm saying, hey, computer, do this so long as ptr does not equal null.
    • 1:34:57And then what am I doing?
    • 1:34:59if-- and let's ignore this for now, it's an error check--
    • 1:35:02go ahead and-- sorry, let me think for one second.
    • 1:35:13OK.
    • 1:35:13Let's do this.
    • 1:35:17What are these lines of code doing?
    • 1:35:19This is the code that was actually suggested
    • 1:35:21at the very end of our human example.
    • 1:35:23Like, what if we wanted to insert all of the elements
    • 1:35:26at the end of the link list?
    • 1:35:28How do you express that?
    • 1:35:30So in this highlighted lines of code, we're asking the question,
    • 1:35:34if the current pointer's next field is null, we've found the end.
    • 1:35:38Go ahead and update that next field to equal n and then break.
    • 1:35:42So let me translate this to an actual picture,
    • 1:35:45but using smaller boxes that makes clear where something is going.
    • 1:35:49So suppose that this program's been running for a little while
    • 1:35:53and we have a length list that looks like this,
    • 1:35:57where this one is pointing here and maybe this one's pointing here.
    • 1:36:02And this says null here.
    • 1:36:04And this points here.
    • 1:36:05And the numbers are, as we've been using today, 42, 50, 13.
    • 1:36:11So the start of this list is called numbers.
    • 1:36:17This points to the start of the list.
    • 1:36:19What am I doing in this for-loop?
    • 1:36:21I am just implementing the following logic with this loop--
    • 1:36:24give me a variable called ptr, as represented
    • 1:36:27in the story by my left finger, here, and initialize
    • 1:36:30that to be the start of the list.
    • 1:36:33If that node's next pointer is equal to null, add a new node here.
    • 1:36:42But this is not null.
    • 1:36:43I want to follow the bread crumbs to here.
    • 1:36:46And then, oh, we're at the end of the list.
    • 1:36:48I want to insert this new thing here.
    • 1:36:50So how do express this code actually in C?
    • 1:36:55So if I look back up here, this is the line of code
    • 1:37:01that allocates my left finger here called ptr and initialize it
    • 1:37:05to equal numbers, which is the same as pointing at the first element.
    • 1:37:09It's kind of like Comey was representing first earlier.
    • 1:37:11But now our array is called numbers.
    • 1:37:13Next, what am I doing?
    • 1:37:16Does ptr equal null?
    • 1:37:17Well, no.
    • 1:37:18If my left hand is pointing here, it obviously doesn't equal null.
    • 1:37:21So we don't have to worry yet.
    • 1:37:23Then what do I want to do?
    • 1:37:25If ptr next equals null, well, what does that mean?
    • 1:37:28Well, ptr is here.
    • 1:37:30ptr arrow next means here.
    • 1:37:32Does this equal null in this story?
    • 1:37:35I mean, it literally doesn't.
    • 1:37:37Because null is not written there. null is way down there.
    • 1:37:39So the condition does not pass.
    • 1:37:42So what do I do next?
    • 1:37:44If ptr is equal to null doesn't apply, here's a weird update.
    • 1:37:50ptr gets ptr next.
    • 1:37:52So it's cryptic-looking syntax.
    • 1:37:54But if ptr is pointing here, what is ptr next?
    • 1:37:58That's just this, right?
    • 1:38:00This is n.
    • 1:38:00This is next.
    • 1:38:01Or this is number.
    • 1:38:02This is next.
    • 1:38:03So ptr next is this.
    • 1:38:05So what is this value?
    • 1:38:07Well, this is a pointer pointing here.
    • 1:38:09So that highlighted block of code, ptr equals ptr next,
    • 1:38:13has the effect visually of doing this.
    • 1:38:17Why?
    • 1:38:18If the arrows are a little too magical, just think about these being addresses.
    • 1:38:22If this is saying, the next address is location 100,
    • 1:38:25ptr equals ptr next is like saying, well, this also equals 100.
    • 1:38:30Whatever 100 is, for instance, over here is
    • 1:38:33what both arrows should now point out.
    • 1:38:35And if you now repeat this process and repeat this process,
    • 1:38:38eventually that question we asked earlier is going to apply--
    • 1:38:41if ptr next equals null, what do I want to do?
    • 1:38:46Well, if ptr x equals null, there's two lines going on. ptr next equals n.
    • 1:38:53So ptr next is no longer null.
    • 1:38:56It should instead be pointing at n, which is the new node.
    • 1:39:00And then that's it.
    • 1:39:02Because this was already initialized to null.
    • 1:39:04And let's suppose this was 55.
    • 1:39:06And we're done.
    • 1:39:07So much easier to do, obviously, in person with just humans,
    • 1:39:09and moving around, and pointing with their left hands.
    • 1:39:12But in code, you just have to think about the basic building blocks.
    • 1:39:16What is each of these values?
    • 1:39:17Where is each of it pointing?
    • 1:39:20And which of those fields do you need to update?
    • 1:39:22And the only new code here-- even though we're kind of combining it all in one
    • 1:39:25massive example--
    • 1:39:26is this.
    • 1:39:27We are actually using arrow notation to say, go to that address
    • 1:39:31and access some value therein.
    • 1:39:34And this condition down here, which I'll wave my hand out for now,
    • 1:39:37just handles this situation where the list is initially empty.
    • 1:39:41Any questions on this thus far?
    • 1:39:45All right.
    • 1:39:46So let's take a look more graphically at some final problems we can solve.
    • 1:39:52And what you'll see in the days ahead is the following
    • 1:39:56when it comes to these linked lists and more.
    • 1:39:58We now have the ability to actually allocate things in memory dynamically.
    • 1:40:01We don't necessarily know in advance how many numbers we have
    • 1:40:04or, in the case of the next problem set, how many words we have.
    • 1:40:06We have the ability though to use malloc, and maybe even realloc,
    • 1:40:10to grow and grow our data structure in memory.
    • 1:40:12And we have the ability in code to actually
    • 1:40:14traverse those values in such a way that we
    • 1:40:17can access memory that's all over the board now
    • 1:40:20and not necessarily back to back to back.
    • 1:40:22But what happens if we want to combine these ideas into fancier solutions
    • 1:40:26still?
    • 1:40:27Well, let's take a look at that.
    • 1:40:29In particular, if I go let's say over here to the following,
    • 1:40:35let's consider a problem we might now solve.
    • 1:40:38If I wanted to store everyone's name in this room in a data structure,
    • 1:40:42I could do what?
    • 1:40:44Well, we could use an array.
    • 1:40:46So I could actually decide how many people are in the room--
    • 1:40:48let's call it n--
    • 1:40:49and actually draw n boxes on the board, and then iteratively ask
    • 1:40:52everyone for their name, and actually write it down.
    • 1:40:55If I then wanted to take attendance thereafter and say, oh, is Alice here,
    • 1:40:59or is Bob here, or is Kareem here, or Brian,
    • 1:41:02I could just look through that array and say yes or no, that human is here.
    • 1:41:05But what's the running time of that algorithm?
    • 1:41:08How long would it take to look up a name in a data structure
    • 1:41:11where I've just drawn it as an array, a big list on the board?
    • 1:41:14AUDIENCE: A big O of n.
    • 1:41:15DAVID J. MALAN: What's that?
    • 1:41:15AUDIENCE: A big O of n.
    • 1:41:16DAVID J. MALAN: A big O of n, right?
    • 1:41:17Because if it's just a list of names, it's going to take big 0 of n.
    • 1:41:20And frankly, that seems a little slow.
    • 1:41:22How could I do an optimization?
    • 1:41:24Well, what if we combined some of these ideas?
    • 1:41:26Arrays are nice because they give me random sort
    • 1:41:28of instant access to memory locations.
    • 1:41:31But linked lists are nice because they allow me to dynamically add or subtract
    • 1:41:35elements even if I want from the list.
    • 1:41:38So you know what?
    • 1:41:38Instead of writing down everyone's names, like Alice, and Bob,
    • 1:41:44and Charlie, like this in just one big array of some fixed size that might
    • 1:41:52paint me into a corner-- now I only have room for one more name--
    • 1:41:55what if I instead do things a little more cleverly?
    • 1:41:57So when I'm actually jotting down everyone's name in the room, what
    • 1:42:01if I instead did, OK, is Alice here.
    • 1:42:04All right.
    • 1:42:04Alice is here.
    • 1:42:06And then Brian is here.
    • 1:42:07I'm going to put Brian here.
    • 1:42:09And then maybe Charlie is here.
    • 1:42:11All right.
    • 1:42:11So Charlie.
    • 1:42:13And then maybe Arnold is here.
    • 1:42:16Where should I put Arnold?
    • 1:42:17So also starts with A. You know what?
    • 1:42:19Let's just put Arnold here.
    • 1:42:21Arnold.
    • 1:42:23And Abby is here.
    • 1:42:24So you know what?
    • 1:42:25Let's just put Abby up here as well.
    • 1:42:27Bob came as well.
    • 1:42:29So Bob-- so what's the pattern I'm obviously following
    • 1:42:31as I'm hearing names called out?
    • 1:42:35AUDIENCE: Alphabetically sorted.
    • 1:42:36DAVID J. MALAN: Alphabetically sorted--
    • 1:42:37kind of.
    • 1:42:38Like, Abby kind of ended up in a weird place here.
    • 1:42:41But that's fine because I didn't hear her name first.
    • 1:42:44But I did kind of bucketize people into different rows of the board.
    • 1:42:49In other words, all of the A names I seem
    • 1:42:50to just write down for convenience at the top,
    • 1:42:53and then all of the B names together, and C names.
    • 1:42:54And probably if I kept going, I could do this all the way
    • 1:42:56through Z in the English alphabet.
    • 1:42:58So what's nice about this is that, yeah, I'm making lists of names,
    • 1:43:01but how long is each of those lists?
    • 1:43:03If there's n people in the room, each of my lists
    • 1:43:06is not going to be n long, which is slow.
    • 1:43:09It's going to be what? n divided by 26, give or take.
    • 1:43:14If we assume that there's an equal number of people
    • 1:43:16with Z names and A names, it's going to be roughly n divided by 26 so
    • 1:43:20that I have these chains of human names, but they're
    • 1:43:23much shorter than they would have been if I just grouped everyone together.
    • 1:43:26And this is a fundamental technique in programming called hashing.
    • 1:43:31It turns out there are things in this world called hash functions.
    • 1:43:34These are just mathematical, or verbal, or code-implemented functions
    • 1:43:39that take as input something and produce as output a number typically-- a number
    • 1:43:43from 0 to, say, 25, or from 1 to 26.
    • 1:43:46But they can also output strings in other contexts as well.
    • 1:43:49So my hash function here in my mind is, if you hand me a name,
    • 1:43:53I'm going to look at the first letter in your name.
    • 1:43:55And if it's A, I'm putting you in location 0.
    • 1:43:57If it's B, I'm going to put you in location 1.
    • 1:44:00If it's a Z, I'm going to put you in location 25 at the end.
    • 1:44:03So these are all buckets I've got, so to speak,
    • 1:44:05in computer science-- like 26 buckets or room
    • 1:44:08on the board that represent the starts of people's names.
    • 1:44:11So what is that?
    • 1:44:12Well, it would seem that if I don't know in advance how many A names I have,
    • 1:44:16that's kind of like drawing this as a linked list, if you will,
    • 1:44:19that might just get longer and longer.
    • 1:44:22But I do know that I only have a finite number of first letters.
    • 1:44:27So that-- at the risk of drawing a little messily--
    • 1:44:30is kind of like drawing what data structure?
    • 1:44:32AUDIENCE: An array.
    • 1:44:33DAVID J. MALAN: Yeah.
    • 1:44:34It's kind of like drawing an array that just has 26 spots.
    • 1:44:39And what's nice about an array is that I have random access.
    • 1:44:42I can jump right to any letter of the alphabet in constant time, one step.
    • 1:44:47And once I get there, I'm still going to see a list of names.
    • 1:44:50Thankfully, thanks to linked lists, that list can be short or long.
    • 1:44:54But on average, let's say it's going to be
    • 1:44:55126th the length that it would have been if I just used one array or one linked
    • 1:45:02list.
    • 1:45:02So this technique of using a hash function-- which, again,
    • 1:45:06I've defined as you give me a name; I take that as input;
    • 1:45:09I look at the first letter; and I return as output a number from 0 to 25--
    • 1:45:13a hash function lets you create a hash table.
    • 1:45:17And there's different ways to implement hash tables,
    • 1:45:19but perhaps one of the most common is indeed like this.
    • 1:45:22You decide in advance on the size of an array.
    • 1:45:26But that array does not contain the strings or the humans' names.
    • 1:45:30That array actually contains linked lists.
    • 1:45:34And it's the linked lists that contain the names.
    • 1:45:37So we borrow ideas from, like, week two.
    • 1:45:39We merge them with an idea today from week four of adding arrays
    • 1:45:42to linked list respectively.
    • 1:45:44And we kind of get the best of both worlds.
    • 1:45:46Because I can immediately jump to any letter of the alphabet super fast.
    • 1:45:49And once I'm there, yeah, there's a list,
    • 1:45:51but it's not nearly as long as it would have been if I didn't use this trick.
    • 1:45:55So what's the running time of all of this?
    • 1:45:57Well, it turns out that a hash table in the worst case
    • 1:46:01might still take you how many steps to find someone's name once it's
    • 1:46:04been added to the list?
    • 1:46:06In the very worst case, how many steps, if there's n people in the room?
    • 1:46:09AUDIENCE: n.
    • 1:46:10DAVID J. MALAN: Maybe n.
    • 1:46:11Why?
    • 1:46:12It's kind of a perverse situation.
    • 1:46:14But can you contrive a scenario in which,
    • 1:46:17even though we're doing this fanciness, it still
    • 1:46:19takes me n steps to confirm or deny that someone's here?
    • 1:46:21Yeah?
    • 1:46:21AUDIENCE: Everyone's name starts with the same letter.
    • 1:46:23DAVID J. MALAN: Everyone's name starts with the same letter
    • 1:46:25for some weird reason.
    • 1:46:26Now, it's a little silly in the human world.
    • 1:46:28But it could happen if you're just talking
    • 1:46:29data or whatever in the computer world.
    • 1:46:32This can devolve into, sure, an array with just one really linked list.
    • 1:46:37But in practice, that's not likely going to happen, right?
    • 1:46:39If we actually spent the time here and asked everyone for their name,
    • 1:46:42we'd probably get a reasonably uniform distribution of letters,
    • 1:46:46at least as is statistically likely with just human names.
    • 1:46:49So that would kind of spread things out.
    • 1:46:51And so there's this fundamental distinction between sort of real-world
    • 1:46:55running time, or wall clock time-- how many seconds are actually spinning
    • 1:46:58on the clock--
    • 1:46:59versus asymptotic running time.
    • 1:47:00We've talked for a couple of weeks now about running time as being big O of n.
    • 1:47:04And that might be still the case, that a hash table-- yes, in the worst case,
    • 1:47:08it's still a big O of n data structure.
    • 1:47:10Because in the worst case, it's going to take n steps.
    • 1:47:12But in the real world, big O of n is really big O of n divided by 26,
    • 1:47:18even though we always ignore those lower-order terms.
    • 1:47:21But when it's you, the human, running the code and analyzing the data,
    • 1:47:24running 26 times faster is actually real time saved,
    • 1:47:30even though a mathematician might say, ah, that's the same fundamentally.
    • 1:47:33And indeed, one of the problems ahead for the next problem set
    • 1:47:36is going to be to suss out exactly what the implications are
    • 1:47:39in your own code for actual wall clock running time.
    • 1:47:43And making smarter design decisions, like something like this,
    • 1:47:46can actually really speed up your code to be 26 times as fast, even
    • 1:47:50though, yes, a theoretician would say, ah,
    • 1:47:52but that's still asymptotically or mathematically
    • 1:47:55equivalent to just something linear.
    • 1:47:58So it's this fine tuning that will make your code even better and better.
    • 1:48:01Now, frankly, hashing on first names probably
    • 1:48:03isn't the smartest thing alone, right?
    • 1:48:05Like, does anyone's-- and this is going to be hard.
    • 1:48:09Does anyone's name start with X here?
    • 1:48:11AUDIENCE: [INAUDIBLE].
    • 1:48:12DAVID J. MALAN: [INAUDIBLE] is not here.
    • 1:48:13But thank you for that perfect counter-example.
    • 1:48:15But she's not here.
    • 1:48:16So look, there's no Zs.
    • 1:48:17So now we're down to 25 possible values.
    • 1:48:19And I could probably pick some less common letters too.
    • 1:48:22The point is there's probably a few more As than there are Zs
    • 1:48:25or a few more B's than there are Q's just by nature of human names.
    • 1:48:28So maybe just using the first letter isn't good enough.
    • 1:48:32And frankly, with 26 names-- suppose we did this for all of Harvard
    • 1:48:35and had thousands of names.
    • 1:48:37Each of my chains might still have hundreds or thousands of names.
    • 1:48:40So another design question is going to be, well, how many buckets should you
    • 1:48:43have, how big should the array be.
    • 1:48:45Maybe you shouldn't look at the first letter.
    • 1:48:47What if you look at the first and the second letter together-- so AA, and AB,
    • 1:48:50and AC, and then dot dot dot, BA, BB, BC, so you could come up
    • 1:48:55with more and more buckets?
    • 1:48:57But what else?
    • 1:48:57How else might we kind of uniformly distribute people?
    • 1:49:01What do all of you have that we could use as input to a hash function?
    • 1:49:06AUDIENCE: A last name.
    • 1:49:07DAVID J. MALAN: OK.
    • 1:49:07Well, you could do last name, which might give us
    • 1:49:09a different or similar distribution.
    • 1:49:11Yeah?
    • 1:49:11AUDIENCE: ID number.
    • 1:49:12DAVID J. MALAN: Whats that?
    • 1:49:12AUDIENCE: ID number.
    • 1:49:13DAVID J. MALAN: Yeah.
    • 1:49:13We could use your ID number and actually look at the first digit of your ID.
    • 1:49:17And odds are, it's 0 through 9.
    • 1:49:19So we could probably at least get 10 buckets that way.
    • 1:49:21And that's probably uniformly distributed.
    • 1:49:23I'm not sure.
    • 1:49:24We could use birth dates in some way.
    • 1:49:27Like, we could put all of the freshmen in one bucket,
    • 1:49:29all the seniors in another bucket, and everyone else,
    • 1:49:31and so forth, in their own buckets, which would also give us some input.
    • 1:49:34So again, a hash function is entirely up to you to program and design.
    • 1:49:38The goal though is to smooth things out.
    • 1:49:41You want to have roughly the same number of things in each linked list
    • 1:49:44just so that you have about the same performance
    • 1:49:48across all of these various inputs.
    • 1:49:50So let's take a look at a couple of other data structures,
    • 1:49:53again, in this abstract way.
    • 1:49:54Now that we know that, even though it's not obvious at first attempt,
    • 1:49:58we know how to construct arrays.
    • 1:49:59We kind of know now how to construct linked lists.
    • 1:50:02It stands to reason we could implement them together in code.
    • 1:50:05What else could we do now with these building blocks?
    • 1:50:08So for instance, this structure here is a very common one, known as a tree.
    • 1:50:13A tree like a family tree, where there's one patriarch or matriarch
    • 1:50:16at the top, and then their children, and then their grandchildren,
    • 1:50:19and great grandchildren, and so forth.
    • 1:50:21And what's nice about a tree structure is that, if you're storing data,
    • 1:50:25you can actually store the data in clever ways to the left child,
    • 1:50:28to the right child, and so forth, as follows.
    • 1:50:32Notice here, there's something curious about all the numbers in this data
    • 1:50:37structure.
    • 1:50:38What is noteworthy about them?
    • 1:50:44What is noteworthy?
    • 1:50:45Yeah?
    • 1:50:45AUDIENCE: Multiples of 11.
    • 1:50:46DAVID J. MALAN: What's that?
    • 1:50:47AUDIENCE: They're multiples of 11.
    • 1:50:48DAVID J. MALAN: They are multiples of 11.
    • 1:50:50That was just to make them look pretty though by the author here.
    • 1:50:53Yeah?
    • 1:50:53AUDIENCE: [INAUDIBLE].
    • 1:50:55DAVID J. MALAN: Yeah.
    • 1:50:56There's a mathematical significance too.
    • 1:50:58Like, no matter what node or circle you look at, the value in it
    • 1:51:02is bigger than the left child and it's smaller than the right child.
    • 1:51:08So it's kind of in-between.
    • 1:51:09Any circle you look at, the number to the left is smaller,
    • 1:51:11the number to the right is bigger.
    • 1:51:13And I think that applies universally all over the place.
    • 1:51:15Yes?
    • 1:51:16So what does that mean?
    • 1:51:17We'll recall from, like, week 0 when we had a whole bunch of phone book pages
    • 1:51:23that we were searching--
    • 1:51:241, 2, 3, 4, 5, 6.
    • 1:51:26Let's give ourselves a 7th one.
    • 1:51:27Recall that when we did divide and conquer, or binary search,
    • 1:51:30we did it on an array.
    • 1:51:31And what was nice about binary search was we started in the middle,
    • 1:51:34and then we maybe went left, or we maybe went right,
    • 1:51:36and we kind of divided and divided and divided
    • 1:51:38and conquered the problem much more efficiently in logarithmic time
    • 1:51:41than it would have been if we did it linearly.
    • 1:51:44But we know now weeks later that arrays are kind of limiting, right?
    • 1:51:48If I keep storing all of my values in an array,
    • 1:51:51what can I not do with the array?
    • 1:51:56Make it bigger, right?
    • 1:51:57I can't add an element to it without copying every darn element,
    • 1:52:00as we've discussed thus far today.
    • 1:52:02But what if I was a little smarter about it?
    • 1:52:04What if I stored my values, not just in an array,
    • 1:52:07but I started storing them in these circles--
    • 1:52:10let's call them nodes--
    • 1:52:12and each of those nodes is really just an integer plus two additional values?
    • 1:52:18How would we implement this data structure in memory?
    • 1:52:20Well, here's an int n-- could represent the number in question.
    • 1:52:24And we could put that in a data structure
    • 1:52:26called a node that just has the same syntax as earlier today,
    • 1:52:29but I've left room for two more fields.
    • 1:52:31What is it that I want to represent in code if I
    • 1:52:34want to start storing my numbers, not in this old-school week 0 array,
    • 1:52:38but in a tree?
    • 1:52:42AUDIENCE: Two pointers.
    • 1:52:43DAVID J. MALAN: Two--
    • 1:52:43AUDIENCE: Pointers.
    • 1:52:44DAVID J. MALAN: Two pointers.
    • 1:52:44Right?
    • 1:52:45A tree, as drawn here literally with arrows,
    • 1:52:49is just like saying every one of these nodes or circles
    • 1:52:51has a left child and a right child.
    • 1:52:53How do you implement children?
    • 1:52:55Well, you can literally just use pointer notation as well here.
    • 1:52:58A left child is just a pointer to another struct on the left.
    • 1:53:01And a right child is just another pointer to the child on the right.
    • 1:53:05And what's nice about this ultimately is that we can now
    • 1:53:08traverse this tree just as efficiently as we can traverse this array.
    • 1:53:14Because notice if I want to search for the number 66,
    • 1:53:18how many steps does it take me if I start at the top?
    • 1:53:23Just like Comey represented the start of our linked list,
    • 1:53:26so in the world of a tree does the root have special significance.
    • 1:53:29And that's where we always begin.
    • 1:53:31So how many steps does it take me to find 66 given the top?
    • 1:53:33AUDIENCE: Three.
    • 1:53:34AUDIENCE: Two.
    • 1:53:35DAVID J. MALAN: It looks like--
    • 1:53:36yeah, two or three, right?
    • 1:53:37I start at the top.
    • 1:53:38I look at it and say, hmm, 55, which way do I go.
    • 1:53:41I go to the right.
    • 1:53:42Then I see 77.
    • 1:53:43OK.
    • 1:53:43Which way do I go?
    • 1:53:44I go to the left.
    • 1:53:45So it's the same logic as week 0 in dividing and conquering the phone book
    • 1:53:49or an array a couple of weeks later.
    • 1:53:51But we get to the number we care about pretty quickly.
    • 1:53:54And it's not linear.
    • 1:53:55And in fact, if we actually did out the math, what's
    • 1:53:57really cool about a binary search tree is that if you have n elements,
    • 1:54:00n circles, the height of that tree is by definition mathematically log n.
    • 1:54:06So the height of the tree just so happens
    • 1:54:09to correspond to exactly how many times you can take n
    • 1:54:12and divide it, divide it, divide it, divide it in two.
    • 1:54:15And you can actually see this if you think about it the reverse direction.
    • 1:54:18On the bottom row, there are how many elements?
    • 1:54:21All right?
    • 1:54:21And on the middle row, there is?
    • 1:54:23AUDIENCE: Two.
    • 1:54:23DAVID J. MALAN: Two.
    • 1:54:24And on the top row, there's one.
    • 1:54:26So you can actually see it in the reverse direction.
    • 1:54:27This is like divide and conquer, but in a different conceptual way.
    • 1:54:33Every row in the tree has half as many elements as the one below it.
    • 1:54:38And so the implication of that is just like from week 0 in the phone book
    • 1:54:41when we're dividing, and dividing, and dividing in half, and half, and half.
    • 1:54:45So this is only to say, now that we have structures and pointers,
    • 1:54:48we can build something like this.
    • 1:54:50But let's try one other example here too.
    • 1:54:53This is a crazy looking example.
    • 1:54:55But it's kind of amazing.
    • 1:54:57Suppose that, if we wanted to store a dictionary of words--
    • 1:55:01so not humans' names this time, but English words.
    • 1:55:04So Merriam Webster or Oxford English Dictionary has what?
    • 1:55:06Thousands, hundreds of thousands of words
    • 1:55:08these days in English for instance?
    • 1:55:10How do you actually store those?
    • 1:55:12Well, if you just look up words in a dictionary back in yesteryear,
    • 1:55:15that is linear.
    • 1:55:16You have to start at the beginning and look through it
    • 1:55:18page by page, looking for words.
    • 1:55:19Or you could be a little smarter.
    • 1:55:21Because the words in any dictionary are hopefully alphabetized,
    • 1:55:23you can do the Mike Smith-style divide and conquer by going to the middle,
    • 1:55:27then the middle of the middle, and so forth--
    • 1:55:29log of n.
    • 1:55:30But what if I told you, you could look up words in constant time--
    • 1:55:34some fixed number of steps?
    • 1:55:36None of this divide and conquer complexity.
    • 1:55:38No log n.
    • 1:55:39Just constant time-- you want a word, go get it instantly.
    • 1:55:43That's where this last structure comes in, which is called a trie--
    • 1:55:46T-R-I-E-- short for retrieval, even though it's pronounced the opposite.
    • 1:55:51So a trie is a tree each of whose nodes is an array.
    • 1:55:57So it's like this weird Frankenstein's monster kind of data structure.
    • 1:56:00We're just really combining lots of different ideas, as follows.
    • 1:56:04And the way a trie works, as is implied by this partial diagram on the board,
    • 1:56:10is that if you want to store the name Brian, for instance,
    • 1:56:14in your dictionary-- it's the first word--
    • 1:56:15what you do is you start by creating a tree with just one node.
    • 1:56:19But that node is effectively an array.
    • 1:56:22That array is of size, let's say for simplicity, 26.
    • 1:56:26So A through Z. This location here therefore represents B for Brian.
    • 1:56:32So if I want to insert Brian into this tree, I create one node at the top.
    • 1:56:37And then for the second letter in his name, R,
    • 1:56:39I create another node, also an array, A through Z.
    • 1:56:44And so here, I put a pointer to this node here.
    • 1:56:48B-R-I. So I should have drawn some more boxes.
    • 1:56:51A, B, C, D, E, F, G, H, I. So here, I'm going to draw another pointer to B--
    • 1:57:01wait.
    • 1:57:02Bian.
    • 1:57:02[LAUGHTER]
    • 1:57:04OK.
    • 1:57:04That's wrong.
    • 1:57:07Billy shall be our name.
    • 1:57:09Billy is at B. Wait.
    • 1:57:12No.
    • 1:57:14Dammit.
    • 1:57:15B, B. B-I-A-- yes, this works.
    • 1:57:18This works.
    • 1:57:19OK.
    • 1:57:19Sorry.
    • 1:57:20So here we go.
    • 1:57:20We're inserting Billy into this fancy data structure.
    • 1:57:23So the first node represents the first letter.
    • 1:57:25The second node represents the second letter.
    • 1:57:27The third node represents the third letter.
    • 1:57:29And so forth.
    • 1:57:30But what's cool about this is the re-usability.
    • 1:57:32So notice if this is the second letter and I counted this out correctly,
    • 1:57:36I, this is going to lead to a third node deeper
    • 1:57:39in the tree where it's L that we care about next, and then another one
    • 1:57:44down here which represents another L.
    • 1:57:47And I'll start drawing the letters.
    • 1:57:49L. This is B. This is I. L. And we'll call this L.
    • 1:57:53And then, finally, another one over here, which is a Y. And this
    • 1:57:59gets pointing down here.
    • 1:58:00This gets pointing here.
    • 1:58:02And so forth.
    • 1:58:03So in short, we have one node essentially
    • 1:58:06for every letter in the word that we're inserting into the data structure.
    • 1:58:11Now, this looks stupidly inefficient at the moment.
    • 1:58:14Because to store B, I, L, L, Y, how much memory did I just use?
    • 1:58:2026 plus 26 plus 26 plus 26 plus 26.
    • 1:58:26Just to store five characters, I use 26 times 5.
    • 1:58:30But this is kind of thematic in computer science--
    • 1:58:33spend a little more space, and I bet I can decrease the amount of time
    • 1:58:36it takes to find anyone.
    • 1:58:37Because now no matter how many other students are in this data structure--
    • 1:58:42and for instance, let's do another one.
    • 1:58:44If we had another one, like Bob--
    • 1:58:48so B is the same first letter.
    • 1:58:51That leads us to this second node.
    • 1:58:52O is somewhere else in this array, say, over here.
    • 1:58:57So this represents O. And then Bob has another one.
    • 1:59:00So there's going to be another array here.
    • 1:59:02And this is why the picture above draws this so succinctly.
    • 1:59:06This is how we might store Bob.
    • 1:59:08So B, I, L, L, Y. Or you can follow a different route, B, O, B.
    • 1:59:16So we can start to reuse some of these arrays.
    • 1:59:19So there's where you start to get some of the efficiency.
    • 1:59:21Any time names share a few letters, then you start reusing those same nodes.
    • 1:59:25So it's not super, super wasteful.
    • 1:59:27But the question now is, if there's like 1,000 students in the class,
    • 1:59:30or 1,000 students in the room, we're going have a lot of nodes
    • 1:59:33there on the board.
    • 1:59:34But how many steps does it take to find Billy,
    • 1:59:38or Bob, or any name with this data structure, and to conclude yes or no
    • 1:59:44that student is in the class?
    • 1:59:48So, like, five for Billy, three for Bob.
    • 1:59:51And notice none of that math has any relationship
    • 1:59:56to how many students are in the room.
    • 1:59:58If we instead wrote out a long list of 1,000 names, in the worst case,
    • 2:00:02it might take me 1,000 steps to find Billy or Bob.
    • 2:00:04Maybe I could be a little smarter if I sort it.
    • 2:00:06But in the worst case, big O of n, it's linear.
    • 2:00:08Or if I used a hash table before, and maybe there's
    • 2:00:121,000 students in the room, but, OK, there's
    • 2:00:1426 letters in the English alphabet at least.
    • 2:00:16So that's 26 buckets.
    • 2:00:17So maybe it's 1,000 divided by 26, worst case,
    • 2:00:19if I'm using those linked lists inside my array.
    • 2:00:23But wait a minute.
    • 2:00:24If I'm using this structure, a trie, where every node in the tree
    • 2:00:28is just in an array that leads me to the next node, ala breadcrumbs, B, I, L, L,
    • 2:00:34Y is 5 and always 5.
    • 2:00:36B, O, B is always 3.
    • 2:00:38B, R, I, A, N would have been 5 as well.
    • 2:00:41None of these totals has any impact or any influence
    • 2:00:46from the number of total names in the data structure.
    • 2:00:49So a trie in some sense is this amazing holy grail
    • 2:00:54in that, by combining these various data structures, now you get constant time,
    • 2:00:57but you do pay a price.
    • 2:00:59And just to be clear, what is the price we seem to be paying?
    • 2:01:02AUDIENCE: Memory.
    • 2:01:03DAVID J. MALAN: Memory.
    • 2:01:04And in fact, this is why I'm not really drawing it much more.
    • 2:01:07Because it just becomes a big mess on the screen because it's
    • 2:01:09hard to draw such wide data structures.
    • 2:01:11It's taking a huge amount of memory.
    • 2:01:13But theoretically, it's coming faster.
    • 2:01:15Yeah?
    • 2:01:16Question.
    • 2:01:17AUDIENCE: So would you deal with a case if someone is in the Bob,
    • 2:01:20but then the other kid is in the Bobby?
    • 2:01:22DAVID J. MALAN: Good question.
    • 2:01:23So it's a bit of a simplification.
    • 2:01:25If you were storing both Bob and Bobby, you would actually keep going.
    • 2:01:28So each of these elements is not just one letter.
    • 2:01:31You also have essentially a node there or some other data structure
    • 2:01:35that says either stop here or continue.
    • 2:01:37And you'll see actually in the problems that we'll
    • 2:01:39propose to you how you can represent that idea if you
    • 2:01:42choose to go this route.
    • 2:01:43Indeed, the challenge ahead ultimately is something quite like this.
    • 2:01:46You will implement your very own spell checker.
    • 2:01:48And we will give you code that gets you started with this process.
    • 2:01:51And of course, a spell checker these days in Google Docs
    • 2:01:53and Microsoft Word just underlines in red misspelled words.
    • 2:01:56But what's going on?
    • 2:01:57And how is it that Word or Google Docs can
    • 2:01:59spell check your English or whatever language so quickly?
    • 2:02:02Well, it has a dictionary in memory, probably with tens of thousands
    • 2:02:05or hundreds of thousands of words.
    • 2:02:07And all they're doing constantly is, every time you type a word
    • 2:02:10and hit the Spacebar, or Period, or Enter,
    • 2:02:13it's quickly looking up that new word or those words in its dictionary
    • 2:02:16and saying, yes or no, should I squiggle a red line underneath this word.
    • 2:02:20And so what we're going to do is give you a big text file, ASCII text,
    • 2:02:23containing 100-plus thousand words.
    • 2:02:26You're going to have to decide how to load those
    • 2:02:28into memory, not just correctly, but in a way that's well designed.
    • 2:02:32And we'll even give you a tool, if you choose to use it,
    • 2:02:34that times how long your code takes.
    • 2:02:36And it even counts how much RAM you're actually using.
    • 2:02:39But the key goals for this week and our final week in C
    • 2:02:42is to take some of these basic building blocks,
    • 2:02:44like arrays, and pointers, and structures,
    • 2:02:48and decide for yourselves how you're most comfortable stitching them
    • 2:02:51together, to what extent you want to really fine tune your code beyond just
    • 2:02:55getting it correct, and to give you a better sense of the underlying code
    • 2:03:00that people have had to write for years in libraries
    • 2:03:02to make programming doable, ala Scratch.
    • 2:03:05Because in just a few weeks, we're going to transition to Python.
    • 2:03:07And the dozens of lines of code you've been writing now
    • 2:03:10are going to be whittled down to one line, two line,
    • 2:03:12because we're going to get a lot more features from these newer,
    • 2:03:15fancier languages.
    • 2:03:16But you'll hopefully have an appreciation of what is actually
    • 2:03:18going on underneath that hood.
    • 2:03:20So I'll stick around for any one-on-one questions.
    • 2:03:22Let's call it a day.
    • 2:03:22Take a duck on your way out for roommates as well.
    • 2:03:24And we'll see you next time.
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video