CS50 Video Player
    • 🧁

    • 🍧

    • 🥝

    • 🍿
    • 0:00:00Introduction
    • 0:00:50Enhance
    • 0:01:41Week 2 Recap
    • 0:05:10CS50 IDE
    • 0:14:24check50
    • 0:18:37debug50
    • 0:26:43Taking Off the Training Wheels
    • 0:27:46compare0.c
    • 0:29:54compare1.c
    • 0:32:09Strings Don't Exist
    • 0:41:40compare2.c
    • 0:47:50char *
    • 0:48:54compare3.c
    • 0:51:09compare4.c
    • 0:57:30compare6.c
    • 0:59:54copy0.c
    • 1:06:18copy1.c
    • 1:10:53copy2.c
    • 1:11:37malloc and free
    • 1:13:18scanf0.c
    • 1:15:36scanf1.c
    • 1:19:26scanf2.c
    • 1:25:07addresses.c
    • 1:30:51Hexadecimal
    • 1:33:38Milk and OJ
    • 1:36:44noswap.c
    • 1:44:51swap.c
    • 1:57:17Pointer Fun with Binky
    • 2:00:16Stack Overflow
    • 2:01:35struct0.c
    • 2:02:39struct1.c
    • 2:04:14struct2.c
    • 2:08:30Outro
    • 0:00:00[MUSIC PLAYING]
    • 0:00:49[VIDEO PLAYBACK]
    • 0:00:50- --we know?
    • 0:00:52- That at 9:15, Ray Santoya was at the ATM.
    • 0:00:56- So the question is, what was he doing at 9:16?
    • 0:00:59- Shooting the nine-millimeter at something.
    • 0:01:02Maybe he saw the sniper.
    • 0:01:04- Or he was working with him.
    • 0:01:06- Right.
    • 0:01:07Go back one.
    • 0:01:08- What do you see?
    • 0:01:17- Bring his face up.
    • 0:01:18Full screen.
    • 0:01:20- His glasses.
    • 0:01:22- There's a reflection.
    • 0:01:33- That's Neuvitas baseball team.
    • 0:01:35That's their logo.
    • 0:01:36- And he's talking to whoever is wearing that jacket.
    • 0:01:39- We may have a witness.
    • 0:01:40- To both shootings.
    • 0:01:41[END PLAYBACK]
    • 0:01:42DAVID MALAN: This is he is CS50, and this is lecture 3, and that
    • 0:01:45is not how computer science works.
    • 0:01:47And indeed, by the end of today, we'll make
    • 0:01:49clear exactly what's right, what's not right about that,
    • 0:01:52and hopefully give you some pause any time you watch TV or movies hereafter
    • 0:01:57and notice these little things that all too many writers seem
    • 0:02:00to take for granted.
    • 0:02:02So recall that last time, we took a look lower level at what compiling actually
    • 0:02:06is.
    • 0:02:07And recall that it was a few things, these four steps of pre-processing
    • 0:02:10and compiling and assembling and linking,
    • 0:02:12so that when you start with their source cod,
    • 0:02:14that might look like this code that we have written in the past,
    • 0:02:17you first have to preprocess it, and the first step in pre-processing was
    • 0:02:20converting all of those processor instructions--
    • 0:02:23anything starting with a hash at the beginning-- to their equivalents.
    • 0:02:26So opening the files and effectively copying and pasting the contents
    • 0:02:29there so that programs and the compiler know what get_string
    • 0:02:32is and know what printf is.
    • 0:02:34The next step that came after that was actually
    • 0:02:36compiling, whereby compiling technically means taking that source
    • 0:02:39code, once it's been preprocessed, and printing and generating
    • 0:02:42this very cryptic-looking stuff called assembly code.
    • 0:02:45And those assembly codes or assembly instructions are really what the CPU--
    • 0:02:50the brain of your computer-- actually understands,
    • 0:02:52although technically the computer understands them only in the form
    • 0:02:55of 0's and 1's.
    • 0:02:57And so when you "assemble-- step three--
    • 0:03:00that assembly code, you actually get out those 0's and 1's.
    • 0:03:02But even that simplest of programs where we just prompt the user for a string
    • 0:03:06and then print out their name still involved a couple more files.
    • 0:03:10There was not only cs50.h and stdio.h at the top,
    • 0:03:15somewhere in the computer system there's probably files called cs50.c,
    • 0:03:20and in the case of stdio, printf.c, in which actually the code is
    • 0:03:25for those two functions, those two have to get compiled down
    • 0:03:28to 0's and 1's, and then we need to link everything together,
    • 0:03:31merging those 0's and 1's so that the computer has access to your code
    • 0:03:35and to printf's code and to the cs50 library's code And so forth.
    • 0:03:39But all of that we can just generally wrap up in the descriptor of compiling.
    • 0:03:43And so that's one of the looks we took last week.
    • 0:03:45And we also have introduced, last week and previously, a few tools.
    • 0:03:49And odds are, you're having as many frustrations perhaps already
    • 0:03:52with the p-sets as you are accomplishments
    • 0:03:54and sense of satisfaction.
    • 0:03:55And that's normal, and rest assured that the scales will eventually tip more
    • 0:03:59toward happiness and away from sadness, but we'll
    • 0:04:01give you indeed more tools today than these for actually finding
    • 0:04:05problems or shortcomings in your code.
    • 0:04:07help50, recall, helps you with what process?
    • 0:04:10When you instinctively consider using help50?
    • 0:04:14When you see error messages on the screen.
    • 0:04:15Something you don't understand that's the result of some mistake you
    • 0:04:18probably made but you don't quite understand what the computer is telling
    • 0:04:21you, run help50, and then that same command and we, the staff,
    • 0:04:24with our code will try to understand the message for you
    • 0:04:26and provide you with feedback.
    • 0:04:28style50 does exactly that.
    • 0:04:30It helps you see with red and green color coding exactly what spaces should
    • 0:04:34be there, shouldn't be there-- it just helps you pretty
    • 0:04:36your code so that you can read it better and other humans can as well.
    • 0:04:39And then printf, which is kind of like the coarsest tool in your tool box,
    • 0:04:44this is just helping you see not only messages you want to see,
    • 0:04:47but just the values of variables.
    • 0:04:49You can print ints and strings, whatever you want,
    • 0:04:51and then you can delete those lines of printf
    • 0:04:54once you're confident your program's working.
    • 0:04:55But that gets a little tedious, and honestly, as our programs get bigger,
    • 0:04:59we're going to want more powerful tools than like manually printing things
    • 0:05:02out, recompiling, rerunning, it very quickly it gets tedious.
    • 0:05:04And the goal of programming is not to be tedious, but to be empowering,
    • 0:05:07and that's where we'll step to today via this.
    • 0:05:10So CS50 IDE is sort of fancier version of what
    • 0:05:14you've been using called CS50 Sandbox, and in turn, CS50 Lab.
    • 0:05:18Now recall that both of those tools, the Sandbox and the Lab,
    • 0:05:21have a terminal window where you can type commands,
    • 0:05:23they have a code editor where you can actually write your code,
    • 0:05:29and then they have a file browser with icons and such
    • 0:05:31where you can actually see your files and folders.
    • 0:05:34So it turns out that CS50 IDE is another tool that at first glance
    • 0:05:38is very, very similar, even though it's laid out a little differently,
    • 0:05:41but it has as many features as the Sandbox and the Lab, but some more.
    • 0:05:45More features that actually help you solve problems in your code
    • 0:05:49and even collaborate come final project time with others if you would like.
    • 0:05:53So this we'll see is this is the CS50 IDE.
    • 0:05:54It comes with the so-called night mode so you
    • 0:05:56can make everything a little darker on your screen, especially if p-setting
    • 0:05:58at night, and let's actually take a look then
    • 0:06:00at what you can do with this kind of tool.
    • 0:06:04When you log into this tool for the very first time in the next problem set,
    • 0:06:08you'll see an interface that's almost the same as before.
    • 0:06:10The colors are a little different, the font sizes are a little different,
    • 0:06:13but at the bottom by default, you have your so-called terminal window,
    • 0:06:16though instead of the dollar sign now, you'll
    • 0:06:17see a little more detailed workspace, but more on that in a bit.
    • 0:06:21Up here you just have the code editor window,
    • 0:06:23nothing's really going on there.
    • 0:06:25And then we have the added feature of Ceiling Cat
    • 0:06:27in the top right-hand corner.
    • 0:06:29And we'll also see some other features along the way.
    • 0:06:31So let's actually write a program in CS50 IDE, which, to be clear,
    • 0:06:35is just another web-based programming environment that also gives you
    • 0:06:39access to your own cloud-based server.
    • 0:06:42It, too, is running Ubuntu Linux, which is a popular operating system that
    • 0:06:45is not macOS and it's not Windows.
    • 0:06:47But unlike the sandbox environment where you don't even log in
    • 0:06:51and you lose your files eventually, as you
    • 0:06:53may know from when your cookies are lost or something goes wrong,
    • 0:06:56the IDE saves everything.
    • 0:06:57And you'll log in with your account, and whatever
    • 0:06:59you put there last week is going to be there this week and next week
    • 0:07:02and beyond.
    • 0:07:03So let me go ahead up to File, New File, or I could just click this little plus
    • 0:07:07icon in the top right-hand corner, and let me go ahead and preemptively hit
    • 0:07:10Control-S or Command-S or go to File, Save--
    • 0:07:13you should find the interface very similar to any Mac or PC program--
    • 0:07:17and let me go ahead and save this file as follows.
    • 0:07:20I'm going to call this hello.c.
    • 0:07:23And it's important to mention the file extension,
    • 0:07:25otherwise the IDE, like the Sandbox and the Lab,
    • 0:07:27won't know what type of program you're writing.
    • 0:07:29And then let me go ahead and just write my simplest of programs.
    • 0:07:32So let me go ahead and include stdio.h, int main void.
    • 0:07:37Let me go ahead and open my curly braces, printf--
    • 0:07:40hello, world, backslash n, and a semi-colon.
    • 0:07:43So you'll notice that almost everything is the same.
    • 0:07:46The colors are a little different, perhaps,
    • 0:07:48and you might see some different assistive
    • 0:07:50features as you're typing your code, but the end result is the same.
    • 0:07:53And the color coding you just get for free because it's helping
    • 0:07:55draw your attention to different parts of the code.
    • 0:07:57Let me go ahead now and--
    • 0:07:59oh notice this.
    • 0:08:00There's one difference.
    • 0:08:01The IDE is a more powerful tool, but as such, it's a more manual tool
    • 0:08:05and it's not just going to auto-save your code for you.
    • 0:08:07Nice as that's been with the Sandbox, such that you'd never
    • 0:08:10actually had the hit Command-S or Control-S--
    • 0:08:12and if you were, you didn't need to be, the IDE
    • 0:08:14is only going to save things when you want it to so that nothing
    • 0:08:18will happen magically anymore.
    • 0:08:20So what I'm going to have to do is go back up here, File, Save, or Command-S
    • 0:08:24or Control-S, you'll see a little green dot
    • 0:08:26briefly, and now and back at my prompt.
    • 0:08:29I'm going to go ahead now and type my familiar command, make hello, Enter,
    • 0:08:33and you'll see pretty much the same cryptic-looking client
    • 0:08:36command as before because the IDE is configured quite like the Sandbox.
    • 0:08:40And if I want to go ahead and run this now, how do I run this program?
    • 0:08:44Quick check?
    • 0:08:46./hello, it's exactly the same as before.
    • 0:08:48./hello, and there we have it, hello, world.
    • 0:08:51So long story short, the user interface thus far is a little different,
    • 0:08:54but functionally it's the same.
    • 0:08:56We're just going to now start to see some more features.
    • 0:08:58So what are those features?
    • 0:08:59And let's introduce new some capabilities that were actually
    • 0:09:02possible in the Sandbox, we just didn't really introduce them at the time.
    • 0:09:05If I click this folder icon at top left, you'll see all of my files and folders.
    • 0:09:09And today for lecture I have a lot of pre-made examples
    • 0:09:12that are already on the course's website, some of which we'll look at,
    • 0:09:14some of which we'll refer to the website,
    • 0:09:16but these are just familiar files and folders.
    • 0:09:18And you can see that everything in my account
    • 0:09:21is apparently in something called Workspace, which
    • 0:09:23is just a folder, name, or a directory.
    • 0:09:26Here's my sc3 directory, which again, comes
    • 0:09:28from the website for today's lecture, lecture 3.
    • 0:09:30And then here's the file I just compiled in the program and the file
    • 0:09:33that I wrote, hello.c.
    • 0:09:35You'll notice too that there's this funky symbol here, tilde,
    • 0:09:38that you might not have occasion to write often in English,
    • 0:09:41but in Spanish in other languages you might use this character.
    • 0:09:44This is actually a shorthand notation for what's called your home directory.
    • 0:09:48In this environment, CS50 IDE, you have your own home directory, which
    • 0:09:52means your folder of files and other folders that you get to create,
    • 0:09:55you own, and that persists every time you log in-- you're not
    • 0:09:59going to lose the contents therein.
    • 0:10:00So this just means that in your home directory, a.k.a. tilde,
    • 0:10:05there is a folder called workspace in which I'm currently working.
    • 0:10:09And that's just one folder in which all of my work is going to be done,
    • 0:10:12because there's so many other files and folders in this cloud environment,
    • 0:10:15just like there are in your Mac and PC, we just generally
    • 0:10:17don't care what they are.
    • 0:10:19But notice what we can do at this terminal window besides compile
    • 0:10:25and run code.
    • 0:10:26There are other commands.
    • 0:10:27For instance, this blue text here, similarly to the file browser up top,
    • 0:10:33indicates now not just that this is my prompt per the dollar sign,
    • 0:10:37but that in my home directory's workspace directory.
    • 0:10:40So that means I can be elsewhere even though I haven't
    • 0:10:44specified where I want to go yet.
    • 0:10:46And in fact, I can do this. ls stands for list,
    • 0:10:49it's just shorthand notation for that.
    • 0:10:51And now I see a textual version of my file tree, so to speak.
    • 0:10:56So you'll see here, sc3 is a folder, and you
    • 0:10:59can tell as much because there's a slash at the end of it.
    • 0:11:01hello.c is of course the file I wrote a moment ago.
    • 0:11:04And then hello in green is my program that I compiled, and the star
    • 0:11:08or asterisk there is just--
    • 0:11:10it's not the name of the file, it's just indicating to me
    • 0:11:12visually that that is executable.
    • 0:11:14That's a program I can run just so I know what's compiled
    • 0:11:17and what maybe is source code.
    • 0:11:19So when you're running ./hello, the reason all this time this has been
    • 0:11:23working is because in dot, your current folder, there is a file called hello,
    • 0:11:28and when you hit Enter, you are running that program there.
    • 0:11:32So if after today you go back onto CS50 Sandbox or CS50 Lab and type ls,
    • 0:11:36you'll see exactly the same thing as you might by the little folder
    • 0:11:39icon in those programs as well.
    • 0:11:41But suppose I want to go into a directory.
    • 0:11:44In macOS or Windows or even the IDE, I could, of course,
    • 0:11:48go my File icon, and then per the little triangle
    • 0:11:51here, which might seem intuitive, you just click it
    • 0:11:53and you can see what's going on inside, not surprising.
    • 0:11:56But how do you do that textually?
    • 0:11:57At a command prompt, well it's not all that hard.
    • 0:12:00You just need to change your directory.
    • 0:12:02So if I do cd space sc3, Enter, nothing seems to happen quite yet
    • 0:12:08except that my prompt changed.
    • 0:12:10Here's the indication that-- this is my prompt, but to the left of it
    • 0:12:13you see in blue that I'm now in my home directory's workspace folder,
    • 0:12:17in my sc3 folder there.
    • 0:12:19So it's just a text-based version of the GUIs, the Graphical User Interfaces
    • 0:12:23that all of us have certainly come to take
    • 0:12:25for granted in the world of macOS and Windows thus far.
    • 0:12:29Well, suppose that I'm a little done with my hello program
    • 0:12:33and I want to delete it.
    • 0:12:34Well in the IDE, like in the Sandbox, you can actually go up here and you can
    • 0:12:37click on it, and then you can typically right-click or control-click,
    • 0:12:41and you'll get a whole menu of other options, one of which is Delete--
    • 0:12:44and feel free to tinker like that in your own environment.
    • 0:12:46But what about the command line?
    • 0:12:48If I zoom in down here and I want to remove hello, you're
    • 0:12:52not going to type remove because that just feels a little verbose
    • 0:12:55and humans decades ago decided that's too tedious to type,
    • 0:12:58let's just call this command rm--
    • 0:13:00for remove-- hello, you're going to see a somewhat cryptic prompt.
    • 0:13:04rm-- remove regular file 'hello?'
    • 0:13:07This is more arcane than it needs to be, but it's just asking,
    • 0:13:09are you sure you want to delete 'hello?'
    • 0:13:11Then it's just waiting for you.
    • 0:13:12And here you can type y or yes or sometimes other commands too,
    • 0:13:18now I've confirmed that my intentions were yes.
    • 0:13:20If I type ls again, I-- whoops, in the wrong folder.
    • 0:13:23If I type ls again after doing hello--
    • 0:13:27no-- after doing hello and do ls, now I'll
    • 0:13:31see just those two things-- sc3 and hello.c.
    • 0:13:34What if I want to make a folder?
    • 0:13:36Well notice this.
    • 0:13:37If I type at the bottom here, make directory--
    • 0:13:41mkdir-- test just to make a test folder, I'm
    • 0:13:45about to hit Enter, but watch the top left-hand corner
    • 0:13:48where I currently have those other files and folders, and when I hit Enter,
    • 0:13:51now I have a test folder.
    • 0:13:53So these things are identical.
    • 0:13:54One is graphical, one is command line, and there's even other commands
    • 0:13:57if I decide I don't want that.
    • 0:13:59rmdir is remove directory, and it just goes away
    • 0:14:02because it's empty and thus safe.
    • 0:14:04Any questions then on any of those commands
    • 0:14:06or just the overall layout of what it is we're looking at?
    • 0:14:11All right, so don't get hung up on any of those commands,
    • 0:14:13and the problem set and beyond will always
    • 0:14:15remind you of those kinds of features.
    • 0:14:16The point for now is just that we're in a somewhat new environment,
    • 0:14:19but it's fundamentally still the same, it has the same capabilities.
    • 0:14:23So what are other tools we looked at?
    • 0:14:24So you might have heard rumors about a tool called check50, and indeed,
    • 0:14:28this is a tool that the staff use to evaluate problem set 1 and problems set
    • 0:14:312 to evaluate the correctness of them so that we ourselves don't have to type
    • 0:14:35./mario or ./caesar again and again and again to test students' code.
    • 0:14:41But starting this week, you, too, have access to the same program.
    • 0:14:44check50 is a command from the staff that checks the correctness of your code
    • 0:14:48just like style50 checks the style of your code.
    • 0:14:51And in fact, if I go back over to my IDE,
    • 0:14:53let's try to use this for the first time by making the same version of hello
    • 0:14:57that you did perhaps for your first problem set.
    • 0:15:00So if I go ahead and include not just stdio, but cs50.h,
    • 0:15:04and I go ahead and get a string from the user
    • 0:15:07with get_string, prompting them for their name, and then go ahead
    • 0:15:10and print not just hello, world, but hello, percent s comma name,
    • 0:15:14this I believe was the same program you yourselves probably
    • 0:15:17wrote, or some variant thereof.
    • 0:15:20So if I go ahead now and test this myself--
    • 0:15:22make hello, Enter, seems OK, ./hello.
    • 0:15:26I'm going to go ahead and type in my name, and voila, hello, David.
    • 0:15:29Now suppose you're feeling pretty good, you're
    • 0:15:30pretty confident that your code is correct,
    • 0:15:32and most importantly, you have tested your code yourselves.
    • 0:15:36It's not sufficient to rely on our tool alone
    • 0:15:38to test your code because it, too, might not be exhaustive.
    • 0:15:41So once you've tried a few inputs, not just David, but perhaps
    • 0:15:45Veronica's name as well, seems to work.
    • 0:15:47Brian's name as well, seems to work.
    • 0:15:49No name at all, doesn't seem to work, maybe?
    • 0:15:52But we'll have to look back to the problem set
    • 0:15:54to see if that's actually a problem.
    • 0:15:56Let me go ahead now and run check50.
    • 0:15:58check50 expects a special slug, so to speak.
    • 0:16:02Just a unique identifier for the problem that you want to check.
    • 0:16:05And you would only know this from reading a problem
    • 0:16:07set or a documentation online.
    • 0:16:09I just happened to recall that the command that the staff had been using
    • 0:16:12to grade and evaluate hello is just cs50/2018/fall/hello.
    • 0:16:18And the slash is to just kind of visually distinguish those words,
    • 0:16:21this isn't a folder or files or anything like that in your own account.
    • 0:16:24So I'm going to run check50 cs50/2018/fall/hello in the same
    • 0:16:29directory that hello.c is in.
    • 0:16:31Enter.
    • 0:16:32It's going to go ahead and connect to GitHub, which is the backend,
    • 0:16:35recall, that we use for storing your code.
    • 0:16:37It's authenticating me now, which means what's your username and password?
    • 0:16:40I'm going to go ahead and use one of my test accounts.
    • 0:16:43And now it's prompting me for my password,
    • 0:16:45and I'm going to go ahead and type that in.
    • 0:16:47You'll notice you're seeing stars like you see bullets in a website
    • 0:16:49just so that someone looking over your shoulder can't see what you're typing.
    • 0:16:52Now I'm going to go ahead and watch the progress.
    • 0:16:55It's preparing, let me go ahead and zoom in.
    • 0:16:58Dot-dot-dot.
    • 0:16:59It's looking at my code, it's getting ready for submission,
    • 0:17:03it's now uploading it to GitHub.com, and once it's on the servers,
    • 0:17:07then it's going to tell CS50 server, here is so-and-so's submission,
    • 0:17:11go ahead and run a few automated tests on it,
    • 0:17:14checking therefore its correctness, and hopefully we're about to see some
    • 0:17:17green, happy smiley faces, and voila, yes,
    • 0:17:21it looks like this check50 command for this problem--
    • 0:17:24or slug, so to speak--
    • 0:17:26checked that hello.c exists, because if I forgot to write the file
    • 0:17:29or if I misnamed it, nothing's going to work.
    • 0:17:32We checked that it compiles successfully,
    • 0:17:33so that, too, is a happy green face.
    • 0:17:35Then it apparently checked--
    • 0:17:37what if we type in Veronica?
    • 0:17:38Do we see hello, Veronica?
    • 0:17:40Apparently yes.
    • 0:17:41What if we typed in another word, Brian?
    • 0:17:42Yes, apparently we say hello, Brian.
    • 0:17:44And so with high probability, we're going
    • 0:17:46to conclude, based on those four tests, that your code is, in fact, correct,
    • 0:17:49at least with respect to those inputs.
    • 0:17:51And there's often some more detail via URL at the bottom
    • 0:17:54where you can actually see more graphically just more
    • 0:17:56feedback on your code.
    • 0:17:57Of course, the first time, second time, third time maybe you run this command,
    • 0:18:01you might not see some green happy faces,
    • 0:18:03you might see some red unhappy faces or some yellow flat faces,
    • 0:18:06which just means we couldn't even run the checks because something else is
    • 0:18:09wrong.
    • 0:18:10But over time, this will help you feel more comfortable and more confident
    • 0:18:14that your code's correct before you actually use submit50 and submit.
    • 0:18:18Going into it you'll feel a little better or a little frustrated
    • 0:18:21to know in advance-- wait a minute, I'm about to submit this but nope,
    • 0:18:24it's not yet correct.
    • 0:18:25So realize it's a two-edged sword.
    • 0:18:28Any questions about check50 or any of these commands thus far?
    • 0:18:34Anything at all?
    • 0:18:36No?
    • 0:18:37All right.
    • 0:18:38So let's take a look at the final and most powerful
    • 0:18:41tool now available to you in the IDE environment.
    • 0:18:45Built in to CS50 IDE, which stands for Integrated Development
    • 0:18:49Environment, which isn't a CS50 thing-- this is a common term in industry
    • 0:18:52for tools that make it easier to write code,
    • 0:18:54it turns out that there's some other feature besides the cat over here.
    • 0:18:58Namely, one, you can share your workspace
    • 0:19:00with teaching fellows and course assistants
    • 0:19:03so they can perhaps help you in real time a la Google Docs, even chatting
    • 0:19:06with you in real time.
    • 0:19:07But it also provides you with what's called a debugger.
    • 0:19:09A debugger, as the name suggests, removes bugs--
    • 0:19:12or rather, helps you remove bugs from your code
    • 0:19:15by allowing you to not just resort to printf--
    • 0:19:17printing out ints and strings and whatever
    • 0:19:19is good that's going on your program, it kind of automates
    • 0:19:22that very tedious process for you.
    • 0:19:24And it lets you walk through your code one
    • 0:19:26line at a time at your own comfortable pace
    • 0:19:29and see along the way all of the values of your variables in that program.
    • 0:19:33To activate this debugger, I'm going to go ahead and do the following.
    • 0:19:36I'm going to compile my code as always with make hello.
    • 0:19:39It has to compile, otherwise I might want
    • 0:19:42to use help50 and figure out why it's not compiling,
    • 0:19:44but it does seem to have compiled.
    • 0:19:46And now I'm going to go ahead and run debug50, space, and then
    • 0:19:50the name of the program I wanted to debug.
    • 0:19:52And the name of the program I wanted to debug at the moment
    • 0:19:54is the current directory's file called hello.
    • 0:19:56Let's assume that there's perhaps something wrong with it.
    • 0:19:59The first time I run this command, though, debug50
    • 0:20:01is not going to be happy with me because it's going to say,
    • 0:20:03it looks like you haven't set any breakpoints.
    • 0:20:06Set at least one breakpoint by clicking to the left of a line number
    • 0:20:09and then rerun debug50.
    • 0:20:10Well what is a breakpoint?
    • 0:20:12Well as the name kind of suggests, it allows
    • 0:20:14you to break or pause the running of your code at any of your lines.
    • 0:20:19And all this time for the past few weeks,
    • 0:20:21your code been automatically line-numbered.
    • 0:20:23And this is useful because the most interesting line in this program,
    • 0:20:27once it really gets going, isn't this stuff at the top,
    • 0:20:29it's not int main void, right?
    • 0:20:31That's all copy-paste from past programs.
    • 0:20:33It's really the sixth line here where I actually have some logic of my own.
    • 0:20:37And so in CS50 IDE, what you can now do is
    • 0:20:41click to the left of one of these line numbers,
    • 0:20:43a little red light like a stop sign is going to appear saying,
    • 0:20:46break or pause my program on this line so
    • 0:20:49that I can poke around my actual code.
    • 0:20:52Sandbox and Lab cannot do this.
    • 0:20:54So now I'm going to go ahead and rerun debug50 in exactly the same way, hit
    • 0:20:58Enter, but now I have one breakpoint.
    • 0:21:01And you'll see on the right-hand side a fancier menu just popped up
    • 0:21:05by the cat that provides me with a bunch of features.
    • 0:21:07And at first glance, frankly, it's a little overwhelming
    • 0:21:10because there's a lot going on here, but you'll notice first,
    • 0:21:13and most importantly, there's some mention of my name variable.
    • 0:21:17I don't quite understand 0x0 or whatnot, but I do understand string.
    • 0:21:21And so what the debug50 program has realized is oh, on this line and below,
    • 0:21:26you have a variable called name.
    • 0:21:28It doesn't seem to have a value yet.
    • 0:21:290x0, it turns out, is just going to mean empty or null or 0.
    • 0:21:33But that's good, because now, when I actually execute this line,
    • 0:21:37hopefully it's going to take on the name David or Veronica or Brian.
    • 0:21:41So let's see what happens.
    • 0:21:42Notice that it's highlighted in yellow, line 6, which means it
    • 0:21:46has not yet executed this line of code.
    • 0:21:48My code has paused at this point because I set that breakpoint.
    • 0:21:52And then notice kind of like a music player up here, there's a few icons.
    • 0:21:57The Play button is just going to say, ah, play my program,
    • 0:21:59run it all the way through the end, kind of like scratch with the green flag.
    • 0:22:03But more powerful is this.
    • 0:22:04You can step over this line, therefore executing it just once.
    • 0:22:09If it's a function, you can step into this line
    • 0:22:12and actually look inside of a function that you're using, like get_string,
    • 0:22:15or you can step out of another function, but more on that another time.
    • 0:22:19So what I'm going to do is this.
    • 0:22:20And the button I'm going to click most commonly when trying to understand
    • 0:22:23how my program is working is this--
    • 0:22:25Step Over.
    • 0:22:26So it's the second icon from the left, right next to the triangle.
    • 0:22:31So once I click this, watch what's going to happen,
    • 0:22:34even though it's a little small, on the right-hand side for my name variable.
    • 0:22:38Notice that I'm being prompted to type in my name because the program
    • 0:22:41is still running in my terminal window, but when I hit Enter now,
    • 0:22:44providing my own name, automatically you see on the right-hand side
    • 0:22:49that this name variable has a value now of, quote-unquote,
    • 0:22:53"David" of type string.
    • 0:22:55There's this 0x1083010-- more on that later, just a little cryptic,
    • 0:22:59but I didn't have to use printf now, I can actually see what's going on.
    • 0:23:02Now you can see that line 7 is highlighted,
    • 0:23:04because I set a breakpoint above it, so now I'm on the second line
    • 0:23:07because I just stepped into it.
    • 0:23:08Let me go ahead and click Next again, and you'll
    • 0:23:11see that in my terminal window, hello, David just got executed.
    • 0:23:14And now if I just keep going, it's going to go ahead and run to the end
    • 0:23:17and close the debugger.
    • 0:23:18So not all that useful for this program because frankly, I'm
    • 0:23:21pretty sure this is correct, but the power of debug50 and a debugger more
    • 0:23:25generally is that it lets you, whether you're less comfy or more comfy,
    • 0:23:28walk through your own code at your pace just like a TF or a CA might say, OK,
    • 0:23:33what is this line doing?
    • 0:23:34What is this line doing?
    • 0:23:35You don't have to resort to printf, you can just very methodically
    • 0:23:38walk through your code and find that damn bug that's been bothering you
    • 0:23:41for minutes or even hours.
    • 0:23:43So henceforth, any time you have a bug in your code that is compiling
    • 0:23:47but it's just logically incorrect-- the pyramid in Mario isn't quite right,
    • 0:23:51your encryption of Caesar isn't quite right, or something else,
    • 0:23:53your first instinct now should be, let me compile it, run debug50 on it,
    • 0:23:58and just step through the code, setting a breakpoint wherever I want,
    • 0:24:01so you focus on just a few lines, not the whole thing--
    • 0:24:04like I just did--
    • 0:24:05and see if you can figure out logically when a value is not what you expected,
    • 0:24:08then oh--
    • 0:24:09go ahead and just click Resume, fix the bug, and retry.
    • 0:24:13Such a powerful tool.
    • 0:24:15Any questions?
    • 0:24:17Yeah?
    • 0:24:19What is it?
    • 0:24:19AUDIENCE: What does it look like when there is a bug?
    • 0:24:21DAVID MALAN: What does it look like when there is a bug?
    • 0:24:24So the debugger won't find your bugs and it won't show you your bugs, per se.
    • 0:24:28It's going to let you see what line is executing,
    • 0:24:31it's going to let you see what's outputting,
    • 0:24:32it's going to let you take input, but all it's
    • 0:24:34going to do on that right-hand side is just show
    • 0:24:36you the values of things along the way.
    • 0:24:38It's up to you to infer from that information what
    • 0:24:42it is that's going wrong, just like if you're using printf in past weeks
    • 0:24:46to see what's going on in your program.
    • 0:24:48Other questions?
    • 0:24:50And let me save this too.
    • 0:24:52It is so easy to get into the habit, especially when so many things have
    • 0:24:55been new over the past few weeks of just saying, ah,
    • 0:24:57this is just yet another thing to learn.
    • 0:24:59This is hands down the kind of tool that if you
    • 0:25:02spend a few extra minutes this week and next week just using it,
    • 0:25:05get a little more comfortable with it, it
    • 0:25:07will save you potentially hours in the long run,
    • 0:25:09because all the time you've been spending manually
    • 0:25:12trying to fix your bugs or posting questions online
    • 0:25:14trying to understand things, this is a tool
    • 0:25:16that if you invest those minutes upfront will just
    • 0:25:18help you understand everything going on inside of your program,
    • 0:25:21and will absolutely over the next few weeks save you more and more time.
    • 0:25:27All right, any questions? yeah?
    • 0:25:30AUDIENCE: So you have a for loop that ran [INAUDIBLE] times,
    • 0:25:34[INAUDIBLE] separate break statements so you don't have to [INAUDIBLE]..
    • 0:25:38DAVID MALAN: Ah, good question.
    • 0:25:39If you have something like a for loop or a while loop, something
    • 0:25:42that's happening a lot, can you set a breakpoint in such a way
    • 0:25:45that it only breaks so that you don't have to walk through it 100 times
    • 0:25:49just to see that value?
    • 0:25:50Short answer, yes.
    • 0:25:51And let me defer to section and online resources for just a few
    • 0:25:54of these features, but one, you can actually watch values,
    • 0:25:57and you can have what's called a watch expression.
    • 0:25:59You can say show me this value if only when x is greater than 50
    • 0:26:03or something like that.
    • 0:26:04Or you yourself can just add some lines of code.
    • 0:26:06You could add a, if x equals-equals 50, then print out something,
    • 0:26:11and you can set a breakpoint on that new, if temporary line,
    • 0:26:14so there's a couple of ways to do that.
    • 0:26:15Good question to anticipate.
    • 0:26:16Yeah?
    • 0:26:17Behind.
    • 0:26:18AUDIENCE: If you run debug50, aren't you adding
    • 0:26:22another arugment with the [INAUDIBLE] in your main method at line 4?
    • 0:26:26DAVID MALAN: Really good question.
    • 0:26:28If you're running debug50, aren't you adding
    • 0:26:30another argument-- argv-- per our discussion last week of command line
    • 0:26:33arguments?
    • 0:26:34Short answer, no, because debug50 corrects for that,
    • 0:26:36so you don't have to worry about that.
    • 0:26:38It will not shift things over numerically.
    • 0:26:40Really good thought.
    • 0:26:41Other questions?
    • 0:26:43All right, so with that said, let's now take some training wheels off.
    • 0:26:50So the only reason I bought these training wheels years ago
    • 0:26:53is to make this very dramatic point of now taking the training wheels off
    • 0:26:57today.
    • 0:26:59OK, so what does this mean?
    • 0:27:01Well worth the trip to Target.
    • 0:27:03So what does this mean?
    • 0:27:04For the past few weeks, we have been using a whole bunch
    • 0:27:07of functions from CS50's library.
    • 0:27:09All of these were meant to just make it pretty easy, relatively speaking,
    • 0:27:12in the first few weeks to get input from the user.
    • 0:27:14Because it turns out, as we'll see today,
    • 0:27:16it's actually a kind of a pain in the neck to get input from users in C,
    • 0:27:20and frankly, even in other languages reliability.
    • 0:27:23Because you'll recall that get_string and get_int and all of these functions
    • 0:27:27take on the burden of like re-prompting the user if they don't actually
    • 0:27:30give you an an int or don't give you a float
    • 0:27:32or don't give you a char that you're expecting, they'll re-prompt,
    • 0:27:34they're using a while loop or a do-while loop or the like,
    • 0:27:37so there's just a lot of error detection built into these functions.
    • 0:27:40But, most importantly-- and most misleadingly,
    • 0:27:44has been the last one on this list.
    • 0:27:46Recall that we introduced a couple weeks ago now the notion of a string.
    • 0:27:50And a string is in English what?
    • 0:27:53An array of characters, good.
    • 0:27:54It's a sequence of characters, and we learned last week that a sequence can
    • 0:27:57be implemented in an array, which is just a chunk of memory
    • 0:27:59back-to-back-to-back-to-back.
    • 0:28:01So string, though, is not quite like any of those other data types.
    • 0:28:06It turns out that it's not quite like int or char or even bool or float,
    • 0:28:10and we can start to see that now as follows.
    • 0:28:13I'm going to go ahead and go into the IDE today--
    • 0:28:15and henceforth we're going to just start using the IDE,
    • 0:28:17but you're welcome to keep using the Sandbox for quick and dirty programs,
    • 0:28:20but for anything you want to keep around,
    • 0:28:22your instinct should now be to open your IDE.
    • 0:28:24I'm going to go ahead and create a new file,
    • 0:28:26and I'm going to call it compare0.c from my first example of comparing things.
    • 0:28:31And I'm going to go ahead and whip up a relatively short program
    • 0:28:34that you would hope would work right out of the box.
    • 0:28:37So I'm going to go ahead and include the familiar cs50.h.
    • 0:28:40I'm going to go include stdio.h.
    • 0:28:42I'm going to go ahead and do int main void.
    • 0:28:44I'm going to go ahead and in here--
    • 0:28:46let me a variable called i using get_int from the user,
    • 0:28:49and just prompt them for i.
    • 0:28:51Let me go ahead then and prompt the user for another get_int.
    • 0:28:55We'll call it j and get that from them.
    • 0:28:57And then let's just compare these things.
    • 0:28:59So if i equals-equals j, then go ahead and print out
    • 0:29:03with printf same and a new line.
    • 0:29:05Then go ahead and print out the opposite, which is different.
    • 0:29:10So the only place I think I could have screwed up, perhaps,
    • 0:29:13is if I did this, which is kind of reasonable if you
    • 0:29:16come in knowing what an equal sign is.
    • 0:29:17But again, in code, we typically need two equal signs
    • 0:29:20because that compares two values.
    • 0:29:21So I didn't make that mistake, I'm feeling pretty good about this.
    • 0:29:24Let me save it with Command-S or Control-S or via File,
    • 0:29:28Save; go to my prompt and run make compare0.
    • 0:29:31Good, everything compiled.
    • 0:29:33And let me go ahead and run compare0, Enter, and I'll type in 50,
    • 0:29:38and I'll type in 50, and they do seem to be the same.
    • 0:29:42Let me go ahead and do that again, let's type in 42 and 13,
    • 0:29:46and they are different.
    • 0:29:47And I should probably test a few more, maybe some negative values, maybe some
    • 0:29:500's, positive values and the like, but I'm
    • 0:29:52feeling pretty good about the correctness of this code.
    • 0:29:54All right.
    • 0:29:55So let's change this program a bit.
    • 0:29:57Let me go ahead and create another file, which
    • 0:29:59I can do with the little green plus or via File, New File.
    • 0:30:02I'm going to go ahead save this one as compare1.c.
    • 0:30:04And for the moment I'm going to go ahead and just paste in that code
    • 0:30:08from before, but I'm going to make some changes now.
    • 0:30:11I'm going to go ahead and rename and retype my data types as strings.
    • 0:30:16So give me a string called s, and will prompt the user
    • 0:30:18for that using get_string, then I'm going
    • 0:30:21to go ahead and change this 1 to string t,
    • 0:30:23and I'm going to go ahead and get get_string.
    • 0:30:25I, of course, need to now compare s and t, not i and j.
    • 0:30:30And s is a common variable name for a string. t just comes after s,
    • 0:30:33so that's pretty reasonable too, but I should of course update that as well.
    • 0:30:36And so I think everything's now the same logically.
    • 0:30:39I just changed my data types and my variable names.
    • 0:30:41So I've saved this.
    • 0:30:42Let me go ahead and run make compare1.
    • 0:30:45Good, everything's correct.
    • 0:30:47Let me go ahead and do ./compare1.
    • 0:30:51Let me go ahead and type in Brian and Veronica.
    • 0:30:56And of course, those are different.
    • 0:30:58Now let me go ahead and type in David, let me type in David again,
    • 0:31:01and those of course are different?
    • 0:31:05Huh.
    • 0:31:06Maybe it's because I just hit the Spacebar or something.
    • 0:31:08So let's try Erin.
    • 0:31:11Her name's a little shorter.
    • 0:31:12Hmm.
    • 0:31:13OK, let's try-- oh, what's her name?
    • 0:31:16TJ.
    • 0:31:17OK, even shorter, perfect.
    • 0:31:19TJ, can't go wrong.
    • 0:31:21Different.
    • 0:31:21I mean, what is going on?
    • 0:31:23Let's just say i, i.
    • 0:31:25Different?
    • 0:31:27So where's the logical bug in this program?
    • 0:31:34What is it that's going on?
    • 0:31:36Yeah, what do you think?
    • 0:31:37AUDIENCE: Is it comparing integer values?
    • 0:31:39DAVID MALAN: Is it comparing integer values?
    • 0:31:40Well maybe.
    • 0:31:41I mean, thus far when we've used equal-equals
    • 0:31:43we've probably used it mostly for comparing integers,
    • 0:31:45so maybe I'm just misusing it, sure.
    • 0:31:47Other thoughts?
    • 0:31:48AUDIENCE: [INAUDIBLE]
    • 0:31:51DAVID MALAN: Oh, that's a big word that we'll get to in just a little bit.
    • 0:31:54But correct, correct-- but for very similar reasons.
    • 0:31:58So something's going on logically involving comparison,
    • 0:32:02because I'm using equal-equal, but maybe I'm using it for the wrong data types?
    • 0:32:06I mean, it's clearly broken for strings.
    • 0:32:09So why might that actually be?
    • 0:32:11Well it turns out that strings don't actually exist.
    • 0:32:16So a string that we know is just a sequence of characters
    • 0:32:19or an array of characters is not an actual data type.
    • 0:32:22int is, float is, double is, long is, bool is, and even more
    • 0:32:27are actual data types.
    • 0:32:28String is kind of a little white lie we've
    • 0:32:30been telling for a few weeks that's implemented only in the CS50 library.
    • 0:32:35Now the word string is super common in programming.
    • 0:32:37Like every programmer out there will know what you mean when you say string.
    • 0:32:40That is not a CS50 word, but our use of it in C is CS50-specific.
    • 0:32:44Because in that file called cs50.h, in addition
    • 0:32:47to declaring functions like get_string and get_int and get_float
    • 0:32:50and a bunch of other things, we also have a special line that says,
    • 0:32:53create a data type called string.
    • 0:32:57But what does it actually do or what does it actually mean?
    • 0:33:00Well let's go ahead and consider what might be going on underneath the hood
    • 0:33:04here.
    • 0:33:04So if I go ahead and draw the program that we just
    • 0:33:08ran, that program compare1 gets a string s from the user,
    • 0:33:12then gets a string t from the user, and then compares them.
    • 0:33:15So we know from last week what a string is, it's just an array.
    • 0:33:18So when I run that first line of code and get a string from the user--
    • 0:33:22for instance, Brian, I'm going to go ahead and see a B-R-I-A-N,
    • 0:33:28which we know from last week to actually be an array of memory that might look
    • 0:33:33pictorially like this-- and this, too, is a bit of a white lie,
    • 0:33:36there's something else.
    • 0:33:37AUDIENCE: The null.
    • 0:33:38DAVID MALAN: Yeah, the null character, so to speak, and ul,
    • 0:33:41which we typically just write with a backslash 0, which is just all 0 bits.
    • 0:33:45And it turns out, you might recall from the debugger earlier, you saw this--
    • 0:33:49that's the even more cryptic way of expressing the null character,
    • 0:33:52backslash 0.
    • 0:33:53Just different programs display it in different ways.
    • 0:33:55So when I get_string and type in Brian, this is what's allocated in memory.
    • 0:34:00And when I type Veronica, I can see a V-E-R-O-N-I-C-A.
    • 0:34:05I'm going to get that right preemptively.
    • 0:34:07Backslash 0.
    • 0:34:08That, too, is a chunk of memory, which I'll draw like this.
    • 0:34:121, 2, and split these up into interval characters or bytes.
    • 0:34:16And recall from last time that these bytes just come from my memory,
    • 0:34:20and that memory just has a bunch of bytes in it, maybe millions or even
    • 0:34:23billions these days.
    • 0:34:24And so honestly, if you just have that many things,
    • 0:34:26any human or computer can certainly number them.
    • 0:34:29Like this is byte 1, 2, 3, 4.
    • 0:34:31So let's just assume for the sake of discussion
    • 0:34:34that out of context of my computer's hardware,
    • 0:34:36Brian just ended up at location 100, and location 101, and 102, 103, 104, 105.
    • 0:34:46So this is the 100th byte in my computer,
    • 0:34:49this is 105th byte in my computer, and Brian
    • 0:34:51is using that many characters in total.
    • 0:34:53Veronica, she ended up somewhere else.
    • 0:34:55Maybe she ended up farther away just because at location 900, 901, 902, 903,
    • 0:35:02904, 905, 906-- a lot more memory, 907, and 908--
    • 0:35:09but you can see even more visually now that the length of Brian's name--
    • 0:35:14strlen of Brian is what?
    • 0:35:18AUDIENCE: [INAUDIBLE]
    • 0:35:21DAVID MALAN: I hear five and I hear six.
    • 0:35:22The length of Brian's name--
    • 0:35:24Brian, how long is your name?
    • 0:35:25AUDIENCE: Five.
    • 0:35:26DAVID MALAN: OK, it is definitively five characters, that
    • 0:35:28is the length of Brian's name, but you have
    • 0:35:31to appreciate that in the computer, Brian's five-character name does indeed
    • 0:35:35take up six bytes.
    • 0:35:36So both answers are kind of correct, but the length of the string henceforth
    • 0:35:39is always the number of actual characters.
    • 0:35:41The amount of space it takes up is that plus 1 for the null character.
    • 0:35:45So you can actually see why Brian's name takes up six bytes in this picture
    • 0:35:49rather than just the actual length, which is five.
    • 0:35:52So when you call get_string now, and when you call
    • 0:35:55get_string and get another string--
    • 0:35:57Brian and Veronica respectively, what is actually being handed back?
    • 0:36:01A couple weeks ago, Erin came up and she kind of like
    • 0:36:04handed me back a string, a student's name from the audience.
    • 0:36:07On that piece of paper we thought was the student's name.
    • 0:36:11But it's not.
    • 0:36:13It turns out that when a function returns a value,
    • 0:36:15it can pretty much only return a 1 byte or maybe 2 or 4 bytes.
    • 0:36:20It can't return an arbitrary number of bytes, like six for Brian or 1, 2, 3,
    • 0:36:254, 5, 6, 7, 8, 9-- it cannot return 9 bytes for Veronica.
    • 0:36:29And if you even type a whole paragraph or page of text,
    • 0:36:32it can't return all of that text, it can only return a single value.
    • 0:36:37So to your instinct earlier, what might actually
    • 0:36:40be getting returned by get_string when the human has
    • 0:36:44typed in a name like Brian or Veronica?
    • 0:36:47AUDIENCE: [INAUDIBLE]
    • 0:36:49DAVID MALAN: The memory location.
    • 0:36:50Indeed, an integer, or as you called it, a pointer,
    • 0:36:53which we'll introduce more formally in just a moment.
    • 0:36:55So when get_string string returns "Brian," quote-unquote,
    • 0:36:58it's actually not returning B-R-I-A-N backslash 0, it is just returning 100.
    • 0:37:05And when get_string returns Veronica, it's not returning her name,
    • 0:37:08it's returning 900.
    • 0:37:10And so if you realize that now, when you do does
    • 0:37:13s equal-equal t, what question more mundanely are you actually asking?
    • 0:37:19Yeah.
    • 0:37:20Memory location and memory location-- does 100 equal 900?
    • 0:37:24And obviously not.
    • 0:37:25And so that is why Brian's name, Veronica's name,
    • 0:37:28my name, TJ's name-- every word I typed in was of course different,
    • 0:37:32because each input was ending up at a different location in memory.
    • 0:37:36And even if I typed the same word like David twice, one David was going here,
    • 0:37:40one David was going somewhere else, they were ending up
    • 0:37:42at different memory locations.
    • 0:37:44Maybe 100, maybe 900, maybe something else,
    • 0:37:46but they were ending up in different locations in memory.
    • 0:37:48So equal-equals does compare values, but dammit
    • 0:37:51if it isn't comparing the wrong values.
    • 0:37:54Yeah?
    • 0:37:54AUDIENCE: Well what if you use some char*s?
    • 0:37:56DAVID MALAN: Ah, so we'll come back to that.
    • 0:37:58Let me come back to that in just a moment.
    • 0:38:00char* is actually intricately related.
    • 0:38:02More on that in a moment.
    • 0:38:03Yeah?
    • 0:38:04AUDIENCE: If you add two integers in memory--
    • 0:38:06DAVID MALAN: Uh huh?
    • 0:38:06AUDIENCE: Wouldn't they be in different places in memory?
    • 0:38:09So you would return--
    • 0:38:11so you need a different value.
    • 0:38:12DAVID MALAN: OK, really good question.
    • 0:38:14So wait a minute, this same logic that I'm returning the address of something
    • 0:38:19surely applies to integers as well or floating point values as well?
    • 0:38:23Because if I type in the number 50 like I
    • 0:38:25did earlier, that, too, is somewhere in memory-- like a box in memory,
    • 0:38:29and that, too, has an address somewhere in memory,
    • 0:38:32but it turns out, for reasons that you just alluded to, actually,
    • 0:38:36ints are returned as their values.
    • 0:38:38Chars are returned as their values.
    • 0:38:40Bools are returned as their values.
    • 0:38:42Floats are returned as their values.
    • 0:38:43Strings are different.
    • 0:38:45Strings are returned by their address.
    • 0:38:48And those addresses, it turns out, are ultimately going to be called
    • 0:38:54char*'s, which we'll see in just a moment.
    • 0:38:56So how do we go about then fixing this fundamentally?
    • 0:38:59Like even if you have no idea how to code this yet, just intuitively,
    • 0:39:03if I do actually want to delete--
    • 0:39:06if I do actually want to compare--
    • 0:39:09sorry.
    • 0:39:13OK.
    • 0:39:14If I do want to go ahead and compare Brian and Veronica for equality,
    • 0:39:19what do I want to do intuitively?
    • 0:39:21I can't just compare their addresses.
    • 0:39:23What do I need to do?
    • 0:39:25Isolate the characters and then do what with them?
    • 0:39:27AUDIENCE: [INAUDIBLE]
    • 0:39:30DAVID MALAN: Good.
    • 0:39:31Yeah, good instincts.
    • 0:39:32Use a for loop, use a while loop-- any kind of looping structure.
    • 0:39:35And intuitively, compare the first characters,
    • 0:39:37and if they're different, well then we know we don't have to go any further.
    • 0:39:40B is not a V, so surely these names are different.
    • 0:39:43But what about in my case?
    • 0:39:44If it was David and David, you would compare the first two.
    • 0:39:46D and D are the same.
    • 0:39:48Compare the second two, A and A are the same.
    • 0:39:50V and V, I and I, D and D, and then what am I going to hit last?
    • 0:39:55Null character.
    • 0:39:56And should I keep going beyond the null character?
    • 0:39:58No.
    • 0:39:59So this is the beauty of that super simple design for a string.
    • 0:40:02Insofar as strings are identified by their starting address, just the byte
    • 0:40:07at which they start, you still need to know
    • 0:40:10how long they are, because otherwise how do where one word begins and ends
    • 0:40:14and another word begins?
    • 0:40:16And so the simple decision we made last week-- as did humans decades ago--
    • 0:40:20to terminate all strings with backslash 0 or all 0's is a super handy trick,
    • 0:40:25so that if I tell you that Brian starts at 100,
    • 0:40:28you can infer that he ends where?
    • 0:40:33At byte number 105 or 104, if you will, however you want to think about it,
    • 0:40:37because all you need to do in linear time,
    • 0:40:40if you will, left or right, is check-- backslash 0, backslash 0-- ah!
    • 0:40:43Backslash 0, now I know how long Brian's name is.
    • 0:40:46So let's consider for a moment this program called string length.
    • 0:40:49How does strlen actually work?
    • 0:40:52When you pass to strlen, a variable containing a string, like Brian,
    • 0:40:57what is sterling probably doing?
    • 0:41:00AUDIENCE: [INAUDIBLE]
    • 0:41:02DAVID MALAN: Exactly.
    • 0:41:03It's looking at that null character's address
    • 0:41:05and subtracting the start address and the end address,
    • 0:41:09figuring out what the difference is, and actually returning
    • 0:41:12that minus 1 the total count.
    • 0:41:14And more mechanically, we'll see in a moment,
    • 0:41:16it's probably doing exactly the same thing I did,
    • 0:41:19which is, is this backslash 0?
    • 0:41:20Is this backslash 0?
    • 0:41:21Is this, is this, is this?
    • 0:41:22I asked that question five times before I saw backslash 0.
    • 0:41:25strlen is just a function some human wrote years ago
    • 0:41:29that probably just has a simple for loop and an if condition,
    • 0:41:31and then that's it.
    • 0:41:33Because that person understood before we even
    • 0:41:35did how strings are actually implemented.
    • 0:41:39Any questions then?
    • 0:41:41All right, so let's actually implement this.
    • 0:41:42Let me go ahead and into my editor here, and make one other example here
    • 0:41:46that I'm going to call compare2.
    • 0:41:48I'm going to go ahead and do include cs50.h and include stdio.h,
    • 0:41:55and then I'm going to do int main void, and I'm
    • 0:41:57going to quickly now grab my code from before where I got strings
    • 0:42:03and I compared them, but I have to obviously fix that comparison.
    • 0:42:06So here's my code from before.
    • 0:42:08I'm going to do this the right way.
    • 0:42:10I'm going to call a function called compare_strings passing in s and t.
    • 0:42:14Because as you proposed, we need to do some logic.
    • 0:42:16We don't have to pass it to a function, but we could.
    • 0:42:18We could just do a for loop here, but I'm
    • 0:42:20going to go ahead and implement compare_strings as follows.
    • 0:42:23If I want to write a function that returns a yes/no answer, what data type
    • 0:42:28should it return?
    • 0:42:29A bool.
    • 0:42:30So we've not necessarily done this yet, but you
    • 0:42:32can return a bool just like you can int or a char or something else.
    • 0:42:36I'm going to call this function compare_strings.
    • 0:42:38It's going to take in one string called a and another string called b,
    • 0:42:42but I could call those anything I want.
    • 0:42:44And now what's the easiest thing to check?
    • 0:42:47If I pass two strings, a and b, or Brian and Veronica,
    • 0:42:50what's the easiest question you can ask and just immediately say, nope,
    • 0:42:53these are different?
    • 0:42:55String length, right?
    • 0:42:56Like if the B-R-I-A-N is not of the same length as Veronica's name,
    • 0:43:00we don't need to do any logic whatsoever beyond that,
    • 0:43:02we can just quit and say false.
    • 0:43:04So let me just do that.
    • 0:43:05If the strlen of a does not equal the strlen of b, you know what?
    • 0:43:10Let's just go ahead and return false and get out of here.
    • 0:43:13OK, but now, if we get past that gateway, so to speak,
    • 0:43:17that check, that question, that Boolean expression,
    • 0:43:19now I have to compare things character by character by character.
    • 0:43:23So I can do this in a bunch of ways, but I like the suggestion of a for loop.
    • 0:43:26So for int i at 0, n for efficiency-- actually,
    • 0:43:30let's do i is less than the string length--
    • 0:43:33should I do the string length of a or b?
    • 0:43:36And it doesn't matter, right?
    • 0:43:38So let's go with a.
    • 0:43:39And frankly, had I been smart early on, I
    • 0:43:41could have stored the value in a variable and then reused it,
    • 0:43:44but we'll just keep going ahead for now.
    • 0:43:45Then i plus-plus, but I remember from last time-- this is correct,
    • 0:43:48but this is not good design.
    • 0:43:49Why?
    • 0:43:52Yeah, I keep calling strlen again and again, because remember, in a for loop,
    • 0:43:55this condition is checked again and again
    • 0:43:58and again-- you're just wasting your own time.
    • 0:44:00So let me go ahead and actually do this.
    • 0:44:02n or any variable equals the strlen of a, then just compare i against n,
    • 0:44:09because now i is getting incremented, but n is never changing.
    • 0:44:12So now let me go ahead and implement this for loop.
    • 0:44:15So if-- how about the i-th character of a does not equal the i-th character
    • 0:44:21of b, I can immediately conclude--
    • 0:44:24nope, these strings can't be the same, because some letter, like a B,
    • 0:44:28is not the same as another, like a V, or whatever letter we're actually
    • 0:44:31comparing.
    • 0:44:32And then I think that's it.
    • 0:44:34If I get through these gauntlets of questions--
    • 0:44:37are yours lengths different?
    • 0:44:38Are your characters different?
    • 0:44:40And I still haven't said false, what should I return by default?
    • 0:44:45Yeah.
    • 0:44:45Like if you make it through all of those questions and all is well,
    • 0:44:49then D-A-V-I-D must indeed equal D-A-V-I-D or whatever the user actually
    • 0:44:54typed in.
    • 0:44:54Now I'm not quite done yet.
    • 0:44:56When I've implemented a function or a helper function
    • 0:44:58like this, because it's helping me do my work,
    • 0:45:01what else do I have to add to the file?
    • 0:45:02Oh?
    • 0:45:03AUDIENCE: I've got a logical question.
    • 0:45:04DAVID MALAN: Sure.
    • 0:45:05AUDIENCE: In a computer, couldn't you just type in David with a capital D
    • 0:45:08and then david with a lowercase d, you're going to run [INAUDIBLE],,
    • 0:45:11they're not going to sync because your first character's not
    • 0:45:12the same character.
    • 0:45:13DAVID MALAN: Correct.
    • 0:45:14So this is a feature, not a bug at the moment.
    • 0:45:17My program at the moment is case-sensitive.
    • 0:45:20If I type in DAVID and all caps, that is a different string
    • 0:45:22I claim for now than david in all lowercase.
    • 0:45:25If you want to tolerate uppercase and lowercase,
    • 0:45:27you're going have to add more logic.
    • 0:45:29But for now that's a design decision that I intend.
    • 0:45:32All right.
    • 0:45:32What else do I need to add to the program?
    • 0:45:36Yeah, the prototype at top.
    • 0:45:38You can literally copy and paste-- this is the only time copy and paste is
    • 0:45:41probably a legitimate thing to do--
    • 0:45:43at the top, and then semi-colon-- don't re-implement it.
    • 0:45:45But I do need one other header file.
    • 0:45:48I'm using a function that's not in cs50.h or in stdio.h.
    • 0:45:54String length?
    • 0:45:56Where was string length?
    • 0:45:58Yeah, string.h.
    • 0:45:59So I just need this, include string.h, save.
    • 0:46:03Now this I think is correct.
    • 0:46:05We'll see if I eat the word in a moment.
    • 0:46:08But realize that if you're writing this code yourself,
    • 0:46:10like this is not a natural thing to be writing a program in office hours
    • 0:46:13or at home in your dorm and just getting it right the first time.
    • 0:46:16This is after like 20 years of doing this, so realize we happen to be--
    • 0:46:19and I also have a cheat sheet right here--
    • 0:46:21we happen to be doing this correctly often,
    • 0:46:23but realize that's not going to be the common case.
    • 0:46:26So with that reassurance in mind, let's see
    • 0:46:28if I have to now take all that back. make compare2.
    • 0:46:33OK-- phew.
    • 0:46:3320 years worked out.
    • 0:46:34So now I'm going to go ahead and ./compare2.
    • 0:46:37Let's type in Brian, let's type in Veronica.
    • 0:46:40Those are indeed still different hopefully.
    • 0:46:42Now let's try myself, David and David.
    • 0:46:45Phew!
    • 0:46:46Those are the same.
    • 0:46:47And to your point, David in capitalized and David in all lowercase,
    • 0:46:52different, but that's what I expect now.
    • 0:46:56Any questions on compare2?
    • 0:46:58Yeah?
    • 0:46:58AUDIENCE: [INAUDIBLE]
    • 0:47:01DAVID MALAN: OK.
    • 0:47:02AUDIENCE: [INAUDIBLE] string in the program and in general.
    • 0:47:06DAVID MALAN: OK.
    • 0:47:07AUDIENCE: Would that still work [INAUDIBLE]
    • 0:47:10DAVID MALAN: If you were to hard code the strings?
    • 0:47:12Short answer, yes, that would still work.
    • 0:47:14If you for whatever reason did not do this and using get_string,
    • 0:47:19but you did David, and here, for instance, David, that would work too.
    • 0:47:25And whatever your error is, if you can recreate it, just let us know.
    • 0:47:28AUDIENCE: It seems to be like a string that would be increased
    • 0:47:31for a set that was [INAUDIBLE] only?
    • 0:47:33And it was having issues in the little [INAUDIBLE]..
    • 0:47:36DAVID MALAN: I'd have to see it to be sure, but happy to chat after.
    • 0:47:39All right, so let's see if we can't now clean this
    • 0:47:42up just a little bit as follows.
    • 0:47:45Let me go ahead here and reveal what it is that's actually going on.
    • 0:47:51So indeed, there is no such thing as a string.
    • 0:47:53And indeed, as you pointed out a moment ago,
    • 0:47:55it actually goes by a different name.
    • 0:47:57String is just a synonym for what's called a char*.
    • 0:48:01Now what does that even mean?
    • 0:48:02So char is the same as it's always been.
    • 0:48:04It's a single character.
    • 0:48:05Star in a program written in C could of course mean multiplication,
    • 0:48:09we have seen that.
    • 0:48:10This is another use of the star.
    • 0:48:12Whenever you see it after a data type like char,
    • 0:48:15this means that the data type in question is not just a char,
    • 0:48:19it's the address of a char.
    • 0:48:21So the star just means the address of whatever the data type is to the left,
    • 0:48:25and this is, as you pointed out earlier, what
    • 0:48:27we're going to start calling a pointer.
    • 0:48:29A pointer is, for all intents and purposes, an address.
    • 0:48:32It's just a buzzword to describe an address.
    • 0:48:34This data type here, char*, means I want a variable that doesn't store a char,
    • 0:48:40it stores the address of a char.
    • 0:48:42The number 100, the number 900.
    • 0:48:45But that address is just going to be called a pointer.
    • 0:48:48A pointer variable is a variable that stores the address of something.
    • 0:48:52A char or even other data types as well.
    • 0:48:54So with that in mind, let me actually quickly create compare3.c, paste this
    • 0:49:00in, and save it as compare3.c, and let me take off, if you will,
    • 0:49:05those training wheels.
    • 0:49:06It turns out that when you get a string with get_string,
    • 0:49:10it doesn't return a string, per se, because again,
    • 0:49:12that word doesn't exist in C, it actually returns a char*.
    • 0:49:16And when I call it again here and return another string, it, too,
    • 0:49:19returns a char*.
    • 0:49:21Now technically the star can have spaces around it.
    • 0:49:24Some people write it like this, but the sort of right way to do it
    • 0:49:27or the default way should just be to put the star next to the variable name
    • 0:49:30for clarity.
    • 0:49:31So I have to make a few other changes.
    • 0:49:33This should change too, because there is no more string as of today.
    • 0:49:37I'm going to change this to a char*; and then I also need to change it here,
    • 0:49:41char*; and then here, char*; and that is actually it.
    • 0:49:48And honestly, the only reason we didn't introduce this like two weeks ago
    • 0:49:53is because it just looks cryptic.
    • 0:49:55Like no one wants to program the first time they're ever touching a keyboard
    • 0:49:58and writing code and see char* and need to worry about what that means,
    • 0:50:01it's just a string conceptually.
    • 0:50:03But the only change I technically need to make to take those training wheels
    • 0:50:06off is just change all mentions of string as data types to char*.
    • 0:50:11And that just means that you know what-- a?
    • 0:50:12Yes it's a string, but more technically it's the address of a string.
    • 0:50:17Or more precisely, it is the address of the first byte of the string,
    • 0:50:21like 100 for Brian or 900 for Veronica, and I'm not even
    • 0:50:25going to tell you where the string ends because you, the programmer,
    • 0:50:28can figure that out by calling strlen or just by using a loop
    • 0:50:32and figuring out where that backslash 0 actually is.
    • 0:50:35So that is enough information to pass it around.
    • 0:50:37So if go ahead now and compile this, make compare3,
    • 0:50:41and then I go ahead and do ./compare3, let's go ahead and type in Brian
    • 0:50:47and Veronica, those are indeed still different.
    • 0:50:49Now let me go ahead and type in David and David, those are in fact the same.
    • 0:50:53So the training wheels are off, there is no such thing as string,
    • 0:50:56henceforth it's a char*.
    • 0:50:57Let's go ahead and take a quick break here for five minutes,
    • 0:51:00and we'll come back and dive in more.
    • 0:51:02All right.
    • 0:51:03So we are back, and let's go ahead and simplify this now,
    • 0:51:06as our tendency has been.
    • 0:51:07It's kind of a bunch of code, but I think
    • 0:51:09we can make this a little tighter.
    • 0:51:10But rather than type this one out manually,
    • 0:51:12let me go ahead and just open one of our pre-made examples
    • 0:51:14from today, which is all in the course's website, called compare4.
    • 0:51:18And you'll see in compare4, that's it.
    • 0:51:21I only have a main function this time.
    • 0:51:23I've gotten rid of my compare_strings function because you know what?
    • 0:51:26I seem to be using something instead.
    • 0:51:29What function did I apparently deploy?
    • 0:51:33Yeah, S-T-R-C-M-P, or someone with pronounce it,
    • 0:51:35just str compare or strcmp.
    • 0:51:37So this, like strlen, also succinctly named,
    • 0:51:41is just a function that's actually declared
    • 0:51:43in one of our familiar libraries up top, string.h,
    • 0:51:46and it turns out if you look in the man page, so to speak,
    • 0:51:49by typing man strcmp, or if you go to CS50 reference and actually
    • 0:51:53look at the less comfortable description of the function there,
    • 0:51:55this is just a function whose sole purpose in life
    • 0:51:57is to compare strings for you.
    • 0:51:59But it's a little different in behavior because it's
    • 0:52:01a little fancier than the one I just wrote.
    • 0:52:03Let me zoom in on this, and you'll see that line 14 here, I'm
    • 0:52:07not quite treating it in the same way.
    • 0:52:11My logic is ever so slightly different.
    • 0:52:14What am I actually checking for in my Boolean expression this time?
    • 0:52:20AUDIENCE: [INAUDIBLE]
    • 0:52:21DAVID MALAN: Yeah, which is a little weird.
    • 0:52:23I'm checking explicitly-- if strcmp's return value equal-equal to 0.
    • 0:52:28Before I just said, if compare_strings s comma
    • 0:52:33t, because I was expecting back a bool-- true or false. strcmp, kind of weird,
    • 0:52:38acts the opposite way.
    • 0:52:40It turns out that strcmp doesn't return true and false.
    • 0:52:43If you read its documentation, it returns 0 if the strings are equal,
    • 0:52:48but super conveniently, it returns a positive value
    • 0:52:52if s is supposed to come before t, and it returns a negative value
    • 0:52:56if s is supposed to come after t alphabetically.
    • 0:52:59So it turns out that you can use strcmp not just to compare for equality,
    • 0:53:03but inequality--
    • 0:53:04less than or equal--
    • 0:53:05less than or greater than, so to speak, alphabetically,
    • 0:53:08or in ASCII order, so to speak.
    • 0:53:10It will actually compare character by character the ASCII values,
    • 0:53:13and that will make sure that B comes after A,
    • 0:53:16and C comes after B, and so forth.
    • 0:53:18So you can actually use strcmp to like sort a dictionary,
    • 0:53:20or to sort the contacts in your iPhone or your Android phone.
    • 0:53:24So long story short, this is a function we can use,
    • 0:53:27we don't have to reinvent this wheel, and thus, we have no more code
    • 0:53:30even after this.
    • 0:53:30We just have to use it correctly, and there, the documentation
    • 0:53:33is your friend.
    • 0:53:34So if I run this program it's going to work exactly the same way,
    • 0:53:37but let me go ahead and point out some flaws.
    • 0:53:40It turns out all this time, I've been a little lazy with my error checking--
    • 0:53:44checking for errors.
    • 0:53:46There's a whole bunch of things that can go wrong in week 1 of CS50
    • 0:53:49that we just kind of turn a blind eye to, because it would just
    • 0:53:52bloat our code, make it longer and sort of less interesting and fun to write
    • 0:53:56and less comprehensible.
    • 0:53:57But today, now that we know what's actually going on,
    • 0:53:59we can begin to ask some additional questions
    • 0:54:01and make our code stronger, more robust so
    • 0:54:04that nothing does, in fact, go wrong.
    • 0:54:05Turns out, if you read the documentation for get_string in the man page
    • 0:54:08or in CS50 reference, turns out get_string
    • 0:54:11does return a string-- uh, not really.
    • 0:54:14It returns the address of a string.
    • 0:54:15Uh, not really.
    • 0:54:16It returns the address of the first byte of a string, technically.
    • 0:54:22But if something goes wrong, it returns a special character called null.
    • 0:54:26Not to be confused with NUL, it returns a special address called null--
    • 0:54:32left hand wasn't talking to right hand decades ago.
    • 0:54:34So null, N-U-L-L, just means the address 0, which nothing should ever live at.
    • 0:54:41It's just a bogus, invalid address.
    • 0:54:44Insofar as get_string returns the address of a string in memory,
    • 0:54:49like 100 for Brian or 900 for Veronica, if get_string ever
    • 0:54:53runs into a problem and just something goes wrong with the computer,
    • 0:54:56if it ever returns 0, specifically 0, a.k.a.
    • 0:55:01null-- N-U-L-L, then you can detect that something has gone wrong.
    • 0:55:07So to do that, and it's going to get a little tedious,
    • 0:55:10but it's nonetheless the right thing to do,
    • 0:55:11I need to be a little more defensive.
    • 0:55:14If s equals-equals null, otherwise known as 0, otherwise known as 0x0,
    • 0:55:21but I'll write it conventionally like this,
    • 0:55:23I'm going to go ahead and return 1 as my exit code.
    • 0:55:27If t equals-equals null, I'm going to go ahead and return 1 as my exit code,
    • 0:55:32or I could return 2 or 3--
    • 0:55:34I just need to return some value to signal to the computer
    • 0:55:36that something went wrong, but by default we'll
    • 0:55:38just return 1 whenever something goes wrong, but if all went well,
    • 0:55:43I'm going to go ahead and return 0.
    • 0:55:44So recall again from last week, and we didn't spend a huge amount of time
    • 0:55:47on this--
    • 0:55:48main itself can return values.
    • 0:55:50By default, ever since week 1, if you don't return anything,
    • 0:55:53main is automatically and secretly returning 0 for you because 0 is good.
    • 0:55:58The reason for 0 is because there's only one 0 in the world, obviously,
    • 0:56:02but there is an infinite number to the left
    • 0:56:03and there's an infinite number of the right, negative and positive.
    • 0:56:06That's great, because as you've already experienced in the past few weeks,
    • 0:56:09it feels like there's an infinite number of things that can go wrong when you're
    • 0:56:11writing even the shortest of programs.
    • 0:56:13So that means we have a lot of numbers we can assign to error codes,
    • 0:56:17so to speak.
    • 0:56:18Now I don't really care what the error codes are,
    • 0:56:20so I'm just going to adopt the human convention at the moment--
    • 0:56:23if anything goes wrong, returns anything other than 0.
    • 0:56:27And so I'm going to return 1 up here, but if nothing goes wrong, return 0.
    • 0:56:31The point here is that by adding these three lines here and these three
    • 0:56:36lines here, I'm going to avoid what's called
    • 0:56:38a segmentation fault or segfault. Did any of you
    • 0:56:42encounter this cryptic error?
    • 0:56:43OK.
    • 0:56:44So a decent number of you, and if you probably had no idea what that means,
    • 0:56:46but starting today you will a bit more, and in the weeks to come,
    • 0:56:49you'll understand even more.
    • 0:56:50Segmentation fault means you touched memory you should not have.
    • 0:56:54Or something went wrong and you did not detect it.
    • 0:56:58It's kind of a catch-all phrase for memory-related problems.
    • 0:57:01This helps ward off those kinds of errors.
    • 0:57:03It's not the only way, but it's one such way.
    • 0:57:06So starting today with problems set programs and anything
    • 0:57:09you write in the course, you always want to be thinking about,
    • 0:57:12even if you go back and add it later, could this go wrong?
    • 0:57:14Could this go wrong?
    • 0:57:16And just add some additional ifs and else-ifs
    • 0:57:18and handle those situations so that your program doesn't just crash on you
    • 0:57:21or segfault or surprise someone who's actually using it.
    • 0:57:25All right, let's take a look at one final example,
    • 0:57:28because frankly this is a little tedious.
    • 0:57:30I'm going to go ahead and open up--
    • 0:57:32and this file can be found in compare5.c.
    • 0:57:34Let me go ahead and save this so that we have it-- compare5.c.
    • 0:57:39I'm going to make one final comparison example.
    • 0:57:41I'm going to save this as compare6.c.
    • 0:57:43Turns out that humans like their succinctness.
    • 0:57:46And null, because it is technically the 0 address,
    • 0:57:50you can actually be a little clever.
    • 0:57:52If not s and if not t is a sufficient way to express those same things.
    • 0:57:59Because what does the bang do?
    • 0:58:00The exclamation point in code if you recall?
    • 0:58:04It inverts something.
    • 0:58:05So like if this is saying, if s is not 0, a.k.a., if s not null, or rather--
    • 0:58:15if-- now I'm getting confused.
    • 0:58:18Yes.
    • 0:58:18If I had just said, if s, then it's a valid address
    • 0:58:21and I should go on with my business.
    • 0:58:23But if it's not s or if s is null, I want
    • 0:58:28to go ahead and return 1 because there's an error, and down here too.
    • 0:58:31So any time you're checking whether something equals null,
    • 0:58:33you can make it more succinct by just saying if not s; if it's null,
    • 0:58:37return 1.
    • 0:58:38If it's null, return 1.
    • 0:58:39It's just syntactic shorthand.
    • 0:58:42Phew!
    • 0:58:43I had to think about that one.
    • 0:58:45Any questions?
    • 0:58:45AUDIENCE: Why does [INAUDIBLE] will store some [INAUDIBLE]
    • 0:58:53DAVID MALAN: Correct.
    • 0:58:54You are storing an address, but if that address is 0.
    • 0:58:58Saying if it's not 0, 0 is like false, so not false means true,
    • 0:59:04and so it has the effect of inverting the logic.
    • 0:59:07That's all.
    • 0:59:08Anytime you use a bang or exclamation point, it changes a 0 to non-0--
    • 0:59:12AUDIENCE: [INAUDIBLE], but even--
    • 0:59:14I don't understand why [INAUDIBLE] implies that it's [INAUDIBLE]..
    • 0:59:20DAVID MALAN: So you can think about it this way.
    • 0:59:22If s-- previously we had this.
    • 0:59:24If s equals-equals null is like saying if s literally equals 0.
    • 0:59:30And you can kind of think of that informally as
    • 0:59:32if s doesn't have a valid pointer--
    • 0:59:340 is not a valid point or it's not a valid address by definition.
    • 0:59:38100 is valid, 900 is valid, 0 is not valid just by a human convention.
    • 0:59:42So this is like saying, if s does not have a value, that's valid.
    • 0:59:46So the way to succinctly say that, if not s,
    • 0:59:52and it's just shorthand for that is another way to think about it.
    • 0:59:55All right, so let's take a look at a very different program,
    • 0:59:58but that reveals the same kind of issue as follows.
    • 1:00:02I'm going to go ahead and open up an example called
    • 1:00:05copy0, whose purpose in life hopefully is to copy a string.
    • 1:00:09So notice that in my program here, which I
    • 1:00:11wrote in advance, I'm getting a string from the user on line 11,
    • 1:00:15and I'm storing it in a string called s.
    • 1:00:17I could change this to char* now, but we know what it is.
    • 1:00:20And I'm going to go ahead and copy the string's address from s into t.
    • 1:00:24And then I'm going to say, if the length of t is greater than 0,
    • 1:00:29then go ahead and just capitalize the first character.
    • 1:00:31So it's a little cryptic, but you might have
    • 1:00:32done something kind of like this with Caesar and with recent string
    • 1:00:35manipulation.
    • 1:00:35This is just making sure, do I have at least one character?
    • 1:00:38And if so, first character is t bracket 0, as you recall.
    • 1:00:42toupper is a function in ctype.h from last week
    • 1:00:45that just capitalizes this letter.
    • 1:00:46So this one line of code, 19, just capitalizes the first letter
    • 1:00:50in t, that's it.
    • 1:00:51And then at the very end we just print out what s is and print out what t is.
    • 1:00:54That's all.
    • 1:00:55So this program just copies s into t, capitalizes t, and that's it.
    • 1:00:59So let me go ahead and make copy0.
    • 1:01:02This is in our code from today.
    • 1:01:03So I'm going to do cd sc3, because I already wrote it in that directory.
    • 1:01:07make copy0.
    • 1:01:08Went well. ./copy0.
    • 1:01:12Let's go ahead and type in tj again in lowercase.
    • 1:01:17Enter.
    • 1:01:18Huh.
    • 1:01:19TJ, TJ-- both are capitalized.
    • 1:01:21All right, maybe it's just a weird thing with initials.
    • 1:01:24So let's just do Veronica, all lowercase.
    • 1:01:28Huh, that's definitely capital.
    • 1:01:30Let's do even more obvious difference, Brian where
    • 1:01:32the B's really going to look different.
    • 1:01:35Yet I'm only capitalizing t.
    • 1:01:38Well let's consider what's actually going on here.
    • 1:01:40In this case, when I'm getting a string from the user, s and t, and I type in,
    • 1:01:46for instance, brian in all lowercase, backslash 0, this, of course,
    • 1:01:51is just an array underneath the hood.
    • 1:01:54This is taking up six bytes here.
    • 1:01:56And when I store in s, s is a string.
    • 1:01:58So you know what?
    • 1:01:59We didn't do this before.
    • 1:02:00Let me actually create a variable, a chunk of memory for s and call it s.
    • 1:02:05And suppose Brian is just where he was before--
    • 1:02:07100, 101, 102, 103, 104, and 105.
    • 1:02:13So if I do s equals get_string and get_string returns Brian,
    • 1:02:18what do I write in the box called s?
    • 1:02:21Yeah, just 100, right?
    • 1:02:22This is all that's been going on all this time
    • 1:02:24even though we didn't talk about it at this level.
    • 1:02:27And actually, it turns out-- pointer actually can be used pictorially.
    • 1:02:30If you actually prefer to think about a pointer as being an address
    • 1:02:34or like kind of a map that leads you somewhere, another way a human
    • 1:02:37would typically draw a pointer-- because honestly,
    • 1:02:40who really cares that Brian is at address 100?
    • 1:02:41Like that is way too low level, that's week 0 stuff.
    • 1:02:45He's just pointing there.
    • 1:02:46So s is a pointer to that chunk of memory.
    • 1:02:49It happens to be 100, whatever, the arrow is how you would literally
    • 1:02:52point at the chunk of memory if you were drawing this on some notes.
    • 1:02:56So that, too, is correct.
    • 1:02:57So the problem arises here with that line of code.
    • 1:03:00When I actually try to copy s and store in t, think about what's going on.
    • 1:03:07The right-hand side is just s's value, which happens to be 100.
    • 1:03:10The left-hand side is just saying, hey computer, give me
    • 1:03:13another variable, first string, and call it t.
    • 1:03:16So that's like saying, hey, computer, give me another chunk of memory,
    • 1:03:19call it t, and then store s in it.
    • 1:03:22But what does it mean to store s?
    • 1:03:24Well what is s's value at this point in time?
    • 1:03:27It's the pointer to Brian, or it's technically--
    • 1:03:30I'll write both just for thoroughness-- it's literally the number 100.
    • 1:03:34So if you do t equals s, that is like saying put 100 there too,
    • 1:03:39and pictorially that's like saying this.
    • 1:03:42So at this point in the story, when I copy s into t,
    • 1:03:46the computer took me literally.
    • 1:03:48It did copy s into t, but what is s?
    • 1:03:51It's just the address.
    • 1:03:52It is not B-R-I-A-N backslash 0, it's just the address.
    • 1:03:56So when I then say, t bracket 0 gets toupper--
    • 1:04:00so let's look at this line of code.
    • 1:04:02The one line of code here that's highlighted,
    • 1:04:04when I say go to the 0th character of t and store
    • 1:04:07the uppercase version of that same character, you just follow the arrows.
    • 1:04:11If you ever played chutes and ladders as a kid,
    • 1:04:13you just kind of follow the arrow, see where you end up.
    • 1:04:16t bracket 0 is this location here, because again,
    • 1:04:19if this is a chunk of memory, per last week it's an array,
    • 1:04:22so you can also think of this as being bracket 0, this is bracket 1,
    • 1:04:26this is bracket 2, and so forth.
    • 1:04:30So it's just an array.
    • 1:04:31So t bracket 0 is lowercase b, and toupper of lowercase b,
    • 1:04:36of course, changes this little b to a B. But now
    • 1:04:40both s and t are still pointing at the same chunk of memory,
    • 1:04:43so of course s and t are both going to be Bryan capitalized,
    • 1:04:47or TJ too in my first example.
    • 1:04:51Any questions then on what we just did and why that happens?
    • 1:04:57All right, so intuitively what's the fix?
    • 1:04:59Doesn't matter if you've no idea how to code it,
    • 1:05:01like what do we have to do to fundamentally copy a string, not
    • 1:05:04an address?
    • 1:05:06AUDIENCE: [INAUDIBLE]
    • 1:05:08DAVID MALAN: Create a new what?
    • 1:05:10AUDIENCE: Basically create the [INAUDIBLE]..
    • 1:05:12DAVID MALAN: Yeah.
    • 1:05:13Create the same string in a new chunk of memory.
    • 1:05:15What I really need to do is allocate or give myself
    • 1:05:17a bunch of more memory that's just as big as Brian,
    • 1:05:21including his backslash 0.
    • 1:05:24And then logically I just need to copy every character into that.
    • 1:05:28So if I go back to my original when it was a lowercase b,
    • 1:05:31I need to make a copy logically by using a for loop or a while loop
    • 1:05:34or whatever you prefer--
    • 1:05:35B-R-I-A-N backslash 0, so that when I copy the string and then store it in t,
    • 1:05:42It's not actually copying literally s.
    • 1:05:45And let's suppose that he ends up at location 300 just arbitrarily--
    • 1:05:49just making up easy numbers.
    • 1:05:51t now stores 300, points here.
    • 1:05:54So when I execute this line in this version of the story, t bracket 0
    • 1:05:59gets toupper, what am I actually doing?
    • 1:06:02I'm following a different arrow this time
    • 1:06:04because I gave myself a different chunk of memory, capitalizing this Brian,
    • 1:06:08thereby hopefully fixing the bug, albeit verbally only.
    • 1:06:13So how do we do this in code?
    • 1:06:14We need to do exactly that.
    • 1:06:16We need to give ourself some more memory,
    • 1:06:18so let's introduce one other feature of C. In copy1.c,
    • 1:06:23we see the solution to this problem.
    • 1:06:25Notice at the top I'm doing things a little lower level-- oop, surprise.
    • 1:06:31Notice in this version of the code, copy1.c,
    • 1:06:33see I've started off almost the same, but just to be super clear,
    • 1:06:37I'm just using char*.
    • 1:06:39I don't want any magic, so there's no string,
    • 1:06:41there's no training wheels here.
    • 1:06:42But this logically is the exact same as before--
    • 1:06:45plus the error-checking.
    • 1:06:46This line is new.
    • 1:06:47And it looks a little funky, but let's see what's going on.
    • 1:06:51And this line of code here, what am I doing?
    • 1:06:54The left-hand side, that's shorter, let's start with the easier one.
    • 1:06:57Char* t, just in layman's terms, what does that expression do? char*?
    • 1:07:02Hey computer, do what?
    • 1:07:06What's that?
    • 1:07:07AUDIENCE: [INAUDIBLE]
    • 1:07:08DAVID MALAN: Not quite yet.
    • 1:07:09Different formulation.
    • 1:07:14Hey computer, give me--
    • 1:07:17not quite.
    • 1:07:17Be more precise?
    • 1:07:19AUDIENCE: An array?
    • 1:07:21DAVID MALAN: Not quite an array, just this part.
    • 1:07:23So let me hide all this.
    • 1:07:25If the star wasn't there--
    • 1:07:27I can't really do this very well.
    • 1:07:28So this-- yeah?
    • 1:07:29AUDIENCE: [INAUDIBLE] character?
    • 1:07:31DAVID MALAN: Good, I'll take that.
    • 1:07:33So hey computer, give me a pointer to a character.
    • 1:07:35Or even more low level, hey computer, give me
    • 1:07:37a chunk of memory in which I can store the address of a character.
    • 1:07:41I mean, it is that mundane.
    • 1:07:42Draw a box on the screen, call it s-- or rather,
    • 1:07:46call it t, but just give me space for a pointer, as you said.
    • 1:07:49So that's all that's doing.
    • 1:07:50It's drawing a box on the screen and calling it t, and it's currently empty.
    • 1:07:54Now let's look at the scarier part on the right-hand side.
    • 1:07:56malloc, new function today.
    • 1:07:58Stands for memory allocates.
    • 1:08:00It's very cryptic-sounding, but it just means give me a chunk of memory.
    • 1:08:03It says exactly what you said in functional terms.
    • 1:08:05Then it just needs you to answer one question--
    • 1:08:07OK, how much memory do you want?
    • 1:08:09How many bytes do you want?
    • 1:08:11And now maybe the math, even though cryptic at first glance, makes sense.
    • 1:08:15Get the string length of s, add 1, and then multiply it
    • 1:08:19by the size of a character.
    • 1:08:21And we've not seen this before. sizeof literally does that.
    • 1:08:23It tells you how many bytes is a char.
    • 1:08:26Happens to be 1, and in fact, that's defined.
    • 1:08:28So if we simplify this in C, the char is always 1 byte,
    • 1:08:32so this is equivalent to just multiplying by 1.
    • 1:08:35And obviously mathematically that's a waste of time,
    • 1:08:37so we can whittle this down to be even simpler.
    • 1:08:39I was just being thorough.
    • 1:08:41So now, hey computer, allocate me this many bytes of memory.
    • 1:08:45Why is it plus 1?
    • 1:08:46AUDIENCE: You need the null character.
    • 1:08:48DAVID MALAN: I need that null character.
    • 1:08:50Brian is 1, 2, 3, 4, 5 as he said, but I need the sixth for his null character,
    • 1:08:54and I just know that's going to be there.
    • 1:08:56So at this point in the story, what has happened?
    • 1:08:59All that malloc does is it gives me this box of memory
    • 1:09:04containing room for as many bytes are in Brian's name.
    • 1:09:07But it doesn't fill them just yet.
    • 1:09:09Now I need to logically fill those bytes with Brian's actual name.
    • 1:09:13So if we scroll down to my for loop here,
    • 1:09:15we can actually copy the string into that space.
    • 1:09:18And it's a little long, the expression, but nothing new here.
    • 1:09:21Initialize i to 0, n to the length of s, i is less than or equal to n--
    • 1:09:28we'll come back to that, i++.
    • 1:09:30So it's just a pretty standard for loop.
    • 1:09:32Then copy the i-th character of s into the i-th character of t.
    • 1:09:36The only thing that's making me a little nervous honestly is this thing here.
    • 1:09:40Like I feel like every time we do less than or equal to,
    • 1:09:43we create a bug like last week.
    • 1:09:45But this is correct, why?
    • 1:09:50Why do I want to go up to and through the length of this?
    • 1:09:54AUDIENCE: Is it the null character that adds--
    • 1:09:56DAVID MALAN: Exactly.
    • 1:09:57Because of the null character.
    • 1:09:58I actually don't want to stop at the strlen of s, so I could change this.
    • 1:10:02If you're just more comfortable using less than, because you just
    • 1:10:04got your mind wrapped around why we do that in the first place, that's fine,
    • 1:10:08we just need to do this instead.
    • 1:10:11So this is mathematically-- if you go to strlen plus 1, the same thing
    • 1:10:16as not doing that math but just going one step further.
    • 1:10:18Just whatever you want to think about it is fine.
    • 1:10:20However you want to think about it is fine.
    • 1:10:22OK, and then lastly, just a quick check, is the length
    • 1:10:25of t at least one or more characters?
    • 1:10:27Because otherwise there's nothing to capitalize, and if so,
    • 1:10:29go ahead and do it.
    • 1:10:31So if I now run this example, make-- oop, let me save it.
    • 1:10:34make copy1, that compiled.
    • 1:10:37./copy1, now let's type in tj, tj in lowercase comes back,
    • 1:10:42but now t is capitalized.
    • 1:10:44And let's go ahead and do Brian's name in all lowercase, only one of them
    • 1:10:49is now capitalized.
    • 1:10:51So does that make sense what's now happened?
    • 1:10:54All right.
    • 1:10:54So where can we go with this?
    • 1:10:57Well it turns out-- let me open up one final example here,
    • 1:11:00because honestly, that's incredibly tedious,
    • 1:11:02and no one's ever going to want to copy strings if you
    • 1:11:04have to go through all of that work.
    • 1:11:05Turns out that store copy exists.
    • 1:11:08So when in doubt, check the man page.
    • 1:11:11When in doubt, check CS50 reference.
    • 1:11:12Does the function exist somewhere related
    • 1:11:15to some keywords you have in mind?
    • 1:11:16Like string copy, see if something comes back.
    • 1:11:18And indeed, we've had strlen, we've had strcmp, we now have strcpy,
    • 1:11:22and if you read the documentation, this is deliberately reversed like this.
    • 1:11:25The destination is this variable, the source or the origin string
    • 1:11:30is this one, and it copies from one end to the other,
    • 1:11:32and then I don't need that for loop.
    • 1:11:35It just saves me a few lines of code.
    • 1:11:37All right.
    • 1:11:38So let's take off one other detail here.
    • 1:11:41Oh, and you'll notice, actually, let me make one fix, one fix here.
    • 1:11:46It turns out that what I'm doing here is a little lazy.
    • 1:11:50It turns out that malloc does have an opposite.
    • 1:11:54So anytime you allocate memory, technically
    • 1:11:57you should also be freeing that memory.
    • 1:11:59And so C allows you to ask the computer for as much memory as you want,
    • 1:12:02but if you never give it back, have you ever experienced on your own Mac or PC,
    • 1:12:06like after your computer's been running a while
    • 1:12:08or using some new or bloated program like a browser,
    • 1:12:10it gets slower and slower and slower?
    • 1:12:13And in the worse case it just freezes or hangs or something?
    • 1:12:16It is quite possible that that program simply-- was made by humans,
    • 1:12:19of course--
    • 1:12:20just has a memory leak.
    • 1:12:21So some human wrote one or more lines of code that uses malloc
    • 1:12:25or some equivalent in another language that just kept allocating memory
    • 1:12:28for the user's input.
    • 1:12:29You're visiting one web page, two web pages,
    • 1:12:31that requires memory whatever the program is.
    • 1:12:33And if that human never calls the opposite of allocate-- deallocate,
    • 1:12:37otherwise known as free, you're never giving the memory back
    • 1:12:40to the operating system.
    • 1:12:41So it gets slower and slower because it's running lower and lower and lower
    • 1:12:45on memory, and it might have to move some things around
    • 1:12:47to make room for things, that's what's called a memory leak.
    • 1:12:50And so indeed, in this program, I should actually improve this a little bit.
    • 1:12:54If I go back into this version here and line 18, recall,
    • 1:12:58I allocated this memory just to make my copy,
    • 1:13:01the very last thing I should actually do in this program
    • 1:13:04is this line here-- free.
    • 1:13:05You don't have to tell the computer how many bytes you want to free,
    • 1:13:08it will remember for you so long as you're just pass in the pointer--
    • 1:13:12the variable that's storing the address of the chunk of memory
    • 1:13:16that you allocated.
    • 1:13:18All right.
    • 1:13:19So let's now see why we've been using get_string,
    • 1:13:23since it's not just to kind of simplify the code,
    • 1:13:25it's also to defend against some very easy problems.
    • 1:13:28Here is a program called scanf0--
    • 1:13:31scanned formatted text, another arcane-sounding function,
    • 1:13:35but it's pretty straightforward.
    • 1:13:36This program simply gets in from the user using scanf.
    • 1:13:39Up until now for the past three weeks, you've used get_int.
    • 1:13:42So this is an alternative to get_int that you could
    • 1:13:45have started using a few weeks ago.
    • 1:13:48Give me an int called x, print out x colon whatever--
    • 1:13:51that's just the prompt to the user.
    • 1:13:53scanf %i, &x;, whatever that is, and then print out x's value using %i.
    • 1:14:01So what's going on here?
    • 1:14:02Now today we can actually start to wrap our minds around what get_int actually
    • 1:14:06does.
    • 1:14:06This is effectively get_int.
    • 1:14:07If you actually look at the source code for get_int, it's a little fancier.
    • 1:14:11But in essence, what get_int does is it declares a variable called x,
    • 1:14:13and it doesn't put anything there, because that's
    • 1:14:16supposed to come from you, the human.
    • 1:14:17It then prompts you for whatever string you pass to get_int,
    • 1:14:20so those are the first two lines.
    • 1:14:22And this is the only weird-looking one.
    • 1:14:24Scanf is like the opposite of printf.
    • 1:14:26You still use a formatted string-- %s, %i, %f or whatever,
    • 1:14:31but you're not going to output this, you're going to input this from
    • 1:14:35the human's keyboard.
    • 1:14:37And %x is the opposite of--
    • 1:14:41is the special symbol in C that says, go ahead and get me the address of x.
    • 1:14:49So don't pass in x, give me the address of x.
    • 1:14:52Now why is that?
    • 1:14:53We'll see, but this is the way where you can tell the computer,
    • 1:14:56I've made a variable for you called x, here is where it is.
    • 1:14:59It's a treasure map that leads you to x, go put a value here for me.
    • 1:15:03And so the end result is that we do, in fact, end up getting an int.
    • 1:15:06If I do make scanf0, and then ./scanf0, I'll type in 42, all right?
    • 1:15:13It's not an interesting program, it just spits back out what I got,
    • 1:15:16but that's literally all that get_int, of course,
    • 1:15:18is doing if you then print out the value.
    • 1:15:20So if I stipulate this is correct, this is how you get an int from the user,
    • 1:15:24but honestly, the reason we don't do this in week 1 of the course is like,
    • 1:15:27my God, we just took the fun out of even getting a simple number from the user
    • 1:15:31by using these lines of code and whoever knows
    • 1:15:32what this symbol is-- we don't want you to think about that,
    • 1:15:35we want you to just get an int.
    • 1:15:36But today those training wheels are off, but we're
    • 1:15:39going to run into a problem super fast.
    • 1:15:41Let's try the same thing with a string.
    • 1:15:44If I were to do this, you would think that the result is the same.
    • 1:15:49Or let's just do it as char*.
    • 1:15:52But there's going to be one tweak.
    • 1:15:54If I go ahead and give myself space for the address of a character,
    • 1:15:59I don't need to use the ampersand now, because scanf
    • 1:16:01does need to be told where the chunk of memory is,
    • 1:16:04but it's already an address, so I don't need the ampersand here.
    • 1:16:08Recall earlier, I declared int x, which was just an int.
    • 1:16:11%x gets the address of that int.
    • 1:16:14Here, I'm saying from the get-go, get me the address of a char.
    • 1:16:19I don't need the ampersand cause I already have the address of a char
    • 1:16:22by definition of that star symbol.
    • 1:16:24So what's going on here?
    • 1:16:26Let me see now.
    • 1:16:27If I run scanf1, what happens?
    • 1:16:30So make scanf1 and--
    • 1:16:33oh, let's see.
    • 1:16:34Here's a warning I'm getting.
    • 1:16:35Variable s is uninitialized when used here.
    • 1:16:37All right, that's fine.
    • 1:16:38It wants me to initialize it because this is a very common mistake.
    • 1:16:41Those of you who alluded to segmentation faults
    • 1:16:43earlier might have encountered something similar in spirit to this.
    • 1:16:46So that squelched that error.
    • 1:16:47Let me go ahead and run scanf1.
    • 1:16:49All right, here we go, TJ.
    • 1:16:51Hmm.
    • 1:16:52That is not your name, but OK.
    • 1:16:54It didn't crash at least, it's just a little weird.
    • 1:16:57David.
    • 1:16:58Null, OK, that's a little weird.
    • 1:16:59Let's go ahead and do this again.
    • 1:17:01Let's type in a really long name.
    • 1:17:06Enter.
    • 1:17:07Dammit, that didn't work.
    • 1:17:09So let's try an even longer name.
    • 1:17:16I'm hitting paste a lot.
    • 1:17:19OK-- dammit.
    • 1:17:21Too many times.
    • 1:17:23Command not found, that's definitely not a command.
    • 1:17:26Wow, OK.
    • 1:17:30Well that's interesting.
    • 1:17:32Oh, there it is.
    • 1:17:32Null, same thing.
    • 1:17:33OK, so what's actually going on?
    • 1:17:36Well null, which is all lowercase here, which
    • 1:17:38is this kind of an aesthetic thing, well it's not working.
    • 1:17:41It's not working.
    • 1:17:42Well what am I actually doing?
    • 1:17:44In that first line of code, when I say give me s to be a char*,
    • 1:17:49otherwise known as a string, all that's doing is allocating this.
    • 1:17:52And it's technically the size of a pointer.
    • 1:17:54A pointer, we never mentioned this before, but now we can.
    • 1:17:57Turns out it is 64 bits or 8 bytes.
    • 1:18:028 bits is 1 bytes, so a pointer is by definition on many computers these
    • 1:18:07days-- most of your Macs, most of your PCs, the IDE, the Sandbox, the Lab--
    • 1:18:11is 64-bit.
    • 1:18:12So that just means there's 64 bits here, but we initialized it to null,
    • 1:18:16so that just means there's 64 0's here, dot-dot-dot.
    • 1:18:20But when I get a string using scanf, what
    • 1:18:24I'm telling the computer to do with this line of code here,
    • 1:18:26notice, is hey computer, go to that address and put a string there.
    • 1:18:31So what's actually happening?
    • 1:18:34It turns out that there's just not enough room to type in TJ.
    • 1:18:37There's not enough room--
    • 1:18:38that's a bit of a white lie, because we could fit you in 64 bits,
    • 1:18:41but there's not enough room to type in the long sentence or paragraph of text
    • 1:18:45I did, right?
    • 1:18:46What did we not do?
    • 1:18:47We didn't allocate any space over here.
    • 1:18:49All we allocated space for was the address.
    • 1:18:51And so every time I use scanf saying, get me a string and put it here,
    • 1:18:55there's nowhere to put it.
    • 1:18:57And so the value just very defensively says, no, like no,
    • 1:19:00cannot store this anywhere for you.
    • 1:19:03So I actually need to be a little smarter about this.
    • 1:19:05I actually need to get myself some space so that I can actually store something
    • 1:19:10in the right place.
    • 1:19:11Let's do that.
    • 1:19:12Let me go ahead and create a new program.
    • 1:19:15I'm going to go ahead and call this scanf2.
    • 1:19:21We need a little secret code to remind me of that.
    • 1:19:25Oh, wrong file name.
    • 1:19:27So I'm gone ahead and create a file called scanf2.
    • 1:19:30scanf2.c.
    • 1:19:32And I'm going to quickly recreate this stdio.h, int main void,
    • 1:19:37and then down here I'm going to go ahead and-- you know what?
    • 1:19:39Instead of a string s, which I know today to be a char* s,
    • 1:19:44what is this string really?
    • 1:19:45Well you said it earlier.
    • 1:19:46What is this string?
    • 1:19:48It's an array of characters.
    • 1:19:49Let me take you literally.
    • 1:19:51Just give me an array of let's say five characters.
    • 1:19:54The D-A-V-I-D, or one more, that's fine, just enough for my backslash 0.
    • 1:19:58Let me just create a string-- really low level,
    • 1:20:01but this time give myself the chunk of memory.
    • 1:20:03I don't want just the address of a character,
    • 1:20:05I want the actual characters themselves.
    • 1:20:08Let me go ahead and just prompt the human for their string with s,
    • 1:20:11just like before.
    • 1:20:12Then let me call scanf and get a string from the user using %s and then pass
    • 1:20:16in s.
    • 1:20:17And here's a little trick.
    • 1:20:18It turns out that because a string is really just an array,
    • 1:20:22but a string is also just a pointer, you can actually treat
    • 1:20:25an array as though it is a pointer--
    • 1:20:28an address.
    • 1:20:29And so even though this is a char* array, this is OK.
    • 1:20:33This is the equivalent in this context to being just the address of a string.
    • 1:20:37Because strings are arrays, arrays can be treated as pointers as of now.
    • 1:20:41And then let me go ahead and just print out whatever the human typed in.
    • 1:20:44S is actually this.
    • 1:20:46Pass in s;, save.
    • 1:20:49Yeah?
    • 1:20:49AUDIENCE: So [INAUDIBLE] char*?
    • 1:20:52DAVID MALAN: At this point it would be redundant to do char*,
    • 1:20:55because I literally want for this story six characters.
    • 1:20:58I want space, rather, for six characters.
    • 1:21:01So this is kind of week 2 stuff now, there's no pointers involved.
    • 1:21:05But again, just showing the equivalence of these ideas for now.
    • 1:21:08So if I now go into this, and this is in my other directory at the moment,
    • 1:21:12make scanf2, Enter, ./scanf2, s is going to type in--
    • 1:21:19I'll type in my name, I know I can fit that, we're back in business.
    • 1:21:22Like now it's working because I didn't just create the address for a string,
    • 1:21:26I created the space for the string.
    • 1:21:27But let me get a little dangerous--
    • 1:21:31David Malan?
    • 1:21:32OK, that kind of worked out OK.
    • 1:21:35David Malan or some really long other name?
    • 1:21:40OK, that worked out too.
    • 1:21:42Let me go ahead and run it again.
    • 1:21:44Let me try that really long string again, see what happens.
    • 1:21:48I know this didn't work very well last time.
    • 1:21:49All right, done.
    • 1:21:51Ooh, OK.
    • 1:21:53So now I'm in the club of those of you who have had segmentation faults.
    • 1:21:57So let's understand what's going on here.
    • 1:21:59Segmentation fault a moment ago I claimed
    • 1:22:01was touching a segment, a chunk of memory that's not your own.
    • 1:22:05So just happened?
    • 1:22:06Well with this simple program, I told the computer, hey computer,
    • 1:22:09give me room for six characters, give me six bytes.
    • 1:22:13With the scanf line, I'm telling the computer, put the following user
    • 1:22:18input at that location, in that array of characters.
    • 1:22:22D-A-V-I-D backslash 0 fit.
    • 1:22:24David Malan didn't really, but it didn't seem to be a huge deal.
    • 1:22:27David Malan or some really long other name, also didn't crash the computer.
    • 1:22:33But that's because unbeknownst to us, usually when you ask for six bytes,
    • 1:22:36the computer is kind of sort of-- it's giving you a few extras.
    • 1:22:38It's not safe to use them, but it gives you enough
    • 1:22:41that you're not going to necessarily see a problem like a segmentation fault.
    • 1:22:44But it only allocates a few extra bytes typically,
    • 1:22:47so if you really keep pasting in long, long, long, long lines of text,
    • 1:22:51eventually you're going exceed not only those six
    • 1:22:53bytes, but well past the special--
    • 1:22:55the secret bytes that you got back that you shouldn't be using anyway,
    • 1:22:58and that point the computer just gives up and says,
    • 1:23:00you are touching memory you shouldn't, a.k.a.
    • 1:23:03segmentation fault.
    • 1:23:04AUDIENCE: [INAUDIBLE] if the computer gives you
    • 1:23:06a few extra bytes, then why isn't it printing any of the other stuff?
    • 1:23:10After you said [INAUDIBLE] it just printed David.
    • 1:23:14DAVID MALAN: Really good question.
    • 1:23:15So even though I'm getting these sort of extra bytes,
    • 1:23:18why am I not seeing them after D-A-V-I-D?
    • 1:23:20I'm probably getting lucky.
    • 1:23:21Long story short, when you first run a program,
    • 1:23:24much of the memory that your program has access to is by default initialized
    • 1:23:28to 0's.
    • 1:23:290 is the same thing as backslash 0, and so I'm getting lucky.
    • 1:23:33When I had D-A-V-I-D and then excess space in that array,
    • 1:23:37a lot of them are initialized as 0's already,
    • 1:23:39and the string is getting secretly terminated for me.
    • 1:23:43Or the better answer is, it's undefined behavior.
    • 1:23:46Like you should not touch memory that is not your own.
    • 1:23:49What happens after that is your risk alone.
    • 1:23:52But that's a conjecture as to why that's happening.
    • 1:23:55All right, so what is the fundamental feature than get_int
    • 1:23:58is providing for us?
    • 1:23:59All of this time get_int has actually been dealing
    • 1:24:02with all of this headache for us.
    • 1:24:04I mean honestly, even I'm getting bored like thinking about, talking
    • 1:24:07about how you just get a damn string from the user,
    • 1:24:09because you need to figure out, well how many bytes do you need?
    • 1:24:12And what if the human types in one more bite than you were expecting?
    • 1:24:15Then you need to do a switcheroo and get more memory.
    • 1:24:17get_string is doing all of this headache for us.
    • 1:24:20And that's not to say you need to use it forever,
    • 1:24:22there are indeed training wheels, but that's
    • 1:24:23just because when you're using C or a lot of programming languages,
    • 1:24:27the computer will only do what you tell it to do.
    • 1:24:29And it turns out that even asking the user for input,
    • 1:24:31if you don't know how many characters he or she is
    • 1:24:34going to type in from the get-go, you have to deal with it.
    • 1:24:37And so underneath the hood-- and you're welcome to take a look at the source
    • 1:24:40code for CS50's library, which I'll post on the home page later today,
    • 1:24:44it turns out that with the way we're doing get_string is taking baby steps.
    • 1:24:48We literally like get one character at a time
    • 1:24:50from the user, kind of building the road as we go.
    • 1:24:54And if we don't have enough space, we ask the computer,
    • 1:24:56give me some more bytes so I can get more bytes,
    • 1:24:58and we just get one character at a time so
    • 1:25:01that we can handle the user maliciously or accidentally typing in way
    • 1:25:04more input than we actually expect.
    • 1:25:08So let's contextualize all of this then.
    • 1:25:10Recall that we've been drawing these pictures the past couple of weeks.
    • 1:25:12Let's just make this super clear as to what's been going on.
    • 1:25:15This is a memory module in a computer.
    • 1:25:17It's just a green board, it's way blown out of scale here,
    • 1:25:20it's easily like yea big inside of your Mac or PC laptop or desktop,
    • 1:25:24though can vary in size.
    • 1:25:25One of these black chips is the actual memory or the bytes
    • 1:25:28to which we've been referring.
    • 1:25:29And if we zoom in on that, recall that I proposed last week
    • 1:25:32that you can just think about this as like a grid, an array.
    • 1:25:35And it doesn't have to be rectangular, this is just an artist's rendition,
    • 1:25:38but each of those squares represents, we claimed, a byte.
    • 1:25:41And each of those bytes can be addressed in some way with a number.
    • 1:25:44And that number is just its location, otherwise known as an address.
    • 1:25:49We can actually see this, it turns out, as follows.
    • 1:25:52Let me go ahead and open up this example here.
    • 1:25:54Or actually, you know, let's just write this one from scratch.
    • 1:25:57Let me write a program called addresses.c.
    • 1:26:01And that's going to use our old friends, the CS50 library and stdio.h and int
    • 1:26:09main void.
    • 1:26:11And let me go ahead and just do this.
    • 1:26:13I'm going to go ahead and get a string--
    • 1:26:15you know what?
    • 1:26:15No more string. char* from the user, get_string, ask the user for s.
    • 1:26:21And we get another string, a.k.a.
    • 1:26:23char*, get_string, call it t from the user.
    • 1:26:26And then, I want to print out not the strings, which I used to do like this,
    • 1:26:31printing out s.
    • 1:26:32I want to print out the pointer that s really is, that is the address.
    • 1:26:37Turns out %p for pointer will print out not the string at that memory location,
    • 1:26:42it will print the actual memory location for you of s.
    • 1:26:45And I can do the same thing here, %p, backslash 0, paste in t.
    • 1:26:50And just so I know which is which, let me just prefix it
    • 1:26:52with some text-- s colon and t colon.
    • 1:26:55Let me go ahead now down here and do make addresses.
    • 1:26:58Oh, I messed up, missed a semi-colon.
    • 1:27:02Let me do this again.
    • 1:27:03make addresses.
    • 1:27:07And get rid of this.
    • 1:27:09That compiled OK, ./addresses, and here we go.
    • 1:27:14Let's type in-- let's do Brian and Veronica like before.
    • 1:27:18Enter.
    • 1:27:18And this is a little funky, but it turns out the IDE in your Macs
    • 1:27:23and your PCs have a lot of memory.
    • 1:27:25So this is the address.
    • 1:27:26It's not quite as small as 100, it's not quite as small as 900.
    • 1:27:30It's actually kind of big.
    • 1:27:31It's 2331010 with this weird 0x.
    • 1:27:36Well it turns out, this is just a human convention.
    • 1:27:38In week 0 we talked about decimal and all of us
    • 1:27:40grew up with decimal, 10 digits from 0 to 9.
    • 1:27:43Talked a little bit about binary 0's and 1's.
    • 1:27:46Turns out there's an infinite number of base systems--
    • 1:27:48decimal/dec, binary/bi are just two of those infinite number of possibilities.
    • 1:27:53Turns out there's another one that's super common called hexadecimal.
    • 1:27:57Hexa meaning 16 in this case.
    • 1:27:59So base-16 actually has 16 letters in its alphabet.
    • 1:28:030, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f.
    • 1:28:11So it turns out that base systems that need to count higher than 10 characters
    • 1:28:15just start using letters of the alphabet by convention.
    • 1:28:18Humans just decided this.
    • 1:28:19So we're getting just numbers in this case,
    • 1:28:22but if these addresses were even bigger, we
    • 1:28:24might actually see some alphabetical letters between a and f there.
    • 1:28:30And frankly I don't know what address this is,
    • 1:28:32but Google's usually pretty good at this stuff,
    • 1:28:34so let me actually open up another browser window.
    • 1:28:39So Google is your friend when it comes to this stuff, so let me actually open up another browser window.
    • 1:28:39So Google is your friend when it comes to this stuff,
    • 1:28:41or any number of calculators.
    • 1:28:420x2331010 in decimal please.
    • 1:28:47And Google has translated that.
    • 1:28:48So Brian, I-- kind of under a bit earlier.
    • 1:28:51He is not at address location 0, he's actually
    • 1:28:54in the 36 millionth byte inside of my computer
    • 1:28:58right now, location 36,900,880.
    • 1:29:02So a little higher address than 100.
    • 1:29:05And then Veronica, if you really want to get into the weeds here,
    • 1:29:09we can say "in decimal," let Google translate that for us.
    • 1:29:12She's at location 36,900,944.
    • 1:29:16Why?
    • 1:29:16Who cares?
    • 1:29:17The computer is managing all of this for us, but when get_string used malloc,
    • 1:29:22these are literally the numbers that were being returned saying,
    • 1:29:26you may use this chunk of memory.
    • 1:29:28And why did humans use hexadecimal?
    • 1:29:30Like it's just slightly more compact to say 0x2331050, then 36900944--
    • 1:29:39like you just save a few digits, so it's just conventional.
    • 1:29:41That's all, there's no magic there.
    • 1:29:43But, recall earlier.
    • 1:29:44Do you recall that when I had the debugger open earlier,
    • 1:29:47you saw next to my name variable a value that was cryptically 0x0?
    • 1:29:51Then there was another value that I don't recall--
    • 1:29:530x-something?
    • 1:29:55That was just the numeric address of my name in hexadecimal.
    • 1:30:00And 0x0 is just the technical address being used by null.
    • 1:30:06Yeah?
    • 1:30:06AUDIENCE: You said the address printed out was [INAUDIBLE] x of the variable s
    • 1:30:12and--
    • 1:30:13DAVID MALAN: Sorry, could you say that again?
    • 1:30:13AUDIENCE: You said the address printed out on the screen was an x,
    • 1:30:17but x is [INAUDIBLE]
    • 1:30:20DAVID MALAN: Ah, I should've clarified.
    • 1:30:210x, humans years ago decided anytime you see anything
    • 1:30:24with 0x, that means whatever comes next is hexadecimal.
    • 1:30:29Just the convention.
    • 1:30:30It's also common too if it starts with a 0, it's an octal, which is base-8.
    • 1:30:35If you see a lowercase b at the end, it means binary.
    • 1:30:37So humans have just come up with symbology
    • 1:30:39as to kind of communicate this to readers, that's all.
    • 1:30:41Not part of the value.
    • 1:30:42So turns out that we can actually do this math ourselves.
    • 1:30:45And we won't really get into the weeds of this
    • 1:30:47because it's not a particularly useful life
    • 1:30:50skill, to be able to convert to various base systems,
    • 1:30:52but let's just do one example so that we've seen it.
    • 1:30:54Just to make clear that there's no magic here,
    • 1:30:56it's just a different way of thinking about numbers versus grade school.
    • 1:30:59So if back in the day we had three decimal numbers--
    • 1:31:01255, 216, and then another 255, if we rewound to week 0,
    • 1:31:06we could go through the math of converting that to binary.
    • 1:31:09And even if it might take you a little while, this is the binary equivalent.
    • 1:31:12And frankly, the first and last are kind of easy.
    • 1:31:15255 is kind of a special value because with 8 bits, all of which
    • 1:31:19are 1, that's what gives you 255.
    • 1:31:21So the only hard one is actually this.
    • 1:31:23But who cares about the math today.
    • 1:31:25We know from weeks ago that we can do this if we really tried.
    • 1:31:28But notice that bytes are eight bits, and of course, eight is a pair of four,
    • 1:31:35if you will.
    • 1:31:36Well what's really nice about hexadecimal is that it starts at 0
    • 1:31:40and ends at f.
    • 1:31:41And that's 0, 1, 2, 3, 4, 5, 6, 7, 8, 9--
    • 1:31:46wait-- yes, that's 10.
    • 1:31:47OK.
    • 1:31:48And then a, b, c, d, e, f.
    • 1:31:51I just held up 16 fingers in total, hence, hexadecimal.
    • 1:31:54What's nice about base-16 is that how many bits do I need to count from 0 up
    • 1:32:00to--
    • 1:32:02one, two, three, four--
    • 1:32:0315?
    • 1:32:05Just 4, right?
    • 1:32:06So if I have all 0 bits, that's 0.
    • 1:32:09And if I have 4 1-bits, that's--
    • 1:32:13let's see.
    • 1:32:13This is an 8 plus 4 plus 2 plus 1 gives me 15.
    • 1:32:18So long story short, hexadecimal's super convenient because 0 through f
    • 1:32:22maps wonderfully cleanly to 4 bits.
    • 1:32:25So it's just a nice way of thinking about the world not in units of 8
    • 1:32:28but in 4 instead.
    • 1:32:29So all I did here was I took my values and I just
    • 1:32:31added a little bit of whitespace to make clear
    • 1:32:33that 8 bits is like a pair of 4 bits.
    • 1:32:35It turns out now that 1 1 1 1 is f for the reasons I enumerated earlier.
    • 1:32:40All 1's is f, otherwise known as 15.
    • 1:32:44All 1's is again f, otherwise known as 15.
    • 1:32:47If we did the math, 1 1 0 1 is d, 1 0 0 0 is 8, and then all 1's is f and f.
    • 1:32:55So long story short, there is a way to convert from decimal
    • 1:32:58to binary, to hexadecimal, to any number of other base systems.
    • 1:33:01It all just boils down to what digits you care about.
    • 1:33:03And the way you write this, to your question earlier,
    • 1:33:05is by human convention.
    • 1:33:06Not just FFDAFF, but 0xFF0xD80xFF just because.
    • 1:33:12Then it's clear to the user what it is.
    • 1:33:14So a little levity now.
    • 1:33:16I'm sorry to do this to you, but now you will all hopefully
    • 1:33:19understand this famous comic.
    • 1:33:26OK, welcome to that club of people who understand things like this.
    • 1:33:29So let's now stumble upon just one last problem,
    • 1:33:34and we'll take it home by putting into the context
    • 1:33:36a very sexy field of forensics where all of these building blocks
    • 1:33:41will come into play.
    • 1:33:42But first let's start with a problem.
    • 1:33:43Suppose I want to implement a function here called swap whose purpose in life
    • 1:33:47is just to swap two values, a and b.
    • 1:33:49I just want to do a switcheroo.
    • 1:33:50Let's first do this with a sort of mid-lecture snack for at least
    • 1:33:54one person.
    • 1:33:55Would anyone be up for--
    • 1:33:56OK, that was fast.
    • 1:33:57Volunteering, come on up.
    • 1:34:00What's your name?
    • 1:34:01Kelly, all right.
    • 1:34:02Thank you for volunteering so suddenly.
    • 1:34:07Kelly, David, nice to meet you.
    • 1:34:09OK, so very simple task at hand.
    • 1:34:11I have here two empty cups, and we have some orange juice.
    • 1:34:19OK, put this in here.
    • 1:34:22And we've got some milk over here.
    • 1:34:26That should stand out, very different colors.
    • 1:34:29OK, I would just like you, Kelly, if you could, swap those two values.
    • 1:34:34Orange goes into milk, milk goes into orange please.
    • 1:34:42That is cheating, OK?
    • 1:34:44No, I mean literally the cups.
    • 1:34:45I put them in the wrong cup, I prefer my milk
    • 1:34:47in the other cup and my orange juice in the other cup, I'm sorry.
    • 1:34:53AUDIENCE: Pour it back in.
    • 1:34:54DAVID MALAN: No, that is not available to you, OK?
    • 1:34:56[LAUGHTER]
    • 1:34:57OK, so you're struggling.
    • 1:34:59Why are you struggling?
    • 1:35:00KELLY: Because I'm going to mix them.
    • 1:35:01And then it won't be the same.
    • 1:35:03DAVID MALAN: Right.
    • 1:35:03So I mean obviously, this is kind of a losing proposition.
    • 1:35:06You can't really do this.
    • 1:35:07What would make this easier for you besides putting them back
    • 1:35:09in the bottles?
    • 1:35:10KELLY: Having another container.
    • 1:35:10DAVID MALAN: Yeah.
    • 1:35:11So you need like a temporary storage space for this.
    • 1:35:14You know, let me--
    • 1:35:15Tara, can we get some more cups over here?
    • 1:35:18Ah, this will make it easier.
    • 1:35:20OK, so if I get you some temporary space--
    • 1:35:22here you go-- could you solve the problem now please?
    • 1:35:28Ah, very nice.
    • 1:35:30A little contamination, but that's OK.
    • 1:35:35But I need that temporary cup back for Tara.
    • 1:35:37Yeah, OK.
    • 1:35:38Thank you.
    • 1:35:39All right, a round of applause if we could for Kelly here.
    • 1:35:42[APPLAUSE]
    • 1:35:44Well here we go.
    • 1:35:44I'm guessing you don't want warm milk, but orange juice?
    • 1:35:47OK.
    • 1:35:47Thank you so much.
    • 1:35:48All right, so what's the point here?
    • 1:35:50This is pretty easy.
    • 1:35:51Like once you have some temporary storage
    • 1:35:53space-- a variable, if you will, like it's no problem to swap two values.
    • 1:35:57So let me go ahead and do that as follows.
    • 1:36:00I'm going to go ahead and just implement this swap function
    • 1:36:02and see exactly as Kelly ultimately just implemented it.
    • 1:36:05If the goal is to swap a and b, I can't just do a complete switcheroo,
    • 1:36:09it seems.
    • 1:36:10I need to put one of those values, like the milk, in another container,
    • 1:36:13and then swap and then swap.
    • 1:36:15So it takes three steps, not just one.
    • 1:36:17All right, so I could call this extra variable or cup
    • 1:36:19that Tara gave us anything we want-- tmp.
    • 1:36:22So I'm just going to put a in tmp.
    • 1:36:25Then I'm going to put b in a, because a is now empty.
    • 1:36:28Then I'm going to put tmp in b, and then I don't really
    • 1:36:31care what happens to tmp-- indeed, it's just still sitting there,
    • 1:36:33but the job is now done.
    • 1:36:35So let's go ahead and see this program in action, because obviously this
    • 1:36:39should be pretty straightforward.
    • 1:36:40So let me go ahead and open up this program
    • 1:36:44in the context of a main function so we can actually run it.
    • 1:36:47In this code here, I'm going to demonstrate it as follows.
    • 1:36:51Here's my main function.
    • 1:36:52I'm going to call variable x, give it 1, call variable y,
    • 1:36:55give it 2, go ahead and just print out just for a quick sanity check--
    • 1:36:58x is this, y is that.
    • 1:37:00Then I'm going to call this super simple swap function, x, y.
    • 1:37:04Then I'm going to print the exact same thing-- x is this, y is that,
    • 1:37:08just so I can see in those variables--
    • 1:37:09I could also use debug50, but this is meant to be a complete solution,
    • 1:37:12I want to see it on the screen.
    • 1:37:13Here is swap.
    • 1:37:14I copy-pasted that from before.
    • 1:37:16This feels like a no-brainer, super straightforward,
    • 1:37:18let's go into my directory and compile this program, which, slight spoiler,
    • 1:37:23noswap is the name.
    • 1:37:26./noswap.
    • 1:37:29Oof.
    • 1:37:32Let's zoom in.
    • 1:37:33Nope, that is not what I intended, right?
    • 1:37:35I really intended milk to become OJ, OJ to become milk,
    • 1:37:38or x become y, y become x, this doesn't seem to work.
    • 1:37:41And again, the only magic is this one call to swap.
    • 1:37:44All right, maybe it just works some of the time.
    • 1:37:46So nope, nope-- OK.
    • 1:37:49Now it's time for the debugger.
    • 1:37:50I don't understand what's going on in my program,
    • 1:37:52printf is not really illuminating here.
    • 1:37:54So let me go ahead and run debug50 ./noswap.
    • 1:37:58The little debugging panels get opened on the side,
    • 1:38:00but wait, I need a breakpoint.
    • 1:38:02I'm going to start a breakpoint at the very top, the first line I care about.
    • 1:38:05I don't really care about all the stuff at the super top.
    • 1:38:08Now I'm going to go ahead and rerun debug50 ./noswap, all right?
    • 1:38:12Now I see over here, the first line 9 is highlighted.
    • 1:38:15Notice on the right-hand side, and this perhaps
    • 1:38:17answers by example your question earlier.
    • 1:38:19x and y conveniently, but just because we're initialized to 0--
    • 1:38:23not by me, I shouldn't necessarily trust this in all contexts,
    • 1:38:26but that's why they had values.
    • 1:38:28They're otherwise known as garbage values, but I got lucky with 0's here.
    • 1:38:31Let me go ahead and step over that line, and if you watch, albeit small,
    • 1:38:34on the right-hand side, x should suddenly take on a value of 1.
    • 1:38:39And if I step over one more line, y should take on a value of 2.
    • 1:38:43OK, so I'm pretty confident the program is thus far correct.
    • 1:38:45I'm going to go ahead and step over printf.
    • 1:38:48And notice the blue terminal window, I see one output.
    • 1:38:51Now things get interesting.
    • 1:38:53If I continue stepping over lines, it's just going to finish running
    • 1:38:56and that's not enough.
    • 1:38:58So notice this time I'm going to hover over this third icon, Step Into.
    • 1:39:01Now I can kind of go down the rabbit hole,
    • 1:39:03so to speak, and go into the swap function, and notice,
    • 1:39:07the debugger jumps into that other function.
    • 1:39:09So here now, the context changed.
    • 1:39:11My local variables are now a, b, and tmp, and this is really weird.
    • 1:39:15A is 1, b is 2, as expected, because I passed an x, y.
    • 1:39:21And in the context of this function I'm just calling them a, b because.
    • 1:39:25But why is tmp 32,767?
    • 1:39:29It's just because it can't be trusted, it's a garbage value.
    • 1:39:31If you just give yourself a temporary value, who knows what's in there?
    • 1:39:35We got lucky and Tara did not have anything in this cup,
    • 1:39:38but it could have had a garbage value, maybe it had some Pepsi,
    • 1:39:41and then we would have had to replace that value somehow.
    • 1:39:44So to be clear, when you declare variables in a program,
    • 1:39:47quite often they have garbage values, just bogus values--
    • 1:39:50the 0's and 1's that are there underneath the hood in that chip,
    • 1:39:53but that you didn't set yourself.
    • 1:39:55But that's OK, because I'm explicitly in this next line setting tmp equal to a.
    • 1:39:59So it doesn't matter what its original weird value was, so if I click Next,
    • 1:40:03tmp is now 1, a.k.a.
    • 1:40:06a.
    • 1:40:07Now notice a is going to become b if you watch the right-hand side.
    • 1:40:11Now I seem to have a is 2, b is 2, which is a little worrisome but not as bad,
    • 1:40:15because I have that separate variable tmp, so I still have the one around.
    • 1:40:18So now b is about to become 1, and I've done the switcheroo.
    • 1:40:22OK, at this point in the story, line 22, my code seems correct.
    • 1:40:27b has become a, a has become b, and the values are swapped--
    • 1:40:30and the debugger is confirming that for me visually.
    • 1:40:34So now, let's do a step and--
    • 1:40:39dammit.
    • 1:40:42Lost.
    • 1:40:43What is going on?
    • 1:40:45Intuitively?
    • 1:40:46Even if you've never seen or done this before, like clearly there's a bug.
    • 1:40:53What is that bug?
    • 1:40:55What must be happening?
    • 1:40:56Yeah?
    • 1:40:57AUDIENCE: [INAUDIBLE] a new value [INAUDIBLE]
    • 1:41:01doesn't have the same address for the first one?
    • 1:41:03DAVID MALAN: Yeah.
    • 1:41:04What seems to be happening here is yes, you're passing in x and y
    • 1:41:07and calling it a and b, but a and b would seem to be copies of x and y.
    • 1:41:12And I am very successfully, very correctly swapping a and b,
    • 1:41:16but because they're copies, it has no effect on the original x and y.
    • 1:41:20So our metaphor here of juice isn't quite apt
    • 1:41:22because I didn't pass Kelly copies of the OJ and milk,
    • 1:41:27I handed her the actual OJ and milk and she was able to change the values.
    • 1:41:31But in the context of C and code, when you pass arguments to a function,
    • 1:41:35you're passing copies of those arguments to the function.
    • 1:41:38So intuitively, what is the solution?
    • 1:41:40We clearly cannot pass from one function to another copies of the values if we
    • 1:41:44expect the function swap, or a.k.a.
    • 1:41:47Kelly, to make some useful change for us.
    • 1:41:50What do we have to pass to the function or to Kelly instead?
    • 1:41:55The addresses of those values, right?
    • 1:41:57I told her where the milk and OJ were.
    • 1:41:59I didn't give her copies of them, I told her, here's the milk,
    • 1:42:02here's the OJ, swap those.
    • 1:42:04In this version of the code, I've just said,
    • 1:42:06here's a copy of x, here's a copy of y, you can call them a and b-- um-mmm.
    • 1:42:10We need to now use the ampersand or something like that to pass in a map,
    • 1:42:14if you will.
    • 1:42:15The treasure map to those values so that swap can change the original values.
    • 1:42:20And the way we do this is a little weird-looking,
    • 1:42:22but all we're going to have to do is make a little addition here
    • 1:42:27that looks as follows.
    • 1:42:29It's got to look like this instead.
    • 1:42:33So this is the broken version.
    • 1:42:35Or broken in that it doesn't have the effect we intend even though it works.
    • 1:42:39This is what we need to do instead, and it's the last piece
    • 1:42:41of new symbology for today.
    • 1:42:42We've seen star in a couple of different places
    • 1:42:44before, now we're using it in one final context.
    • 1:42:47When you specify a star here and here in the arguments to a function, that
    • 1:42:53is just the way you tell the computer, I'm
    • 1:42:55expecting not an int, but the address of an int.
    • 1:42:57I'm expecting not an int here, but the address of an int.
    • 1:43:00So two pointers, two addresses of integers.
    • 1:43:03Down here, tmp is still just an int.
    • 1:43:05I don't need to over think tmp, that's just an empty cup.
    • 1:43:08Give me an integer called tmp from week 1.
    • 1:43:11But, what do I want to store in tmp?
    • 1:43:14Both a and b in this version are addresses.
    • 1:43:17Do I want to remember the address a and the address b?
    • 1:43:22No, I want to remember the volume of OJ, the volume of milk,
    • 1:43:25I want to remember 1 and 2, I don't care where in memory they are.
    • 1:43:29So star in this context, when there's no mention of a data type,
    • 1:43:34there's just a star and a variable name.
    • 1:43:37That variable is a pointer and it's not multiplication,
    • 1:43:39there's no math going on.
    • 1:43:40That star is the dereference operator that says, go to this address
    • 1:43:46and get the value there.
    • 1:43:48So if this address a is at location, I don't know, 100 like Brian was,
    • 1:43:52and this address b is at location 900 like Veronica was,
    • 1:43:55*a means go to the 100th byte in memory and get me that value, which is 1.
    • 1:44:01This means, down here, go to the address b, get me that value at address 900,
    • 1:44:05which is 2.
    • 1:44:07And go ahead and store 1 in tmp.
    • 1:44:10Go ahead and go to that address and put whatever's
    • 1:44:13at b's address-- so get that address and put it over-- get that address,
    • 1:44:17get the value, and put it over at that address by dereferencing.
    • 1:44:20And then lastly, go to b in memory, like over there, put the tmp value there.
    • 1:44:26So whereas ampersand in our previous example means,
    • 1:44:28tell me what the address is of a variable, star is the opposite.
    • 1:44:32When you have an address, it says, go to that address.
    • 1:44:35Follow the treasure map, X marks the spot at that location in memory,
    • 1:44:39and get at its value.
    • 1:44:40So what is the net effect here?
    • 1:44:42If I actually now open up not this example, but swap.c--
    • 1:44:46spoiler, this one is going to actually work.
    • 1:44:50If I open up swap.c, we're going to see now the following instead.
    • 1:44:55The code is almost the same, except that I pasted it
    • 1:44:58in this new green version of the function.
    • 1:45:01And notice here, this had a change.
    • 1:45:03Why am I typing in %x now and %y instead of just x and y?
    • 1:45:11AUDIENCE: [INAUDIBLE] address [INAUDIBLE] functions [INAUDIBLE]..
    • 1:45:17DAVID MALAN: Exactly.
    • 1:45:18The swap function now, the new improved version
    • 1:45:20is expected two addresses-- stars.
    • 1:45:22Each star, a.k.a. pointers, not just values.
    • 1:45:25So this means I know x and y are actually integers from week 1.
    • 1:45:29Now I need the address of x and the address of y
    • 1:45:31so that swap can follow those treasure maps,
    • 1:45:35so to speak, and go to those addresses.
    • 1:45:37So now, when I run this program, this is more like the metaphor with Kelly
    • 1:45:42where I told her where the milk and OJ were.
    • 1:45:44Now swap and go to those locations as follows. make swap.
    • 1:45:48Let me go ahead and then do ./swap, Enter--
    • 1:45:51ah!
    • 1:45:52Now it seems to be working.
    • 1:45:54And we can see as much even with the debugger.
    • 1:45:56Even though it doesn't seem to be buggy, I can still use debug50
    • 1:45:59to see and understand my program, if not obvious-- oh,
    • 1:46:03I still need a breakpoint.
    • 1:46:04Let's set a breakpoint as before.
    • 1:46:05Let's rerun debug50.
    • 1:46:07The right-hand panel will open automatically for me.
    • 1:46:11And let's go ahead and see, if I start stepping over this,
    • 1:46:14now I see that x is 1, y is 2, printf prints as much on the screen.
    • 1:46:23Now I'm going to go ahead and step into swap,
    • 1:46:25and now notice, it's a little weird-looking,
    • 1:46:28because now a is an address and b is an address,
    • 1:46:32but tmp is still an int with a garbage value, but I can fix that.
    • 1:46:36Now tmp is 1, but notice, a and b's values are not changing,
    • 1:46:41but what is clearly changing per the code?
    • 1:46:46So notice, this is weird and cryptic.
    • 1:46:48a is this 0x value.
    • 1:46:50That's a big hexadecimal address, like that is where in memory a is.
    • 1:46:54But you know what?
    • 1:46:55If I click the little triangle, I can kind of follow that pointer
    • 1:46:58and go to it.
    • 1:46:59The debugger is smart like that.
    • 1:47:01So *a, go to a is 2; and *b at the moment is 2, but if I keep going,
    • 1:47:06now I've done a switcheroo, and you can see that these values have changed.
    • 1:47:11And again, we don't care what these addresses are,
    • 1:47:13I don't care what the actual addresses are.
    • 1:47:15I do care that it gives me this functionality, because now when
    • 1:47:17I return up here in print, now the values have indeed
    • 1:47:20changed as I expected this whole time.
    • 1:47:23All right.
    • 1:47:24That was complex, but hopefully clear as to why it now works even though we've
    • 1:47:30made this code look more cryptic.
    • 1:47:33If not, any questions are welcome.
    • 1:47:34Yeah?
    • 1:47:35AUDIENCE: Is that from the spot where [INAUDIBLE]
    • 1:47:39DAVID MALAN: Uh huh.
    • 1:47:40AUDIENCE: [INAUDIBLE] the star [INAUDIBLE] pointers?
    • 1:47:44DAVID MALAN: Good question.
    • 1:47:46Do we really need to have these ampersands here because we already
    • 1:47:49have the stars here?
    • 1:47:50Short answer, yes, for symmetry.
    • 1:47:52This is telling the function what to expect on the way in;
    • 1:47:55this is what's telling the computer actually what to send in.
    • 1:48:00So what are the actual inputs to that function?
    • 1:48:03It has to be symmetric.
    • 1:48:05Yeah?
    • 1:48:05AUDIENCE: [INAUDIBLE] value is swapping addresses.
    • 1:48:11DAVID MALAN: We are swapping what is at the addresses.
    • 1:48:16AUDIENCE: So what if you change the address of [INAUDIBLE]
    • 1:48:24DAVID MALAN: OK.
    • 1:48:25AUDIENCE: And would we swap the addresses saying 2 is at 200 and 1
    • 1:48:29is at [INAUDIBLE] that could change.
    • 1:48:31DAVID MALAN: Short answer, you cannot for the following reason.
    • 1:48:35So technically, when you do %x and %y, these are converted to the address
    • 1:48:41of x, the address of y.
    • 1:48:42Technically swap is getting copies of something, C has not changed.
    • 1:48:46But C is now getting copies of the address
    • 1:48:49of x, copies of the address of y, calling them a and b.
    • 1:48:53So sure, you could swap the addresses, but for the same reasons as before,
    • 1:48:57it's going to have no fundamental effect.
    • 1:48:58The difference here is because I'm passing in a map, so to speak,
    • 1:49:01to x and y, their addresses.
    • 1:49:03And again, an address is like--
    • 1:49:04we are at 45 Quincy Street I think right now--
    • 1:49:08Cambridge, Massachusetts 02138, USA.
    • 1:49:10That uniquely identifies the building.
    • 1:49:12These 0x hexadecimal numbers uniquely identify locations in memory.
    • 1:49:15So this is like saying now, get me the address of x, get me the address of y,
    • 1:49:19and I'm technically passing in copies of those addresses, but it doesn't matter,
    • 1:49:22because now with the star notation, I'm saying go to those addresses
    • 1:49:25and swap who is physically in this building and some other.
    • 1:49:30All right.
    • 1:49:31So let's just put this now into the context of what else
    • 1:49:34your computer actually has just that you've
    • 1:49:36seen some nomenclature around this computer's memory.
    • 1:49:39So this is the chip with a grid laid out on top of it
    • 1:49:41just to communicate that there's bytes here, and we could number them.
    • 1:49:44But let's think about this now more abstractly,
    • 1:49:47and let me just reveal that it turns out that the computer treats
    • 1:49:49different bytes, different squares in different ways just by convention.
    • 1:49:53It turns out that in your computer's memory--
    • 1:49:56and this is all just an artist's representation--
    • 1:49:58at the top of that chip of memory, so to speak,
    • 1:50:01is the so-called text of your program.
    • 1:50:03This is a fancy and non-obvious way of saying
    • 1:50:05the 0's and 1's that your code have has been compiled into.
    • 1:50:09The text of a program is the code you wrote in binary,
    • 1:50:12that's where it's loaded from memory.
    • 1:50:14So in macOS and Windows, you double-click an icon,
    • 1:50:16that program is loaded into memory I said last week.
    • 1:50:18It's literally loaded into the top of your computer's memory conceptually.
    • 1:50:22What else?
    • 1:50:23Well the heap is the fancy name given to the chunk of memory in which memory
    • 1:50:29is coming from when you call malloc.
    • 1:50:31So when I called malloc earlier to get a bunch of space for some characters,
    • 1:50:34it was just coming from this big open area called the heap.
    • 1:50:37And that's what get_string is using and other functions as well.
    • 1:50:41Well it turns out that the reason for the problem we just ran into
    • 1:50:44is because the bottom part of memory is what's called the stack.
    • 1:50:48The stack is the area of memory that functions use when they are called.
    • 1:50:52And this is actually relevant to that very simple noswap example as follows.
    • 1:50:57If we now assume that anytime you call a function, the memory it uses
    • 1:51:01comes from the bottom of that big block of memory,
    • 1:51:04where you can draw that, for instance, here on the screen,
    • 1:51:07because it turns out that anytime you call a function, that function gets
    • 1:51:10a slice of its own memory.
    • 1:51:12So for instance, main is always the first program a function calls,
    • 1:51:15and so it gets the first slice of memory at the bottom of the screen here.
    • 1:51:20And so if main had two variables x and y, that's like saying,
    • 1:51:25OK, give me a chunk of memory called x and put the value 1 in it;
    • 1:51:29give me another chunk of memory, call it y, put a value in it here.
    • 1:51:33But remember, from the first noswap example, the swap function was called.
    • 1:51:38This is a stack in the literal sense.
    • 1:51:40You go into a dining hall, a cafeteria, one tray for food, goes on another,
    • 1:51:44goes on another, goes on another so that the humans can take it
    • 1:51:46and put food and plates on it.
    • 1:51:48Well similarly in this model, when you call a function,
    • 1:51:51it gets its own slice of memory, but literally above, conceptually,
    • 1:51:55the existing frame on the stack.
    • 1:51:58So this is the swap function's own chunk of memory,
    • 1:52:01and it, too, gets some space.
    • 1:52:03It gets some space for a variable called a.
    • 1:52:06It gets some space for a variable called b.
    • 1:52:08And guess what goes inside those of that first example?
    • 1:52:11A copy of x and a copy of y.
    • 1:52:15And you know what?
    • 1:52:15It had a temp variable, so that's got to have some space here.
    • 1:52:19So I'll call this tmp.
    • 1:52:20And recall that I set tmp equal to a, so that got 1.
    • 1:52:24And then what happened?
    • 1:52:25Well then I did what--
    • 1:52:27what did I?
    • 1:52:30Let me get this right.
    • 1:52:33We had a gets b.
    • 1:52:36So what happened there?
    • 1:52:38So in this example here, a gets the value b, so that changed.
    • 1:52:43And then what happens here, b got the value of 10, so that changed.
    • 1:52:46So swap was working in the sense that it was swapping values,
    • 1:52:49but the problem is, when a function returns, this chunk of memory that it
    • 1:52:53was previously using gets reclaimed so that someone else can now use it,
    • 1:52:58another function.
    • 1:52:59So we did all that hard work and no swap, and we did it correctly,
    • 1:53:03we just did it in the wrong place.
    • 1:53:05So by contrast, this next example that we did, which was swap.c,
    • 1:53:11just treated the memory a little bit differently.
    • 1:53:13Main this time still had two variables called x, and this was a 1,
    • 1:53:18and then another one called y, and this was a 2.
    • 1:53:21And then one swap was called this time, it again
    • 1:53:23had a variable called a and a variable called
    • 1:53:26b, but what was stored in a and b?
    • 1:53:29Well now they're addresses.
    • 1:53:30And I don't know what it is, but let me just arbitrarily say that this
    • 1:53:34is location 100, this is location--
    • 1:53:37let's say 104.
    • 1:53:39But it could be anything, we just don't care at this point,
    • 1:53:41it would have 0x technically if the computer were showing us.
    • 1:53:44What's going in a here is 100, what's going in b here is 104.
    • 1:53:49And those are the addresses of x and y, and the code
    • 1:53:54we had using all of those new stars was saying,
    • 1:53:56go to address 100 and store whatever is at address 100 in tmp.
    • 1:54:04Then go to the address that's in b, or 104,
    • 1:54:07and store that at the location int *a, whatever is there.
    • 1:54:12Then it was saying, go get that 10th value, by the way,
    • 1:54:15and go ahead and put that here, so that now we did
    • 1:54:20different work in a different place.
    • 1:54:23So now when swap is done running, it doesn't
    • 1:54:25matter if its memory disappears because it has now mutated or changed
    • 1:54:31the other memory.
    • 1:54:32That it was passed in just like Kelly changed or mutated the cups
    • 1:54:35I actually pointed her at rather than copies thereof.
    • 1:54:39Now as an aside, there's other chunks of memory that are actually used.
    • 1:54:43If you have global variables in a program,
    • 1:54:45turns out that in between the text and the heap
    • 1:54:48memory are your global variables, if they're initialized with values
    • 1:54:51or they're not initialized with values, as would happen with the equal sign,
    • 1:54:54but we don't care too much about that for today's purposes.
    • 1:54:56And if you've ever heard of environment variables, which
    • 1:54:58we will when we get to web programming, they, too,
    • 1:55:01are stored elsewhere in memory.
    • 1:55:02But the most interesting chunks of memory
    • 1:55:04are stack and heap, as in this case here.
    • 1:55:07But unfortunately it's so easy for things to go awry--
    • 1:55:10I mean, some of you experienced segmentation faults already,
    • 1:55:13and let's consider why that might happen.
    • 1:55:15So here's a contrived example of code that is by design buggy,
    • 1:55:18but let's just talk it through in English what these lines are doing.
    • 1:55:21This line here, int *x, is saying, hey, computer,
    • 1:55:25give me a variable that will store the address of an integer.
    • 1:55:31So give me a pointer to an int is the more casual way of saying it.
    • 1:55:34Hey computer, give me another variable that's
    • 1:55:37going to store the address of an int and call it y.
    • 1:55:40So x and y, that's it.
    • 1:55:42This line is new-ish.
    • 1:55:44Hey computer, allocate enough space that will fit an int.
    • 1:55:48So sizeof int is the new syntax we saw earlier for just figuring out
    • 1:55:51how many bytes is an int.
    • 1:55:52Odds are this is going to come back as 4 or 32 bits in most computers.
    • 1:55:56So this just says, hey browser, give me 4 bytes of memory
    • 1:55:59and store that in this location.
    • 1:56:02Or rather, store that in this variable, store that this variable.
    • 1:56:06So maybe it's going to say, OK, here's four bytes at location 100,
    • 1:56:09or here's four bytes at location 900.
    • 1:56:11Or wherever, we don't care, we're just remembering that address in x.
    • 1:56:15*x says, go to that address--
    • 1:56:18100 or 900, whatever it is, put the number 42 there.
    • 1:56:22This next line says, go to the address in y and put the unlucky number-- hint,
    • 1:56:26hint--
    • 1:56:2713 there.
    • 1:56:30Well what is the address in y?
    • 1:56:35I haven't allocated it yet.
    • 1:56:36What's the address in x?
    • 1:56:38It's wherever malloc told me to use space.
    • 1:56:41That's safe, that was like 100, 900, whatever the value was,
    • 1:56:44but did I allocate space for y?
    • 1:56:46So what kind of value does it contain, so to speak?
    • 1:56:49A garbage value.
    • 1:56:50Maybe it's 0, maybe it's like 32,000-- we don't know,
    • 1:56:53because if you don't specify the value, it
    • 1:56:55is not safe to trust it or do anything with it.
    • 1:56:59This is going to give me probably one of those segmentation faults.
    • 1:57:02And indeed, if I run a program like this,
    • 1:57:04I'm quite likely going to see exactly that kind of problem.
    • 1:57:08It's perhaps better, though, to see this in a way that
    • 1:57:10will paint a more memorable picture, and for that, thought we'd take--
    • 1:57:14in our 10 minutes remaining, use a few of these minutes
    • 1:57:16to take a look at something our friends at Stanford
    • 1:57:18put together with a bit of claymation.
    • 1:57:20It's about three minutes long, well worth it
    • 1:57:22to paint a picture of exactly what goes wrong
    • 1:57:24when you don't use memory correctly.
    • 1:57:27If you could dim the lights.
    • 1:57:29[VIDEO PLAYBACK]
    • 1:57:29[MUSIC PLAYING]
    • 1:57:32- Hey, Binky.
    • 1:57:33Wake up!
    • 1:57:34It's time for pointer fun!
    • 1:57:36- What's that?
    • 1:57:37Learn about pointers?
    • 1:57:39Oh goody!
    • 1:57:41- Well to get started, I guess we're going to need a couple of pointers.
    • 1:57:44- OK.
    • 1:57:45This code allocates two pointers which can point to integers.
    • 1:57:48- OK.
    • 1:57:49Well I see the two pointers, but they don't seem to be pointing to anything.
    • 1:57:52- That's right.
    • 1:57:53Initially pointers don't point to anything.
    • 1:57:55The things they point to are called pointees,
    • 1:57:58and setting them up to a separate step.
    • 1:58:00- Oh right, right.
    • 1:58:00I knew that.
    • 1:58:01The pointees are separate.
    • 1:58:03So how do you allocate a pointee?
    • 1:58:05- OK.
    • 1:58:06Well this code allocates a new integer pointee,
    • 1:58:09and this part sets x to point to it.
    • 1:58:12- Hey, that looks better.
    • 1:58:14So make it do something.
    • 1:58:15- OK.
    • 1:58:16How do you reference the pointer x to store the number 42 into its pointee?
    • 1:58:21For this trick, I'll need my magic wand of dereferencing.
    • 1:58:24- Your magic wand of dereferencing?
    • 1:58:27That-- that's great.
    • 1:58:30- This is what the code looks like.
    • 1:58:31I'll just set up the number and--
    • 1:58:33[POP]
    • 1:58:34- Hey look!
    • 1:58:35There it goes.
    • 1:58:36So doing a dereference on x follows the arrow to access its pointee.
    • 1:58:41In this case, to store 42 in there.
    • 1:58:43Hey, try using it to store the number 13 through the other pointer, y.
    • 1:58:48- OK.
    • 1:58:49I'll just go over here to y and get the number 13 set up,
    • 1:58:54and then take the wand of dereferencing and just--
    • 1:58:58[BUZZING] whoa!
    • 1:58:59- Oh hey, that didn't work.
    • 1:59:01Say, Binky, I don't think dereferencing y is a good idea,
    • 1:59:05cause setting up the pointee is a separate step
    • 1:59:08and I don't think we ever did it.
    • 1:59:10- Mmm, good point.
    • 1:59:12- Yeah.
    • 1:59:12We allocated the pointer y, but we never set it to point to a pointee.
    • 1:59:17- Mmm, very observant.
    • 1:59:19- Hey, you're looking good there, Binky.
    • 1:59:21Can you fix it so that y points to the same pointee as x?
    • 1:59:24- Sure.
    • 1:59:24I'll use my magic wand of pointer assignment.
    • 1:59:27- Is that going to be a problem like before?
    • 1:59:29- No, this doesn't touch the pointees.
    • 1:59:31It just changes one pointer to point to the same thing as another.
    • 1:59:35- Oh, I see.
    • 1:59:36Now y points to the same place as x.
    • 1:59:38So wait, now y is fixed.
    • 1:59:40It has a pointee.
    • 1:59:41So you can try the wand of dereferencing again to send the 13 over.
    • 1:59:46- OK.
    • 1:59:47Here goes.
    • 1:59:48- Hey, look at that.
    • 1:59:50Now dereferencing works on y.
    • 1:59:51And because the pointers are sharing that one pointee, they both see the 13.
    • 1:59:55- Yeah, sharing, whatever.
    • 1:59:57So we going to switch places now?
    • 1:59:59- Oh look, we're out of time.
    • 2:00:01- But--
    • 2:00:02[END PLAYBACK]
    • 2:00:02DAVID MALAN: All right.
    • 2:00:03So hopefully that puts a little more visual behind some of these ideas,
    • 2:00:07but let's now contextualize this in a domain that's perhaps
    • 2:00:12more familiar in a couple of ways.
    • 2:00:13So one, some of you might already know, especially
    • 2:00:16if you've had prior programming experience, of a very popular website
    • 2:00:18called Stack Overflow where lots of programmers
    • 2:00:20post questions and hopefully answers to common technical problems.
    • 2:00:24If you ever wondered why it's called Stack Overflow,
    • 2:00:26it turns out it reduces to this picture here.
    • 2:00:29This was not a mistake that I drew one arrow from the heap pointing down,
    • 2:00:33and one arrow from the stack growing up.
    • 2:00:34As you malloc, malloc, malloc more and more space,
    • 2:00:38starts up here, so to speak, and you just get more and more space
    • 2:00:41that's going this direction.
    • 2:00:42But the more functions you call-- function after function
    • 2:00:45after function after a function, each of them
    • 2:00:47gets its own slice or frame of memory, that, too, is growing up.
    • 2:00:50So this feels like a pretty bad design, but honestly, it's not really avoidable
    • 2:00:54because if you have a finite amount of memory,
    • 2:00:56you can't avoid each other forever.
    • 2:00:58And so there's this fundamental risk of overflowing the stack,
    • 2:01:03or even overflowing the heap in the reverse direction.
    • 2:01:06So Stack Overflow is an allusion to, for instance, calling too
    • 2:01:09many-- many, many, many, many, many, many, many, many functions,
    • 2:01:12so many so that it overlaps other chunks or segments of memory,
    • 2:01:15thereby inducing a segmentation fault, and buffer heap overflow
    • 2:01:19is in the reverse direction, and these are more
    • 2:01:21generally known as buffer overflows, and we'll see more of these in the weeks
    • 2:01:26to come.
    • 2:01:27But now that we have the ability to discuss pointers,
    • 2:01:29let's introduce one final feature and then a familiar face.
    • 2:01:33So it turns out that you can actually come up with your own custom variables
    • 2:01:38kind of like we did with string, but even more sophisticated than that.
    • 2:01:42For instance, if I wanted to implement a program that
    • 2:01:46involves multiple students, I might do something like this.
    • 2:01:49Ask the user what is the enrollment in a class, then go ahead
    • 2:01:52and give myself an array of strings, a.k.a.
    • 2:01:55char*s today of that size, and then I could also have another array of dorms.
    • 2:01:59And I could have two arrays containing one for the students' names,
    • 2:02:03one for the students' dorms, and I can keep track of other things.
    • 2:02:05Another array for emails, another array for phone numbers--
    • 2:02:08but this gets messy quickly, because you can imagine,
    • 2:02:11if I need names and dorms and emails and phones,
    • 2:02:15that starts to become a lot of copy-paste.
    • 2:02:17And I just have this design where I have lots and lots of arrays
    • 2:02:20where each bracket location-- like bracket 0, bracket 1
    • 2:02:24presumably refers to the same student across all of these arrays, like mmm!
    • 2:02:28Messy, messy, messy design.
    • 2:02:30So with a wave of my hand, let me actually
    • 2:02:32fix that immediate problem out of the gate by introducing a new feature.
    • 2:02:36I can invent my own data types.
    • 2:02:38Let me just go ahead and declare an array
    • 2:02:40called students with this many students, but of data type student.
    • 2:02:46C comes with float, bool, char, int, not string, and definitely not student.
    • 2:02:51So you can make your own custom data types,
    • 2:02:54and you can put them in your own header files, which we've not done either.
    • 2:02:57But I can look, and you'll see more of this in the next problem set.
    • 2:03:00So not to worry if this feels quite brief,
    • 2:03:02it's just meant to be a teaser here.
    • 2:03:04And struct.h is how you declare or define your own type.
    • 2:03:09The keyword is literally typedef struct for structure, or data structure
    • 2:03:13to be more complete.
    • 2:03:14The name of the data structure comes at the end after some curly braces.
    • 2:03:18And then inside the curly braces you just specify,
    • 2:03:20well what do you want a student to have?
    • 2:03:22I want them to have a name, a dorm, maybe a phone number, maybe
    • 2:03:25an email address, anything I want.
    • 2:03:27I can just add here.
    • 2:03:28So that now in my actual code, I can have an array of actual students,
    • 2:03:34and I can just access them with this new notation like this.
    • 2:03:37You know that you can index into an array with bracket notation.
    • 2:03:40What you didn't know until now, perhaps, is that if at that location
    • 2:03:45is a structure, a.k.a.
    • 2:03:46struct, you can get at the name, the dorm, or the phone, or the email,
    • 2:03:51or anything else there just by using a dot-notation, which is
    • 2:03:54our last piece of new syntax for today.
    • 2:03:56Everything else is the same.
    • 2:03:58I can write a program that says so and so is in such and such a dorm
    • 2:04:01by just saying get the i-th student's name and the i-th student's dorm.
    • 2:04:05And I can be even fancier, and if I don't want to just print those values,
    • 2:04:09I can even, now, that I see no understand pointers--
    • 2:04:13or I've seen pointers and we'll soon understand them
    • 2:04:15by way of problem sets and practice, I can actually do this.
    • 2:04:19This is just a little sneak preview of a line of code
    • 2:04:21that uses a new function called fopen.
    • 2:04:23fopen this file open, and it takes in the name of the file to open.
    • 2:04:27You might know of CSV files, they're like simple spreadsheets,
    • 2:04:29comma separated values.
    • 2:04:31And quote-unquote "w" means write.
    • 2:04:33So this says open the file called students.csv in write mode,
    • 2:04:37so I can write to this file.
    • 2:04:38Because in this example, as you'll see in the days to come,
    • 2:04:40I want to write out to a file.
    • 2:04:42But it turns out to use files, I need to know what a pointer is,
    • 2:04:45and it's a little weird that it's all caps,
    • 2:04:47but there is a data type in C called "file," and it's a pointer.
    • 2:04:51So long story short, what you're going to see in the next problem set
    • 2:04:54as we explore the world of forensics is the ability
    • 2:04:56using pointers and a few new functions to open files and get back
    • 2:05:00the address of that file in memory so that you can go to that address,
    • 2:05:04change the contents of a file, and save it back out.
    • 2:05:07All of us take for granted these days that you can go to File, Open and File,
    • 2:05:10Save, but what's actually happening, pointers are involved,
    • 2:05:13stuff's getting loaded into memory, and the computer
    • 2:05:15is dereferencing or going to those addresses
    • 2:05:17and changing what's at those locations in memory.
    • 2:05:20Now why might you want to do this?
    • 2:05:22Well here, of course, is Zamila-- you might
    • 2:05:23recall from some of the problem sets and the walkthroughs.
    • 2:05:26Turns out we could try to enhance this picture of her by zooming in,
    • 2:05:30and here's about as much fidelity as it is in her eyes.
    • 2:05:33Like I do not see the glint of any criminal's logo
    • 2:05:38on his or her jacket in the glint of Zamila's eyes.
    • 2:05:40If you zoom in on an image, and an image, recall, from week 0
    • 2:05:43is just a grid of pixels or dots, that's all you get.
    • 2:05:47And you can maybe smooth it out a little bit or clean up the colors,
    • 2:05:50but you can't just "enhance," quote-unquote,
    • 2:05:53and see more of the glint in Zamila's eye,
    • 2:05:55because an image at the end of the day is just a bitmap, a map--
    • 2:05:59top-down, left-right-- of pixels.
    • 2:06:01For instance, here's a smiley face.
    • 2:06:03If you kind of take a look back and you can kind of see a black smiley
    • 2:06:06face against a white backdrop.
    • 2:06:08And if we just decide as humans, let's represent white dots
    • 2:06:11with 1's and black dots with 0's, this might be what's in the file,
    • 2:06:15this is what the human sees.
    • 2:06:16So if we have the ability to open that from a file, store it in memory,
    • 2:06:21and then using pointers go to those locations in memory,
    • 2:06:24we can even change the smiley face to an unhappy face, for instance, or color it
    • 2:06:27or do any number of things to it.
    • 2:06:29Now at quick glance, there's a lot going on in files,
    • 2:06:32because what a file is is a set of conventions that humans decided
    • 2:06:37on where humans years ago just decided in a bitmap file,
    • 2:06:40BMP file-- so an older but still popular file format for images, humans
    • 2:06:44just decided that, like, we're going to put a bunch of special values
    • 2:06:48at the first bytes of the file, then some more
    • 2:06:50special values than the actual RGB pixels in the rest of the file.
    • 2:06:55So this is meant to look cryptic at first glance,
    • 2:06:58and the next homework assignment will walk you through this,
    • 2:07:00but all it is is a convention of what the 0's and 1's mean
    • 2:07:04in these different locations.
    • 2:07:05And indeed, the challenge ahead is going to be to do a number of things.
    • 2:07:08One is to first and foremost figure out--
    • 2:07:10who done it?
    • 2:07:10A sort of murder mystery in which there's a clue hidden in an image,
    • 2:07:14but an image that's a little noisy and you're
    • 2:07:16going to have to figure out what secret messages in the image
    • 2:07:18by loading that image in, tweaking it, putting a sort of red filter
    • 2:07:22on top of it and seeing the secret message, but all digitally; two,
    • 2:07:25actually resizing images and taking this many pixels in this big
    • 2:07:29of a smiley face or something else and making it bigger,
    • 2:07:32or if more comfortable, making it even smaller
    • 2:07:34and figuring out how to make that workout;
    • 2:07:36and then lastly, we've been taking some photographs of all CS50 staff
    • 2:07:39in Cambridge and New Haven.
    • 2:07:41Unfortunately we accidentally corrupted or lost the memory card,
    • 2:07:45but we made a forensic image of it, a copy of all of the 0's and 1's with all
    • 2:07:49of the staff photos, and we're going to need
    • 2:07:50you to write code that actually recovers all of the JPEGs
    • 2:07:53or photographs from that digital card by opening a file,
    • 2:07:57reading in those 0's and 1's, understanding what they are
    • 2:08:00and where they are, and just writing them
    • 2:08:01back out to disk using functions we'll introduce you to in the problem
    • 2:08:05set itself.
    • 2:08:06But of course, all of this takes for granted that we can do this,
    • 2:08:09and you can only do so much.
    • 2:08:10And indeed, this week is as much about solving those problems
    • 2:08:13as it is realizing the limitations of computers,
    • 2:08:16and so we thought we'd end with the final few seconds of this very
    • 2:08:19real example from Futurama.
    • 2:08:21[VIDEO PLAYBACK]
    • 2:08:22- Magnify that death sphere.
    • 2:08:27Why is it still blurry?
    • 2:08:28- That's all the resolution we have.
    • 2:08:30Making it bigger doesn't make it clearer.
    • 2:08:32- It does on CSI Miami.
    • 2:08:34- Ugh.
    • 2:08:35[END PLAYBACK]
    • 2:08:35DAVID MALAN: And that's it for CS50, we'll see you next time.
    • 2:08:38[APPLAUSE]
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video