CS50 Video Player
    • 🧁

    • 🍬

    • 🍌

    • 🍿
    • 0:00:00Introduction
    • 0:00:24File I/O
    • 0:01:17lists
    • 0:05:54open
    • 0:13:55with
    • 0:21:39sorted
    • 0:29:31Comma-Separated Values
    • 0:46:37Sort Keys
    • 0:53:01Lambda Functions
    • 0:57:13csv Library
    • 1:02:17csv.reader
    • 1:07:49csv.DictReader
    • 1:14:05csv.writer
    • 1:16:28csv.DictWriter
    • 1:23:00Images, PIL library
    • 1:31:42Conclusion
    • 0:00:00[CROWD MURMURING]
    • 0:00:00[MUSIC PLAYING]
    • 0:00:24DAVID MALAN: All right, this is CS50's Introduction
    • 0:00:27to Programming with Python.
    • 0:00:29My name is David Malan, and this is our week on File I/O, Input and Output
    • 0:00:33of files.
    • 0:00:34So up until now, most every program we've written just
    • 0:00:37stores all the information that it collects in memory--
    • 0:00:39that is, in variables or inside of the program itself, a downside of which
    • 0:00:43is that, as soon as the program exits, anything you typed in,
    • 0:00:46anything that you did with that program is lost.
    • 0:00:49Now, with files, of course, on your Mac or PC, you can hang on to information
    • 0:00:53long term.
    • 0:00:53And File I/O within the context of programming
    • 0:00:56is all about writing code that can read from, that is load information
    • 0:01:00from, or write to, that is save information to, files themselves.
    • 0:01:04So let's see if we can't transition then from only
    • 0:01:06using memory and variables and the like to actually writing
    • 0:01:10code that saves some files for us and, therefore, data persistently.
    • 0:01:14Well, to do this, let me propose that we first consider a familiar data
    • 0:01:18structure, a familiar type of variable that we've seen before, that of a list.
    • 0:01:21And using lists, we've been able to store more than one piece
    • 0:01:24of information in the past.
    • 0:01:26Using one variable, we typically store one value.
    • 0:01:28But if that variable is a list, we can store multiple values.
    • 0:01:31Unfortunately, lists are stored in the computer's memory.
    • 0:01:34And so once your program exits, even the contents of those disappear.
    • 0:01:38But let's at least give ourselves a starting point.
    • 0:01:40So I'm over here in VS Code.
    • 0:01:42And I'm going to go ahead and create a simple program using
    • 0:01:45code of names.py, a program that just collects people's names,
    • 0:01:49students' names, if you will.
    • 0:01:51And I'm going to do it super simply initially
    • 0:01:53in a manner consistent with what we've done in the past to get user input
    • 0:01:56and print it back out.
    • 0:01:57I'm going to say something like this, name equals input, quote/unquote,
    • 0:02:01what's your name?
    • 0:02:03Thereby storing in a variable called name
    • 0:02:06the return value of input, as always.
    • 0:02:08And as always, I'm going to go ahead and very simply
    • 0:02:11print out a nice f string that says, hello, comma,
    • 0:02:14and then, in curly braces, name to print out Hello, David, hello, world,
    • 0:02:17whoever happens to be using the program.
    • 0:02:20Let me go ahead and run this just to remind myself what I should expect.
    • 0:02:23And if I run python of names.py and hit Enter, type in my name like David,
    • 0:02:26of course, I now see Hello, comma, David.
    • 0:02:29Suppose, though, that we wanted to add support not just for one name,
    • 0:02:32but multiple names-- maybe three names for the sake of discussion
    • 0:02:35so that we can begin to accumulate some amount of information
    • 0:02:39in the program, such that it's really going
    • 0:02:42to be a downside if we keep throwing it away once the program exits.
    • 0:02:46Well, let me go back into names.py up here at top.
    • 0:02:49Let me proactively give myself a variable, this time called names,
    • 0:02:52plural.
    • 0:02:53And set it equal to an empty list.
    • 0:02:55Recall that the square bracket notation, especially if nothing's inside of it,
    • 0:02:58just means, give me an empty list that we can add things to over time.
    • 0:03:03Well, what do we want to add to it?
    • 0:03:04Well, let's add three names, each from the user.
    • 0:03:07And let me say something like this, for underscore in range of 3,
    • 0:03:11let me go ahead and prompt the user with the input function
    • 0:03:16and getting their name in this variable.
    • 0:03:18And then using list syntax, I can say, names.append name to that list.
    • 0:03:25And now I have, in that list, that given name--
    • 0:03:281, 2, 3 of them.
    • 0:03:30Other points to note is, I could use a variable here,
    • 0:03:32like i, which is conventional.
    • 0:03:34But if I'm not actually using i explicitly on any subsequent lines,
    • 0:03:37I might as well just use underscore, which is a Pythonic convention.
    • 0:03:40And actually, if I want to clean this up a little bit right now,
    • 0:03:43notice that my name variable doesn't really
    • 0:03:46need to exist because I'm assigning it a value
    • 0:03:48and then immediately appending it.
    • 0:03:50Well, I could tighten this up further by just getting rid of that variable
    • 0:03:54altogether and just appending immediately the return value of input.
    • 0:03:59I think we could go both ways in terms of design here.
    • 0:04:01On the one hand, it's a pretty short line, and it's readable.
    • 0:04:04On the other hand, if I were to eventually change this phrase
    • 0:04:06to be not what's your name but something longer,
    • 0:04:08we might want to break it out again into two lines.
    • 0:04:11But for now, I think it's pretty readable.
    • 0:04:13Now later in the program, let's just go ahead and print out those same names,
    • 0:04:17but let's sort them alphabetically so that it makes sense
    • 0:04:20to be gathering them all together, then sorting them, and printing them.
    • 0:04:24So how can I do that?
    • 0:04:25Well, in Python, the simplest way to sort a list in a loop
    • 0:04:28is probably to do something like this.
    • 0:04:30For name in names--
    • 0:04:32but wait.
    • 0:04:33Let's sort the names first.
    • 0:04:34Recall that there's a function called sorted
    • 0:04:36which will return a sorted version of that list.
    • 0:04:40Now let's go ahead and print out an f string that says, again, hello,
    • 0:04:44bracket, name, close quotes.
    • 0:04:47All right, let me go ahead and run this.
    • 0:04:49So Python of names.py, and let me go ahead
    • 0:04:52and type in a few names this time.
    • 0:04:54How about Hermione?
    • 0:04:56How about Harry?
    • 0:04:57How about Ron?
    • 0:04:58And notice that they're not quite in alphabetical order.
    • 0:05:02But when I hit Enter and that loop kicks in,
    • 0:05:04it's going to print out, hello, Harry, hello, Hermione, hello,
    • 0:05:07Ron, in sorted order.
    • 0:05:10But of course, now, if I run this program again, all of the names
    • 0:05:13are lost.
    • 0:05:14And if this is a bigger program than this,
    • 0:05:16that might actually be pretty painful to have
    • 0:05:18to re-input the same information again, and again, and again.
    • 0:05:21Wouldn't it be nice, like most any program today
    • 0:05:23on a phone, or a laptop, or desktop, or cloud
    • 0:05:26to be able to save this information somehow instead?
    • 0:05:30And that's where File I/O comes in.
    • 0:05:32And that's where files come in.
    • 0:05:33They are a way of storing information persistently on your own phone, or Mac,
    • 0:05:37or PC, or some cloud server's disk so that they're there when you
    • 0:05:42come back and run the program again.
    • 0:05:44So how can we go about saving all three of these names on in a file as opposed
    • 0:05:50to having to type them again and again?
    • 0:05:52Let me go ahead and simplify this file and, again,
    • 0:05:54give myself just a single variable called name,
    • 0:05:57and set the return value of input equal to that variable.
    • 0:06:01So what's your name, as before, quote/unquote.
    • 0:06:04And now let me go ahead, and let me do something more with this value.
    • 0:06:08Instead of just adding it to a list or printing it immediately out,
    • 0:06:11let's save the value of the person's name
    • 0:06:14that's just been typed in to a file.
    • 0:06:15Well, how do we go about doing that?
    • 0:06:17Well, in Python, there's this function called open whose purpose in life
    • 0:06:20is to do just that, to open a file, but to open it up programmatically
    • 0:06:25so that you, the programmer, can actually read information from it
    • 0:06:28or write information to it.
    • 0:06:30So open is like the programmer's equivalent of double clicking
    • 0:06:33on an icon on your Mac or PC.
    • 0:06:35But it's a programmer's technique because it's
    • 0:06:37going to allow you to specify exactly what you want
    • 0:06:40to read from or write to that file.
    • 0:06:42Formally, it's documentation is here, and you'll
    • 0:06:45see that it's usage is relatively straightforward.
    • 0:06:48It minimally just requires the name of the file that we want to open
    • 0:06:50and, optionally, how we want to open it.
    • 0:06:53So let me go back to VS Code here, and let me propose now that I do this.
    • 0:06:57I'm going to go ahead and call this function called open, passing
    • 0:07:01in an argument for names.txt, which is the name of the file I would
    • 0:07:05like to store all of these names in.
    • 0:07:07I could call it anything I want.
    • 0:07:08But because it's going to be just text, it's
    • 0:07:10conventional to call it something.txt.
    • 0:07:13But I'm also going to tell the open function
    • 0:07:15that I plan to write to this file.
    • 0:07:18So as a second argument to open, I'm going to put literally, quote/unquote,
    • 0:07:21w, for Write, and that's going to tell open to open
    • 0:07:25the file in a way that's going to allow me to change the content.
    • 0:07:28And better yet, if it doesn't even exist yet,
    • 0:07:29it's going to create the file for me.
    • 0:07:32Now, open returns what's called a file handle,
    • 0:07:35a special value that allows me to access that file subsequently.
    • 0:07:39So I'm going to go ahead and sign it equal to a variable like file.
    • 0:07:42And now I'm going to go ahead and, quite simply,
    • 0:07:45write this person's name to that file.
    • 0:07:47So I'm going to literally type file, which is the variable linking to that
    • 0:07:52file, .write, which is a function otherwise known as a method that comes
    • 0:07:57with open files that allows me to write that name to the file.
    • 0:08:00And then lastly, I'm going to quite simply going
    • 0:08:03to go ahead and say, file.close, which will close and effectively save
    • 0:08:07the file.
    • 0:08:08So these three lines of code here are essentially the programmer's equivalent
    • 0:08:11to double clicking an icon on your Mac or PC,
    • 0:08:13making some changes in Microsoft Word or some other program,
    • 0:08:16and going to File, Save.
    • 0:08:18We're doing that all in code with just these three lines here.
    • 0:08:21Well, let's see, now, how this works.
    • 0:08:24Let me go ahead now and run python of names.py and Enter.
    • 0:08:30Let's type in a name.
    • 0:08:31I'll type in Hermione, Enter.
    • 0:08:34All right, where did she end up?
    • 0:08:37Well, let me go ahead now and type code of names.txt,
    • 0:08:41which is a file that happens now to exist
    • 0:08:43because I opened it in write mode.
    • 0:08:45And if I open this in a tab, we'll see there is Hermione.
    • 0:08:49Well, let's go ahead and run names.py once more.
    • 0:08:52I'm going to go ahead and run python of names.py, Enter, and this time,
    • 0:08:57I'll type in Harry.
    • 0:08:58Let me go ahead and run it one more time.
    • 0:09:00And this time, I'll type in Ron.
    • 0:09:02And now let me go up to names.txt, where, hopefully, I'll see all three
    • 0:09:07of them here.
    • 0:09:08But no.
    • 0:09:09I've just actually seen Ron.
    • 0:09:12What might explain what happened to Hermione and Harry,
    • 0:09:16even though I'm pretty sure I ran the program three times,
    • 0:09:19and I definitely wrote the code that writes their name to that file?
    • 0:09:24What's going on here, do you think?
    • 0:09:26AUDIENCE: I think because we're not appending them,
    • 0:09:28we should append the names.
    • 0:09:30Since we are writing directly, it is erasing the old content,
    • 0:09:34and it is replacing with the last set of characters that we mentioned.
    • 0:09:40DAVID MALAN: Exactly.
    • 0:09:41Unfortunately, quote/unquote w is a little dangerous.
    • 0:09:44Not only will it create the file for you,
    • 0:09:46it will also recreate the file for you every time you
    • 0:09:49open the file in that mode.
    • 0:09:50So if you open the file once and write Hermione,
    • 0:09:52that worked just fine, as we saw.
    • 0:09:54But if you do it again for Harry, if you do it again for Ron,
    • 0:09:57the code is working.
    • 0:09:58But each time, it's opening the file and recreating it with brand-new contents,
    • 0:10:02so we had one version with Hermione, and one version with Harry,
    • 0:10:04and one final version with Ron.
    • 0:10:06But ideally, I think we probably want to be appending,
    • 0:10:09as Vishal says, each of those names to the file,
    • 0:10:11not just clobbering-- that is, overwriting the file each time.
    • 0:10:15So how can I do this?
    • 0:10:16It's actually a relatively easy fix.
    • 0:10:18Let me go ahead and do this as follows.
    • 0:10:20I'm going to first remove the old version of names.txt.
    • 0:10:23And now I'm going to change my code to do this.
    • 0:10:26I'm going to change the w, quote/unquote, to just a,
    • 0:10:29quote/unquote-- a for Append, which means to add to the bottom,
    • 0:10:32to the bottom, to the bottom, again and again.
    • 0:10:34Now let me go ahead and rerun python of names.py, Enter.
    • 0:10:39I'll again start from scratch with Hermione
    • 0:10:41because I'm creating the file new.
    • 0:10:44Notice that if I now do code of names.txt, Enter, we do
    • 0:10:49see that Hermione is back.
    • 0:10:51So after removing the file, it did get recreated,
    • 0:10:54even though I'm using append, which is good.
    • 0:10:56But now let's see what happens when I go back to my terminal.
    • 0:11:00And this time, I run python of names.py again--
    • 0:11:03this time, typing in Harry.
    • 0:11:04And let me run it one more time--
    • 0:11:06this time, typing in Ron.
    • 0:11:08So hopefully, this time, in that second tab, names.txt,
    • 0:11:10I should now see all three of them.
    • 0:11:13But, but, but, but this doesn't look ideal.
    • 0:11:17What have I clearly done wrong?
    • 0:11:21Something tells me, even though all three names are there,
    • 0:11:23it's not going to be easy to read those back unless you
    • 0:11:26know where each name ends and begins.
    • 0:11:29AUDIENCE: The English format is not correct.
    • 0:11:33The English format is not correct.
    • 0:11:35It's incorrect.
    • 0:11:36It's concatenating them.
    • 0:11:38DAVID MALAN: It is.
    • 0:11:40Well, it appears to be concatenating.
    • 0:11:43But technically speaking, it's just appending to the file--
    • 0:11:46first Hermione, then Harry, then Ron.
    • 0:11:48It has the effect of combining them back to back,
    • 0:11:50but it's not concatenating, per se.
    • 0:11:52It really is just appending.
    • 0:11:53Let's go to another hand here.
    • 0:11:55What really have I done wrong?
    • 0:11:58Or equivalently, how might I fix?
    • 0:12:01It would be nice if there were some kind of gaps between each of the names,
    • 0:12:05so we could read them more cleanly.
    • 0:12:07AUDIENCE: Hello.
    • 0:12:08We should add a new line before we write new name.
    • 0:12:13DAVID MALAN: Good.
    • 0:12:13We want to add a new line ourselves.
    • 0:12:15So whereas print by default, recall, always outputs, automatically,
    • 0:12:19a line ending of backslash n.
    • 0:12:20Unless we override it with the named parameter called end,
    • 0:12:24write does not do that.
    • 0:12:25Write takes you literally.
    • 0:12:26And if you say write Hermione, that's it.
    • 0:12:29You're getting the H through the e.
    • 0:12:30If you say, write Harry, you get the H through the y.
    • 0:12:33You don't get any extra new lines automatically.
    • 0:12:36So if you want to have a new line at the end of each of these names,
    • 0:12:40we've got to do that manually.
    • 0:12:42So let me, again, close names.txt, and let me remove the current file.
    • 0:12:46And let me go back up to my code here.
    • 0:12:48And I can fix this in any number of ways,
    • 0:12:49but I'm just going to go ahead and do this.
    • 0:12:51I'm going to write out an f string that contains name and backslash
    • 0:12:55n at the end.
    • 0:12:56We could do this in different ways.
    • 0:12:57We could manually print just the new line or some other technique,
    • 0:13:00but I'm going to go ahead and use my f strings, as I'm in the habit of doing,
    • 0:13:04and just print the name and the new line all at once.
    • 0:13:07I'm going to go ahead now and down to my terminal window, run python of names.py
    • 0:13:11again, Enter.
    • 0:13:12We'll type in Hermione.
    • 0:13:13I'm going to run it again, type in Harry.
    • 0:13:15I'm going to type it again and this time, Ron.
    • 0:13:18Now I'm going to run code of names.txt and open that file.
    • 0:13:22And now it looks like the file is a bit cleaner.
    • 0:13:25Indeed, I have each of the name on its own line
    • 0:13:28as well as a line ending, which ensures that we can separate one
    • 0:13:32from the other.
    • 0:13:33Now, if I were writing code, I bet I could parse, that is, read
    • 0:13:38the previous file by looking at differences
    • 0:13:39between lowercase and uppercase letters.
    • 0:13:41But that's going to get messy quickly.
    • 0:13:43Generally speaking, when storing data long-term in a file,
    • 0:13:46you should probably do it somehow cleanly, like doing one name at a time.
    • 0:13:50Well, let's now go back, and I'll propose
    • 0:13:52that this code is now working correctly, but we
    • 0:13:54can design it a little bit better.
    • 0:13:56It turns out that it's all too easy when writing code to sometimes forget
    • 0:14:00to close files.
    • 0:14:01And sometimes, this isn't necessarily a big deal.
    • 0:14:03But sometimes, it can create problems.
    • 0:14:05Files could get corrupted or accidentally deleted or the like,
    • 0:14:08depending on what happens in your code.
    • 0:14:09So it turns out that you don't strictly need to call close on the file yourself
    • 0:14:14if you take another approach instead.
    • 0:14:16More Pythonic when manipulating files is to do this,
    • 0:14:21to introduce this other keyword called, quite simply,
    • 0:14:25with that allows you to specify that, in this context,
    • 0:14:29I want you to open and automatically close some file.
    • 0:14:33So how do we use with?
    • 0:14:34It simply looks like this.
    • 0:14:35Let me go back to my code here.
    • 0:14:37I've gotten rid of the close line.
    • 0:14:39And I'm now just going to say this instead.
    • 0:14:41Instead of saying, file equals open, I'm going
    • 0:14:44to say, with open, then the same arguments as before,
    • 0:14:48and somewhat curiously, I'm going to put the variable at the end of the line.
    • 0:14:51Why?
    • 0:14:52That's just the way this is done.
    • 0:14:54You say, with, you call the function in question,
    • 0:14:56and then you say as and specify the name of the variable that should
    • 0:15:00be assigned the return value of open.
    • 0:15:03Then I'm going to go ahead and indent the line underneath so
    • 0:15:05that the line of code that's writing the name
    • 0:15:08is now in the context of this with statement, which just ensures that,
    • 0:15:12automatically, if I had more code in this file
    • 0:15:15down below no longer indented, the file would be automatically closed
    • 0:15:19as soon as line 4 is done executing.
    • 0:15:22So it doesn't change what has just happened,
    • 0:15:24but it does automate the process of at least closing things for us
    • 0:15:26just to ensure I don't forget and so that something doesn't go wrong.
    • 0:15:31But suppose, now, that I wanted to read these names from the file.
    • 0:15:35All I've done thus far is write code that writes names to the file.
    • 0:15:38But let's assume, now, that we have all of these names in the file.
    • 0:15:41And heck, let's go ahead and add one more.
    • 0:15:43Let me go ahead and run this one more time-- python of names.py.
    • 0:15:47And let's add in Draco to the mix.
    • 0:15:49So now that we have all four of these names here,
    • 0:15:52how might we want to read them back?
    • 0:15:54Well, let me propose that we go into names.py now,
    • 0:15:57or we could create another program altogether.
    • 0:15:59But I'm going to keep reusing the same name just to keep us focused on this.
    • 0:16:02And now I'm going to write code that reads an existing file with Hermione,
    • 0:16:07Harry, Ron, and Draco together.
    • 0:16:10And how do I do this?
    • 0:16:11Well, it's similar in spirit.
    • 0:16:13I'm going to start this time with with open,
    • 0:16:15and then the first argument is going to be the name of the file
    • 0:16:18that I want to open, as before.
    • 0:16:19And I'm going to open it, this time, in read mode-- quote/unquote, r.
    • 0:16:23And to read a file just means to load it, not to save it.
    • 0:16:27And I'm going to name the return value file.
    • 0:16:30And now I'm going to do this.
    • 0:16:31And there's a number of ways I can do this,
    • 0:16:33but one way to read all of the lines from the file at once would be this.
    • 0:16:37Let me declare a variable called lines.
    • 0:16:39Let me access that file and call a function or a method that
    • 0:16:42comes with it called readlines.
    • 0:16:44So if you read the documentation on File I/O in Python,
    • 0:16:47you'll see that open files come with a special method whose purpose in life
    • 0:16:51is to read all the lines from the file and return them to me as a list.
    • 0:16:56So what this line 2 is doing is it's reading all of the lines
    • 0:16:59from that file, storing them in a variable called lines.
    • 0:17:03Now, suppose I want to iterate over all of those lines
    • 0:17:05and print out each of those names.
    • 0:17:07For line in lines, this is just a standard for loop in Python.
    • 0:17:12Lines as a list.
    • 0:17:13Line is the variable that will be automatically set
    • 0:17:16to each of those lines.
    • 0:17:17Let me go ahead and print out something like, oh, hello, comma,
    • 0:17:22and then I'll print out the line itself.
    • 0:17:25All right, so let me go to my terminal window, run python of names.py now--
    • 0:17:30I have not deleted names.txt, so it still contains all four
    • 0:17:34of those names-- and hit Enter, and OK, it's not bad,
    • 0:17:38but it's a little ugly here.
    • 0:17:41What's going on?
    • 0:17:42When I ran names.py, it's saying Hello to Hermione, to Harry, to Ron,
    • 0:17:45to Draco.
    • 0:17:46But there's these gaps now between the lines.
    • 0:17:50What explains that symptom?
    • 0:17:53If nothing else, it just looks ugly.
    • 0:17:55AUDIENCE: It happens because in the text file,
    • 0:17:57we have new line symbols in between those names,
    • 0:18:01and the print always adds another new line at the end.
    • 0:18:05So you use the same symbol twice.
    • 0:18:08DAVID MALAN: Perfect.
    • 0:18:09And here's a good example of a bug, a mistake in a program.
    • 0:18:12But if you just think about those first principles,
    • 0:18:14like, how do each of the lines of code work that I'm using?
    • 0:18:18You should be able to reason, exactly as Ripal there to say that, all right,
    • 0:18:21well, one of those new lines is coming from the file after each name.
    • 0:18:24And then, of course, print, all of these weeks later,
    • 0:18:26is still giving us for free that extra new line.
    • 0:18:29So there's a couple possible solutions.
    • 0:18:31I could certainly do this, which we've done in the past,
    • 0:18:34and pass in a named argument to print, like end="".
    • 0:18:38And that's fine.
    • 0:18:39I would argue a little better than that might actually
    • 0:18:41be to do this, to strip off of the end of the line the actual new line
    • 0:18:46itself so that print is handling the printing of everything, the person's
    • 0:18:50name as well as the new line.
    • 0:18:52But you're just stripping off what is really just an implementation
    • 0:18:55detail in the file.
    • 0:18:56We chose to use new lines in my text file to separate one name from another.
    • 0:19:01So arguably, it should be a little cleaner in terms of design
    • 0:19:05to strip that off and then let print print out
    • 0:19:07what is really just now a name.
    • 0:19:09But that's ultimately a design decision.
    • 0:19:10The effect is going to be exactly the same.
    • 0:19:14Well, if I'm going to open this file and read all the lines
    • 0:19:18and then iterate over all of those lines and print them each out,
    • 0:19:21I could actually combine this into one thing
    • 0:19:23because, right now, I'm doing twice as much work.
    • 0:19:26I'm reading all of the lines, then I'm iterating over all of the lines just
    • 0:19:30to print out each of them.
    • 0:19:32Well, in Python, with files, you can actually do this.
    • 0:19:34I'm going to erase almost all of these lines
    • 0:19:37now, keeping only with statement at top.
    • 0:19:39And inside of this with statement, I'm going to say this, for line in file,
    • 0:19:45go ahead and print out, quote/unquote, hello, comma, and then line.rstrip.
    • 0:19:50So I'm going to take the approach of stripping off the end of the line.
    • 0:19:53But notice how elegant this is, so to speak.
    • 0:19:57I've opened the file in line 1.
    • 0:19:59And if I want to iterate over every line in the file,
    • 0:20:01I don't have to very explicitly read all the lines,
    • 0:20:05then iterate over all of the lines.
    • 0:20:06I can combine this into one thought.
    • 0:20:08In Python, you can simply say, for line in file,
    • 0:20:11and that's going to have the effect of giving you a for loop that iterates
    • 0:20:14over every line in the file, one at a time, and on each iteration,
    • 0:20:18updating the value of this variable line to be Hermione,
    • 0:20:22then Harry, then Ron, then Draco.
    • 0:20:24So this, again, is one of the appealing aspects of Python
    • 0:20:28is that it reads rather like English-- for line in file, print this.
    • 0:20:32It's a little more compact when written this way.
    • 0:20:35Well, what if, though, I don't want quite this behavior?
    • 0:20:38Because notice now, if I run python of names.py, it's correct.
    • 0:20:42I'm seeing each of the names and each of the hellos,
    • 0:20:45and there's no extra spaces in between.
    • 0:20:47But just to be difficult, I'd really like us to be sorting these hellos.
    • 0:20:52Really, I'd like to see Draco first, then Harry, then Hermione, then Ron,
    • 0:20:56no matter what order they appear in the file.
    • 0:20:58So I could go in, of course, to the file and manually change the file.
    • 0:21:02But if that file is changing over time based
    • 0:21:03on who is typing their name into the program,
    • 0:21:06that's not really a good solution.
    • 0:21:07In code, I should be able to load the file, no matter what it looks
    • 0:21:10like, and just sort it all at once.
    • 0:21:12Now, here is a reason to not do what I've just done.
    • 0:21:17I can't iterate over each line in the file and print it out
    • 0:21:21but sort everything in advance.
    • 0:21:23Logically, if I'm looking at each line one at a time and printing it out,
    • 0:21:27it's too late to sort.
    • 0:21:29I really need to read all of the lines first without printing them,
    • 0:21:32sort them, then print them.
    • 0:21:34So we have to take a step back in order to add now this new feature.
    • 0:21:38So how can I do this?
    • 0:21:39Well, let me combine some ideas from before.
    • 0:21:42Let me go ahead and start fresh with this.
    • 0:21:44Let me give myself a list called names, and assign it an empty list,
    • 0:21:48just so I have a variable in which to accumulate all of these lines.
    • 0:21:52And now let me open the file with open, quote/unquote, names.txt.
    • 0:21:56And it turns out, I can tighten this up a little bit.
    • 0:21:58It turns out, if you're opening a file to read it,
    • 0:22:00you don't need to specify, quote/unquote, r.
    • 0:22:03That is the implicit default.
    • 0:22:05So you can tighten things up by just saying, open names.txt.
    • 0:22:08And you'll be able to read the file but not write it.
    • 0:22:10I'm going to give myself a variable called file, as before.
    • 0:22:13I am going to iterate over the file in the same way, for line in file.
    • 0:22:17But instead of printing each line, I'm going to do this.
    • 0:22:21I'm going to take my names list and append to it.
    • 0:22:25And this is appending to a list in memory,
    • 0:22:27not appending to the file itself.
    • 0:22:30I'm going to go ahead and append the current line,
    • 0:22:32but I'm going to strip off the new line at the end
    • 0:22:35so that all I'm adding to this list is each of the students' names.
    • 0:22:39Now I can use that familiar technique from before.
    • 0:22:42Let me go outside of this with statement because now I've read the entire file,
    • 0:22:46presumably.
    • 0:22:47So by the time I'm done with lines 4 and 5,
    • 0:22:50again, and again, and again, for each line in the file,
    • 0:22:52I'm done with the file.
    • 0:22:53It can close.
    • 0:22:54I now have all of the students' names in this list variable.
    • 0:22:57Let me do this.
    • 0:22:58For name in, not just names, but the sorted names,
    • 0:23:04using our Python function sorted, which does just that, and do print,
    • 0:23:08quote/unquote, with an f string, hello, comma,
    • 0:23:10and now I'll plug in bracket name.
    • 0:23:13So now, what have I done?
    • 0:23:15I'm creating a list at the beginning, just
    • 0:23:18so I have a place to gather my data.
    • 0:23:20I then, on lines 3 through 5, iterate over the file from top to bottom,
    • 0:23:23reading in each line, one at a time, stripping off the new line
    • 0:23:27and adding just the student's name to this list.
    • 0:23:29And the reason I'm doing that is so that on line 7,
    • 0:23:32I can sort all of those names, now that they're all in memory,
    • 0:23:35and print them in order.
    • 0:23:37I need to load them all into memory before I can sort them.
    • 0:23:40Otherwise, I'd be printing them out prematurely,
    • 0:23:42and Draco would end up last instead of first.
    • 0:23:45So let me go ahead in my terminal window and run python of names.py
    • 0:23:48now, and hit Enter.
    • 0:23:50And there we go.
    • 0:23:51The same list of four hellos, but now they're sorted.
    • 0:23:54And this is a very common technique.
    • 0:23:56When dealing with files and information more
    • 0:23:58generally, if you want to change that data in some way, like sorting it,
    • 0:24:03creating some kind of variable at the top of your program, like a list,
    • 0:24:06adding or appending information to it just to collect it in one place,
    • 0:24:10and then do something interesting with that collection, that list,
    • 0:24:14is exactly what I've done here.
    • 0:24:16Now, I should note that if we just want to sort the file,
    • 0:24:18we can actually do this even more simply in Python, particularly
    • 0:24:21by not bothering with this names list, nor the second for loop.
    • 0:24:25And let me go ahead and, instead, just do more simply this.
    • 0:24:28Let me go ahead and tell Python that we want the file
    • 0:24:31itself to be sorted using that same sorted function,
    • 0:24:34but this time on the file itself.
    • 0:24:36And then inside of that for loop, let's just go ahead and print
    • 0:24:38right away our hello, comma, followed by the line itself,
    • 0:24:42but still stripping off of the end of it any white space therein.
    • 0:24:46If we go ahead and run this same program now
    • 0:24:48with python of names.py and hit Enter, we get the same result.
    • 0:24:51But of course, it's a lot more compact.
    • 0:24:53But for the sake of discussion, let's assume
    • 0:24:55that we do actually want to potentially make some changes to the data
    • 0:24:59as we iterate over it.
    • 0:25:00So let me undo those changes, leave things as is.
    • 0:25:03Whereby now, we'll continue to accumulate all of the names first
    • 0:25:06into a list, maybe do something to them, maybe forcing them
    • 0:25:08to uppercase or lowercase or the like, and then sort and print out each item.
    • 0:25:13Let me pause and see if there's any questions
    • 0:25:15now on File I/O reading or writing or now accumulating all of these values
    • 0:25:21in some list.
    • 0:25:22AUDIENCE: Hi.
    • 0:25:22Is there a way to sort the files--
    • 0:25:25instead if you want it from alphabetically from A to Z,
    • 0:25:29is there a way to reverse it from Z to A.
    • 0:25:32Is there a little extension that you can add to the end to do that?
    • 0:25:35Or would you have to create a new function?
    • 0:25:37DAVID MALAN: If you wanted to reverse the contents of the file?
    • 0:25:40AUDIENCE: Yeah, so if you, instead of sorting them from A to Z
    • 0:25:43in ascending order, if you wanted them in descending order,
    • 0:25:47is there an extension for that?
    • 0:25:49DAVID MALAN: There is, indeed.
    • 0:25:50And as always, the documentation is your friend.
    • 0:25:53So if the goal is to sort them, not in alphabetical order, which
    • 0:25:55is the default, but maybe reverse alphabetical order,
    • 0:25:58you can take a look, for instance, at the formal Python documentation there.
    • 0:26:01And what you'll see is this summary.
    • 0:26:03You'll see that the sorted function takes the first argument, generally
    • 0:26:06known as an iterable.
    • 0:26:08And something that's iterable means that you can iterate over it.
    • 0:26:11That is you can loop over it one thing at a time.
    • 0:26:13What the rest of this line here means is that you can specify a key, like,
    • 0:26:17how you want to sort it, but more on that later.
    • 0:26:19But this last named parameter here is reverse.
    • 0:26:22And by default, per the documentation, it's false.
    • 0:26:25It will not be reversed by default. But if we change that to true,
    • 0:26:28I bet we can do that.
    • 0:26:29So let me go back to VS Code here and do just that.
    • 0:26:32Let me go ahead and pass in a second argument
    • 0:26:34to sorted in addition to this iterable, which is my names list--
    • 0:26:38iterable, again, in the sense that it can be looped over.
    • 0:26:42And let me pass in reverse=True, thereby overriding the default of false.
    • 0:26:47Let me now run python of names.py.
    • 0:26:49And now Ron's at the top, and Draco's at the bottom.
    • 0:26:53So there, too, whenever you have a question like that moving forward,
    • 0:26:56consider, what does the documentation say?
    • 0:26:58And see if there's a germ of an idea there because, odds are,
    • 0:27:01if you have some problem, odds are, some programmer
    • 0:27:03before you has had the same question.
    • 0:27:05Other thoughts?
    • 0:27:07AUDIENCE: Can we limit the number or numbers of names?
    • 0:27:11And the second question, can we find a specific name in list?
    • 0:27:15DAVID MALAN: Really good question, can we
    • 0:27:17limit the number of the names in the file?
    • 0:27:19And can we find a specific one?
    • 0:27:20We absolutely could.
    • 0:27:22If we were to write code, we could, for instance,
    • 0:27:25open the file first, count how many lines are already there,
    • 0:27:29and then if there's too many already, we could just
    • 0:27:32exit with sys.exit or some other message to indicate to the user
    • 0:27:35that, sorry, the class is full.
    • 0:27:37As for finding someone specifically, absolutely.
    • 0:27:40You could imagine opening the file, iterating over it with a for loop
    • 0:27:44again and again and then adding a conditional.
    • 0:27:46Like, if the current line equals equals Harry, then we found the chosen run.
    • 0:27:51And you can print something like that.
    • 0:27:52So you can absolutely combine these ideas with previous ideas,
    • 0:27:55like conditionals, to ask those same questions.
    • 0:27:58How about one other question on File I/O?
    • 0:28:02AUDIENCE: So I just thought about this function, like read all the lines.
    • 0:28:08And it looks like it's separate all the lines
    • 0:28:14by this special character, backslash.
    • 0:28:17And but it looks like we don't need it character, and we always strip it.
    • 0:28:24And it looks like some bad design or function.
    • 0:28:28Why wouldn't we just strip it inside this function?
    • 0:28:33DAVID MALAN: A really good question.
    • 0:28:35So we are, in my examples thus far, using rstrip
    • 0:28:40to strip from the end of the line all of this white space.
    • 0:28:43You might not want to do that.
    • 0:28:45In this case, I am stripping it away because I know that each of those lines
    • 0:28:49isn't some generic line of text.
    • 0:28:51Each line really represents a name that I have put there myself.
    • 0:28:55I'm using the new line just to separate one value from another.
    • 0:28:58In other scenarios, you might very well want
    • 0:29:00to keep that line ending because it's a very long series of text,
    • 0:29:03or a paragraph, or something like that, where you want
    • 0:29:06to keep it distinct from the others.
    • 0:29:07But it's just a convention.
    • 0:29:09We have to use something, presumably, to separate one chunk of text
    • 0:29:13from another.
    • 0:29:14There are other functions in Python that will, in fact, handle the removal
    • 0:29:18of that white space for you.
    • 0:29:20Readlines, though, does literally that, though.
    • 0:29:22It reads all of the lines as is.
    • 0:29:25Well, allow me to turn our attention back to where we left off here,
    • 0:29:28which is just names to propose that, with names.txt, we have an ability,
    • 0:29:33it seems, to store each of these names pretty straightforwardly.
    • 0:29:36But what if we wanted to keep track of other information as well?
    • 0:29:39Suppose that we wanted to store information,
    • 0:29:42including a student's name and their house at Hogwarts,
    • 0:29:47be it Gryffindor, or Slytherin, or something else.
    • 0:29:50Well, where do we go about putting that?
    • 0:29:52Hermione lives in Gryffindor, so we could do something
    • 0:29:55like this in our text file.
    • 0:29:56Harry lives in Gryffindor, so we could do that.
    • 0:29:58Ron lives in Gryffindor, so we could do that.
    • 0:30:01And Draco lives in Slytherin, so we could do that.
    • 0:30:03But I worry here--
    • 0:30:06but I worry now that we're mixing apples and oranges, so to speak.
    • 0:30:09Some lines are names.
    • 0:30:11Some lines are houses.
    • 0:30:12So this probably isn't the best design, if only because it's confusing,
    • 0:30:15or it's ambiguous.
    • 0:30:17So maybe what we could do is adopt a convention.
    • 0:30:19And indeed, this is, in fact, what a lot of programmers do.
    • 0:30:22They change this file not to be names.txt, but instead, let
    • 0:30:26me create a new file called names.csv.
    • 0:30:28CSV stands for Comma-Separated Values.
    • 0:30:31And it's a very common convention to store multiple pieces of information
    • 0:30:35that are related in the same file.
    • 0:30:37And so to do this, I'm going to separate each of these types of data,
    • 0:30:41not with another new line, but simply with a comma.
    • 0:30:44I'm going to keep each student on their own line,
    • 0:30:46but I'm going to separate the information about each student using
    • 0:30:49a comma instead.
    • 0:30:51And so now we sort of have a two-dimensional file, if you will.
    • 0:30:54Row by row, we have our students.
    • 0:30:56But if you think of these commas as representing a column,
    • 0:30:59even though it's not perfectly straight because of the lengths of these names,
    • 0:31:02it's a little jagged.
    • 0:31:05You can think of these commas as representing a column.
    • 0:31:07And it turns out, these CSV files are very commonly
    • 0:31:11used when you use something like Microsoft Excel, Apple Numbers,
    • 0:31:14or Google Spreadsheets, and you want to export the data to share
    • 0:31:17with someone else as a CSV file.
    • 0:31:20Or conversely, if you want to import a CSV
    • 0:31:23file into your preferred spreadsheet software,
    • 0:31:25like Excel, or Numbers, or Google Spreadsheets, you can do that as well.
    • 0:31:29So CSV is a very common, very simple text format
    • 0:31:33that just separates values with commas and different types of values,
    • 0:31:37ultimately, with new lines as well.
    • 0:31:39Let me go ahead and run code of students.csv
    • 0:31:42to create a brand-new file that's initially empty.
    • 0:31:44And we'll add to it those same names but also some other information as well.
    • 0:31:48So if I now have this new file, students.csv, inside of which
    • 0:31:52is one column of names, so to speak, and one column of houses,
    • 0:31:56how do I go about changing my code to read not just those names but also
    • 0:32:00those names and houses so that they're not all on one line--
    • 0:32:03we somehow have access to both type of value separately?
    • 0:32:06Well, let me go ahead and create a new program here called students.py.
    • 0:32:11And in this program, let's go about reading,
    • 0:32:13not a text file, per se, but a specific type of text file, a CSV,
    • 0:32:17a Comma-Separated Values file.
    • 0:32:19And to do this, I'm going to use similar code as before.
    • 0:32:22I'm going to say with open, quote/unquote, students.csv.
    • 0:32:26I'm not going to bother specifying, quote/unquote,
    • 0:32:28r because, again, that's the default.
    • 0:32:30But I'm going to give myself a variable name of file.
    • 0:32:33And then in this file, I'm going to go ahead and do this.
    • 0:32:36For line in file, as before, and now I have to be a bit clever here.
    • 0:32:41Let me go back to students.csv, looking at this file,
    • 0:32:45and it seems that on my loop on each iteration,
    • 0:32:47I'm going to get access to the whole line of text.
    • 0:32:51I'm not going to automatically get access
    • 0:32:52to just Hermione or just Gryffindor.
    • 0:32:55Recall that the loop is going to give me each full line of text.
    • 0:32:58So logically, what would you propose that we
    • 0:33:01do inside of a for loop that's reading a whole line of text at once,
    • 0:33:05but we now want to get access to the individual values,
    • 0:33:08like Hermione and Gryffindor, Harry and Gryffindor?
    • 0:33:11How do we go about taking one line of text
    • 0:33:14and gaining access to those individual values, do you think?
    • 0:33:16Just instinctively, even if you're not sure what the name of the functions
    • 0:33:20would be.
    • 0:33:20AUDIENCE: You can access it as you would as if you were using a dictionary,
    • 0:33:24like using a key and value.
    • 0:33:26DAVID MALAN: So ideally, we would access it using it a key and value.
    • 0:33:29But at this point in the story, all we have is this loop,
    • 0:33:32and this loop is giving me one line of text that is the time.
    • 0:33:35I'm the programmer now.
    • 0:33:36I have to solve this.
    • 0:33:37There is no dictionary yet in question.
    • 0:33:39How about another suggestion here?
    • 0:33:41AUDIENCE: So you can somehow split the two words based on the comma?
    • 0:33:45DAVID MALAN: Yeah, even if you're not quite
    • 0:33:47sure what function is going to do this, intuitively,
    • 0:33:49you want to take this whole line of text--
    • 0:33:51Hermione, comma, Gryffindor, Harry, comma, Gryffindor, and so forth--
    • 0:33:55and split that line into two pieces, if you will.
    • 0:33:58And it turns out wonderfully, the function we'll use
    • 0:34:00is actually called split that can split on any characters,
    • 0:34:03but you can tell it what character to use.
    • 0:34:06So I'm going to go back into students.py, and inside of this loop,
    • 0:34:09I'm going to go ahead and do this.
    • 0:34:11I'm going to take the current line.
    • 0:34:12I'm going to remove the white space at the end, as always, using rstrip here.
    • 0:34:17And then whatever the result of that is, I'm
    • 0:34:19going to now call split and, quote/unquote, comma.
    • 0:34:23So the split function or method comes with strings.
    • 0:34:27Strs in Python-- any str has this method built-in.
    • 0:34:31And if you pass in an argument, like a comma, what this split function will do
    • 0:34:36is split that current string into 1, 2, 3, maybe more pieces by looking
    • 0:34:41for that character again and again.
    • 0:34:46Ultimately, split is going to return to us
    • 0:34:48a list of all of the individual parts to the left
    • 0:34:51and to the right of those commas.
    • 0:34:53So I can give myself a variable called row here.
    • 0:34:55And this is a common paradigm.
    • 0:34:57When you know you're iterating over a file, specifically a CSV,
    • 0:35:01it's common to think of each line of it as being
    • 0:35:04a row and each of the values therein separated by commas as columns,
    • 0:35:09so to speak.
    • 0:35:10So I'm going to deliberately name my variable row, just
    • 0:35:13to be consistent with that convention.
    • 0:35:14And now what do I want to print?
    • 0:35:17Well, I'm going to go ahead and say this.
    • 0:35:19Print, how about the following, an f string that starts with curly braces--
    • 0:35:26well, how do I get access to the first thing in that row?
    • 0:35:29Well, the row is going to have how many parts?
    • 0:35:31Two, because if I'm splitting on commas, and there's one comma per line,
    • 0:35:35that's going to give me a left part and a right part,
    • 0:35:37like Hermione and Gryffindor, Harry and Gryffindor.
    • 0:35:41When I have a list like row, how do I get access to individual values?
    • 0:35:45Well, I can do this.
    • 0:35:47I can say, row, bracket, 0.
    • 0:35:50And that's going to go to the first element of the list, which
    • 0:35:52should hopefully be the student's name.
    • 0:35:54Then after that, I'm going to say, is in,
    • 0:35:57and I'm going to have another curly brace here for row, bracket, 1.
    • 0:36:01And then I'm going to close my whole quote.
    • 0:36:03So it looks a little cryptic at first glance.
    • 0:36:05But most of this is just f string syntax with curly braces to plug in values.
    • 0:36:09And what values am I plugging in?
    • 0:36:11Well, row, again, is a list, and it has two elements, presumably--
    • 0:36:15Hermione in one and Gryffindor in the other, and so forth.
    • 0:36:19So bracket 0 is the first element because, remember,
    • 0:36:22we start indexing at 0 in Python.
    • 0:36:25And 1 is going to be the second element.
    • 0:36:27So let me go ahead and run this now and see what happens--
    • 0:36:30python of students.py, Enter.
    • 0:36:35And we see Hermione is in Gryffindor.
    • 0:36:37Harry's in Gryffindor.
    • 0:36:38Ron is in Gryffindor.
    • 0:36:39And Draco is in Slytherin.
    • 0:36:41So we have now implemented our own code from scratch that actually parses,
    • 0:36:48that is, reads and interprets a CSV file ultimately here.
    • 0:36:53Now, let me pause to see if there's any questions.
    • 0:36:55But we'll make this even easier to read in just a moment.
    • 0:36:59Any questions on what we've just done here by splitting by comma?
    • 0:37:03AUDIENCE: So my question is, can we edit any line of code any time we want?
    • 0:37:08Or the only option that we have is to append the lines?
    • 0:37:13Or let's say, we want to, let's say, change Harry's house
    • 0:37:18to Slytherin or some other house.
    • 0:37:22DAVID MALAN: Yeah, a really good question.
    • 0:37:24What if you want to, in Python, change a line in the file and not just
    • 0:37:28append to the end?
    • 0:37:30You would have to implement that logic yourself.
    • 0:37:32So for instance, you could imagine now opening the file
    • 0:37:35and reading all of the contents in, then maybe iterating over
    • 0:37:39each of those lines.
    • 0:37:40And as soon as you see that the current name equals equals Harry,
    • 0:37:43you could maybe change his house to Slytherin.
    • 0:37:47And then it would be up to you, though, to write all of those changes
    • 0:37:51back to the file.
    • 0:37:52So in that case, you might want to, in simplest form,
    • 0:37:54read the file once and let it close.
    • 0:37:56Then open it again, but open for writing, and change the whole file.
    • 0:38:00It's not really possible or easy to go in and change just part of the file,
    • 0:38:04though you can do it.
    • 0:38:05It's easier to actually read the whole file, make your changes in memory,
    • 0:38:09then write the whole file out.
    • 0:38:11But for larger files where that might be quite slow,
    • 0:38:13you can be more clever than that.
    • 0:38:16Well, let me propose now that we clean this up a little bit because I actually
    • 0:38:19think this is a little cryptic to read-- row, bracket, 0, row, bracket,
    • 0:38:231-- it's not that well-written at the moment, I would say.
    • 0:38:27But it turns out that when you have a variable that's a list like row,
    • 0:38:32you don't have to throw all of those variables into a list.
    • 0:38:35You can actually unpack that whole sequence at once.
    • 0:38:38That is to say, if you know that a function like split returns a list,
    • 0:38:42but you know in advance that it's going to return
    • 0:38:45two values in a list, the first and the second,
    • 0:38:48you don't have to throw them all into a variable that itself is a list.
    • 0:38:51You can actually unpack them simultaneously into two variables,
    • 0:38:55doing name, comma, house.
    • 0:38:57So this is a nice Python technique to not only create, but assign,
    • 0:39:01automatically, in parallel, two variables at once,
    • 0:39:05rather than just one.
    • 0:39:06So this will have the effect of putting the name in the left, Hermione,
    • 0:39:10and it will have the effect of putting Gryffindor
    • 0:39:12the house in the right variable.
    • 0:39:14And we now no longer have a row.
    • 0:39:15We can now make our code a little more readable by now literally just saying
    • 0:39:18name down here and, for instance, house down here.
    • 0:39:22So just a little more readable, even though, functionally, the code
    • 0:39:25now is exactly the same.
    • 0:39:28All right, so this now works.
    • 0:39:30And I'll confirm as much by just running it once more-- python of students.py,
    • 0:39:34Enter.
    • 0:39:34And we see that the text is as intended.
    • 0:39:37But suppose, for the sake of discussion, that I'd
    • 0:39:39like to sort this list of output.
    • 0:39:42I'd like to say hello, again, to Draco first, then hello to Harry,
    • 0:39:46then Hermione, then Ron.
    • 0:39:47How can I go about doing this?
    • 0:39:49Well, let's take some inspiration from the previous example, where
    • 0:39:52we were only dealing with names and, instead, do it with these full phrases.
    • 0:39:57So and so is in house.
    • 0:39:59Well, let me go ahead and do this.
    • 0:40:01I'm going to go ahead and start scratch and give myself a list called students,
    • 0:40:05equal to an empty list, initially.
    • 0:40:07And then with open students.csv as file, I'm going to go ahead and say this--
    • 0:40:14for line in file.
    • 0:40:16And then below this, I'm going to do exactly as before-- name, comma,
    • 0:40:19house equals the current line, stripping off the white space at the end,
    • 0:40:23splitting it on a comma--
    • 0:40:24so that's exact same as before.
    • 0:40:26But this time, before I go about printing the sentence,
    • 0:40:32I'm going to store it temporarily in a list
    • 0:40:34so that I can accumulate all of these sentences and then sort them later.
    • 0:40:38So let me go ahead and do this.
    • 0:40:39Students, which is my list, .append--
    • 0:40:42let me append the actual sentence I want to show
    • 0:40:45on the screen-- so another f string.
    • 0:40:46So name is in house, just as before.
    • 0:40:50But notice, I'm not printing that sentence.
    • 0:40:52I'm appending it to my list-- not a file, but to my list.
    • 0:40:56Why am I doing this?
    • 0:40:58Well, just because, as before, I want to do this.
    • 0:41:00For student in the sorted students, I want
    • 0:41:04to go ahead and print out students, like this.
    • 0:41:07Well, let me go ahead and run python of students.py, and hit Enter now.
    • 0:41:11And I think we'll see, indeed, Draco is now first.
    • 0:41:14Harry is second.
    • 0:41:15Hermione is third.
    • 0:41:16And Ron is fourth.
    • 0:41:18But this is arguably a little sloppy, right?
    • 0:41:21It seems a little hackish that I'm constructing these sentences.
    • 0:41:25And even though I technically want to sort by name,
    • 0:41:29I'm technically sorting by these whole English sentences.
    • 0:41:32So it's not wrong.
    • 0:41:33It's achieving the intended result, but it's not really
    • 0:41:36well designed because I'm just getting lucky that English
    • 0:41:39is reading from left to right.
    • 0:41:40And therefore, when I print this out, it's sorting properly.
    • 0:41:43It would be better, really, to come up with a technique for sorting
    • 0:41:46by the students' names, not by some English sentence
    • 0:41:50that I've constructed here on line 6.
    • 0:41:53So to achieve this, I'm going to need to make my life more complicated
    • 0:41:57for a moment.
    • 0:41:57And I'm going to need to collect information about each student
    • 0:42:02before I bother assembling that sentence.
    • 0:42:04So let me propose that we do this.
    • 0:42:06Let me go ahead and undo these last few lines of code
    • 0:42:09so that we currently have two variables, name and house, each of which
    • 0:42:14has name and the student's house respectively.
    • 0:42:16And we still have our global variable, students.
    • 0:42:19But let me do this.
    • 0:42:20Recall that Python supports dictionaries.
    • 0:42:22And dictionaries are just collections of keys and values.
    • 0:42:25So you can associate something with something else,
    • 0:42:28like, a name with Hermione, like, a house with Gryffindor.
    • 0:42:32That really is a dictionary.
    • 0:42:33So let me do this.
    • 0:42:34Let me temporarily create a dictionary that stores this association of name
    • 0:42:39with house.
    • 0:42:40Let me go ahead and do this.
    • 0:42:42Let me say that the student here is going to be represented initially
    • 0:42:45by an empty dictionary.
    • 0:42:46And just like you can create an empty list with square brackets,
    • 0:42:49you can create an empty dictionary with curly braces.
    • 0:42:51So give me an empty dictionary that will soon have two keys, name and house.
    • 0:42:57How do I do that?
    • 0:42:58Well, I could do it this way-- student, open bracket,
    • 0:43:01name equals the student's name that we got from the line.
    • 0:43:05Student, bracket, house equals the house that we got from the line.
    • 0:43:10And now I'm going to append to the students list--
    • 0:43:14plural-- that particular student.
    • 0:43:17Now, why have I done this?
    • 0:43:18I've admittedly made my code more complicated.
    • 0:43:21It's more lines of code, but I've now collected
    • 0:43:23all of the information I have about students while still keeping
    • 0:43:27track-- what's a name, what's a house.
    • 0:43:29The list, meanwhile, has all of the students' names and houses together.
    • 0:43:34Now, why have I done this?
    • 0:43:35Well, let me, for the moment, just do something simple.
    • 0:43:38Let me do for student in students, and let me very simply now say, print
    • 0:43:43the following f string, the current student with this name
    • 0:43:48is in this current student's house.
    • 0:43:53And now notice one detail.
    • 0:43:55Inside of this f string, I'm using my curly braces, as always.
    • 0:43:59I'm using, inside of those curly braces, the name of a variable, as always.
    • 0:44:03But then I'm using not bracket 0 or 1 because these are dictionaries now,
    • 0:44:07not list.
    • 0:44:08But why am I using single quotes to surround house and to surround name?
    • 0:44:16Why single quotes inside of this f string to access those keys?
    • 0:44:25AUDIENCE: Yes, because you have double quotes in that line 12.
    • 0:44:30And so you have to tell Python to differentiate.
    • 0:44:34DAVID MALAN: Exactly, because I'm already
    • 0:44:35using double quotes outside of the f string, if I want to put quotes
    • 0:44:39around any strings on the inside, which I do
    • 0:44:41need to do for dictionaries because, recall, when you index
    • 0:44:44into a dictionary, you don't use numbers like lists--
    • 0:44:470, 1, 2, onward--
    • 0:44:49you, instead, use strings, which need to be quoted.
    • 0:44:51But if you're already using double quotes,
    • 0:44:53it's easiest to then use single quotes on the inside,
    • 0:44:55so Python doesn't get confused about what lines up with what.
    • 0:44:59So at the moment, when I run this program,
    • 0:45:02it's going to print out those hellos.
    • 0:45:04But they're not yet sorted.
    • 0:45:05In fact, what I now have is a list of dictionaries,
    • 0:45:10and nothing is yet sorted.
    • 0:45:12But let me tighten up the code too to point out that it
    • 0:45:14doesn't need to be quite as verbose.
    • 0:45:16If you're in the habit of creating an empty dictionary, like this on line 6,
    • 0:45:20and then immediately putting in two keys, name and house,
    • 0:45:23each with two values, name and house respectively, you
    • 0:45:26can actually do this all at once.
    • 0:45:27So let me show you a slightly different syntax.
    • 0:45:29I can do this.
    • 0:45:30Give me a variable called student, and let me use curly braces
    • 0:45:34on the right-hand side here.
    • 0:45:35But instead of leaving them empty, let's just define those keys
    • 0:45:38and those values now.
    • 0:45:40Quote/unquote name will be name, and quote/unquote house will be house.
    • 0:45:45This achieves the exact same effect in one line instead of three.
    • 0:45:49It creates a new non-empty dictionary containing a name key,
    • 0:45:53the value of which is the student's name,
    • 0:45:55and a house key, the value of which is the student's house.
    • 0:45:58Nothing else needs to change.
    • 0:45:59That will still just work so that if I, again, run python of students.py,
    • 0:46:03I'm still seeing those greetings, but they're still
    • 0:46:06not quite actually sorted.
    • 0:46:08Well, what might I go about doing here in order to--
    • 0:46:12what could I do to improve upon this further?
    • 0:46:15Well, we need some mechanism now of sorting those students.
    • 0:46:19But unfortunately, you can't do this.
    • 0:46:22We can't sort all of the students now because those students are not names
    • 0:46:28like they were before.
    • 0:46:29They aren't sentences like they were before.
    • 0:46:31Each of the students is a dictionary, and it's not obvious
    • 0:46:34how you would sort a dictionary inside of a list.
    • 0:46:37So ideally, what do we want to do?
    • 0:46:40If at the moment we hit line 9, we have a list of all of these students,
    • 0:46:45and inside of that list is one dictionary per student,
    • 0:46:48and each of those dictionaries has two keys, name and house,
    • 0:46:52wouldn't it be nice if there were way in code to tell Python, sort this list
    • 0:46:57by looking at this key in each dictionary?
    • 0:46:59Because that would give us the ability to sort either by name, or even
    • 0:47:03by house, or even by any other field that we add to that file.
    • 0:47:07So it turns out, we can do this.
    • 0:47:09We can tell the sorted function not just to reverse things or not.
    • 0:47:14It takes another positional--
    • 0:47:16it takes another named parameter called key,
    • 0:47:19where you can specify what key should be used in order to sort
    • 0:47:23some list of dictionaries.
    • 0:47:25And I'm going to propose that we do this.
    • 0:47:27I'm going to first define a function-- temporarily, for now-- called get_name.
    • 0:47:31And this function's purpose in life, given a student,
    • 0:47:35is to, quite simply, return the student's name
    • 0:47:38from that particular dictionary.
    • 0:47:40So if student is a dictionary, this is going to return literally
    • 0:47:43the student's name, and that's it.
    • 0:47:45That's the sole purpose of this function in life.
    • 0:47:48What do I now want to do?
    • 0:47:50Well now that I have a function that, given a student,
    • 0:47:52will return to me the student's name, I can do this.
    • 0:47:56I can change sorted to say, use a key that's
    • 0:47:59equal to whatever the return value of get_name is.
    • 0:48:03And this now is a feature of Python.
    • 0:48:05Python allows you to pass functions as arguments into other functions.
    • 0:48:12So get_name is a function.
    • 0:48:14Sorted is a function.
    • 0:48:15And I'm passing in get_name to sorted as the value of that key parameter.
    • 0:48:22Now, why am I doing that?
    • 0:48:24Well, if you think of the get_name function,
    • 0:48:26it's just a block of code that will get the name of a student.
    • 0:48:30That's handy because that's the capability that sorted needs.
    • 0:48:33When given a list of students, each of which is a dictionary,
    • 0:48:36sorted needs to know, how do I get the name of the student?
    • 0:48:38In order to do alphabetical sorting for you.
    • 0:48:40The authors of Python didn't know that we
    • 0:48:42were going to be creating students here in this class,
    • 0:48:44so they couldn't have anticipated writing code in advance
    • 0:48:47that specifically sorts on a field called student, let alone called name,
    • 0:48:51let alone house.
    • 0:48:53So what did they do?
    • 0:48:54They instead built into the sorted function
    • 0:48:57this named parameter key that allows us, all these years later,
    • 0:49:01to tell their function sorted how to sort this list of dictionaries.
    • 0:49:06So now watch what happens.
    • 0:49:07If I run python of students.py and hit Enter,
    • 0:49:11I now have a sorted list of output.
    • 0:49:14Why?
    • 0:49:14Because now that list of dictionaries has all
    • 0:49:17been sorted by the student's name.
    • 0:49:20I can further do this.
    • 0:49:22If, as before, we want to reverse the whole thing by saying reverse
    • 0:49:24equals true, we can do that too.
    • 0:49:26Let me rerun Python of students.py, and hit Enter.
    • 0:49:28Now it's reversed.
    • 0:49:29Now it's Ron, then Hermione, Harry, and Draco.
    • 0:49:32But we can do something different as well.
    • 0:49:34What if I want to sort, for instance, by house name reversed?
    • 0:49:39I could do this.
    • 0:49:40I could change this function from get_name to get_house.
    • 0:49:43I could change the implementation up here to be get_house.
    • 0:49:46And I can return not the student's name but the student's house.
    • 0:49:49And so now notice, if I run python of students.py, Enter, notice now
    • 0:49:56it is sorted by house in reverse order.
    • 0:49:59Slytherin is first, and then Gryffindor.
    • 0:50:02If I get rid of the reverse but keep the get_house and rerun this program,
    • 0:50:07now it's sorted by house.
    • 0:50:09Gryffindor is first, and Slytherin is last.
    • 0:50:11And the upside now of this is, because I'm using this list of dictionaries
    • 0:50:15and keeping the students data together until the last minute
    • 0:50:19when I'm finally doing the printing, I now
    • 0:50:21have full control over the information itself, and I can sort by this or that.
    • 0:50:25I don't have to construct those sentences in advance, like I
    • 0:50:29rather hackishly did the first time.
    • 0:50:31All right, that was a lot.
    • 0:50:32Let me pause here to see if there are questions.
    • 0:50:36AUDIENCE: So when we are sorting the files, every time,
    • 0:50:40should we use the loops, or a text dictionary, or any kind of list?
    • 0:50:48Can we sort by just sorting, not looping or any kind of stuff?
    • 0:50:55DAVID MALAN: A good question, and the short answer with Python
    • 0:50:58alone, you're the programmer.
    • 0:51:00You need to do the sorting.
    • 0:51:01With libraries and other techniques, absolutely.
    • 0:51:05You can do more of this automatically because someone else
    • 0:51:08has written that code.
    • 0:51:09What we're doing at the moment is doing everything from scratch ourselves.
    • 0:51:12But absolutely, with other functions or libraries, some of this
    • 0:51:15could be made more easily done.
    • 0:51:18Some of this could be made easier.
    • 0:51:20Other questions on this technique here?
    • 0:51:23AUDIENCE: If equal to the return value of the function,
    • 0:51:28can it be equal to just a variable or a value?
    • 0:51:36DAVID MALAN: Well, yes.
    • 0:51:37It should equal a value.
    • 0:51:39And I should clarify, actually, since this was not obvious.
    • 0:51:42So when you pass in a function like get_name or get_house
    • 0:51:46to the sorted function as the value of key,
    • 0:51:49that function is automatically called by the sorted function for you
    • 0:51:55on each of the dictionaries in the list.
    • 0:51:58And it uses the return value of get_name or get_house
    • 0:52:02to decide what strings to actually use to compare in order to decide
    • 0:52:07which is alphabetically correct.
    • 0:52:09So this function, which you pass just by name, you
    • 0:52:12do not pass in parentheses at the end, is
    • 0:52:14called by the sorted function in order to figure out for you
    • 0:52:18how to compare these same values.
    • 0:52:21AUDIENCE: How can we use nested dictionaries?
    • 0:52:25I have read about nested dictionaries.
    • 0:52:28What is the difference between nested dictionaries
    • 0:52:31and the dictionary inside a list?
    • 0:52:34I think it is that.
    • 0:52:35DAVID MALAN: Sure.
    • 0:52:36So we are using a list of dictionaries.
    • 0:52:39Why?
    • 0:52:39Because each of those dictionaries represents a student.
    • 0:52:42And a student has a name and a house, and we want to, I claim,
    • 0:52:45maintain that association.
    • 0:52:46And it's a list of students because we've got multiple students-- four,
    • 0:52:49in this case.
    • 0:52:50You could create a structure that is a dictionary of dictionaries.
    • 0:52:54But I would argue, it just doesn't solve a problem.
    • 0:52:56I don't need a dictionary of dictionary.
    • 0:52:58I need a list of key-value pairs right now.
    • 0:53:00That's all.
    • 0:53:01So let me propose, if we go back to students.py here,
    • 0:53:05and we revert back to the approach where we have get_name as the function,
    • 0:53:10both used and defined here, and that function returns the student's name,
    • 0:53:14what happens to be clear is that the sorted function will use the value
    • 0:53:19of key-- get_name, in this case--
    • 0:53:22calling that function on every dictionary in the list
    • 0:53:25that it's supposed to sort.
    • 0:53:27And that function, get_name, returns the string
    • 0:53:30that sorted will actually use to decide whether things
    • 0:53:33go in this order, left-right, or in this order, right-left.
    • 0:53:36It alphabetizes these things based on that return value.
    • 0:53:39So notice that I'm not calling the function get_name here
    • 0:53:43with parentheses.
    • 0:53:43I'm passing it in only by its name so that the sorted function
    • 0:53:47can call that get name function for me.
    • 0:53:50Now, it turns out, as always, if you're defining something,
    • 0:53:53be it a variable or, in this case, a function, and then immediately using
    • 0:53:57it but never, once again, needing the name of that function,
    • 0:54:01like, get_name, we can actually tighten this code up further.
    • 0:54:04I can actually do this.
    • 0:54:06I can get rid of the get_name function all together,
    • 0:54:09just like I could get rid of a variable that isn't strictly necessary.
    • 0:54:12And instead of passing key, the name of a function,
    • 0:54:16I can actually pass key what's called a lambda
    • 0:54:19function, which is an anonymous function, a function that
    • 0:54:22just has no name.
    • 0:54:23Why?
    • 0:54:24Because you don't need to give it a name if you're only going to call it in one
    • 0:54:27place.
    • 0:54:27And the syntax for this in Python is a little weird.
    • 0:54:30But if I do key equals literally the word lambda, then something
    • 0:54:35like student, which is the name of the parameter
    • 0:54:37I expect this function to take, and then I don't even type the Return key.
    • 0:54:41I instead just say, student, bracket, name.
    • 0:54:45So what am I doing here with my code?
    • 0:54:47This code here that I've highlighted is equivalent to the get_name function
    • 0:54:52I implemented a moment ago.
    • 0:54:54The syntax is admittedly a little different.
    • 0:54:56I don't use def.
    • 0:54:57I didn't even give it a name, like get_name.
    • 0:54:59I, instead, am using this other keyword in Python called lambda, which says,
    • 0:55:03hey, Python, here comes a function, but it has no name.
    • 0:55:06It's anonymous.
    • 0:55:07That function takes a parameter.
    • 0:55:10I could call it anything I want.
    • 0:55:11I'm calling it student.
    • 0:55:12Why?
    • 0:55:13Because this function that's passed in as key
    • 0:55:16is called on every one of the students in that list,
    • 0:55:20every one of the dictionaries in that list.
    • 0:55:22What do I want this anonymous function to return?
    • 0:55:24Well given a student, I want to index into that dictionary
    • 0:55:28and access their name so that the string Hermione, and Harry, and Ron,
    • 0:55:32and Draco is ultimately returned.
    • 0:55:34And that's what the sorted function uses to decide
    • 0:55:37how to sort these bigger dictionaries that have other keys, like house,
    • 0:55:42as well.
    • 0:55:43So if I now go back to my terminal window and run python of students.py,
    • 0:55:47it still seems to work the same, but it's arguably a little better design
    • 0:55:52because I didn't waste lines of code by defining some other function,
    • 0:55:55calling it in one and only one place.
    • 0:55:57I've done it all sort of in one breath, if you will.
    • 0:56:00All right, let me pause here to see if there's any questions specifically
    • 0:56:03about lambda, or anonymous functions, and this tightening up of the code.
    • 0:56:10AUDIENCE: I have a question, like whether we could define lambda twice.
    • 0:56:14DAVID MALAN: You can use lambda twice.
    • 0:56:17You can create as many anonymous functions as you'd like.
    • 0:56:19And you generally use them in contexts like this,
    • 0:56:22where you want to pass to some other function
    • 0:56:25a function that itself does not need a name.
    • 0:56:27So you can absolutely use it in more than one place.
    • 0:56:30I just have only one use case for it.
    • 0:56:32How about one other question on lambda or anonymous functions specifically?
    • 0:56:36AUDIENCE: What if our lambda would take more than one line, for example?
    • 0:56:43DAVID MALAN: Sure, if your lambda function takes
    • 0:56:45multiple parameters, that is fine.
    • 0:56:48You can simply specify commas followed by the names of those parameters,
    • 0:56:52maybe x and y or so forth, after the name student.
    • 0:56:55So here too, lambda looks a little different
    • 0:56:58from def in that you don't have parentheses,
    • 0:57:00you don't have the keyword def, you don't have a function name.
    • 0:57:02But ultimately, they achieve that same effect.
    • 0:57:05They create a function anonymously and allow you to pass it in,
    • 0:57:08for instance, as some value here.
    • 0:57:11So let's now change students.csv to contain
    • 0:57:14not students' houses at Hogwarts, but their homes where they grew up.
    • 0:57:17So Draco, for instance, grew up in Malfoy Manor.
    • 0:57:21Ron grew up in The Burrow.
    • 0:57:24Harry grew up in Number Four, Privet Drive.
    • 0:57:29And according to the internet, no one knows where Hermione grew up.
    • 0:57:33The movies apparently took certain liberties with where she grew up.
    • 0:57:35So for this purpose, we're actually going
    • 0:57:37to remove Hermione because it is unknown exactly where she was born.
    • 0:57:40So we still have some three students.
    • 0:57:43But if anyone can spot the potential problem now,
    • 0:57:47how might this be a bad thing?
    • 0:57:49Well, let's go and try and run our own code here.
    • 0:57:51Let me go back to students.py here.
    • 0:57:53And let me propose that I just change my semantics
    • 0:57:56because I'm now not thinking about Hogwarts houses but the students'
    • 0:57:59own homes.
    • 0:58:00So I'm just going to change some variables.
    • 0:58:01I'm going to change this house to a home, this house to a home,
    • 0:58:06as well as this one here.
    • 0:58:07I'm still going to sort the students by name,
    • 0:58:09but I'm going to say that they're not in a house, but rather, from a home.
    • 0:58:13So I've just changed the names of my variables and my grammar in English
    • 0:58:17here, ultimately, to print out that, for instance, Harry
    • 0:58:20is from Number Four, Privet Drive, and so forth.
    • 0:58:23But let's see what happens here when I run
    • 0:58:25Python of this version of students.py, having changed students.csv
    • 0:58:30to contain those homes and not houses.
    • 0:58:33Enter.
    • 0:58:34Huh, our first value error, like the program just doesn't work.
    • 0:58:40What might explain this value error?
    • 0:58:43The explanation of which rather cryptically
    • 0:58:45is, too many values to unpack.
    • 0:58:48And the line in question is this one involving split.
    • 0:58:52How did, all of a sudden, after all of these successful runs of this program,
    • 0:58:57did line 5 suddenly now break?
    • 0:59:00AUDIENCE: In the line in students.csv, you have three values.
    • 0:59:04There's a line that you have three values and in students.
    • 0:59:07DAVID MALAN: Yeah, I spent a lot of time trying
    • 0:59:09to figure out where every student should be from so that we
    • 0:59:12could create this problem for us.
    • 0:59:14And wonderfully, like, the first sentence of the book
    • 0:59:16is Number Four, Privet Drive.
    • 0:59:19And so the fact that address has a comma in it is problematic.
    • 0:59:23Why?
    • 0:59:23Because you and I decided sometime ago to just standardize on commas--
    • 0:59:27CSV, Comma-Separated Values-- to denote the--
    • 0:59:33we standardized on commas in order to delineate one value from another.
    • 0:59:37And if we have commas grammatically in the student's home,
    • 0:59:41we're clearly confusing it as this special symbol.
    • 0:59:44And the split function is now, for just Harry,
    • 0:59:47trying to split it into three values, not just two.
    • 0:59:50And that's why there's too many values to unpack
    • 0:59:53because we're only trying to assign two variables, name and house.
    • 0:59:57Now, what could we do here?
    • 0:59:59Well, we could just change our approach, for instance.
    • 1:00:02One paradigm that is not uncommon is to use something a little less common,
    • 1:00:08like a vertical bar.
    • 1:00:10So I could go in and change all of my commas to vertical bars.
    • 1:00:13That, too, could eventually come back to bite us
    • 1:00:15in that if my file eventually has vertical bars somewhere,
    • 1:00:18it might still break.
    • 1:00:19So maybe that's not the best approach.
    • 1:00:21I could maybe do something like this.
    • 1:00:23I could escape the data, as I've done in the past.
    • 1:00:25And maybe I could put quotes around any English string
    • 1:00:30that itself contains a comma.
    • 1:00:32And that's fine.
    • 1:00:33I could do that, but then my code, students.py,
    • 1:00:36is going to have to change too because I can't just naively split on
    • 1:00:40a comma now.
    • 1:00:41I'm going to have to be smarter about it.
    • 1:00:43I'm going to have to take into account split
    • 1:00:45only on the commas that are not inside of quotes.
    • 1:00:48And oh, it's getting complicated fast.
    • 1:00:51And at this point, you need to take a step back and consider,
    • 1:00:53you know what, if we're having this problem, odds are, many other people
    • 1:00:57before us have had this same problem.
    • 1:00:59It is incredibly common to store data in files.
    • 1:01:02It is incredibly common to use CSV files specifically.
    • 1:01:06And so you know what.
    • 1:01:07Why don't we see if there's a library in Python that
    • 1:01:10exists to read and/or write CSV files?
    • 1:01:14Rather than reinvent the wheel, so to speak,
    • 1:01:16let's see if we can write better code by standing on the shoulders of others who
    • 1:01:20have come before us-- programmers passed--
    • 1:01:22and actually use their code to do the reading and writing of CSVs,
    • 1:01:26so we can focus on the part of our problem that you and I care about.
    • 1:01:30So let's propose that we go back to our code here
    • 1:01:32and see how we might use the CSV library.
    • 1:01:35Indeed, within Python, there is a module called CSV.
    • 1:01:40The documentation for it is at this URL here
    • 1:01:43in Python's official documentation.
    • 1:01:44But there's a few functions that are pretty readily accessible if we just
    • 1:01:49dive right in.
    • 1:01:49And let me propose that we do this.
    • 1:01:52Let me go back to my code here.
    • 1:01:53And instead of re-inventing this wheel and reading the file line by line,
    • 1:01:58and splitting on commas, and dealing now with quotes, and Privet Drives,
    • 1:02:02and so forth, let's do this instead.
    • 1:02:04At the start of my program, let me go up and import the CSV module.
    • 1:02:10Let's use this library that someone else has
    • 1:02:12written that's dealing with all of these corner cases, if you will.
    • 1:02:16I'm still going to give myself a list, initially empty,
    • 1:02:18in which to store all these students.
    • 1:02:20But I'm going to change my approach here now just a little bit.
    • 1:02:23When I open this file with with, let me go in here
    • 1:02:28and change this a little bit.
    • 1:02:30I'm going to go in here now and say this.
    • 1:02:33Reader equals csv.reader, passing in file as input.
    • 1:02:38So it turns out, if you read the documentation for the CSV module,
    • 1:02:42it comes with a function called reader whose purpose in life
    • 1:02:45is to read a CSV file for you and figure out, where are the commas, where
    • 1:02:50are the quotes, where are all the potential corner cases,
    • 1:02:53and just deal with them for you.
    • 1:02:55You can override certain defaults or assumptions in case
    • 1:02:57you're using not a comma, but a pipe or something else.
    • 1:03:00But by default, I think it's just going to work.
    • 1:03:02Now, how do I integrate over a reader and not the raw file itself?
    • 1:03:07It's almost the same.
    • 1:03:08The library allows you still to do this.
    • 1:03:10For each row in the reader--
    • 1:03:13so you're not iterating over the file directly now.
    • 1:03:15You're iterating over the reader, which is, again,
    • 1:03:18going to handle all of the parsing of commas, and new lines, and more.
    • 1:03:22For each row in the reader, what am I going to do?
    • 1:03:25Well, at the moment, I'm going to do this.
    • 1:03:27I'm going to append to my students list the following dictionary, a dictionary
    • 1:03:32that has a name whose value is the current row's first column,
    • 1:03:36and whose house, or rather, home now is the row's second.
    • 1:03:41column.
    • 1:03:41Now, it's worth noting that the reader for each line in the file,
    • 1:03:45indeed, returns to me a row.
    • 1:03:47But it returns to me a row that's a list, which
    • 1:03:50is to say that the first element of that list
    • 1:03:52is going to be the student's name, as before.
    • 1:03:54The second element of that list is going to be the student's home, as now
    • 1:03:59before.
    • 1:03:59But if I want to access each of those elements,
    • 1:04:02remember that lists are 0 indexed.
    • 1:04:04We start counting at 0 and then 1, rather than 1 and then 2.
    • 1:04:07So if I want to get at the student's name, I use row, bracket, 0.
    • 1:04:10And if I want to get at the student's home, I use row, bracket, 1.
    • 1:04:13But in my for loop, we can do that same unpacking as before.
    • 1:04:17If I know the CSV is only going to have two columns,
    • 1:04:21I could even do this-- for name, home in reader.
    • 1:04:25And now I don't need to use list notation.
    • 1:04:27I can unpack things all at once and say, name here, and home here.
    • 1:04:32The rest of my code can stay exactly the same because,
    • 1:04:35what am I doing now on line 8?
    • 1:04:36I'm still constructing the same dictionary as before,
    • 1:04:39albeit for homes instead of houses.
    • 1:04:42And I'm grabbing those values now, not from the file itself
    • 1:04:45and my use of split, but the reader.
    • 1:04:47And again, what the reader is going to do
    • 1:04:48is figure out, where are those commas, where are the quotes?
    • 1:04:51And just solve that problem for you.
    • 1:04:53So let me go now down to my terminal window and run python of students.py,
    • 1:04:57and hit Enter.
    • 1:04:58And now we see successfully, sorted no less, that Draco is from Malfoy Manor.
    • 1:05:04Harry is from Number Four, comma, Privet Drive.
    • 1:05:07And Ron is from The Burrow.
    • 1:05:09Questions now on this technique of using CSV reader from that CSV module, which,
    • 1:05:17again, is just getting us out of the business of reading each line ourself
    • 1:05:20and reading each of those commas and splitting?
    • 1:05:23AUDIENCE: So my questions are related to something in the past.
    • 1:05:27I recognize that you are reading a file every time--
    • 1:05:31well, we assume that we have the CSV file to hand already in this case.
    • 1:05:39Is it possible to make a file readable and writable?
    • 1:05:44So in this case, you could write such stuff to the file,
    • 1:05:50but then at the same time, you could have
    • 1:05:53another function that reads through the file and does changes to it
    • 1:05:57as you go along?
    • 1:05:58DAVID MALAN: A really good question.
    • 1:05:59And the short answer is, yes.
    • 1:06:01However, historically, the mental model for a file is that of a cassette tape.
    • 1:06:05Years ago, not really in use anymore, but cassette tapes
    • 1:06:08are sequential whereby they start at the beginning,
    • 1:06:10and if you want to get to the end, you kind of
    • 1:06:12have to unwind the tape to get to that point.
    • 1:06:14The closest analog nowadays would be something like Netflix or any streaming
    • 1:06:18service, where there's a scrubber that you have to go left to right.
    • 1:06:21You can't just jump there or jump there.
    • 1:06:22You don't have random access.
    • 1:06:24So the problem with files, if you want to read and write them,
    • 1:06:27you or some library needs to keep track of where you are in the file
    • 1:06:31so that if you're reading from the top and then you write at the bottom,
    • 1:06:34and you want to start reading again, you seek back to the beginning.
    • 1:06:37So it's not something we'll do here in class.
    • 1:06:39It's more involved, but it's absolutely doable.
    • 1:06:41For our purposes, we'll generally recommend, read the file.
    • 1:06:44And then if you want to change it, write it back out,
    • 1:06:46rather than trying to make more piecemeal changes, which is good
    • 1:06:49if, though, the file is massive, and it would just be very expensive
    • 1:06:53time-wise to change the whole thing.
    • 1:06:55Other questions on this CSV reader?
    • 1:06:59AUDIENCE: It's possible to write a paragraph in that file?
    • 1:07:05DAVID MALAN: Absolutely.
    • 1:07:06Right now, I'm writing very small strings, just names or houses,
    • 1:07:09as I did before.
    • 1:07:10But you can absolutely write as much text as you want, indeed.
    • 1:07:15Other questions on CSV reader?
    • 1:07:18AUDIENCE: Can a user chose himself a key?
    • 1:07:22Like, input key will be a name or code.
    • 1:07:26DAVID MALAN: So short answer, yes, we could absolutely
    • 1:07:29write a program that prompts the user for a name
    • 1:07:32and a home, a name and a home.
    • 1:07:34And we could write out those values.
    • 1:07:35And in a moment, we'll see how you can write to a CSV file.
    • 1:07:38For now, I'm assuming, as the programmer who created students.csv, that I
    • 1:07:44know what the columns are going to be.
    • 1:07:46And therefore, I'm naming my variables accordingly.
    • 1:07:48However, this is a good segue to one final feature of reading CSVs, which
    • 1:07:53is that you don't have to rely on either getting a row as a list
    • 1:07:57and using bracket 0 or bracket 1, and, you don't have
    • 1:08:00to unpack things manually in this way.
    • 1:08:02We could actually be smarter and start storing
    • 1:08:05the names of these columns in the CSV file itself.
    • 1:08:08And in fact, if any of you have ever opened a spreadsheet file before, be it
    • 1:08:12in Excel, Apple Numbers, Google Spreadsheets or the like, odds are,
    • 1:08:16you've noticed that the first row, very frequently, is a little different.
    • 1:08:20It actually is boldface sometimes, or it actually
    • 1:08:22contains the names of those columns, the names of those attributes below.
    • 1:08:26And we can do this here.
    • 1:08:27In students.csv, I don't have to just keep
    • 1:08:30assuming that the student's name is first
    • 1:08:32and that the student's home is second.
    • 1:08:34I can explicitly bake that information into the file just
    • 1:08:39to reduce the probability of mistakes down the road.
    • 1:08:41I can literally use the first row of this file and say, name, comma, home.
    • 1:08:46So notice that name is not literally someone's name,
    • 1:08:50and home is not literally someone's home.
    • 1:08:52It is literally the words, name and home, separated by comma.
    • 1:08:57And if I now go back into students.py and don't use CSV reader,
    • 1:09:01but instead, I use a dictionary reader, I
    • 1:09:04can actually treat my CSV file even more flexibly, not just for this,
    • 1:09:09but for other examples too.
    • 1:09:10Let me do this.
    • 1:09:11Instead of using a CSV reader, let me use
    • 1:09:14a CSV dict reader, which will now iterate over the file top to bottom,
    • 1:09:19loading in each line of text not as a list of columns
    • 1:09:24but as a dictionary of columns.
    • 1:09:26What's nice about this is that it's going
    • 1:09:28to give me automatic access now to those columns' names.
    • 1:09:32I'm going to revert to just saying, for row in reader,
    • 1:09:35and now I'm going to append a name and a home.
    • 1:09:38But how am I going to get access to the current row's
    • 1:09:41name and the current row's home?
    • 1:09:44Well, earlier, I used bracket 0 for the first and bracket 1 for the second
    • 1:09:48when I was using a reader.
    • 1:09:50A reader returns lists.
    • 1:09:52A dict reader or dictionary reader returns dictionaries, one at a time.
    • 1:09:57And so if I want to access the current row's name,
    • 1:10:01I can say, row, quote/unquote, name.
    • 1:10:03I can say here for home, row, quote/unquote, home.
    • 1:10:06And I now have access to those same values.
    • 1:10:09The only change I had to make, to be clear, was in my CSV file,
    • 1:10:12I had to include, on the very first row, little hints
    • 1:10:16as to what these columns are.
    • 1:10:17And if I now run this code, I think it should behave pretty much
    • 1:10:21the same-- python of students.py.
    • 1:10:23And indeed, we get the same sentences.
    • 1:10:25But now my code is more robust against changes in this data.
    • 1:10:29If I were to open the CSV file in Excel, or Google Spreadsheets, or Apple
    • 1:10:34Numbers, and for whatever reason change the columns around,
    • 1:10:37maybe this is a file that you're sharing with someone else,
    • 1:10:39and just because, they decide to sort things differently left
    • 1:10:42to right by moving the columns around, previously, my code
    • 1:10:46would have broken because I was assuming that name is always first,
    • 1:10:50and home is always second.
    • 1:10:51But if I did this--
    • 1:10:53be it manually in one of those programs or here-- home, comma, name,
    • 1:10:57and suppose, I reversed all of this.
    • 1:10:59The home comes first, followed by Harry, The Burrow, then by Ron,
    • 1:11:04and then lastly, Malfoy Manor, then Draco,
    • 1:11:08notice that my file is now completely flipped.
    • 1:11:10The first column is now the second, and the second's the first.
    • 1:11:12But I took care to update the header of that file, the first row.
    • 1:11:17Notice my Python code, I'm not going to touch it at all.
    • 1:11:21I'm going to rerun python of students.py, and hit Enter.
    • 1:11:24And it still just works.
    • 1:11:26And this, too, is an example of coding defensively.
    • 1:11:29What if someone changes your CSV file, your data file?
    • 1:11:32Ideally, that won't happen.
    • 1:11:33But even if it does now, because I'm using a dictionary reader that's
    • 1:11:37going to infer from that first row for me what the columns are called,
    • 1:11:42my code just keeps working.
    • 1:11:44And so it keeps getting, if you will, better and better.
    • 1:11:47Any questions now on this approach?
    • 1:11:50AUDIENCE: Yeah, what is the importance of new line in the CSV file?
    • 1:11:54DAVID MALAN: What's the importance of the new line in the CSV file?
    • 1:11:56It's partly a convention.
    • 1:11:58In the world of text files, we humans have just
    • 1:12:00been, for decades, in the habit of storing data line by line.
    • 1:12:04It's visually convenient.
    • 1:12:06It's just easy to extract from the file because you just
    • 1:12:09look for the new lines.
    • 1:12:10So the new line just separates some data from some other data.
    • 1:12:14We could use any other symbol on the keyboard,
    • 1:12:17but it's just common to hit Enter to just move the data to the next line.
    • 1:12:21Just a convention.
    • 1:12:22Other questions?
    • 1:12:23AUDIENCE: It seems to be working fine if you just have name and home.
    • 1:12:28I'm wondering what will happen if you want to put in more data.
    • 1:12:34Say, you wanted to add a house to both the name and the home.
    • 1:12:40DAVID MALAN: Sure, if you wanted to add the house back-- so if I go in here
    • 1:12:43and add house last, and I go here and say, Gryffindor for Harry,
    • 1:12:47Gryffindor for Ron, and Slytherin for Draco, now I have three columns,
    • 1:12:53effectively, if you will-- home on the left, name in the middle,
    • 1:12:57house on the right, each separated by commas with weird things,
    • 1:13:00like Number Four, comma, Privet Drive still quoted.
    • 1:13:03Notice, if I go back to students.py, and I don't change the code at all
    • 1:13:07and run python of students.py, it still just works.
    • 1:13:11And this is what's so powerful about a dictionary reader.
    • 1:13:14It can change over time.
    • 1:13:15It can have more and more columns.
    • 1:13:17Your existing code is not going to break.
    • 1:13:20Your code would break, would be much more fragile, so to speak,
    • 1:13:23if you were making assumptions like, the first column's always going to be name.
    • 1:13:26The second column is always going to be house.
    • 1:13:28Things will break fast if those assumptions break down--
    • 1:13:32so not a problem in this case.
    • 1:13:34Well, let me propose that, besides reading CSVs,
    • 1:13:37let's at least take a peek at how we might write a CSV too.
    • 1:13:40If you're writing a program in which you want to store not just students' names,
    • 1:13:44but maybe their homes as well in a file, how can we keep adding to this file?
    • 1:13:48Let me go ahead and delete the contents of students.csv
    • 1:13:52and just re-add a single simple row, name, comma, home,
    • 1:13:56so as to anticipate inserting more names and homes into this file.
    • 1:14:00And then let me go to students.py, and let me just start fresh
    • 1:14:03so as to write out data this time.
    • 1:14:05I'm still going to go ahead and Import CSV.
    • 1:14:07I'm going to go ahead now and prompt the user for their name-- so
    • 1:14:11input, quote/unquote, What's your name?
    • 1:14:15And I'm going to go ahead and prompt the user for their home--
    • 1:14:18so home equals input, quote/unquote, Where's your home?
    • 1:14:23Now I'm going to go ahead and open the file,
    • 1:14:26but this time for writing instead of reading, as follows--
    • 1:14:29with open, quote/unquote, students.csv.
    • 1:14:32I'm going to open it in append mode so that I
    • 1:14:35keep adding more and more students and homes to the file,
    • 1:14:38rather than just overwriting the entire file itself.
    • 1:14:40And I'm going to use a variable name of file.
    • 1:14:43I'm then going to go ahead and give myself a variable called writer,
    • 1:14:46and I'm going to set it equal to the return value of another function
    • 1:14:49in the CSV module called csv.writer.
    • 1:14:53And that writer function takes as its sole argument the file variable there.
    • 1:14:59Now I'm going to go ahead and just do this.
    • 1:15:01I'm going to say, writer.writerow, and I'm
    • 1:15:04going to pass into writerow the line that I want to write to the file
    • 1:15:09specifically as a list.
    • 1:15:10So I'm going to give this a list of name, comma, home,
    • 1:15:13which, of course, are the contents of those variables.
    • 1:15:16Now I'm going to go ahead and save the file.
    • 1:15:18I'm going to go ahead and rerun python of students.py, hit Enter.
    • 1:15:22And what's your name?
    • 1:15:23Well, let me go ahead and type in Harry as my name and Number Four,
    • 1:15:28comma, Privet Drive, Enter.
    • 1:15:31Now notice, that input itself did have a comma.
    • 1:15:34And so if I go to my CSV file now, notice
    • 1:15:37that it's automatically been quoted for me so
    • 1:15:40that subsequent reads from this file don't
    • 1:15:41confuse that comma with the actual comma between Harry and his home.
    • 1:15:46Well, let me go ahead and run it a couple of more times.
    • 1:15:48Let me go ahead and rerun python of students.py.
    • 1:15:51Let me go ahead and input this time Ron and his home as The Burrow.
    • 1:15:55Let's go back to students.csv to see what it looks like.
    • 1:15:58Now we see Ron, comma, The Burrow has been added automatically to the file.
    • 1:16:02And let's do one more--
    • 1:16:03python of students.py, Enter.
    • 1:16:06Let's go ahead and give Draco's name and his home, which would be Malfoy Manor,
    • 1:16:10Enter.
    • 1:16:11And if we go back to students.csv, now, we
    • 1:16:14see that Draco is in the file itself.
    • 1:16:15And the library took care of not only writing each of those rows,
    • 1:16:19per the function's name.
    • 1:16:20It also handled the escaping, so to speak, of any strings
    • 1:16:23that themselves contained a comma, like Harry's own home.
    • 1:16:27Well, it turns out, there's yet another way
    • 1:16:28we could implement this same program without having to worry about precisely
    • 1:16:32that order again and again and just passing in a list.
    • 1:16:35It turns out, if we're keeping track of what's the name and what's the home,
    • 1:16:39we could use something like a dictionary to associate
    • 1:16:42those keys with those values.
    • 1:16:43So let me go ahead and back up and remove these students from the file,
    • 1:16:46leaving only the header row again-- name, comma, home.
    • 1:16:49And let me go over to students.py.
    • 1:16:51And this time, instead of using CSV writer,
    • 1:16:54I'm going to go ahead and use csv.DictWriter,
    • 1:16:57which is a dictionary writer, that's going
    • 1:16:58to open the file in much the same way.
    • 1:17:00But rather than write a row as this list of name,
    • 1:17:04comma, home, what I'm now going to do is follows.
    • 1:17:08I'm going to first output an actual dictionary,
    • 1:17:11the first key of which is name, colon, and then
    • 1:17:14the value thereof is going to be the name that was typed in.
    • 1:17:17And I'm going to pass in a key of home, quote/unquote,
    • 1:17:19the value of which, of course, is the home that was typed in.
    • 1:17:22But with DictWriter, I do need to give it
    • 1:17:24a hint as to the order in which those columns are when writing it out so
    • 1:17:29that, subsequently, they could be read, even if those orderings change.
    • 1:17:33Let me go ahead and pass in fieldnames, which
    • 1:17:36is a second argument to DictWriter, equals, and then
    • 1:17:39a list of the actual columns that I know are
    • 1:17:41in this file, which, of course, are name, comma, home.
    • 1:17:45Those times, in quotes because that's, indeed,
    • 1:17:47the string names of the columns, so to speak,
    • 1:17:50that I intend to write to in that file.
    • 1:17:52All right, now let me go ahead and go to my terminal window,
    • 1:17:55run python of students.py.
    • 1:17:57This time, I'll type in Harry's name again.
    • 1:17:59I'll, again, type in Number Four, comma, Privet Drive, Enter.
    • 1:18:05Let's now go back to students.csv.
    • 1:18:07And voila, Harry is back in the file, and it's properly escaped or quoted.
    • 1:18:11I'm sure that if we do this again with Ron and The Burrow,
    • 1:18:14and let's go ahead and run it one third time with Draco and Malfoy Manor,
    • 1:18:20Enter.
    • 1:18:21Let's go back to students.csv.
    • 1:18:22And via this dictionary writer, we now have all three
    • 1:18:26of those students as well.
    • 1:18:27So whereas with CSV writer, the onus is on us
    • 1:18:31to pass in a list of all of the values that we
    • 1:18:34want to put from left to right, with a dictionary writer, technically,
    • 1:18:37they could be in any order in the dictionary.
    • 1:18:39In fact, I could just have correctly done this,
    • 1:18:43passing in home followed by name.
    • 1:18:45But it's a dictionary.
    • 1:18:46And so the ordering in this case does not matter so long as the key is there
    • 1:18:50and the value is there.
    • 1:18:51And because I have passed in field names as the second argument to DictWriter,
    • 1:18:55it ensures that the library knows exactly which column
    • 1:18:59contains name or home, respectively.
    • 1:19:02Are there any questions now on dictionary reading, dictionary writing,
    • 1:19:07or CSVs more generally?
    • 1:19:10AUDIENCE: In any specific situation for me
    • 1:19:14to use a single quotation or double quotation?
    • 1:19:17Because after the print, we use single quotation
    • 1:19:20to represent the key of the dictionary.
    • 1:19:24But after the reading or writing, we use the double quotation.
    • 1:19:30DAVID MALAN: It's a good question.
    • 1:19:31In Python, you can generally use double quotes, or you can use single quotes.
    • 1:19:36And it doesn't matter.
    • 1:19:37You should just be self-consistent so that stylistically your code
    • 1:19:40looks the same all throughout.
    • 1:19:42Sometimes, though, it is necessary to alternate.
    • 1:19:45If you're already using double quotes, as I was earlier for a long f string,
    • 1:19:49but inside that f string, I was interpolating
    • 1:19:52the values of some variables using curly braces,
    • 1:19:55and those variables were dictionaries.
    • 1:19:57And in order to index into a dictionary, you use square brackets
    • 1:20:02and then quotes.
    • 1:20:03But if you're already using double quotes out here,
    • 1:20:05you should generally use single quotes here, or vise versa.
    • 1:20:09But otherwise, I'm in the habit of using double quotes everywhere.
    • 1:20:12Others are in the habit of using single quotes everywhere.
    • 1:20:15It only matters sometimes if one might be confused for the other.
    • 1:20:20Other questions on dictionary writing or reading?
    • 1:20:24AUDIENCE: Yeah, my question is, can we use multiple CSV files in any program?
    • 1:20:30DAVID MALAN: Absolutely.
    • 1:20:31You can use as many CSV files as you want.
    • 1:20:33And it's just one of the formats that you can use to save data.
    • 1:20:37Other questions on CSVs or File I/O?
    • 1:20:40AUDIENCE: Thanks for taking my question.
    • 1:20:43So when you're reading from the file as a dictionary,
    • 1:20:49you had the fields called.
    • 1:20:52When you're reading, couldn't you just call the row?
    • 1:20:55the previous version of the students.py file, when you're reading each row,
    • 1:21:03you were splitting out the fields by name.
    • 1:21:10Yeah, so when you're appending to the students list,
    • 1:21:13couldn't you just call for row and reader, students.append row,
    • 1:21:20rather than naming each of the fields?
    • 1:21:22DAVID MALAN: Oh, very clever.
    • 1:21:23Short answer, yes, in so far as DictReader returns
    • 1:21:28one dictionary at a time, when you loop over it,
    • 1:21:32row is already going to be a dictionary.
    • 1:21:34So yes, you could actually get away with doing this.
    • 1:21:38And the effect would really be the same in this case.
    • 1:21:41Good observation.
    • 1:21:42How about one more question on CSVs?
    • 1:21:46AUDIENCE: Yeah, when reading in CSVs from my past work with data,
    • 1:21:51a lot of things can go wrong.
    • 1:21:53I don't know if it's a fair question that you can answer in a few sentences.
    • 1:21:57But are there any best practices to double check that no mistakes occurred?
    • 1:22:04DAVID MALAN: It's a really good question.
    • 1:22:06And I would say, in general, if you're using code to generate the CSVs
    • 1:22:10and to read the CSVs, and you're using a good library,
    • 1:22:14theoretically, nothing should go wrong.
    • 1:22:16It should be 100% correct if the libraries are 100% correct.
    • 1:22:20You and I tend to be the problem.
    • 1:22:22When you let a human touch the CSV, or when Excel, or Apple Numbers,
    • 1:22:27or some other tools involved that might not
    • 1:22:29be aligned with your code's expectations,
    • 1:22:30things then, yes, can break.
    • 1:22:33The goal-- sometimes, honestly, the solution is manual fixes.
    • 1:22:37You go in and fix the CSV, or you have a lot of error checking,
    • 1:22:40or you have a lot of try, except just to tolerate mistakes in the data.
    • 1:22:44But generally, I would say, if you're using CSV or any file format
    • 1:22:47internally to a program to both read and write it,
    • 1:22:50you shouldn't have concerns there.
    • 1:22:52You and I, the humans, are the problem, generally
    • 1:22:55speaking-- and not the programmers, the users of those files, instead.
    • 1:22:59All right, allow me to propose that we leave CSVs behind but to note
    • 1:23:02that they're not the only file format you
    • 1:23:04can use in order to read or write data.
    • 1:23:07In fact, they're a popular format, as is just raw text files--
    • 1:23:10.txt files.
    • 1:23:11But you can store data, really, any way that you want.
    • 1:23:14We've just picked CSVs because it's representative
    • 1:23:16of how you might read and write from a file
    • 1:23:18and do so in a structured way, where you can somehow have multiple keys,
    • 1:23:22multiple values all in the same file without having to resort to what would
    • 1:23:26be otherwise known as a binary file.
    • 1:23:29So a binary file is a file that's really just zeros and ones.
    • 1:23:32And they can be laid out in any pattern you might want, particularly
    • 1:23:36if you want to store not textual information,
    • 1:23:39but maybe graphical, or audio, or video information as well.
    • 1:23:43So it turns out that Python is really good
    • 1:23:45when it comes to having libraries for, really, everything.
    • 1:23:48And in fact, there's a popular library called
    • 1:23:50pillow that allows you to navigate image files as well
    • 1:23:55and to perform operations on image files.
    • 1:23:57You can apply filters, a la Instagram.
    • 1:24:00You can animate them as well.
    • 1:24:02And so what I thought we'd do is leave behind text files for now
    • 1:24:05and tackle one more demonstration, this time,
    • 1:24:08focusing on this particular library and image files instead.
    • 1:24:13So let me propose that we go over here to VS Code
    • 1:24:16and create a program, ultimately, that creates an animated GIF.
    • 1:24:19These things are everywhere nowadays in the form of memes, and animations,
    • 1:24:23and stickers, and the like.
    • 1:24:24And an animated GIF is really just an image file
    • 1:24:27that has multiple images inside of it.
    • 1:24:29And your computer or your phone shows you those images, one after another,
    • 1:24:34sometimes on an endless loop, again and again.
    • 1:24:37And so long as there's enough images, it creates the illusion of animation
    • 1:24:41because your mind and mine kind of fills in the gaps visually
    • 1:24:44and just assumes that if something is moving, even though you're only
    • 1:24:47seeing one frame per second, or some sequence thereof,
    • 1:24:51it looks like an animation.
    • 1:24:52So it's like a simplistic version of a video file.
    • 1:24:55Well, let me propose that we start with maybe a couple of costumes
    • 1:25:00from another popular programming language.
    • 1:25:02And let me go ahead and open up my first costume here, number 1.
    • 1:25:05So suppose here that this is a costume or, really, just a static image
    • 1:25:09here, costume1.gif.
    • 1:25:11And it's just a static picture of a cat, no movement at all.
    • 1:25:14Let me go ahead now and open up a second one, costume2.gif,
    • 1:25:18that looks a little bit different.
    • 1:25:20Notice-- and I'll go back and forth-- this cat's legs
    • 1:25:23are a little bit aligned differently so that this was version 1,
    • 1:25:27and this was version 2.
    • 1:25:29Now, these cats come from a programming language from MIT
    • 1:25:32called scratch that allows you, very graphically,
    • 1:25:34to animate all this and more.
    • 1:25:36But we'll use just these two static images, costume1 and costume2
    • 1:25:41to create our own animated GIF that, after this, you
    • 1:25:44could text to a friend or message them, much like any meme online.
    • 1:25:48Well, let me propose that we create this animated GIF, not
    • 1:25:52by just using some off-the-shelf program that we downloaded,
    • 1:25:54but by writing our own code.
    • 1:25:56Let me go ahead and run code of costumes.py
    • 1:25:59and create our very own program that's going to take,
    • 1:26:02as input, two or even more image files and then generate an animated GIF
    • 1:26:07from them by essentially creating this animated GIF by toggling back and forth
    • 1:26:12endlessly between those two images.
    • 1:26:14Well, how am I going to do this?
    • 1:26:15Well, let's assume that this will be a program called costumes.py that
    • 1:26:19expects two command line arguments, the names
    • 1:26:22of the files, the individual costumes that we want to animate back and forth.
    • 1:26:26So to do that, I'm going to import sys so that we ultimately
    • 1:26:29have access to sys.argv.
    • 1:26:31I'm then, from this pillow library, going to import support for images
    • 1:26:35specifically.
    • 1:26:35So from PIL import Image-- capital I, as per the library's documentation.
    • 1:26:41Now I'm going to give myself an empty list called images,
    • 1:26:44just so I have a list in which to store one, or two, or more of these images.
    • 1:26:48And now let me do this.
    • 1:26:50For each argument in sys.argv, I'm going to go ahead and create a new image
    • 1:26:56variable, set it equal to this Image.open function, passing in arg.
    • 1:27:03Now, what is this doing?
    • 1:27:05I'm proposing that, eventually, I want to be
    • 1:27:07able to run python of costumes.py, and then
    • 1:27:10as command line argument, specify costume1.gif, space, costume2.gif.
    • 1:27:14So I want to take in those file names from the command line as my arguments.
    • 1:27:18So what am I doing here?
    • 1:27:20Well, I'm iterating over sys.argv all of the words in my command line arguments.
    • 1:27:25I'm creating a variable called image, and I'm
    • 1:27:27passing to this function, Image.open from the pillow
    • 1:27:30library, that specific argument.
    • 1:27:32And that library is essentially going to open that image
    • 1:27:35in a way that gives me a lot of functionality for manipulating it,
    • 1:27:38like animating.
    • 1:27:40Now I'm going to go ahead and append to my images list that particular image.
    • 1:27:48And that's it.
    • 1:27:48So this loop's purpose in life is just to iterate over the command line
    • 1:27:51arguments and open those images using this library.
    • 1:27:55The last line is pretty straightforward.
    • 1:27:57I'm going to say this.
    • 1:27:58I'm going to grab the first of those images, which is going to be in my list
    • 1:28:02at location 0, and I'm going to save it to disk.
    • 1:28:05That is, I'm going to save this file.
    • 1:28:08Now, in the past when we use CSVs or text files,
    • 1:28:10I had to do the file opening.
    • 1:28:12I had to do the file writing, maybe even the closing.
    • 1:28:15I don't need to do that with this library.
    • 1:28:17The pillow library takes care of the opening, the closing, and the saving
    • 1:28:20for me by just calling save.
    • 1:28:23I'm going to call this save function.
    • 1:28:24And just to leave space, because I have a number of arguments to pass,
    • 1:28:27I'm going to move to another line so it fits.
    • 1:28:29I'm going to pass in the name of the file that I want to create,
    • 1:28:33costumes.gif--
    • 1:28:34that will be the name of my animated GIF.
    • 1:28:37I'm going to tell this library to save all of the frames
    • 1:28:41that I pass to it-- so the first costume, the second costume, and even
    • 1:28:44more if I gave them.
    • 1:28:46I'm going to then append to this first image--
    • 1:28:49the images 0-- the following images, equals this list of images.
    • 1:28:55And this is a bit clever, but I'm going to do this.
    • 1:28:57I want to append the next image there, images[1].
    • 1:29:01And now I want to specify a duration of 200 milliseconds
    • 1:29:05for each of these frames, and I want this to loop forever.
    • 1:29:08And if you specify loop=0, that is time 0,
    • 1:29:12it means it's just not going to loop a finite number of times,
    • 1:29:15but an infinite number of times instead.
    • 1:29:18And I need to do one other thing.
    • 1:29:20Recall that sys.argv contains not just the words I
    • 1:29:24typed after my program's name, but what else does sys.argv contain?
    • 1:29:29If you think back to our discussion of command line arguments,
    • 1:29:33what else is sys.argv besides the words I'm about to type,
    • 1:29:38like costume1.gif and costume2?
    • 1:29:41AUDIENCE: Yeah, so we'll actually get the original name of the program
    • 1:29:45we want to run, the costumes.py.
    • 1:29:48DAVID MALAN: Indeed, we'll get the original name of the program,
    • 1:29:50costumes.py in this case, which is not a GIF, obviously.
    • 1:29:53So remember that using slices in Python, we can do this.
    • 1:29:57If sys.argv is a list, and we want to get a slice of that list, everything
    • 1:30:01after the first element, we can do 1, colon, which says,
    • 1:30:05start it location 1, not 0, and take a slice all the way to the end.
    • 1:30:10So give me everything except the first thing
    • 1:30:12in that list, which, to McKenzie's point, is the name of the program.
    • 1:30:16Now, if I haven't made any mistakes, let's see what happens.
    • 1:30:19I'm going to run python of costumes.py, and now I'm
    • 1:30:22going to specify the two images that I want to animate--
    • 1:30:25so costume1.gif and costume2.gif.
    • 1:30:30What is the code now going to do?
    • 1:30:32Well, to recap, we're using the sys library
    • 1:30:34to access those command line arguments.
    • 1:30:36We're using the pillow library to treat those files
    • 1:30:39as images and with all the functionality that comes with that library.
    • 1:30:42I'm using this images list just to accumulate all of these images, one
    • 1:30:46at a time from the command line.
    • 1:30:48And in lines 7 through 9, I'm just using a loop to iterate over all of them
    • 1:30:52and just add them to this list after opening them with the library.
    • 1:30:56And the last step, which is really just one line of code broken onto three so
    • 1:31:00that it all fits, I'm going to save the first image,
    • 1:31:02but I'm asking the library to append this other image to it
    • 1:31:07as well-- not bracket 0, but bracket 1.
    • 1:31:09And if I had more, I could express those as well.
    • 1:31:12I want to save all of these files together.
    • 1:31:14I want to pause 200 milliseconds-- a fifth of a second
    • 1:31:17in between each frame.
    • 1:31:18And I want it to loop infinitely many times.
    • 1:31:21So now if I cross my fingers as always, hit Enter,
    • 1:31:27nothing bad happened, and that's almost always a good thing.
    • 1:31:30Let me now run code of costumes.gif to open up in VS Code the final image.
    • 1:31:38And what I think I should see is a very happy cat?
    • 1:31:42And indeed.
    • 1:31:43So now we've seen not only that we can read and write files, be it textually.
    • 1:31:47We can read and now write files that are binary zeros and ones.
    • 1:31:51We've just scratched the surface.
    • 1:31:52This is using the library called pillow.
    • 1:31:54But ultimately, this is going to give us the ability to read and write files
    • 1:31:58however we want.
    • 1:31:59So we've now seen that via File I/O, we can manipulate not just textual files,
    • 1:32:03be it TXT files, or CSVs, but even binary files as well.
    • 1:32:06In this case, they happen to be images.
    • 1:32:08But if we dived in deeper, we could explore audio, and video,
    • 1:32:11and so much more all by way of these simple primitives, this ability,
    • 1:32:15somehow, to read and write files.
    • 1:32:18That's it for now.
    • 1:32:19We'll see you next time.
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video