CS50 Video Player
    • 🧁

    • 🍨

    • 🍉

    • 🍿
    • 0:00:00We will begin shortly
    • 0:17:56Introduction
    • 0:18:43Data
    • 0:20:27Spreadsheets
    • 0:27:22Flat-File Databases
    • 0:29:26CSV Files
    • 0:30:17favorites.py
    • 0:39:44Data Cleaning
    • 0:41:50sorted
    • 1:06:28Lambda Functions
    • 1:16:25Relational Databases
    • 1:17:30Break begins
    • 1:25:20Break resumes
    • 1:25:21SQLite
    • 1:32:24SQL
    • 1:32:34CRUD
    • 1:36:45SELECT
    • 1:40:18DISTINCT
    • 1:42:24LIKE
    • 1:45:06ORDER BY
    • 1:46:47GROUP BY
    • 1:55:12INSERT
    • 2:00:26UPDATE
    • 2:01:17DELETE
    • 2:02:37Relational Data
    • 2:06:06Data Types
    • 2:07:22Constraints
    • 2:08:26PRIMARY KEY
    • 2:09:50FOREIGN KEY
    • 2:11:39CS50 Library
    • 2:13:16Many-to-Many Relationships
    • 2:27:34Break begins
    • 2:32:06Break resumes
    • 2:32:07Many-to-Many Relationships (continued)
    • 2:39:50IMDb
    • 2:47:42Indexes
    • 2:50:07JOINs
    • 2:55:44SQL Injection
    • 3:01:13Race Conditions
    • 3:07:09Race Conditions (Demo)
    • 0:17:56all right
    • 0:17:57this is cs50 and this is week seven and all right
    • 0:18:00today's focus is going to be entirely on this is cs50 and this is week seven and
    • 0:18:02data the process of collecting it the today's focus is going to be entirely on
    • 0:18:04process of storing it the process of data the process of collecting it the
    • 0:18:05searching it and so much more you'll process of storing it the process of
    • 0:18:07recall that last week we started off by searching it and so much more you'll
    • 0:18:09playing around with a relatively small recall that last week we started off by
    • 0:18:11data set we asked everyone for playing around with a relatively small
    • 0:18:12what their preferred house at hogwarts data set we asked everyone for
    • 0:18:14might be and then we proceeded to what their preferred house at hogwarts
    • 0:18:16analyze that data a little bit might be and then we proceeded to
    • 0:18:18using some python and counting up how analyze that data a little bit
    • 0:18:20many people wanted gryffindor slytherin using some python and counting up how
    • 0:18:21or the others as well many people wanted gryffindor slytherin
    • 0:18:23and we ultimately did that by using a or the others as well
    • 0:18:24google form to collect it and we stored and we ultimately did that by using a
    • 0:18:26all of the data in a google spreadsheet google form to collect it and we stored
    • 0:18:28which we then exported of course all of the data in a google spreadsheet
    • 0:18:29as a csv file so this week we thought which we then exported of course
    • 0:18:32we'd collect a little more data and see as a csv file so this week we thought
    • 0:18:33what kinds of problems arise when we we'd collect a little more data and see
    • 0:18:35start using what kinds of problems arise when we
    • 0:18:36only a spreadsheet or in turn a csv file start using
    • 0:18:39to store the data that we care about so only a spreadsheet or in turn a csv file
    • 0:18:41in fact if you could go ahead and go to to store the data that we care about so
    • 0:18:42this url in fact if you could go ahead and go to
    • 0:18:43here that you see you should see another this url
    • 0:18:45google form here that you see you should see another
    • 0:18:46this one asking you some different google form
    • 0:18:48questions all of us probably have some this one asking you some different
    • 0:18:50preferred tv shows now more than ever questions all of us probably have some
    • 0:18:52perhaps preferred tv shows now more than ever
    • 0:18:53and what we'd like to do is ask everyone perhaps
    • 0:18:55to input into that form and what we'd like to do is ask everyone
    • 0:18:56their favorite tv show followed by the to input into that form
    • 0:18:59genre or their favorite tv show followed by the
    • 0:19:00genres into which that particular tv genre or
    • 0:19:03show genres into which that particular tv
    • 0:19:04falls so go ahead and take a moment to show
    • 0:19:06do that falls so go ahead and take a moment to
    • 0:19:07and if you're unable to follow along at do that
    • 0:19:09home what folks are looking at is a form and if you're unable to follow along at
    • 0:19:11quite like this one here home what folks are looking at is a form
    • 0:19:12whereby we're just asking them for the quite like this one here
    • 0:19:14title of their preferred tv show whereby we're just asking them for the
    • 0:19:16and the genre or genres of that specific title of their preferred tv show
    • 0:19:21tv show so go ahead and fill that out if and the genre or genres of that specific
    • 0:19:24you could tv show so go ahead and fill that out if
    • 0:19:25we'll keep an eye on the responses you could
    • 0:19:26coming in we'll give everyone a few we'll keep an eye on the responses
    • 0:19:29moments to think about coming in we'll give everyone a few
    • 0:19:31their preferred tv show moments to think about
    • 0:19:34i myself have been re-watching a bit of their preferred tv show
    • 0:19:37the office i myself have been re-watching a bit of
    • 0:19:38i have been watching a lot the office
    • 0:19:42of reruns of older shows i probably i i have been watching a lot
    • 0:19:45think the point is been watching too of reruns of older shows i probably i
    • 0:19:46much tv think the point is been watching too
    • 0:19:48during all of this but in my defense much tv
    • 0:19:50it's on in the background while i'm during all of this but in my defense
    • 0:19:51doing work on my laptop so hopefully it's on in the background while i'm
    • 0:19:53that makes it okay doing work on my laptop so hopefully
    • 0:19:54let me take a look at the responses that that makes it okay
    • 0:19:56have come in all right we're getting a let me take a look at the responses that
    • 0:19:58lot of good data here on the order have come in all right we're getting a
    • 0:19:59of hundreds of responses give you just lot of good data here on the order
    • 0:20:02another moment or so of hundreds of responses give you just
    • 0:20:04the question at hand again is favorite another moment or so
    • 0:20:06tv show the title thereof the question at hand again is favorite
    • 0:20:08and the genre or genres into which tv show the title thereof
    • 0:20:11that tv show falls and the genre or genres into which
    • 0:20:14brian are you okay with my starting to that tv show falls
    • 0:20:16look at the data it's okay if we keep brian are you okay with my starting to
    • 0:20:17collecting some more but i'm gonna go look at the data it's okay if we keep
    • 0:20:19ahead and show the top collecting some more but i'm gonna go
    • 0:20:20uh few rows if that sounds good ahead and show the top
    • 0:20:23all right so let's go ahead and start to uh few rows if that sounds good
    • 0:20:25look at some of this data that's come in all right so let's go ahead and start to
    • 0:20:27here is the resulting google spreadsheet look at some of this data that's come in
    • 0:20:29that google forms has created for us and here is the resulting google spreadsheet
    • 0:20:30you'll notice that by default google that google forms has created for us and
    • 0:20:32forms you'll notice that by default google
    • 0:20:32this particular tool has three different forms
    • 0:20:34columns at least for this form one is a this particular tool has three different
    • 0:20:36time stamp and google automatically columns at least for this form one is a
    • 0:20:37gives us that time stamp and google automatically
    • 0:20:38based on what day and time everyone was gives us that
    • 0:20:40buzzing in with their responses based on what day and time everyone was
    • 0:20:42then they have a header row beyond that buzzing in with their responses
    • 0:20:44for title then they have a header row beyond that
    • 0:20:45and genres i've manually bold faced it for title
    • 0:20:48in advance just to make it stand out but and genres i've manually bold faced it
    • 0:20:50you'll notice that the in advance just to make it stand out but
    • 0:20:51headings here title and genres perfectly you'll notice that the
    • 0:20:53matches the question that we asked headings here title and genres perfectly
    • 0:20:55in the google form that allows us to matches the question that we asked
    • 0:20:57therefore line up your responses in the google form that allows us to
    • 0:20:59with our questions and you can see here therefore line up your responses
    • 0:21:02punisher was the first with our questions and you can see here
    • 0:21:03favorite tv show to be inputted followed punisher was the first
    • 0:21:05by the office breaking bad new girl favorite tv show to be inputted followed
    • 0:21:07archer by the office breaking bad new girl
    • 0:21:07another office and so forth and in the archer
    • 0:21:10third column another office and so forth and in the
    • 0:21:11under genres you'll see that there's third column
    • 0:21:12something curious here while some of the under genres you'll see that there's
    • 0:21:15cells that is the little boxes something curious here while some of the
    • 0:21:17of text have just single words like cells that is the little boxes
    • 0:21:19comedy or drama of text have just single words like
    • 0:21:20you'll notice that some of them have a comedy or drama
    • 0:21:22comma separated list and that comma you'll notice that some of them have a
    • 0:21:23separated list is because comma separated list and that comma
    • 0:21:25some of you checked as you could separated list is because
    • 0:21:26multiple check boxes to indicate that some of you checked as you could
    • 0:21:29breaking bad is a crime genre drama multiple check boxes to indicate that
    • 0:21:33and also thriller and so the way google breaking bad is a crime genre drama
    • 0:21:35forms handles this is a bit and also thriller and so the way google
    • 0:21:37lazily in the sense that they just drop forms handles this is a bit
    • 0:21:39all of those values lazily in the sense that they just drop
    • 0:21:41as a comma separated list inside of the all of those values
    • 0:21:44spreadsheet itself and that's as a comma separated list inside of the
    • 0:21:46potentially a problem if we ultimately spreadsheet itself and that's
    • 0:21:47download this as potentially a problem if we ultimately
    • 0:21:48a csv file comma separated values download this as
    • 0:21:51because if now you have a csv file comma separated values
    • 0:21:52commas inside in between the commas because if now you have
    • 0:21:55fortunately there's a solution to that commas inside in between the commas
    • 0:21:56that we'll ultimately see fortunately there's a solution to that
    • 0:21:58so we've got a good amount of data here that we'll ultimately see
    • 0:21:59in fact if i keep scrolling down we'll so we've got a good amount of data here
    • 0:22:01see a few hundred responses now in fact if i keep scrolling down we'll
    • 0:22:03and would be nice to analyze this data see a few hundred responses now
    • 0:22:05in some way and figure out what the most and would be nice to analyze this data
    • 0:22:07popular tv show is in some way and figure out what the most
    • 0:22:08maybe search for new shows i might like popular tv show is
    • 0:22:10via their genre maybe search for new shows i might like
    • 0:22:11so you can imagine some number of via their genre
    • 0:22:13queries that could be answered by way of so you can imagine some number of
    • 0:22:15this data set but let's first consider queries that could be answered by way of
    • 0:22:17the limitations of leaving this data in this data set but let's first consider
    • 0:22:20just the limitations of leaving this data in
    • 0:22:20a spreadsheet like this all of us are just
    • 0:22:23probably in the habit of using a spreadsheet like this all of us are
    • 0:22:24occasionally google spreadsheets apple probably in the habit of using
    • 0:22:26numbers occasionally google spreadsheets apple
    • 0:22:27microsoft excel or some other tool so numbers
    • 0:22:30let's consider microsoft excel or some other tool so
    • 0:22:31what spreadsheets are good at and what let's consider
    • 0:22:33they are bad at what spreadsheets are good at and what
    • 0:22:34would anyone like to volunteer and they are bad at
    • 0:22:36answer to the first of those what is a would anyone like to volunteer and
    • 0:22:38spreadsheet good at answer to the first of those what is a
    • 0:22:39or good for or not quite sure how to spreadsheet good at
    • 0:22:42answer that or good for or not quite sure how to
    • 0:22:43what do you use spreadsheets for what answer that
    • 0:22:46are some useful what do you use spreadsheets for what
    • 0:22:48problems they solve for us are some useful
    • 0:22:52uh yeah andrew what's your thinking on
    • 0:22:54spreadsheets uh yeah andrew what's your thinking on
    • 0:22:57over to andrew park oh hey um they're
    • 0:23:01very good for quickly sorting okay very over to andrew park oh hey um they're
    • 0:23:04good for quickly sorting i like that i very good for quickly sorting okay very
    • 0:23:06could click on the top of the title good for quickly sorting i like that i
    • 0:23:07column for instance and immediately sort could click on the top of the title
    • 0:23:09all of those titles column for instance and immediately sort
    • 0:23:11by alphabetically i like that other all of those titles
    • 0:23:13reasons to use by alphabetically i like that other
    • 0:23:14a spreadsheet what problems do they reasons to use
    • 0:23:16solve what are they good at a spreadsheet what problems do they
    • 0:23:19other thoughts on spreadsheets yeah how solve what are they good at
    • 0:23:20about peter other thoughts on spreadsheets yeah how
    • 0:23:28storing large amounts of data that you
    • 0:23:30can later analyze storing large amounts of data that you
    • 0:23:31okay so storing large amounts of data can later analyze
    • 0:23:33that you can later analyze it's kind of okay so storing large amounts of data
    • 0:23:35a nice model for storing lots of that you can later analyze it's kind of
    • 0:23:37rows of data so to speak i will say that a nice model for storing lots of
    • 0:23:39there actually is a limit and in fact rows of data so to speak i will say that
    • 0:23:40back in the day i learned what this there actually is a limit and in fact
    • 0:23:42limit is back in the day i learned what this
    • 0:23:43long story short in graduate school i limit is
    • 0:23:44was using a spreadsheet to analyze some long story short in graduate school i
    • 0:23:46research data was using a spreadsheet to analyze some
    • 0:23:47and at one point i had more data than research data
    • 0:23:49excel and at one point i had more data than
    • 0:23:50supported rows for specifically i had excel
    • 0:23:53some 65 supported rows for specifically i had
    • 0:23:55536 rows which was too many at that some 65
    • 0:23:58point for excel at the time because 536 rows which was too many at that
    • 0:24:00long story short if you recall from a point for excel at the time because
    • 0:24:02spreadsheet program like google long story short if you recall from a
    • 0:24:03spreadsheets spreadsheet program like google
    • 0:24:04every row is numbered from 1 on up well spreadsheets
    • 0:24:06unfortunately at the time microsoft had every row is numbered from 1 on up well
    • 0:24:08used a 16-bit integer unfortunately at the time microsoft had
    • 0:24:1016 bits or two bytes to represent each used a 16-bit integer
    • 0:24:13of those numbers 16 bits or two bytes to represent each
    • 0:24:14and it turns out the 2 to the 16th power of those numbers
    • 0:24:16is roughly 65 000 and it turns out the 2 to the 16th power
    • 0:24:18so at that point i maxed out the total is roughly 65 000
    • 0:24:20number of rows now to peter's point so at that point i maxed out the total
    • 0:24:22they've increased that in recent years number of rows now to peter's point
    • 0:24:24and you can actually store a lot more they've increased that in recent years
    • 0:24:25data so spreadsheets are indeed good at and you can actually store a lot more
    • 0:24:27that data so spreadsheets are indeed good at
    • 0:24:28but they're not necessarily good at that
    • 0:24:30everything because at some point you're but they're not necessarily good at
    • 0:24:31going to have more data potentially in a everything because at some point you're
    • 0:24:33spreadsheet going to have more data potentially in a
    • 0:24:34than your mac or pc can handle in fact spreadsheet
    • 0:24:37if you're actually trying to build an than your mac or pc can handle in fact
    • 0:24:38application if you're actually trying to build an
    • 0:24:39whether it's twitter or instagram or application
    • 0:24:41facebook or anything whether it's twitter or instagram or
    • 0:24:42of that scale those companies are facebook or anything
    • 0:24:44certainly not storing their data suffice of that scale those companies are
    • 0:24:46it to say in a spreadsheet because there certainly not storing their data suffice
    • 0:24:47would just be way too much data to use it to say in a spreadsheet because there
    • 0:24:49and no one could literally open it would just be way too much data to use
    • 0:24:51on their computer so we'll need a and no one could literally open it
    • 0:24:52solution to that problem of scale on their computer so we'll need a
    • 0:24:55but i don't think we need to throw out solution to that problem of scale
    • 0:24:57what works well about spreadsheets so but i don't think we need to throw out
    • 0:24:59you can store indeed a lot of data in what works well about spreadsheets so
    • 0:25:01row form you can store indeed a lot of data in
    • 0:25:02but it would seem that you can also row form
    • 0:25:03store a lot of data in column form and but it would seem that you can also
    • 0:25:06even though i'm only showing columns a b store a lot of data in column form and
    • 0:25:08and c of course you've probably used even though i'm only showing columns a b
    • 0:25:10spreadsheets where you add more columns and c of course you've probably used
    • 0:25:11d spreadsheets where you add more columns
    • 0:25:12e f and so forth so what's the right d
    • 0:25:14mental model e f and so forth so what's the right
    • 0:25:15for how to think about rows versus mental model
    • 0:25:18columns in a spreadsheet for how to think about rows versus
    • 0:25:20i feel like we probably use them in a columns in a spreadsheet
    • 0:25:23somewhat i feel like we probably use them in a
    • 0:25:24different way conceptually we might somewhat
    • 0:25:27think about them a little differently different way conceptually we might
    • 0:25:29what's the difference between rows and think about them a little differently
    • 0:25:32columns in a spreadsheet sophia what's the difference between rows and
    • 0:25:37um adding more entries like adding more
    • 0:25:40data is um adding more entries like adding more
    • 0:25:41those are within the rows but then like data is
    • 0:25:42the actual attributes or characteristics those are within the rows but then like
    • 0:25:44of the data should be in columns the actual attributes or characteristics
    • 0:25:46exactly when you add more data to the of the data should be in columns
    • 0:25:47spreadsheet you should really be adding exactly when you add more data to the
    • 0:25:50to the bottom of it adding more and more spreadsheet you should really be adding
    • 0:25:52rows so these things sort of grow to the bottom of it adding more and more
    • 0:25:54vertically even though of course that's rows so these things sort of grow
    • 0:25:55just a human's perception of it vertically even though of course that's
    • 0:25:57they grow from top to bottom by adding just a human's perception of it
    • 0:25:59more and more rows but to sophia's point they grow from top to bottom by adding
    • 0:26:01your columns represent what we might more and more rows but to sophia's point
    • 0:26:03called attributes or your columns represent what we might
    • 0:26:05fields uh or any other such called attributes or
    • 0:26:07characteristic fields uh or any other such
    • 0:26:08that kind of is a type of data that characteristic
    • 0:26:10you're storing so in this case of our that kind of is a type of data that
    • 0:26:12form you're storing so in this case of our
    • 0:26:12timestamp is the first column title is form
    • 0:26:14the second column timestamp is the first column title is
    • 0:26:16genres is the third column and those the second column
    • 0:26:17columns can indeed be thought of as genres is the third column and those
    • 0:26:19fields or attributes properties of your columns can indeed be thought of as
    • 0:26:21data fields or attributes properties of your
    • 0:26:22and those are properties that you should data
    • 0:26:23really decide on in advance when you're and those are properties that you should
    • 0:26:25first creating the form in our case or really decide on in advance when you're
    • 0:26:27when you're manually creating the first creating the form in our case or
    • 0:26:28spreadsheet in your in when you're manually creating the
    • 0:26:30another case you should not really be in spreadsheet in your in
    • 0:26:32the habit when using spreadsheets another case you should not really be in
    • 0:26:34be in the habit of adding data from left the habit when using spreadsheets
    • 0:26:36to right adding more and more columns be in the habit of adding data from left
    • 0:26:38unless to right adding more and more columns
    • 0:26:39you decide to collect more types of data unless
    • 0:26:43so just because someone adds a new you decide to collect more types of data
    • 0:26:45favorite tv show so just because someone adds a new
    • 0:26:46to your data set you shouldn't be adding favorite tv show
    • 0:26:48that from left to right in a new column to your data set you shouldn't be adding
    • 0:26:49you should indeed be adding it from top that from left to right in a new column
    • 0:26:51to bottom you should indeed be adding it from top
    • 0:26:52but suppose that we actually decided to to bottom
    • 0:26:54collect more information from everyone but suppose that we actually decided to
    • 0:26:56maybe that form had instead asked you collect more information from everyone
    • 0:26:57for your name or your maybe that form had instead asked you
    • 0:26:59email address or any other questions for your name or your
    • 0:27:01those properties or attributes or fields email address or any other questions
    • 0:27:04would belong as new columns so this is those properties or attributes or fields
    • 0:27:06to say we generally decide on the would belong as new columns so this is
    • 0:27:08the layout of our data the schema of our to say we generally decide on the
    • 0:27:11data in advance the layout of our data the schema of our
    • 0:27:12and then from there on out we proceed to data in advance
    • 0:27:14add add add more rows and then from there on out we proceed to
    • 0:27:16not columns unless we change our mind add add add more rows
    • 0:27:18and need to change the schema not columns unless we change our mind
    • 0:27:20of our particular data so it turns out and need to change the schema
    • 0:27:23that spreadsheets are indeed wonderfully of our particular data so it turns out
    • 0:27:24useful to peter's point for you know that spreadsheets are indeed wonderfully
    • 0:27:26large or reasonably large useful to peter's point for you know
    • 0:27:28data sets that we might collect and we large or reasonably large
    • 0:27:31can of course per last week export those data sets that we might collect and we
    • 0:27:33data sets as can of course per last week export those
    • 0:27:34csv files and so we can go from a data sets as
    • 0:27:36spreadsheet to a simple text file rep csv files and so we can go from a
    • 0:27:39stored in ascii or unicode more spreadsheet to a simple text file rep
    • 0:27:41generally on your own hard drive or stored in ascii or unicode more
    • 0:27:42somewhere in the cloud generally on your own hard drive or
    • 0:27:44and you can actually think of that file somewhere in the cloud
    • 0:27:46that dot csv and you can actually think of that file
    • 0:27:47file is what we might call a flat file that dot csv
    • 0:27:50database file is what we might call a flat file
    • 0:27:50a database is generally speaking a file database
    • 0:27:53that stores data a database is generally speaking a file
    • 0:27:54or it's a program that stores data for that stores data
    • 0:27:57you and all of us have probably thought or it's a program that stores data for
    • 0:27:58about or use databases in some sense you and all of us have probably thought
    • 0:28:00you're probably about or use databases in some sense
    • 0:28:01uh familiar with the fact that all of you're probably
    • 0:28:03those same big websites google and uh familiar with the fact that all of
    • 0:28:04twitter and facebook and others those same big websites google and
    • 0:28:06use databases to store our data well twitter and facebook and others
    • 0:28:08those databases are either just really use databases to store our data well
    • 0:28:10big files containing lots of data those databases are either just really
    • 0:28:12or special programs that are storing our big files containing lots of data
    • 0:28:14data for us or special programs that are storing our
    • 0:28:15and a flat file is just referring to the data for us
    • 0:28:17fact that it really is a very simple and a flat file is just referring to the
    • 0:28:19design fact that it really is a very simple
    • 0:28:19in fact years ago decades ago humans design
    • 0:28:22decided in fact years ago decades ago humans
    • 0:28:22when storing data in simple text files decided
    • 0:28:25that if you want to store different when storing data in simple text files
    • 0:28:27types of data like to sophia's point that if you want to store different
    • 0:28:28different properties or attributes types of data like to sophia's point
    • 0:28:30well let's keep it simple let's just different properties or attributes
    • 0:28:32separate those columns well let's keep it simple let's just
    • 0:28:33with commas in our flat file database separate those columns
    • 0:28:37aka a csv with commas in our flat file database
    • 0:28:38you can use other things you can use aka a csv
    • 0:28:40tabs there's things called tsvs for tab you can use other things you can use
    • 0:28:42separated values tabs there's things called tsvs for tab
    • 0:28:44and frankly you can use anything you separated values
    • 0:28:45want but there is a corner case and and frankly you can use anything you
    • 0:28:47we've already seen a preview of it want but there is a corner case and
    • 0:28:49what if your actual data has a comma in we've already seen a preview of it
    • 0:28:52it what if the title of your favorite tv what if your actual data has a comma in
    • 0:28:54show has a comma it what if the title of your favorite tv
    • 0:28:55what if google is presuming to store show has a comma
    • 0:28:57genres as a comma separated list what if google is presuming to store
    • 0:28:59bad things can happen if using a csv as genres as a comma separated list
    • 0:29:02your flat file database bad things can happen if using a csv as
    • 0:29:03but there's solutions to that and in your flat file database
    • 0:29:05fact what the world typically does is but there's solutions to that and in
    • 0:29:07whenever you have fact what the world typically does is
    • 0:29:07commas inside of your csv file whenever you have
    • 0:29:10you just make sure that the whole string commas inside of your csv file
    • 0:29:13is double quoted you just make sure that the whole string
    • 0:29:14on the far left and far right and is double quoted
    • 0:29:16anything inside of double quotes on the far left and far right and
    • 0:29:17is not mistaken thereafter as anything inside of double quotes
    • 0:29:20delineating a column is not mistaken thereafter as
    • 0:29:22as the other commas in the file might so delineating a column
    • 0:29:24that's all that's meant by a flat file as the other commas in the file might so
    • 0:29:26database and csv is perhaps one of the that's all that's meant by a flat file
    • 0:29:28most common the most common formats database and csv is perhaps one of the
    • 0:29:30thereof most common the most common formats
    • 0:29:30if only because all of these programs thereof
    • 0:29:32like google spreadsheets and excel and if only because all of these programs
    • 0:29:34numbers like google spreadsheets and excel and
    • 0:29:34allow you to save your files as csvs now numbers
    • 0:29:38long story short those of you who have allow you to save your files as csvs now
    • 0:29:40used fancier features of spreadsheets long story short those of you who have
    • 0:29:42like built-in functions and formulas and used fancier features of spreadsheets
    • 0:29:44those kinds of things like built-in functions and formulas and
    • 0:29:45those are built-in and proprietary to those kinds of things
    • 0:29:47google spreadsheets and excel those are built-in and proprietary to
    • 0:29:49and numbers you cannot use formulas in a google spreadsheets and excel
    • 0:29:52csv and numbers you cannot use formulas in a
    • 0:29:53file or a tsv file or in a flat file csv
    • 0:29:55database more generally file or a tsv file or in a flat file
    • 0:29:57you can only store static that is database more generally
    • 0:29:59unchanging you can only store static that is
    • 0:30:00values so when you export the data what unchanging
    • 0:30:03you see is what you get and that's why values so when you export the data what
    • 0:30:05people use you see is what you get and that's why
    • 0:30:06fancier programs like excel and numbers people use
    • 0:30:07in google spreadsheets because you get fancier programs like excel and numbers
    • 0:30:08more functionality in google spreadsheets because you get
    • 0:30:10but if you want to export the data you more functionality
    • 0:30:12can only get indeed the raw but if you want to export the data you
    • 0:30:14textual data out of it but i dare say can only get indeed the raw
    • 0:30:16that's going to be okay in fact brian do textual data out of it but i dare say
    • 0:30:17you mind if i go ahead and download that's going to be okay in fact brian do
    • 0:30:19this spreadsheet as a csv file now yep you mind if i go ahead and download
    • 0:30:21go ahead this spreadsheet as a csv file now yep
    • 0:30:22all right i'm going to go ahead and go ahead
    • 0:30:23google spreadsheets and go to file all right i'm going to go ahead and
    • 0:30:25download and you can see a whole bunch google spreadsheets and go to file
    • 0:30:27of options pdf download and you can see a whole bunch
    • 0:30:29web page comma separated values which is of options pdf
    • 0:30:32the one i want so i'm going to indeed go web page comma separated values which is
    • 0:30:33ahead and choose the one i want so i'm going to indeed go
    • 0:30:34csv from this drop-down in spreadsheets ahead and choose
    • 0:30:37that of course downloaded that file for csv from this drop-down in spreadsheets
    • 0:30:39me and now i'm going to go ahead and go that of course downloaded that file for
    • 0:30:40into our familiar cs50 ide you'll recall me and now i'm going to go ahead and go
    • 0:30:42that last week into our familiar cs50 ide you'll recall
    • 0:30:43i was able to upload a file into the ide that last week
    • 0:30:46and i'm going to go ahead and do the i was able to upload a file into the ide
    • 0:30:47same here this week as well and i'm going to go ahead and do the
    • 0:30:48i'm going to go ahead and grab my file same here this week as well
    • 0:30:50which ended up in my downloads folder on i'm going to go ahead and grab my file
    • 0:30:52my particular computer here which ended up in my downloads folder on
    • 0:30:54and i'm going to go ahead and drag and my particular computer here
    • 0:30:56drop this and i'm going to go ahead and drag and
    • 0:30:57into the ide such that it ends up in my drop this
    • 0:31:01home directory so to speak so now i have into the ide such that it ends up in my
    • 0:31:03this file favorite tv shows forms and in home directory so to speak so now i have
    • 0:31:06fact if i double click this within the this file favorite tv shows forms and in
    • 0:31:07ide fact if i double click this within the
    • 0:31:08you'll see familiar data now timestamp ide
    • 0:31:11comma you'll see familiar data now timestamp
    • 0:31:11title comma genres is our header row comma
    • 0:31:14that contains the names of the title comma genres is our header row
    • 0:31:16properties or attributes in this file that contains the names of the
    • 0:31:18then we've got our timestamps comma properties or attributes in this file
    • 0:31:20favorite title then we've got our timestamps comma
    • 0:31:21comma and then a comma separated list of favorite title
    • 0:31:24genres and here indeed notice comma and then a comma separated list of
    • 0:31:26that google took care to use double genres and here indeed notice
    • 0:31:28quotes around that google took care to use double
    • 0:31:29any values that themselves had commas so quotes around
    • 0:31:31it's a relatively simple file format any values that themselves had commas so
    • 0:31:33and i could certainly just kind of skim it's a relatively simple file format
    • 0:31:35through this figuring out who likes the and i could certainly just kind of skim
    • 0:31:37office who likes breaking bad or other through this figuring out who likes the
    • 0:31:38shows but per last week we now have a office who likes breaking bad or other
    • 0:31:40pretty useful programming language at shows but per last week we now have a
    • 0:31:42our disposal python pretty useful programming language at
    • 0:31:43that could allow us to start our disposal python
    • 0:31:44manipulating and analyzing this data that could allow us to start
    • 0:31:47more readily and here to my point last manipulating and analyzing this data
    • 0:31:49week about using the right tool for the more readily and here to my point last
    • 0:31:50job week about using the right tool for the
    • 0:31:51you could absolutely do everything we're job
    • 0:31:53about to do in all weeks prior of cs50 you could absolutely do everything we're
    • 0:31:56we could have used c about to do in all weeks prior of cs50
    • 0:31:57for what we're about to do but as you we could have used c
    • 0:31:59can probably glean c tends to be painful for what we're about to do but as you
    • 0:32:01for certain things like anything can probably glean c tends to be painful
    • 0:32:03involving string manipulation changing for certain things like anything
    • 0:32:06strings involving string manipulation changing
    • 0:32:07analyzing strings is just a real pain strings
    • 0:32:09right god forbid you had to take this analyzing strings is just a real pain
    • 0:32:10csv file right god forbid you had to take this
    • 0:32:12and load it all into memory not unlike csv file
    • 0:32:14your spell checker you would have to be and load it all into memory not unlike
    • 0:32:15using malloc your spell checker you would have to be
    • 0:32:16all over the place or reallock or the using malloc
    • 0:32:17like like there's just a lot of heavy all over the place or reallock or the
    • 0:32:19lifting involved in just like like there's just a lot of heavy
    • 0:32:20analyzing a text file so python does all lifting involved in just
    • 0:32:23of that for us by just giving us more analyzing a text file so python does all
    • 0:32:25functions at our disposal of that for us by just giving us more
    • 0:32:27with which to start analyzing and functions at our disposal
    • 0:32:29opening data with which to start analyzing and
    • 0:32:30so let me go ahead and close this file opening data
    • 0:32:32let me go ahead and create a new one so let me go ahead and close this file
    • 0:32:34called favorites.pi wherein i'm going to let me go ahead and create a new one
    • 0:32:37start playing with this data set and see called favorites.pi wherein i'm going to
    • 0:32:38if we can't start start playing with this data set and see
    • 0:32:39answering some questions about it and if we can't start
    • 0:32:41frankly to this day 20 plus years after answering some questions about it and
    • 0:32:43learning how to program for the first frankly to this day 20 plus years after
    • 0:32:44time learning how to program for the first
    • 0:32:45i myself am very much in the habit when time
    • 0:32:47writing a new program of just starting i myself am very much in the habit when
    • 0:32:48simple and not solving the problem i writing a new program of just starting
    • 0:32:50ultimately want to simple and not solving the problem i
    • 0:32:52but something simpler just as a sort of ultimately want to
    • 0:32:54proof of concept to make sure i have the but something simpler just as a sort of
    • 0:32:56right plumbing proof of concept to make sure i have the
    • 0:32:57in place so by that i mean this let's go right plumbing
    • 0:32:59ahead and write a quick in place so by that i mean this let's go
    • 0:33:00program that simply opens up this file ahead and write a quick
    • 0:33:03the csv file program that simply opens up this file
    • 0:33:04iterates over it top to bottom and just the csv file
    • 0:33:07prints out each of the titles just as a iterates over it top to bottom and just
    • 0:33:08quick sanity check that i know what i'm prints out each of the titles just as a
    • 0:33:10doing and i have access to the data quick sanity check that i know what i'm
    • 0:33:12they're in so let me go ahead and import doing and i have access to the data
    • 0:33:14csv and then i can do this in a few they're in so let me go ahead and import
    • 0:33:16different ways but by now you've csv and then i can do this in a few
    • 0:33:17probably seen or remembered different ways but by now you've
    • 0:33:19by using something like the open command probably seen or remembered
    • 0:33:21and the with keyword to sort of open by using something like the open command
    • 0:33:24and eventually automatically close this and the with keyword to sort of open
    • 0:33:25file for me this file is called favorite and eventually automatically close this
    • 0:33:28tv file for me this file is called favorite
    • 0:33:28shows dash form responses tv
    • 0:33:311 dot csv and i'm going to open this up shows dash form responses
    • 0:33:34in read mode 1 dot csv and i'm going to open this up
    • 0:33:36strictly speaking the r is not required in read mode
    • 0:33:38you might see examples online strictly speaking the r is not required
    • 0:33:39not including it that's because read is you might see examples online
    • 0:33:41the default but for parity with c not including it that's because read is
    • 0:33:43and f open i'm going to be explicit and the default but for parity with c
    • 0:33:45actually do quote-unquote r and f open i'm going to be explicit and
    • 0:33:47and i'm going to go ahead and give this actually do quote-unquote r
    • 0:33:48a variable name of file so this line and i'm going to go ahead and give this
    • 0:33:503 here has the effect of opening that a variable name of file so this line
    • 0:33:53csv file in read-only mode 3 here has the effect of opening that
    • 0:33:55and creating a variable called file via csv file in read-only mode
    • 0:33:57which i can reference it and creating a variable called file via
    • 0:33:59now i'm going to go ahead and use some which i can reference it
    • 0:34:00of that csv functionality i'm going to now i'm going to go ahead and use some
    • 0:34:01give myself what we keep calling a of that csv functionality i'm going to
    • 0:34:03reader give myself what we keep calling a
    • 0:34:03which i could call it xyz anything else reader
    • 0:34:06but reader kind of describes which i could call it xyz anything else
    • 0:34:07what this variable is going to do and but reader kind of describes
    • 0:34:09it's going to be the return value of what this variable is going to do and
    • 0:34:11calling csv.reader on that file it's going to be the return value of
    • 0:34:14and so essentially the csv library per calling csv.reader on that file
    • 0:34:17last week and so essentially the csv library per
    • 0:34:18has a lot of fancy features built in and last week
    • 0:34:19all it needs as input has a lot of fancy features built in and
    • 0:34:21is an already opened text file and then all it needs as input
    • 0:34:24it will then wrap that file so to speak is an already opened text file and then
    • 0:34:26with a whole bunch of more useful it will then wrap that file so to speak
    • 0:34:27functionality with a whole bunch of more useful
    • 0:34:28like the ability to read it column functionality
    • 0:34:31and row at a time all right now i'm like the ability to read it column
    • 0:34:33going to go ahead and and row at a time all right now i'm
    • 0:34:35you know what just for now i'm going to going to go ahead and
    • 0:34:37skip the first you know what just for now i'm going to
    • 0:34:38row i'm going to skip the first row skip the first
    • 0:34:41because the first row has my headings row i'm going to skip the first row
    • 0:34:42timestamp title and genres and i i know because the first row has my headings
    • 0:34:45what my columns are so i'm just going to timestamp title and genres and i i know
    • 0:34:47ignore that what my columns are so i'm just going to
    • 0:34:48line for now and now i'm going to do ignore that
    • 0:34:50this for row line for now and now i'm going to do
    • 0:34:51in reader let me go ahead and print out this for row
    • 0:34:54quite simply row and i only want title in reader let me go ahead and print out
    • 0:34:57so i think if it's three columns quite simply row and i only want title
    • 0:34:59from left to right it's 0 1 2 so i want so i think if it's three columns
    • 0:35:02to print out column from left to right it's 0 1 2 so i want
    • 0:35:03bracket 1 which is going to be the to print out column
    • 0:35:05second column 0 index bracket 1 which is going to be the
    • 0:35:07all right let me go ahead and save that second column 0 index
    • 0:35:08go down to my terminal window all right let me go ahead and save that
    • 0:35:10and run python of favorites.pi and cross go down to my terminal window
    • 0:35:13my fingers and run python of favorites.pi and cross
    • 0:35:14okay voila it looks like it flied it my fingers
    • 0:35:16flew by super fast okay voila it looks like it flied it
    • 0:35:18but it looks like indeed these are all flew by super fast
    • 0:35:20of the tv shows that folks have inputted but it looks like indeed these are all
    • 0:35:22indeed there's a few hundred if i keep of the tv shows that folks have inputted
    • 0:35:24scrolling up indeed there's a few hundred if i keep
    • 0:35:24so it looks like my program is working scrolling up
    • 0:35:27but let's improve it just a little bit so it looks like my program is working
    • 0:35:29it turns out that using the csv but let's improve it just a little bit
    • 0:35:32reader isn't necessarily the best it turns out that using the csv
    • 0:35:34approach in python many of you have reader isn't necessarily the best
    • 0:35:35already discovered a dict reader a approach in python many of you have
    • 0:35:37dictionary reader already discovered a dict reader a
    • 0:35:39which is nice because then you don't dictionary reader
    • 0:35:40have to know or keep double checking which is nice because then you don't
    • 0:35:42what number column your data is in you have to know or keep double checking
    • 0:35:44can instead refer it to by the what number column your data is in you
    • 0:35:46header itself so by title quote unquote can instead refer it to by the
    • 0:35:48or by genres header itself so by title quote unquote
    • 0:35:50this is also good because if you or or by genres
    • 0:35:52maybe a colleague are sort of messing this is also good because if you or
    • 0:35:53around with the spreadsheet and they maybe a colleague are sort of messing
    • 0:35:54rearrange the columns by dragging them around with the spreadsheet and they
    • 0:35:56left or right rearrange the columns by dragging them
    • 0:35:57any numbers you have used in your code 0 left or right
    • 0:36:001 2 on up any numbers you have used in your code 0
    • 0:36:01could suddenly be incorrect if your 1 2 on up
    • 0:36:03colleague has reordered those columns could suddenly be incorrect if your
    • 0:36:05so using a dictionary reader tends to be colleague has reordered those columns
    • 0:36:07a little more robust because it uses the so using a dictionary reader tends to be
    • 0:36:09titles a little more robust because it uses the
    • 0:36:10not the mere numbers it's still fallible titles
    • 0:36:12if someone yourself or someone else not the mere numbers it's still fallible
    • 0:36:14changes the values in that very first if someone yourself or someone else
    • 0:36:17row and renames titles or genres then changes the values in that very first
    • 0:36:19things are going to break but at that row and renames titles or genres then
    • 0:36:21point things are going to break but at that
    • 0:36:21we kind of have to blame you for not point
    • 0:36:23having kept track of your code versus we kind of have to blame you for not
    • 0:36:24your data but having kept track of your code versus
    • 0:36:25still a risk so i'm going to change this your data but
    • 0:36:27to dictionary reader or dict reader here still a risk so i'm going to change this
    • 0:36:29and pretty much the rest of my code can to dictionary reader or dict reader here
    • 0:36:31be the same except i don't need this and pretty much the rest of my code can
    • 0:36:32hack here on line five be the same except i don't need this
    • 0:36:34i don't need to just skip over to the hack here on line five
    • 0:36:36next row from the get-go i don't need to just skip over to the
    • 0:36:38because i now want the dictionary reader next row from the get-go
    • 0:36:40to handle the process of because i now want the dictionary reader
    • 0:36:42reading that first row for me but to handle the process of
    • 0:36:43otherwise everything else stays the same reading that first row for me but
    • 0:36:45except for this last line where now i otherwise everything else stays the same
    • 0:36:46think i can now use except for this last line where now i
    • 0:36:48row as a dictionary not as a list think i can now use
    • 0:36:51per se and print out specifically the row as a dictionary not as a list
    • 0:36:54title per se and print out specifically the
    • 0:36:55from each given row so let me go ahead title
    • 0:36:56and run python a favorite stop pi again from each given row so let me go ahead
    • 0:36:59and voila it looks like i got the same and run python a favorite stop pi again
    • 0:37:01results several hundred of them but let and voila it looks like i got the same
    • 0:37:03me stipulate that it's doing the same results several hundred of them but let
    • 0:37:04thing if we actually compared both of me stipulate that it's doing the same
    • 0:37:06those thing if we actually compared both of
    • 0:37:07side by side all right before i forge those
    • 0:37:09ahead now to actually augment this with side by side all right before i forge
    • 0:37:10new functionality ahead now to actually augment this with
    • 0:37:12any questions or confusion on this new functionality
    • 0:37:15python any questions or confusion on this
    • 0:37:15script we just wrote to open a file wrap python
    • 0:37:18it with a reader or dict reader script we just wrote to open a file wrap
    • 0:37:20and then iterate over the rows one at a it with a reader or dict reader
    • 0:37:22time printing and then iterate over the rows one at a
    • 0:37:24the titles any questions confusion on time printing
    • 0:37:27syntax at all it's okay we've only known the titles any questions confusion on
    • 0:37:29or seen python for a week syntax at all it's okay we've only known
    • 0:37:30it's fine if it's still quite new or seen python for a week
    • 0:37:33anything brian we should it's fine if it's still quite new
    • 0:37:34address yeah so why is it that you don't anything brian we should
    • 0:37:38need to address yeah so why is it that you don't
    • 0:37:38close the file using the syntax that need to
    • 0:37:41you're using right here close the file using the syntax that
    • 0:37:42really good question last week i more you're using right here
    • 0:37:44pedantically used really good question last week i more
    • 0:37:46open on its own and then i later used a pedantically used
    • 0:37:48close open on its own and then i later used a
    • 0:37:49function that was associated with the close
    • 0:37:51file that i just opened function that was associated with the
    • 0:37:52now the more pythonic way to do things file that i just opened
    • 0:37:55if you will now the more pythonic way to do things
    • 0:37:56is actually to use this with keyword if you will
    • 0:37:58which didn't exist in c is actually to use this with keyword
    • 0:37:59and it just tends to be a useful feature which didn't exist in c
    • 0:38:01in python whereby if you say and it just tends to be a useful feature
    • 0:38:03with open dot dot dot it will open the in python whereby if you say
    • 0:38:06file for you with open dot dot dot it will open the
    • 0:38:07then it will remain open so long as your file for you
    • 0:38:10code is indented inside of that then it will remain open so long as your
    • 0:38:11with keywords block and as soon as you code is indented inside of that
    • 0:38:14get to the end of your program with keywords block and as soon as you
    • 0:38:16it will automatically be closed for you get to the end of your program
    • 0:38:17so this is one of these features where it will automatically be closed for you
    • 0:38:19python in some sense so this is one of these features where
    • 0:38:20is trying to protect us from ourselves python in some sense
    • 0:38:23it's probably pretty common for humans is trying to protect us from ourselves
    • 0:38:24myself included to forget to close your it's probably pretty common for humans
    • 0:38:26file myself included to forget to close your
    • 0:38:27that can create problems with saving file
    • 0:38:28things permanently it can create memory that can create problems with saving
    • 0:38:30leaks as we know from c things permanently it can create memory
    • 0:38:32so the with keyword just assumes that leaks as we know from c
    • 0:38:34i'm not going to be an idiot and forget so the with keyword just assumes that
    • 0:38:35to close the file i'm not going to be an idiot and forget
    • 0:38:36python is going to do it for me to close the file
    • 0:38:38automatically python is going to do it for me
    • 0:38:40other questions or confusions brian automatically
    • 0:38:44how does dict reader know that title is
    • 0:38:47the name of the key how does dict reader know that title is
    • 0:38:48inside of the dictionary really good the name of the key
    • 0:38:50question too so inside of the dictionary really good
    • 0:38:52it is designed by the authors of the question too so
    • 0:38:54python language it is designed by the authors of the
    • 0:38:55to look at the very first row in the python language
    • 0:38:58file to look at the very first row in the
    • 0:38:59split it on the commas in that very file
    • 0:39:02first row split it on the commas in that very
    • 0:39:03and just assume that the first word or first row
    • 0:39:05phrase and just assume that the first word or
    • 0:39:06before the first comma is the name of phrase
    • 0:39:08the first column before the first comma is the name of
    • 0:39:09that the second word or phrase after the the first column
    • 0:39:12first comma that the second word or phrase after the
    • 0:39:12is the uh name of the second column first comma
    • 0:39:16and so forth so a dict reader just is the uh name of the second column
    • 0:39:18presumes and so forth so a dict reader just
    • 0:39:19as is the convention with csv is that presumes
    • 0:39:21your first row is going to contain as is the convention with csv is that
    • 0:39:23the headings that you want to use to your first row is going to contain
    • 0:39:25refer to those columns if your csv the headings that you want to use to
    • 0:39:27happens not to have such a heading refer to those columns if your csv
    • 0:39:29whereby it just jumps right in on the happens not to have such a heading
    • 0:39:31first row to real data then you're not whereby it just jumps right in on the
    • 0:39:33going to be able to use a dict reader first row to real data then you're not
    • 0:39:34correctly at least not without some going to be able to use a dict reader
    • 0:39:36manual configuration correctly at least not without some
    • 0:39:40other questions brian nothing else here
    • 0:39:43all right so other questions brian nothing else here
    • 0:39:44let's go ahead and now i feel like all right so
    • 0:39:46there's a whole mess here and you know let's go ahead and now i feel like
    • 0:39:47some of these shows are pretty popular there's a whole mess here and you know
    • 0:39:49and as i'm glancing over this i some of these shows are pretty popular
    • 0:39:50definitely see some duplication a whole and as i'm glancing over this i
    • 0:39:51bunch of you like definitely see some duplication a whole
    • 0:39:52the office a whole bunch of you like bunch of you like
    • 0:39:54breaking bad game of thrones and a whole the office a whole bunch of you like
    • 0:39:56bunch of other shows as well breaking bad game of thrones and a whole
    • 0:39:57so it would be nicer i think if we kind bunch of other shows as well
    • 0:39:59of narrow the scope of our look at this so it would be nicer i think if we kind
    • 0:40:01data by just looking at unique values of narrow the scope of our look at this
    • 0:40:04you're looking at unique values so data by just looking at unique values
    • 0:40:05rather than just iterate over the file you're looking at unique values so
    • 0:40:06top to bottom printing out one title rather than just iterate over the file
    • 0:40:08after another why don't we go ahead and top to bottom printing out one title
    • 0:40:10sort of accumulate all of this data in after another why don't we go ahead and
    • 0:40:13some kind of data structure sort of accumulate all of this data in
    • 0:40:14so that we can throw away duplicate some kind of data structure
    • 0:40:16values and then so that we can throw away duplicate
    • 0:40:17only print out the unique titles that values and then
    • 0:40:20we've accumulated only print out the unique titles that
    • 0:40:21so i bet we can do this in a few ways we've accumulated
    • 0:40:22but if we think back to last week's so i bet we can do this in a few ways
    • 0:40:24demonstration of our dictionary but if we think back to last week's
    • 0:40:26you'll recall that i used what was demonstration of our dictionary
    • 0:40:28called a set and i'm going to go ahead you'll recall that i used what was
    • 0:40:29and create a variable called titles and called a set and i'm going to go ahead
    • 0:40:31set it equal to something called and create a variable called titles and
    • 0:40:33set and a set is just a collection of set it equal to something called
    • 0:40:35values it's kind of like a list set and a set is just a collection of
    • 0:40:37but it eliminates duplicates for me and values it's kind of like a list
    • 0:40:39that would seem to be exactly the but it eliminates duplicates for me and
    • 0:40:40characteristic that i want that would seem to be exactly the
    • 0:40:42for this program now instead of printing characteristic that i want
    • 0:40:44each title which is now premature if i for this program now instead of printing
    • 0:40:47want to first filter out duplicates each title which is now premature if i
    • 0:40:49i'm going to go ahead and do this i'm want to first filter out duplicates
    • 0:40:50going to go ahead and add to the titles i'm going to go ahead and do this i'm
    • 0:40:53set using the add function the current going to go ahead and add to the titles
    • 0:40:56rows set using the add function the current
    • 0:40:57title so again i'm not printing it now rows
    • 0:40:59i'm instead title so again i'm not printing it now
    • 0:41:00adding to the title set that particular i'm instead
    • 0:41:03title and if it's there already no big adding to the title set that particular
    • 0:41:05deal the set title and if it's there already no big
    • 0:41:06data structure in python is going to deal the set
    • 0:41:08throw away the duplicates for me and data structure in python is going to
    • 0:41:09it's only going to go ahead and keep the throw away the duplicates for me and
    • 0:41:11uniques it's only going to go ahead and keep the
    • 0:41:11now at the bottom of my file i need to uniques
    • 0:41:14do a little more work admittedly now at the bottom of my file i need to
    • 0:41:15now i have to iterate over the set to do a little more work admittedly
    • 0:41:17print out only those unique titles so now i have to iterate over the set to
    • 0:41:19let me do this for title in print out only those unique titles so
    • 0:41:21titles go ahead and print out title and let me do this for title in
    • 0:41:25this is where python just gets really titles go ahead and print out title and
    • 0:41:26user friendly right you don't have to do this is where python just gets really
    • 0:41:28int i get 0 i less than n or whatever user friendly right you don't have to do
    • 0:41:31you can just say for title in titles and int i get 0 i less than n or whatever
    • 0:41:34if the titles variable you can just say for title in titles and
    • 0:41:36is the type of data structure that you if the titles variable
    • 0:41:38can iterate over is the type of data structure that you
    • 0:41:39which it will be if it's a list or if can iterate over
    • 0:41:42it's a set which it will be if it's a list or if
    • 0:41:43or even if it's a dictionary another it's a set
    • 0:41:44data structure we saw last week in or even if it's a dictionary another
    • 0:41:46python data structure we saw last week in
    • 0:41:47the for loop in python will just know python
    • 0:41:48what to do this will loop over the for loop in python will just know
    • 0:41:50all of the titles in the titles what to do this will loop over
    • 0:41:53sets so let me go ahead and save this all of the titles in the titles
    • 0:41:55file and go ahead now and run python of sets so let me go ahead and save this
    • 0:41:58favorites.pi file and go ahead now and run python of
    • 0:41:59and it looks like yeah the list is favorites.pi
    • 0:42:01different in some way and it looks like yeah the list is
    • 0:42:03but i'm seeing fewer results as i scroll different in some way
    • 0:42:06up definitely fewer than before because but i'm seeing fewer results as i scroll
    • 0:42:08my scroll bar didn't jump nearly up definitely fewer than before because
    • 0:42:09as far down but honestly this is kind of my scroll bar didn't jump nearly
    • 0:42:11a mess let's go ahead and sort this as far down but honestly this is kind of
    • 0:42:13now in c it would have been kind of a a mess let's go ahead and sort this
    • 0:42:15pain to sort things we'd have to whip now in c it would have been kind of a
    • 0:42:16out the pseudocode probably for bubble pain to sort things we'd have to whip
    • 0:42:18sort selection sword or god forbid merge out the pseudocode probably for bubble
    • 0:42:20sort and then implement it ourselves sort selection sword or god forbid merge
    • 0:42:21but no with python comes really the sort and then implement it ourselves
    • 0:42:23proverbial kitchen sink of functions but no with python comes really the
    • 0:42:25so if you want to sort this set you know proverbial kitchen sink of functions
    • 0:42:27what just say you want it sorted so if you want to sort this set you know
    • 0:42:29there is a function in python called what just say you want it sorted
    • 0:42:31sorted that will use one of those better there is a function in python called
    • 0:42:34algorithms maybe it's merge sort maybe sorted that will use one of those better
    • 0:42:36it's something called quick sort maybe algorithms maybe it's merge sort maybe
    • 0:42:37it's something else altogether it's not it's something called quick sort maybe
    • 0:42:39going to use a big o of n squared sort it's something else altogether it's not
    • 0:42:40someone at python probably have spent going to use a big o of n squared sort
    • 0:42:42the time implementing a better sort someone at python probably have spent
    • 0:42:44for us but it will go ahead and sort the the time implementing a better sort
    • 0:42:46set for me now let me go ahead and do for us but it will go ahead and sort the
    • 0:42:48this again let me increase the size of set for me now let me go ahead and do
    • 0:42:49my terminal window this again let me increase the size of
    • 0:42:50and rerun python of favorites.pi okay my terminal window
    • 0:42:54and now we have an interesting and rerun python of favorites.pi okay
    • 0:42:57assortment of shows that's easier for me and now we have an interesting
    • 0:42:59to wrap my mind around assortment of shows that's easier for me
    • 0:43:01because i have it now sorted here to wrap my mind around
    • 0:43:04and indeed if i scroll all the way up we because i have it now sorted here
    • 0:43:06should see all of the shows beginning and indeed if i scroll all the way up we
    • 0:43:07with should see all of the shows beginning
    • 0:43:08numbers or a period which might have with
    • 0:43:11just been someone playing around numbers or a period which might have
    • 0:43:12followed by the a words the b words and just been someone playing around
    • 0:43:14so forth so now it's a little easier to followed by the a words the b words and
    • 0:43:15wrap our minds around this so forth so now it's a little easier to
    • 0:43:17but something's up i feel like a lot of wrap our minds around this
    • 0:43:20you like avatar the last airbender but something's up i feel like a lot of
    • 0:43:22and yet i'm seeing it indeed four you like avatar the last airbender
    • 0:43:25different times and yet i'm seeing it indeed four
    • 0:43:26but i thought we were filtering this different times
    • 0:43:28down to uniques but i thought we were filtering this
    • 0:43:29by using that set structure so what's down to uniques
    • 0:43:32going on and in fact if i keep scrolling by using that set structure so what's
    • 0:43:34i'm pretty sure i saw more going on and in fact if i keep scrolling
    • 0:43:35duplicates in here uh bojack horseman i'm pretty sure i saw more
    • 0:43:38breaking bad breaking bad duplicates in here uh bojack horseman
    • 0:43:40brooklyn nine-nine brooklyn nine-nine breaking bad breaking bad
    • 0:43:42cs50 brooklyn nine-nine brooklyn nine-nine
    • 0:43:43and several different flavors um and cs50
    • 0:43:47yes uh keeps going friends so i see a and several different flavors um and
    • 0:43:50lot of duplicate value so what's going yes uh keeps going friends so i see a
    • 0:43:52on lot of duplicate value so what's going
    • 0:43:53where did those come from on
    • 0:43:56any thoughts here yeah uh kadana
    • 0:44:02yeah so your current sort is case any thoughts here yeah uh kadana
    • 0:44:05insensitive yeah so your current sort is case
    • 0:44:06this sorry is case sensitive meaning insensitive
    • 0:44:08that if someone this sorry is case sensitive meaning
    • 0:44:09spells avatar with capital a's in some that if someone
    • 0:44:11places spells avatar with capital a's in some
    • 0:44:12then it's going to be a different result places
    • 0:44:14each time yeah exactly some of you then it's going to be a different result
    • 0:44:16weren't each time yeah exactly some of you
    • 0:44:16quite diligent when it came to weren't
    • 0:44:18capitalization and so in fact the quite diligent when it came to
    • 0:44:20reality is is kudana notes that there's capitalization and so in fact the
    • 0:44:22differences in capitalization now we've reality is is kudana notes that there's
    • 0:44:24addressed this before in fact when you differences in capitalization now we've
    • 0:44:25implemented your own spell checker addressed this before in fact when you
    • 0:44:27you had to deal with this already when implemented your own spell checker
    • 0:44:28you were spell checking an arbitrary you had to deal with this already when
    • 0:44:30text you were spell checking an arbitrary
    • 0:44:30some words might be capitalized somewhat text
    • 0:44:32might be all lowercase all uppercase and some words might be capitalized somewhat
    • 0:44:34you wanted to tolerate might be all lowercase all uppercase and
    • 0:44:35different casings and so we probably you wanted to tolerate
    • 0:44:38solved this by just forcing everything different casings and so we probably
    • 0:44:39to uppercase or everything to lowercase solved this by just forcing everything
    • 0:44:41and doing things therefore case to uppercase or everything to lowercase
    • 0:44:43insensitively so give me just a moment and doing things therefore case
    • 0:44:45here and i'm going to go ahead and make insensitively so give me just a moment
    • 0:44:46a quick here and i'm going to go ahead and make
    • 0:44:47change to my form here and i'm going to a quick
    • 0:44:50go ahead and change to my form here and i'm going to
    • 0:44:53i'm going to go ahead and change this in go ahead and
    • 0:44:55such a way i'm going to go ahead and change this in
    • 0:44:58so that we have instead such a way
    • 0:45:02let me shrink back my code let's go so that we have instead
    • 0:45:04ahead and change this in such a way that let me shrink back my code let's go
    • 0:45:06we actually force everything to ahead and change this in such a way that
    • 0:45:08uppercase or lowercase doesn't really we actually force everything to
    • 0:45:09matter which but we need to canonicalize uppercase or lowercase doesn't really
    • 0:45:11things so to speak in some way matter which but we need to canonicalize
    • 0:45:13and to canonicalize things just means to things so to speak in some way
    • 0:45:15format and to canonicalize things just means to
    • 0:45:16all of your data in some standard way so format
    • 0:45:18to katana's point all of your data in some standard way so
    • 0:45:19let's just standardize the to katana's point
    • 0:45:20capitalization of things maybe all let's just standardize the
    • 0:45:22uppercase all capitalization of things maybe all
    • 0:45:23lowercase we just need to make a uppercase all
    • 0:45:24judgment call so i'm going to go ahead lowercase we just need to make a
    • 0:45:26and make a few tweaks here judgment call so i'm going to go ahead
    • 0:45:27i'm still going to use a set i'm still and make a few tweaks here
    • 0:45:29going to read the csv i'm still going to use a set i'm still
    • 0:45:30as before but instead of just adding the going to read the csv
    • 0:45:32title as before but instead of just adding the
    • 0:45:34with row bracket title i'm going to go title
    • 0:45:36ahead and force it to with row bracket title i'm going to go
    • 0:45:37uppercase just arbitrarily just for the ahead and force it to
    • 0:45:39sake of uniformity uppercase just arbitrarily just for the
    • 0:45:41and i'm also going to be and then let's sake of uniformity
    • 0:45:44go ahead and check and i'm also going to be and then let's
    • 0:45:45what exactly has happened here i'm not go ahead and check
    • 0:45:46going to change anything else but let me what exactly has happened here i'm not
    • 0:45:48go ahead and increase the size of my going to change anything else but let me
    • 0:45:49terminal window go ahead and increase the size of my
    • 0:45:50rerun python of stock pie and voila terminal window
    • 0:45:54it's a little harder to read just rerun python of stock pie and voila
    • 0:45:56because i'm not used to reading all caps it's a little harder to read just
    • 0:45:57kind of looks like we're yelling at because i'm not used to reading all caps
    • 0:45:58ourselves but kind of looks like we're yelling at
    • 0:46:00i don't see wait a minute i still see ourselves but
    • 0:46:03the office i don't see wait a minute i still see
    • 0:46:04over here twice if i keep scrolling the office
    • 0:46:08here so far i see strangers things and over here twice if i keep scrolling
    • 0:46:10strange here so far i see strangers things and
    • 0:46:11stranger things that just looks like a strange
    • 0:46:13typo i see two sherlocks though stranger things that just looks like a
    • 0:46:16this is a little suspicious so kadana typo i see two sherlocks though
    • 0:46:19you and i don't seem to have solved this is a little suspicious so kadana
    • 0:46:20things fully you and i don't seem to have solved
    • 0:46:22and this one's a little more subtle what things fully
    • 0:46:25more should i perhaps do to my data and this one's a little more subtle what
    • 0:46:28to ensure we get duplicates removed more should i perhaps do to my data
    • 0:46:34olivia
    • 0:46:39maybe trimmer around the edges and we'll
    • 0:46:41trim around the edges i like the sound maybe trimmer around the edges and we'll
    • 0:46:43of that but trim around the edges i like the sound
    • 0:46:43what do you mean what does that do oh of that but
    • 0:46:45like trim off the extra spaces in case what do you mean what does that do oh
    • 0:46:47someone put a space before or after the like trim off the extra spaces in case
    • 0:46:49words someone put a space before or after the
    • 0:46:49yeah exactly it's pretty common for words
    • 0:46:52humans intentionally or accidentally to yeah exactly it's pretty common for
    • 0:46:53hit the space bar where they shouldn't humans intentionally or accidentally to
    • 0:46:55and in fact i'm kind of inferring that i hit the space bar where they shouldn't
    • 0:46:57bet one or more of you and in fact i'm kind of inferring that i
    • 0:46:58accidentally typed sherlock space and bet one or more of you
    • 0:47:01then decided nope that's it i'm not accidentally typed sherlock space and
    • 0:47:02typing anything else then decided nope that's it i'm not
    • 0:47:03but that's space even though we can't typing anything else
    • 0:47:05quite see it obviously is there but that's space even though we can't
    • 0:47:07and when we do a string comparison or quite see it obviously is there
    • 0:47:08when the set data structure does that and when we do a string comparison or
    • 0:47:10it's actually going to be noticed when when the set data structure does that
    • 0:47:13doing those comparisons and therefore it's actually going to be noticed when
    • 0:47:15they're not going to be doing those comparisons and therefore
    • 0:47:16the same so i can do this in a few they're not going to be
    • 0:47:17different ways but it turns out in the same so i can do this in a few
    • 0:47:19python different ways but it turns out in
    • 0:47:19you can chain functions together which python
    • 0:47:21is also too kind of a fancy feature you can chain functions together which
    • 0:47:24notice what i'm doing here i'm still is also too kind of a fancy feature
    • 0:47:25accessing the titles notice what i'm doing here i'm still
    • 0:47:27set i'm adding the following value to it accessing the titles
    • 0:47:30i'm adding the value row bracket title set i'm adding the following value to it
    • 0:47:32but not quite i'm adding the value row bracket title
    • 0:47:33i'm that is a string or an str in python but not quite
    • 0:47:36speak i'm that is a string or an str in python
    • 0:47:37i'm going to go ahead and strip it which speak
    • 0:47:39means if we look up the documentation i'm going to go ahead and strip it which
    • 0:47:41for this function to olivia's point it's means if we look up the documentation
    • 0:47:43going to strip off or trim for this function to olivia's point it's
    • 0:47:44all of the white space to the left all going to strip off or trim
    • 0:47:46of the white space to the right all of the white space to the left all
    • 0:47:47whether that's the space bar or the of the white space to the right
    • 0:47:49enter key or the tab whether that's the space bar or the
    • 0:47:51character or a few other things as well enter key or the tab
    • 0:47:53it's just going to get rid of leading character or a few other things as well
    • 0:47:55and trailing white space and then it's just going to get rid of leading
    • 0:47:57whatever's left over and trailing white space and then
    • 0:47:58i'm going to go ahead and force whatever's left over
    • 0:47:59everything to uppercase in the spirit of i'm going to go ahead and force
    • 0:48:02kadana suggestion 2. so we're sort of everything to uppercase in the spirit of
    • 0:48:03combining two good ideas now kadana suggestion 2. so we're sort of
    • 0:48:05to really massage the data if you will combining two good ideas now
    • 0:48:07into a cleaner format and this is such a to really massage the data if you will
    • 0:48:10real world reality into a cleaner format and this is such a
    • 0:48:11like humans you and i cannot be trusted real world reality
    • 0:48:13to input data like humans you and i cannot be trusted
    • 0:48:14the way we are supposed to sometimes to input data
    • 0:48:16it's all lowercase because we're being a the way we are supposed to sometimes
    • 0:48:18little lazy or a little social it's all lowercase because we're being a
    • 0:48:19media-like even if we're checking out little lazy or a little social
    • 0:48:21from amazon media-like even if we're checking out
    • 0:48:22and trying to input a valid postal from amazon
    • 0:48:24address sometimes it's all capitals and trying to input a valid postal
    • 0:48:26because address sometimes it's all capitals
    • 0:48:27i can think of a few people in my life because
    • 0:48:28who don't quite understand the caps lock i can think of a few people in my life
    • 0:48:30thing just yet and so things might be who don't quite understand the caps lock
    • 0:48:32all capitalized instead thing just yet and so things might be
    • 0:48:33this is not good for computer systems all capitalized instead
    • 0:48:35that require precision to our emphasis this is not good for computer systems
    • 0:48:38in week zero that require precision to our emphasis
    • 0:48:39and so massaging data means cleaning it in week zero
    • 0:48:41up doing some mutations and so massaging data means cleaning it
    • 0:48:43that don't really change the meaning of up doing some mutations
    • 0:48:44the data but canonicalize it standardize that don't really change the meaning of
    • 0:48:47it the data but canonicalize it standardize
    • 0:48:47so that you're comparing apples and it
    • 0:48:49apples so to speak not apples and so that you're comparing apples and
    • 0:48:51oranges apples so to speak not apples and
    • 0:48:51well let me go ahead and run this again oranges
    • 0:48:53in my bigger terminal window python of well let me go ahead and run this again
    • 0:48:55favorites.hi in my bigger terminal window python of
    • 0:48:56voila and scrolling up up up i think favorites.hi
    • 0:49:00we're in a better place i only see one voila and scrolling up up up i think
    • 0:49:03office now we're in a better place i only see one
    • 0:49:04and if i keep scrolling up and up and up office now
    • 0:49:06i'm seeing typos still and if i keep scrolling up and up and up
    • 0:49:08but nothing related to white space and i i'm seeing typos still
    • 0:49:10think we have a much but nothing related to white space and i
    • 0:49:11cleaner unique list of titles at this think we have a much
    • 0:49:14point of course if we scroll up cleaner unique list of titles at this
    • 0:49:16i would have to be a lot more clever if point of course if we scroll up
    • 0:49:19i want to detect things like i would have to be a lot more clever if
    • 0:49:20typographical errors it looks like one i want to detect things like
    • 0:49:22of you typographical errors it looks like one
    • 0:49:22was very diligent about putting f period of you
    • 0:49:25or period i was very diligent about putting f period
    • 0:49:26period and so forth but then got bored or period i
    • 0:49:28at the end and left off the last period period and so forth but then got bored
    • 0:49:29but that's going to happen when you're at the end and left off the last period
    • 0:49:30taking in user input we've of course got but that's going to happen when you're
    • 0:49:32all these variants of cs50 taking in user input we've of course got
    • 0:49:34that's going to be a mess to clean up all these variants of cs50
    • 0:49:36because now you can imagine having to that's going to be a mess to clean up
    • 0:49:38add a whole bunch of if conditions and because now you can imagine having to
    • 0:49:40else's and eltifs to sort of clean all add a whole bunch of if conditions and
    • 0:49:42of that up if we do want to canonicalize else's and eltifs to sort of clean all
    • 0:49:45all different flavors of cs50 as of that up if we do want to canonicalize
    • 0:49:46quote-unquote all different flavors of cs50 as
    • 0:49:48cs50 so this is a very slippery slope quote-unquote
    • 0:49:50like you and i could start writing a cs50 so this is a very slippery slope
    • 0:49:51huge amount of data just to clean this like you and i could start writing a
    • 0:49:53up huge amount of data just to clean this
    • 0:49:53but that's the reality when dealing with up
    • 0:49:56real world data but that's the reality when dealing with
    • 0:49:57well let's go ahead now and improve this real world data
    • 0:50:00program further well let's go ahead now and improve this
    • 0:50:01do something a little fancier because i program further
    • 0:50:04now can trust that my data has been do something a little fancier because i
    • 0:50:06canonicalized except for the actual now can trust that my data has been
    • 0:50:07typos or the weird variants of cs50 and canonicalized except for the actual
    • 0:50:09the like typos or the weird variants of cs50 and
    • 0:50:10let's go ahead and figure out what's the the like
    • 0:50:12most popular favorite tv show let's go ahead and figure out what's the
    • 0:50:15among uh the audience here so i'm going most popular favorite tv show
    • 0:50:17to start where i have before with my among uh the audience here so i'm going
    • 0:50:19current code because i think i have most to start where i have before with my
    • 0:50:21of the building blocks in place current code because i think i have most
    • 0:50:23i'm going to go ahead and clean up my of the building blocks in place
    • 0:50:24code a little bit in here i'm going to i'm going to go ahead and clean up my
    • 0:50:25go ahead and give myself a separate code a little bit in here i'm going to
    • 0:50:26variable now called title go ahead and give myself a separate
    • 0:50:28just so that i can think about things in variable now called title
    • 0:50:30a little more orderly fashion just so that i can think about things in
    • 0:50:32but i'm not going to start adding things a little more orderly fashion
    • 0:50:34to this set anymore but i'm not going to start adding things
    • 0:50:35in fact a set i don't think is really to this set anymore
    • 0:50:38going to be sufficient in fact a set i don't think is really
    • 0:50:40to keep track of the popularity of tv going to be sufficient
    • 0:50:42shows because by definition the set is to keep track of the popularity of tv
    • 0:50:44throwing away duplicates shows because by definition the set is
    • 0:50:45but the goal now is kind of the opposite throwing away duplicates
    • 0:50:47i want to know but the goal now is kind of the opposite
    • 0:50:48which are the duplicates so that i can i want to know
    • 0:50:51tell you that this many people like the which are the duplicates so that i can
    • 0:50:53office this many people like tell you that this many people like the
    • 0:50:54uh breaking bad and the like so what office this many people like
    • 0:50:57tools do we have in python's toolkit uh breaking bad and the like so what
    • 0:51:00via which we could accumulate or tools do we have in python's toolkit
    • 0:51:03figure out that information via which we could accumulate or
    • 0:51:06any thoughts on what data structure figure out that information
    • 0:51:08might help us here any thoughts on what data structure
    • 0:51:09if we want to figure out show might help us here
    • 0:51:12popularity show popularity and by if we want to figure out show
    • 0:51:14popularity i just mean the frequency of popularity show popularity and by
    • 0:51:16it popularity i just mean the frequency of
    • 0:51:17in the csv file santiago it
    • 0:51:22um i guess um one option could be to use
    • 0:51:25dictionaries so that you could have like um i guess um one option could be to use
    • 0:51:27the office dictionaries so that you could have like
    • 0:51:29i don't know 20 votes and then game of the office
    • 0:51:31thrones another one so that i don't know 20 votes and then game of
    • 0:51:33a dictionary could really help you thrones another one so that
    • 0:51:35visualize that a dictionary could really help you
    • 0:51:36yeah perfect instincts recall that a visualize that
    • 0:51:38dictionary at the end of the day yeah perfect instincts recall that a
    • 0:51:40no matter how sophisticated it's dictionary at the end of the day
    • 0:51:42implemented underneath the hood like no matter how sophisticated it's
    • 0:51:43your spell checker implemented underneath the hood like
    • 0:51:44it's just a collection of key value your spell checker
    • 0:51:46pairs and indeed it's it's just a collection of key value
    • 0:51:48maybe one of the most useful data pairs and indeed it's
    • 0:51:50structures in any language because this maybe one of the most useful data
    • 0:51:52ability to associate one piece of data structures in any language because this
    • 0:51:54with another ability to associate one piece of data
    • 0:51:54is just a very general purpose solution with another
    • 0:51:57to problems and indeed to santiago's is just a very general purpose solution
    • 0:51:59point to problems and indeed to santiago's
    • 0:51:59if the problem at hand is to figure out point
    • 0:52:01the popularity of shows well let's make if the problem at hand is to figure out
    • 0:52:03the keys the titles of our shows the popularity of shows well let's make
    • 0:52:06and the frequencies thereof the votes so the keys the titles of our shows
    • 0:52:08to speak and the frequencies thereof the votes so
    • 0:52:09the values of those keys we're going to to speak
    • 0:52:11map title the values of those keys we're going to
    • 0:52:12to votes title to vote title to vote and map title
    • 0:52:15so forth so a dictionary is exactly that to votes title to vote title to vote and
    • 0:52:17so so forth so a dictionary is exactly that
    • 0:52:17let me go ahead and scroll up and i can so
    • 0:52:18make a little tweak here instead of a let me go ahead and scroll up and i can
    • 0:52:20set make a little tweak here instead of a
    • 0:52:20i can instead say dict and give myself set
    • 0:52:23just an empty dictionary i can instead say dict and give myself
    • 0:52:24there's actually shorthand notation for just an empty dictionary
    • 0:52:26that that's a little more common to use there's actually shorthand notation for
    • 0:52:28two that that's a little more common to use
    • 0:52:28empty curly braces that just means the two
    • 0:52:30exact same thing empty curly braces that just means the
    • 0:52:32give me a dictionary that's initially exact same thing
    • 0:52:33empty there's no fancy shortcut for a give me a dictionary that's initially
    • 0:52:36set you have to literally type out set empty there's no fancy shortcut for a
    • 0:52:38open paren close paren but dictionaries set you have to literally type out set
    • 0:52:40are so common so popular so powerful open paren close paren but dictionaries
    • 0:52:43they have this little syntactic shortcut are so common so popular so powerful
    • 0:52:45of just two they have this little syntactic shortcut
    • 0:52:46curly braces open and close so now that of just two
    • 0:52:49i have that curly braces open and close so now that
    • 0:52:50let me go ahead and do this inside of my i have that
    • 0:52:52for loop let me go ahead and do this inside of my
    • 0:52:53instead of printing the title which i for loop
    • 0:52:55don't want to do and instead of adding instead of printing the title which i
    • 0:52:57it to the set don't want to do and instead of adding
    • 0:52:58i now want to add it to the dictionary it to the set
    • 0:52:59so how do i do that well if my i now want to add it to the dictionary
    • 0:53:01dictionary is called titles so how do i do that well if my
    • 0:53:03i think i can essentially do something dictionary is called titles
    • 0:53:04like this titles bracket i think i can essentially do something
    • 0:53:06title uh equals or maybe plus like this titles bracket
    • 0:53:10equals one maybe i can kind of use the title uh equals or maybe plus
    • 0:53:13dictionary equals one maybe i can kind of use the
    • 0:53:14as just a little cheat sheet of counts dictionary
    • 0:53:17numbers as just a little cheat sheet of counts
    • 0:53:17that start at zero and then just add one numbers
    • 0:53:20add two add three so every time i see that start at zero and then just add one
    • 0:53:22the office the office the alphys do plus add two add three so every time i see
    • 0:53:25equals one the office the office the alphys do plus
    • 0:53:26plus equals one we can't do plus plus equals one
    • 0:53:28because that's not a thing in python it plus equals one we can't do plus plus
    • 0:53:29only exists in c because that's not a thing in python it
    • 0:53:30but this would seem to go into the only exists in c
    • 0:53:32dictionary called titles but this would seem to go into the
    • 0:53:34look up the key that matches this dictionary called titles
    • 0:53:37specific title look up the key that matches this
    • 0:53:38and then increment whatever value is specific title
    • 0:53:40there and then increment whatever value is
    • 0:53:41by one but i'm going to go ahead and run there
    • 0:53:44this a little naively here by one but i'm going to go ahead and run
    • 0:53:46let me go ahead and run python of this a little naively here
    • 0:53:48favorites favorites.pie let me go ahead and run python of
    • 0:53:49and wow or it broke already on line nine favorites favorites.pie
    • 0:53:52so it's sort of an apt uh choice of show and wow or it broke already on line nine
    • 0:53:55to begin with so it's sort of an apt uh choice of show
    • 0:53:56uh we have a key error with punisher so to begin with
    • 0:53:58punisher is bad something bad has just uh we have a key error with punisher so
    • 0:54:01happened but what does that mean punisher is bad something bad has just
    • 0:54:02a key error is referring to the fact happened but what does that mean
    • 0:54:04that i tried to access a key error is referring to the fact
    • 0:54:06an invalid key in a dictionary this is that i tried to access
    • 0:54:08saying that literally in this line of an invalid key in a dictionary this is
    • 0:54:10code here saying that literally in this line of
    • 0:54:11even though titles is a dictionary and code here
    • 0:54:13even though the value of title even though titles is a dictionary and
    • 0:54:15singular is quote unquote punisher i'm even though the value of title
    • 0:54:17getting a key error singular is quote unquote punisher i'm
    • 0:54:18because that title does not yet exist getting a key error
    • 0:54:22so even if you're not sure of the python because that title does not yet exist
    • 0:54:23syntax for fixing this problem so even if you're not sure of the python
    • 0:54:26what's the intuitive solution here syntax for fixing this problem
    • 0:54:30i cannot increment the frequency of the what's the intuitive solution here
    • 0:54:33punisher i cannot increment the frequency of the
    • 0:54:34because punisher is not in the punisher
    • 0:54:36dictionary it's almost feels like a because punisher is not in the
    • 0:54:38catch dictionary it's almost feels like a
    • 0:54:3822. uh greg catch
    • 0:54:42i think that you need first of all to
    • 0:54:45create a for loop and maybe assign a i think that you need first of all to
    • 0:54:47value to every create a for loop and maybe assign a
    • 0:54:49thing in the dictionary for example the value to every
    • 0:54:51value zero and then add thing in the dictionary for example the
    • 0:54:53one yeah so good instincts and here i value zero and then add
    • 0:54:56can use another metaphor i worry we one yeah so good instincts and here i
    • 0:54:57might have a chicken in the egg problem can use another metaphor i worry we
    • 0:54:59there because i don't think i can go to might have a chicken in the egg problem
    • 0:55:00the top of my code there because i don't think i can go to
    • 0:55:01at a loop that initializes all of the the top of my code
    • 0:55:05values in the dictionary to zero because at a loop that initializes all of the
    • 0:55:07i would need to know values in the dictionary to zero because
    • 0:55:08all of the names of the shows at that i would need to know
    • 0:55:11point now that's fine all of the names of the shows at that
    • 0:55:12i think i could take you maybe more point now that's fine
    • 0:55:14literally gray i think i could take you maybe more
    • 0:55:15and open up the csv file iterate over it literally gray
    • 0:55:19top to bottom and anytime i see a title and open up the csv file iterate over it
    • 0:55:22just initialize it in the dictionary as top to bottom and anytime i see a title
    • 0:55:24having a value of zero just initialize it in the dictionary as
    • 0:55:25zero zero then have another for loop having a value of zero
    • 0:55:28maybe reopen the file zero zero then have another for loop
    • 0:55:29and do the same and that would work but maybe reopen the file
    • 0:55:31it's arguably not very and do the same and that would work but
    • 0:55:33efficient it is asymptotically in terms it's arguably not very
    • 0:55:35of big o but that would seem to be doing efficient it is asymptotically in terms
    • 0:55:37twice as much work of big o but that would seem to be doing
    • 0:55:38iterate over the file once just to twice as much work
    • 0:55:40initialize everything to zero iterate over the file once just to
    • 0:55:42then iterate over the file a second time initialize everything to zero
    • 0:55:44just to increment the counts then iterate over the file a second time
    • 0:55:45i think we can do things a little more just to increment the counts
    • 0:55:47efficiently i think we can do things a little more
    • 0:55:48i think we can achieve not only efficiently
    • 0:55:49correctness but better design any i think we can achieve not only
    • 0:55:51thoughts correctness but better design any
    • 0:55:52on how we can still solve this problem thoughts
    • 0:55:55without having to iterate over the whole on how we can still solve this problem
    • 0:55:57thing twice without having to iterate over the whole
    • 0:56:01yeah some of it
    • 0:56:05um i think we can add in an if statement
    • 0:56:08to check if that um i think we can add in an if statement
    • 0:56:09key is in the dictionary and if it's not to check if that
    • 0:56:11then added and then go ahead and key is in the dictionary and if it's not
    • 0:56:13increment the value after then added and then go ahead and
    • 0:56:15nice and we can do exactly that so let's increment the value after
    • 0:56:17just apply that intuition if the problem nice and we can do exactly that so let's
    • 0:56:19is that i'm trying to access a key just apply that intuition if the problem
    • 0:56:22that does not yet exist well let's just is that i'm trying to access a key
    • 0:56:24be a little smarter about it and to that does not yet exist well let's just
    • 0:56:25somehow its point be a little smarter about it and to
    • 0:56:26let's check whether the key exists and somehow its point
    • 0:56:29if it does let's check whether the key exists and
    • 0:56:29then increment it but if it does not if it does
    • 0:56:32then and only then to grid's advice then increment it but if it does not
    • 0:56:34initialize it to zero so let me do that then and only then to grid's advice
    • 0:56:36let me go ahead and say initialize it to zero so let me do that
    • 0:56:37if title in titles which is the very let me go ahead and say
    • 0:56:40pythonic if title in titles which is the very
    • 0:56:41beautiful way of asking a question like pythonic
    • 0:56:43that way cleaner than in c beautiful way of asking a question like
    • 0:56:45let me go ahead then and say uh that way cleaner than in c
    • 0:56:48exactly the line from before else though let me go ahead then and say uh
    • 0:56:51if the exactly the line from before else though
    • 0:56:51that title is not yet in the dictionary if the
    • 0:56:54called titles that title is not yet in the dictionary
    • 0:56:55well that's okay too i can go ahead and called titles
    • 0:56:57say titles well that's okay too i can go ahead and
    • 0:56:58bracket title equals zero say titles
    • 0:57:02so the difference here is that i can bracket title equals zero
    • 0:57:04certainly inc i can certainly so the difference here is that i can
    • 0:57:06index into a dictionary using a key certainly inc i can certainly
    • 0:57:09that doesn't exist if i plan at that index into a dictionary using a key
    • 0:57:12moment to give it a value that doesn't exist if i plan at that
    • 0:57:13that's okay and that has always been moment to give it a value
    • 0:57:15okay since last week that's okay and that has always been
    • 0:57:17but however if i want to go ahead and okay since last week
    • 0:57:20increment the value that's there i'm but however if i want to go ahead and
    • 0:57:22going to go ahead and increment the value that's there i'm
    • 0:57:24do that in this separate line but i did going to go ahead and
    • 0:57:27introduce a bug do that in this separate line but i did
    • 0:57:28i did introduce a bug here i think i introduce a bug
    • 0:57:31need to go one step further logically i did introduce a bug here i think i
    • 0:57:34i don't think i want to initialize this need to go one step further logically
    • 0:57:35to zero i don't think i want to initialize this
    • 0:57:37per se does anyone see a subtle bug in to zero
    • 0:57:41my logic here per se does anyone see a subtle bug in
    • 0:57:44if the title is already in the my logic here
    • 0:57:45dictionary i'm incrementing it by one if the title is already in the
    • 0:57:47otherwise i'm initializing it to zero dictionary i'm incrementing it by one
    • 0:57:51any subtle catches here yeah olivia what
    • 0:57:54do you see any subtle catches here yeah olivia what
    • 0:58:01i think you should initialize it to one
    • 0:58:02since it's the first instance i think you should initialize it to one
    • 0:58:05exactly i should initialize it to one since it's the first instance
    • 0:58:06otherwise i'm accidentally overlooking exactly i should initialize it to one
    • 0:58:08this particular title and i'm going to otherwise i'm accidentally overlooking
    • 0:58:10go ahead and under count it so i can fix this particular title and i'm going to
    • 0:58:12this either by doing this go ahead and under count it so i can fix
    • 0:58:14or frankly if you prefer i don't this either by doing this
    • 0:58:16technically need to use an if or frankly if you prefer i don't
    • 0:58:17else i can use just an if by doing technically need to use an if
    • 0:58:19something like this instead i could say else i can use just an if by doing
    • 0:58:21if something like this instead i could say
    • 0:58:21title not in titles then i could go if
    • 0:58:24ahead and say title not in titles then i could go
    • 0:58:25titles bracket title get zero and then ahead and say
    • 0:58:28after that i can titles bracket title get zero and then
    • 0:58:29blindly so to speak just do this so after that i can
    • 0:58:32which one is better i think this second blindly so to speak just do this so
    • 0:58:34one is maybe a little better and that which one is better i think this second
    • 0:58:35i'm saving one line of code one is maybe a little better and that
    • 0:58:37but it's ensuring with that if condition i'm saving one line of code
    • 0:58:39to someone's advice but it's ensuring with that if condition
    • 0:58:40that i'm not indexing into the titles to someone's advice
    • 0:58:43dictionary that i'm not indexing into the titles
    • 0:58:44until i'm sure that the title is in dictionary
    • 0:58:46there so let me go ahead and run this until i'm sure that the title is in
    • 0:58:48now there so let me go ahead and run this
    • 0:58:48python of favorites dot pi enter now
    • 0:58:51and okay it didn't crash so that's good python of favorites dot pi enter
    • 0:58:54but i'm not yet seeing any useful and okay it didn't crash so that's good
    • 0:58:55information but i'm not yet seeing any useful
    • 0:58:56but i now have access to a bit more let information
    • 0:58:58me scroll down now to the bottom of this but i now have access to a bit more let
    • 0:59:00program me scroll down now to the bottom of this
    • 0:59:01where i have now this loop let me go program
    • 0:59:03ahead and print out not just the title where i have now this loop let me go
    • 0:59:05but the value of that key in the ahead and print out not just the title
    • 0:59:07dictionary but the value of that key in the
    • 0:59:08by just indexing into it here and you dictionary
    • 0:59:10might not have seen the syntax before by just indexing into it here and you
    • 0:59:12but with print might not have seen the syntax before
    • 0:59:12you can actually pass in multiple but with print
    • 0:59:14arguments and by default print will just you can actually pass in multiple
    • 0:59:15separate them with a space for you arguments and by default print will just
    • 0:59:17you can override that behavior and separate them with a space for you
    • 0:59:18separate them with anything but this is you can override that behavior and
    • 0:59:20just meant to be a quick and dirty separate them with anything but this is
    • 0:59:21program that prints out titles and now just meant to be a quick and dirty
    • 0:59:23the program that prints out titles and now
    • 0:59:23popularity thereof so let me run this the
    • 0:59:25again python favorites dot pi popularity thereof so let me run this
    • 0:59:27and voila it's kind of again python favorites dot pi
    • 0:59:30all over the place office super popular and voila it's kind of
    • 0:59:33with 26 votes there a lot of all over the place office super popular
    • 0:59:35single votes here a lot of big bang with 26 votes there a lot of
    • 0:59:38theory has nine single votes here a lot of big bang
    • 0:59:39you know this is all nice and good but i theory has nine
    • 0:59:41feel like this is going to take me you know this is all nice and good but i
    • 0:59:42forever to wrap my mind around which are feel like this is going to take me
    • 0:59:44the most forever to wrap my mind around which are
    • 0:59:44popular shows so of course how would we the most
    • 0:59:47do this well to the point made earlier popular shows so of course how would we
    • 0:59:48with spreadsheets my god in microsoft do this well to the point made earlier
    • 0:59:51excel or google spreadsheets or apple with spreadsheets my god in microsoft
    • 0:59:52numbers you just excel or google spreadsheets or apple
    • 0:59:53click the column heading and boom sort numbers you just
    • 0:59:55it we seem to have lost that capability click the column heading and boom sort
    • 0:59:57unless we now do it in code so it we seem to have lost that capability
    • 0:59:58let me do that for us let me go ahead unless we now do it in code so
    • 1:00:00and go back to my code let me do that for us let me go ahead
    • 1:00:02and it looks like sorted even though and go back to my code
    • 1:00:06it does work on dictionaries is actually and it looks like sorted even though
    • 1:00:09sorting by it does work on dictionaries is actually
    • 1:00:10key not by value and here's where our sorting by
    • 1:00:13python programming techniques need to key not by value and here's where our
    • 1:00:15get a little more sophisticated and we python programming techniques need to
    • 1:00:16wanted to introduce another feature here get a little more sophisticated and we
    • 1:00:18now of wanted to introduce another feature here
    • 1:00:18python which is going to solve this now of
    • 1:00:20problem specifically but in a pretty python which is going to solve this
    • 1:00:22general way problem specifically but in a pretty
    • 1:00:23so if we read the documentation for general way
    • 1:00:25sordid uh the sorted so if we read the documentation for
    • 1:00:27function indeed sorts sets by the values sordid uh the sorted
    • 1:00:30they're in function indeed sorts sets by the values
    • 1:00:31it sorts lists by the values they're in they're in
    • 1:00:33it sorts dictionaries it sorts lists by the values they're in
    • 1:00:35by the keys they're in because it sorts dictionaries
    • 1:00:37dictionaries have two pieces of by the keys they're in because
    • 1:00:39information for every dictionaries have two pieces of
    • 1:00:40element it has a key and a value not information for every
    • 1:00:42just a value element it has a key and a value not
    • 1:00:43so by default sorted sorts by key so we just a value
    • 1:00:45somehow have to override that behavior so by default sorted sorts by key so we
    • 1:00:48so how can we do this well it turns out somehow have to override that behavior
    • 1:00:50that the sorted function so how can we do this well it turns out
    • 1:00:51takes another optional argument that the sorted function
    • 1:00:54literally called takes another optional argument
    • 1:00:55key and the key argument literally called
    • 1:00:58takes as its value the name of a key and the key argument
    • 1:01:00function takes as its value the name of a
    • 1:01:01and this is where things get really function
    • 1:01:03interesting if not confusing really and this is where things get really
    • 1:01:04quickly interesting if not confusing really
    • 1:01:05it turns out in python you can pass quickly
    • 1:01:08around it turns out in python you can pass
    • 1:01:08functions as arguments by way of their around
    • 1:01:11name functions as arguments by way of their
    • 1:01:11and technically you can do this in c name
    • 1:01:13it's a lot more syntactically and technically you can do this in c
    • 1:01:15involved but in python it's very common it's a lot more syntactically
    • 1:01:17in javascript it's very common in a lot involved but in python it's very common
    • 1:01:19of languages it's very common in javascript it's very common in a lot
    • 1:01:21to think of functions as first class of languages it's very common
    • 1:01:22objects which is a fancy way of saying to think of functions as first class
    • 1:01:24you can pass them around just like they objects which is a fancy way of saying
    • 1:01:26are variables themselves you can pass them around just like they
    • 1:01:28we're not calling them yet but you can are variables themselves
    • 1:01:30pass them around by their name we're not calling them yet but you can
    • 1:01:31so what do i mean by this well i need a pass them around by their name
    • 1:01:34function now so what do i mean by this well i need a
    • 1:01:35to sort my dictionary by its value function now
    • 1:01:39and only i know how to do this and to sort my dictionary by its value
    • 1:01:41perhaps so let me go ahead and give and only i know how to do this and
    • 1:01:42myself a generic function name just for perhaps so let me go ahead and give
    • 1:01:44the moment called f f myself a generic function name just for
    • 1:01:45for function kind of like in math the moment called f f
    • 1:01:47because we're going to get rid of it for function kind of like in math
    • 1:01:48eventually but let me go ahead and because we're going to get rid of it
    • 1:01:49temporarily eventually but let me go ahead and
    • 1:01:50define a function called f that takes as temporarily
    • 1:01:52input a title define a function called f that takes as
    • 1:01:54and then it returns for me the value input a title
    • 1:01:57corresponding to that key so i'm going and then it returns for me the value
    • 1:01:59to go ahead and return corresponding to that key so i'm going
    • 1:02:00titles bracket title so here we have a to go ahead and return
    • 1:02:04function titles bracket title so here we have a
    • 1:02:05whose purpose in life is super simple function
    • 1:02:07you give it a title whose purpose in life is super simple
    • 1:02:08it gives you the count thereof the you give it a title
    • 1:02:11frequency the popularity thereof by just it gives you the count thereof the
    • 1:02:13looking it up frequency the popularity thereof by just
    • 1:02:14in that global dictionary so it's super looking it up
    • 1:02:16simple in that global dictionary so it's super
    • 1:02:17but that's its only purpose in life but simple
    • 1:02:20now but that's its only purpose in life but
    • 1:02:20according to the documentation for now
    • 1:02:22sorted what it's now going to do because according to the documentation for
    • 1:02:25i'm passing in a second argument called sorted what it's now going to do because
    • 1:02:26key i'm passing in a second argument called
    • 1:02:27the sorted function rather than just key
    • 1:02:30presume you want everything sorted the sorted function rather than just
    • 1:02:32alphabetically by presume you want everything sorted
    • 1:02:33key it's instead going to call alphabetically by
    • 1:02:36that function f on every one key it's instead going to call
    • 1:02:39of the elements in your dictionary and that function f on every one
    • 1:02:42depending on your of the elements in your dictionary and
    • 1:02:44answer the return value you give depending on your
    • 1:02:47with that f function that will be used answer the return value you give
    • 1:02:50instead with that f function that will be used
    • 1:02:50to determine to determine the actual instead
    • 1:02:53ordering to determine to determine the actual
    • 1:02:54so by default sorted just looks at key ordering
    • 1:02:56what i'm effectively doing with this so by default sorted just looks at key
    • 1:02:58f function is instead returning the what i'm effectively doing with this
    • 1:03:01value f function is instead returning the
    • 1:03:02corresponding to every key and so the value
    • 1:03:04logical implication of this even though corresponding to every key and so the
    • 1:03:06the syntax is a little logical implication of this even though
    • 1:03:07new is that this dictionary of titles the syntax is a little
    • 1:03:11will now be sorted by value instead of new is that this dictionary of titles
    • 1:03:14by key will now be sorted by value instead of
    • 1:03:15because again by default it sorts by key by key
    • 1:03:17but if i define my own key function because again by default it sorts by key
    • 1:03:20and override that behavior to return the but if i define my own key function
    • 1:03:22corresponding value and override that behavior to return the
    • 1:03:23it's the values the numbers the counts corresponding value
    • 1:03:26that will actually be used it's the values the numbers the counts
    • 1:03:27to sort this thing all right let's go that will actually be used
    • 1:03:29ahead and see if that's true in practice to sort this thing all right let's go
    • 1:03:31let me go ahead and rerun python ahead and see if that's true in practice
    • 1:03:32favorites dot pi let me go ahead and rerun python
    • 1:03:33i should see all the titles and voila favorites dot pi
    • 1:03:35conveniently the most popular show i should see all the titles and voila
    • 1:03:37seems to be game of thrones with 33 conveniently the most popular show
    • 1:03:39votes followed by friends seems to be game of thrones with 33
    • 1:03:41with 27 followed by the office with 26 votes followed by friends
    • 1:03:43and so forth with 27 followed by the office with 26
    • 1:03:44but of course the list is kind of and so forth
    • 1:03:46backwards i mean it's convenient that i but of course the list is kind of
    • 1:03:48can see it at the bottom of my screen backwards i mean it's convenient that i
    • 1:03:49but really if we're making a list it can see it at the bottom of my screen
    • 1:03:51should really be at the top so how can but really if we're making a list it
    • 1:03:52we override that behavior should really be at the top so how can
    • 1:03:54turns out the sorted function if you we override that behavior
    • 1:03:55read its documentation also takes turns out the sorted function if you
    • 1:03:57another read its documentation also takes
    • 1:03:58optional parameter called reverse and if another
    • 1:04:01you set optional parameter called reverse and if
    • 1:04:01reverse equal to true capital t in you set
    • 1:04:04python reverse equal to true capital t in
    • 1:04:04that's going to go ahead and give us now python
    • 1:04:08the reverse order of that same sort so that's going to go ahead and give us now
    • 1:04:10let me go ahead and maximize my terminal the reverse order of that same sort so
    • 1:04:12window let me go ahead and maximize my terminal
    • 1:04:12rerun it again and voila if i scroll window
    • 1:04:15back up to the top it's not rerun it again and voila if i scroll
    • 1:04:16alphabetically sorted but if i keep back up to the top it's not
    • 1:04:18going keep going keep going keep going alphabetically sorted but if i keep
    • 1:04:20the numbers are getting bigger and voila going keep going keep going keep going
    • 1:04:21now game of thrones with 33 the numbers are getting bigger and voila
    • 1:04:23is all the way at the top now game of thrones with 33
    • 1:04:26all right so pretty cool and again the is all the way at the top
    • 1:04:29new functionality here in python at all right so pretty cool and again the
    • 1:04:31least new functionality here in python at
    • 1:04:31is that we can actually pass in least
    • 1:04:33functions two functions is that we can actually pass in
    • 1:04:35and leave it to the ladder to call the functions two functions
    • 1:04:38former and leave it to the ladder to call the
    • 1:04:39so that was complicated just to say but former
    • 1:04:41any questions so that was complicated just to say but
    • 1:04:42or confusion now on how we are using any questions
    • 1:04:45dictionaries or confusion now on how we are using
    • 1:04:46and how we are sorting things in this dictionaries
    • 1:04:49reverse and how we are sorting things in this
    • 1:04:50value-based way reverse
    • 1:04:54any questions or confusion anything in
    • 1:04:55the chat or verbally brian any questions or confusion anything in
    • 1:04:58uh looks like all questions are answered the chat or verbally brian
    • 1:05:00here okay uh looks like all questions are answered
    • 1:05:02then in that case let me point out a here okay
    • 1:05:03common mistake notice that even though then in that case let me point out a
    • 1:05:06f is a function notice that i did not common mistake notice that even though
    • 1:05:08call it f is a function notice that i did not
    • 1:05:09there that would be incorrect the reason call it
    • 1:05:12being there that would be incorrect the reason
    • 1:05:12we deliberately want to pass the being
    • 1:05:14function f into we deliberately want to pass the
    • 1:05:17the sorted function so that the sorted function f into
    • 1:05:19function can take it upon itself the sorted function so that the sorted
    • 1:05:21to call f again and again and again we function can take it upon itself
    • 1:05:24don't want to just call it once by using to call f again and again and again we
    • 1:05:25the parentheses ourselves don't want to just call it once by using
    • 1:05:27we want to just pass it in by name so the parentheses ourselves
    • 1:05:29that the sorted function which comes we want to just pass it in by name so
    • 1:05:31with python that the sorted function which comes
    • 1:05:32can instead do it for us santiago did with python
    • 1:05:35you have a question can instead do it for us santiago did
    • 1:05:38yes i was i was going to ask why didn't
    • 1:05:40we yes i was i was going to ask why didn't
    • 1:05:41put f of title uh so like why wouldn't we
    • 1:05:46i was gonna ask that question put f of title uh so like why wouldn't
    • 1:05:47specifically oh the with the i was gonna ask that question
    • 1:05:49with the parentheses oh okay perfect so specifically oh the with the
    • 1:05:52uh because that would call the function with the parentheses oh okay perfect so
    • 1:05:54once and only once we want uh because that would call the function
    • 1:05:56sorted to be able to call it again and once and only once we want
    • 1:05:57again now here's actually an example as sorted to be able to call it again and
    • 1:05:59we've seen in the past again now here's actually an example as
    • 1:06:00of a correct solution this is behaving we've seen in the past
    • 1:06:02as i intend a list of sorted titles of a correct solution this is behaving
    • 1:06:05from top to bottom in order of as i intend a list of sorted titles
    • 1:06:07popularity from top to bottom in order of
    • 1:06:08but it's a little poorly designed popularity
    • 1:06:10because i'm defining this function f but it's a little poorly designed
    • 1:06:12whose name in the first place is kind of because i'm defining this function f
    • 1:06:14lame whose name in the first place is kind of
    • 1:06:14but i'm defining a function only to use lame
    • 1:06:16it in one place and my god the but i'm defining a function only to use
    • 1:06:18function's so tiny it just feels like a it in one place and my god the
    • 1:06:20waste function's so tiny it just feels like a
    • 1:06:21of keystrokes to have defined a new waste
    • 1:06:23function just to then pass it in of keystrokes to have defined a new
    • 1:06:25so it turns out in python if you have a function just to then pass it in
    • 1:06:27very short function so it turns out in python if you have a
    • 1:06:29whose purpose in life is meant to be to very short function
    • 1:06:31solve a local problem just once and whose purpose in life is meant to be to
    • 1:06:33that's it solve a local problem just once and
    • 1:06:34and it's short enough that you're pretty that's it
    • 1:06:36sure you can fit it on one line of code and it's short enough that you're pretty
    • 1:06:38without things wrapping and starting to sure you can fit it on one line of code
    • 1:06:39get ugly stylistically without things wrapping and starting to
    • 1:06:41it turns out you can actually do this get ugly stylistically
    • 1:06:43instead it turns out you can actually do this
    • 1:06:44you can copy the code that you had in instead
    • 1:06:46mind like this you can copy the code that you had in
    • 1:06:48and instead of actually defining f as a mind like this
    • 1:06:51function name and instead of actually defining f as a
    • 1:06:52you can actually use a special keyword function name
    • 1:06:53in python called lambda you can actually use a special keyword
    • 1:06:55you can specify the name of an argument in python called lambda
    • 1:06:57for your function as before you can specify the name of an argument
    • 1:06:59and then you can simply specify the for your function as before
    • 1:07:01return value and then you can simply specify the
    • 1:07:02thereafter deleting the function itself return value
    • 1:07:06so to be clear key is still an argument thereafter deleting the function itself
    • 1:07:09to the sorted function so to be clear key is still an argument
    • 1:07:11it expects as its value typically the to the sorted function
    • 1:07:14name of a function it expects as its value typically the
    • 1:07:15but if you've decided that this seems name of a function
    • 1:07:17like a waste of extra a waste of effort but if you've decided that this seems
    • 1:07:18to define a function then pass the like a waste of extra a waste of effort
    • 1:07:20function in especially when it's so to define a function then pass the
    • 1:07:22short function in especially when it's so
    • 1:07:22you can do it in a one-liner a lambda short
    • 1:07:24function is an you can do it in a one-liner a lambda
    • 1:07:25anonymous function lambda literally says function is an
    • 1:07:28python anonymous function lambda literally says
    • 1:07:29give me a function i don't care about python
    • 1:07:31its name give me a function i don't care about
    • 1:07:32therefore you don't have to choose a its name
    • 1:07:33name for it but it does care therefore you don't have to choose a
    • 1:07:35still about its arguments and its return name for it but it does care
    • 1:07:38value still about its arguments and its return
    • 1:07:39so it's still up to you to provide zero value
    • 1:07:41or more arguments so it's still up to you to provide zero
    • 1:07:42and a return value and notice i've done or more arguments
    • 1:07:45that i've specified the keyword lambda and a return value and notice i've done
    • 1:07:47followed by the name of the argument i that i've specified the keyword lambda
    • 1:07:49want this anonymous followed by the name of the argument i
    • 1:07:50nameless function to accept and then i'm want this anonymous
    • 1:07:53specifying the return value nameless function to accept and then i'm
    • 1:07:55and with lambda functions you do not specifying the return value
    • 1:07:57need to specify return and with lambda functions you do not
    • 1:07:59whatever you write after the colon is need to specify return
    • 1:08:01literally what will be returned whatever you write after the colon is
    • 1:08:03automatically literally what will be returned
    • 1:08:04so again this is a very pythonic thing automatically
    • 1:08:06to do it's kind of a very so again this is a very pythonic thing
    • 1:08:08clever one-liner even though it's a to do it's kind of a very
    • 1:08:10little cryptic to see for the very first clever one-liner even though it's a
    • 1:08:11time but it allows you to condense your little cryptic to see for the very first
    • 1:08:13thoughts into a succinct statement that time but it allows you to condense your
    • 1:08:15gets the job done thoughts into a succinct statement that
    • 1:08:16so you don't have to start defining more gets the job done
    • 1:08:18and more functions that you or someone so you don't have to start defining more
    • 1:08:19else and more functions that you or someone
    • 1:08:20then need to keep track of else
    • 1:08:24all right any questions then on this and then need to keep track of
    • 1:08:26i'm pretty sure this is as all right any questions then on this and
    • 1:08:29complex or sophisticated as our python i'm pretty sure this is as
    • 1:08:31code today will get complex or sophisticated as our python
    • 1:08:34yeah over to sofia code today will get
    • 1:08:38i was wondering why lambda is used as
    • 1:08:41like specifically rather than some other i was wondering why lambda is used as
    • 1:08:42keyword like specifically rather than some other
    • 1:08:44yeah so there's a long history in this keyword
    • 1:08:45and if in fact you take a course on yeah so there's a long history in this
    • 1:08:47functional programming and if in fact you take a course on
    • 1:08:48at harvard it's called cs51 um there's a functional programming
    • 1:08:52whole etymology between keywords like at harvard it's called cs51 um there's a
    • 1:08:53this whole etymology between keywords like
    • 1:08:54let me defer that one for another time this
    • 1:08:56but indeed not only in python but in let me defer that one for another time
    • 1:08:59other languages as well these things but indeed not only in python but in
    • 1:09:00have come to exist called other languages as well these things
    • 1:09:02lambda functions so they're actually have come to exist called
    • 1:09:04quite commonplace lambda functions so they're actually
    • 1:09:05in other languages as well and so python quite commonplace
    • 1:09:07just adopted the term in other languages as well and so python
    • 1:09:08of art mathematically lambda is often just adopted the term
    • 1:09:12used as a symbol for of art mathematically lambda is often
    • 1:09:13functions and so they borrowed that same used as a symbol for
    • 1:09:15idea in the world of programming functions and so they borrowed that same
    • 1:09:18all right so seeing no other questions idea in the world of programming
    • 1:09:20let's go ahead and solve all right so seeing no other questions
    • 1:09:21a related problem still with some python let's go ahead and solve
    • 1:09:24but that's going to a related problem still with some python
    • 1:09:25push up against the limits of efficiency but that's going to
    • 1:09:28when it comes to storing our data in csv push up against the limits of efficiency
    • 1:09:31files let me go ahead and start let me when it comes to storing our data in csv
    • 1:09:33go ahead and start fresh files let me go ahead and start let me
    • 1:09:34in this file favorites dot pi all of the go ahead and start fresh
    • 1:09:36code i've written thus far though is on in this file favorites dot pi all of the
    • 1:09:37the course's website in advance so you code i've written thus far though is on
    • 1:09:39can see the incremental the course's website in advance so you
    • 1:09:40improvement i'm going to go ahead and can see the incremental
    • 1:09:41again import csv at the top improvement i'm going to go ahead and
    • 1:09:43and now this let's write a program this again import csv at the top
    • 1:09:45time it doesn't just and now this let's write a program this
    • 1:09:47automatically open up the csv and time it doesn't just
    • 1:09:49analyze it looking for automatically open up the csv and
    • 1:09:51the total popularity of shows let's analyze it looking for
    • 1:09:53search for the total popularity of shows let's
    • 1:09:54a specific show in the csv and then go search for
    • 1:09:58ahead and a specific show in the csv and then go
    • 1:09:58output the popularity thereof and i can ahead and
    • 1:10:02do this in a bunch of different ways but output the popularity thereof and i can
    • 1:10:03i'm going to try to make this as concise do this in a bunch of different ways but
    • 1:10:04as possible i'm going to try to make this as concise
    • 1:10:05i'm first going to ask the user for to as possible
    • 1:10:08input a title i'm first going to ask the user for to
    • 1:10:09i could use cs50's getstring function input a title
    • 1:10:11but recall that it's pretty much the i could use cs50's getstring function
    • 1:10:12same as python's input function but recall that it's pretty much the
    • 1:10:14so i'm going to use python's input same as python's input function
    • 1:10:16function today so i'm going to use python's input
    • 1:10:18and then i'm going to go ahead and as function today
    • 1:10:19before open up that same csv and then i'm going to go ahead and as
    • 1:10:21called favorite tv shows form before open up that same csv
    • 1:10:24responses 1 dot csv in read-only mode called favorite tv shows form
    • 1:10:28as a variable called file i'm then going responses 1 dot csv in read-only mode
    • 1:10:30to give myself a reader and i'll use a as a variable called file i'm then going
    • 1:10:32dict reader again so i don't have to to give myself a reader and i'll use a
    • 1:10:34worry about dict reader again so i don't have to
    • 1:10:35knowing which columns things are in worry about
    • 1:10:36passing in file knowing which columns things are in
    • 1:10:38and then let's see if i only care about passing in file
    • 1:10:40one title i can keep this program and then let's see if i only care about
    • 1:10:42simpler i don't need to figure out one title i can keep this program
    • 1:10:43the popularity of every show i just need simpler i don't need to figure out
    • 1:10:46to figure out the popularity of the popularity of every show i just need
    • 1:10:47one show the title that the human has to figure out the popularity of
    • 1:10:50typed in so i'm going to go ahead and one show the title that the human has
    • 1:10:51give myself a very simple int typed in so i'm going to go ahead and
    • 1:10:53called counter and set it equal to zero give myself a very simple int
    • 1:10:55i don't need a whole dictionary just one called counter and set it equal to zero
    • 1:10:57variable suffices now i don't need a whole dictionary just one
    • 1:10:59and i'm going to go ahead and iterate variable suffices now
    • 1:11:00over the rows and i'm going to go ahead and iterate
    • 1:11:02in the reader as before and then i'm over the rows
    • 1:11:05going to say in the reader as before and then i'm
    • 1:11:05if the current rows title going to say
    • 1:11:08equals equals the title the human typed if the current rows title
    • 1:11:11in let's go ahead and increment counter equals equals the title the human typed
    • 1:11:13by one in let's go ahead and increment counter
    • 1:11:14and it's already initialized because i by one
    • 1:11:15did that on line seven so i think i'm and it's already initialized because i
    • 1:11:17good did that on line seven so i think i'm
    • 1:11:17and then at the end of this program good
    • 1:11:19let's very simply print out the value of and then at the end of this program
    • 1:11:21counter so the purpose of this program let's very simply print out the value of
    • 1:11:24is to prompt the user for a title of a counter so the purpose of this program
    • 1:11:26show and then just is to prompt the user for a title of a
    • 1:11:28report the popularity thereof by show and then just
    • 1:11:30counting the number of instances of it report the popularity thereof by
    • 1:11:32in the file so let me go ahead and run counting the number of instances of it
    • 1:11:34this with python of favorites.pi in the file so let me go ahead and run
    • 1:11:36enter let me go ahead and type in the this with python of favorites.pi
    • 1:11:39office enter enter let me go ahead and type in the
    • 1:11:41and 19. now i don't remember exactly office enter
    • 1:11:45what the number was but i remember the and 19. now i don't remember exactly
    • 1:11:46office was more popular than that what the number was but i remember the
    • 1:11:49i'm pretty sure it was not 19. office was more popular than that
    • 1:11:52any intuition as to why this program is i'm pretty sure it was not 19.
    • 1:11:55buggy any intuition as to why this program is
    • 1:11:56or so it would seem what have i done buggy
    • 1:12:00wrong or so it would seem what have i done
    • 1:12:04any thoughts in the chat or
    • 1:12:08a few people in the chat are saying you any thoughts in the chat or
    • 1:12:09need to remember to deal with a few people in the chat are saying you
    • 1:12:11capitalization and white space again need to remember to deal with
    • 1:12:12yeah so we need to practice those same capitalization and white space again
    • 1:12:15lessons learned from before so i should yeah so we need to practice those same
    • 1:12:17really canonicalize the input that the lessons learned from before so i should
    • 1:12:19human really canonicalize the input that the
    • 1:12:19i just typed in and also the input human
    • 1:12:22that's coming from the csv file perhaps i just typed in and also the input
    • 1:12:24the simplest way to do this is up here that's coming from the csv file perhaps
    • 1:12:25to first strip off leading and trailing the simplest way to do this is up here
    • 1:12:27white space in case i get a little to first strip off leading and trailing
    • 1:12:29sloppy and hit the space bar white space in case i get a little
    • 1:12:30where i shouldn't and then let's go sloppy and hit the space bar
    • 1:12:32ahead and force it to uppercase just where i shouldn't and then let's go
    • 1:12:33because ahead and force it to uppercase just
    • 1:12:34it doesn't matter if it's upper or lower because
    • 1:12:36but at least we'll standardize things it doesn't matter if it's upper or lower
    • 1:12:37that way but at least we'll standardize things
    • 1:12:38and then when i do this look at the that way
    • 1:12:40current rows title and then when i do this look at the
    • 1:12:42i think i really need to do the same current rows title
    • 1:12:43thing if i'm going to canonicalize one i i think i really need to do the same
    • 1:12:45need to canonicalize the other and now thing if i'm going to canonicalize one i
    • 1:12:47compare the all caps white space script need to canonicalize the other and now
    • 1:12:51versions of both strings so now let me compare the all caps white space script
    • 1:12:53rerun it versions of both strings so now let me
    • 1:12:53now i'm going to type in the office rerun it
    • 1:12:55enter and voila now i'm at 26 which i now i'm going to type in the office
    • 1:12:58think is where we were at before and in enter and voila now i'm at 26 which i
    • 1:13:00fact now i the user can be a little think is where we were at before and in
    • 1:13:02sloppy i can say the office fact now i the user can be a little
    • 1:13:04i can run it again and say the office sloppy i can say the office
    • 1:13:06and then for whatever reason hit the i can run it again and say the office
    • 1:13:07space bar a lot enter and then for whatever reason hit the
    • 1:13:09it's still going to work and indeed space bar a lot enter
    • 1:13:11though we seem to be like belaboring the it's still going to work and indeed
    • 1:13:12pedantic here with uh though we seem to be like belaboring the
    • 1:13:14trimming off white space and so forth pedantic here with uh
    • 1:13:16just think in a relatively small trimming off white space and so forth
    • 1:13:17audience here how many of you just think in a relatively small
    • 1:13:18accidentally hit the space bar or audience here how many of you
    • 1:13:20capitalize things differently this accidentally hit the space bar or
    • 1:13:22happens massively on scale and you can capitalize things differently this
    • 1:13:24imagine this being happens massively on scale and you can
    • 1:13:25important when you're tagging friends in imagine this being
    • 1:13:27some social media account you're doing important when you're tagging friends in
    • 1:13:29at some social media account you're doing
    • 1:13:29brian or the like you don't want to have at
    • 1:13:31to require the user to type at brian or the like you don't want to have
    • 1:13:33capital b lowercase r i a n and so forth to require the user to type at
    • 1:13:36so tolerating disparate messy user input capital b lowercase r i a n and so forth
    • 1:13:39is such so tolerating disparate messy user input
    • 1:13:40a common uh problem to solve including is such
    • 1:13:43in today's apps that we all use a common uh problem to solve including
    • 1:13:46all right any questions then on this in today's apps that we all use
    • 1:13:49program which i think is correct all right any questions then on this
    • 1:13:54then let me ask a question of you in program which i think is correct
    • 1:13:56what sense is this program poorly then let me ask a question of you in
    • 1:13:58designed what sense is this program poorly
    • 1:14:00in what sense is this program poorly designed
    • 1:14:02designed in what sense is this program poorly
    • 1:14:04this is more subtle but think about the designed
    • 1:14:08running time of this program this is more subtle but think about the
    • 1:14:09in terms of big o what is the running running time of this program
    • 1:14:12time of this program if the in terms of big o what is the running
    • 1:14:14csv file has n different time of this program if the
    • 1:14:18shows in it or n different submissions csv file has n different
    • 1:14:21so n is the variable in question yeah shows in it or n different submissions
    • 1:14:24what's the running time andrew so n is the variable in question yeah
    • 1:14:27is the big o of n because you're using what's the running time andrew
    • 1:14:29the linear search yeah it's big o of n is the big o of n because you're using
    • 1:14:31because i'm literally using linear the linear search yeah it's big o of n
    • 1:14:33search by way of the for loop that's how because i'm literally using linear
    • 1:14:35a for loop works in python just like in search by way of the for loop that's how
    • 1:14:37c a for loop works in python just like in
    • 1:14:37starts at the beginning and potentially c
    • 1:14:38goes all the way till the end and so i'm starts at the beginning and potentially
    • 1:14:41using implicitly linear search goes all the way till the end and so i'm
    • 1:14:43because i'm not using any fancy data using implicitly linear search
    • 1:14:44structures no sets no dictionaries i'm because i'm not using any fancy data
    • 1:14:46just structures no sets no dictionaries i'm
    • 1:14:47looping from top to bottom so you can just
    • 1:14:49imagine that looping from top to bottom so you can
    • 1:14:50if we surveyed not just all of the imagine that
    • 1:14:51students here in class but maybe if we surveyed not just all of the
    • 1:14:53everyone on campus or everyone in the students here in class but maybe
    • 1:14:55world maybe we're internet movie everyone on campus or everyone in the
    • 1:14:57database imdb world maybe we're internet movie
    • 1:14:58there could be a huge number of votes database imdb
    • 1:15:01and a huge number of shows there could be a huge number of votes
    • 1:15:03and so writing a program whether it's in and a huge number of shows
    • 1:15:05a terminal window like mine or maybe on and so writing a program whether it's in
    • 1:15:07a mobile device or maybe on a web page a terminal window like mine or maybe on
    • 1:15:09for your laptop or desktop a mobile device or maybe on a web page
    • 1:15:11it's probably not the best design to for your laptop or desktop
    • 1:15:13constantly it's probably not the best design to
    • 1:15:15loop over all of the shows in your constantly
    • 1:15:17database from top to bottom loop over all of the shows in your
    • 1:15:19just to answer a single question it database from top to bottom
    • 1:15:22would be much nicer to do things in just to answer a single question it
    • 1:15:23log of end time or in constant time and would be much nicer to do things in
    • 1:15:26thankfully over the past few weeks both log of end time or in constant time and
    • 1:15:27in cnn and python we have seen thankfully over the past few weeks both
    • 1:15:29smarter ways to do this but i'm not in cnn and python we have seen
    • 1:15:32practicing smarter ways to do this but i'm not
    • 1:15:33what i've preached here and in fact at practicing
    • 1:15:36some point what i've preached here and in fact at
    • 1:15:37this notion of a flat file database some point
    • 1:15:40starts to get too primitive for us flat this notion of a flat file database
    • 1:15:42file databases like csv files starts to get too primitive for us flat
    • 1:15:44are wonderfully useful when you just file databases like csv files
    • 1:15:46want to do something quickly are wonderfully useful when you just
    • 1:15:48or when you want to download data from want to do something quickly
    • 1:15:50some third party like google in a or when you want to download data from
    • 1:15:51standard some third party like google in a
    • 1:15:52portable way portable means that it can standard
    • 1:15:54be used by different people on different portable way portable means that it can
    • 1:15:55systems be used by different people on different
    • 1:15:56csv is about as simple as it gets systems
    • 1:15:58because you don't need to own microsoft csv is about as simple as it gets
    • 1:15:59word because you don't need to own microsoft
    • 1:16:00or apple numbers or any particular word
    • 1:16:02product it's just a text file so you can or apple numbers or any particular
    • 1:16:04use any text editing program product it's just a text file so you can
    • 1:16:06or any programming language to access it use any text editing program
    • 1:16:08but flat file databases aren't or any programming language to access it
    • 1:16:10necessarily the best but flat file databases aren't
    • 1:16:12structure to use ultimately for larger necessarily the best
    • 1:16:15data sets structure to use ultimately for larger
    • 1:16:16because they don't really lend data sets
    • 1:16:17themselves to more efficient queries so because they don't really lend
    • 1:16:19csv files pretty much at best you have themselves to more efficient queries so
    • 1:16:21to search csv files pretty much at best you have
    • 1:16:22top to bottom left or right but it turns to search
    • 1:16:24out that there top to bottom left or right but it turns
    • 1:16:25are better databases out there generally out that there
    • 1:16:27known as relational databases are better databases out there generally
    • 1:16:29that instead of being files in which you known as relational databases
    • 1:16:31store data they are instead that instead of being files in which you
    • 1:16:33programs in which you store data now to store data they are instead
    • 1:16:36be fair programs in which you store data now to
    • 1:16:36those programs use a lot of ram memory be fair
    • 1:16:39where they actually store your data those programs use a lot of ram memory
    • 1:16:41and they do certainly persist your data where they actually store your data
    • 1:16:43they keep it long-term and they do certainly persist your data
    • 1:16:44by storing your data also in files but they keep it long-term
    • 1:16:47between you and your data by storing your data also in files but
    • 1:16:49there is this running program and if between you and your data
    • 1:16:50you've ever heard of oracle or mysql or there is this running program and if
    • 1:16:53postgres or sql server or microsoft you've ever heard of oracle or mysql or
    • 1:16:55access or postgres or sql server or microsoft
    • 1:16:56bunches of other popular products both access or
    • 1:16:58commercial bunches of other popular products both
    • 1:16:59and free and open source alike commercial
    • 1:17:01relational databases are so similar in and free and open source alike
    • 1:17:03spirit relational databases are so similar in
    • 1:17:04to spreadsheets but they are implemented spirit
    • 1:17:07in software to spreadsheets but they are implemented
    • 1:17:08and they give us more and more features in software
    • 1:17:10and they use more and more data and they give us more and more features
    • 1:17:11structures so that we can and they use more and more data
    • 1:17:12search for data insert data delete data structures so that we can
    • 1:17:15update data search for data insert data delete data
    • 1:17:16much much more efficiently than we could update data
    • 1:17:19if just using much much more efficiently than we could
    • 1:17:20something like a csv file so let's go if just using
    • 1:17:22ahead and take our five-minute break something like a csv file so let's go
    • 1:17:23here and when we come back we'll look at ahead and take our five-minute break
    • 1:17:25relational databases and in turn a here and when we come back we'll look at
    • 1:17:27language called sql relational databases and in turn a
    • 1:25:22all right we are back and the goal at
    • 1:25:25hand now is to transition from these all right we are back and the goal at
    • 1:25:27fairly simplistic hand now is to transition from these
    • 1:25:28flat file databases to a more proper fairly simplistic
    • 1:25:30relational database and relational flat file databases to a more proper
    • 1:25:32databases are indeed what power relational database and relational
    • 1:25:34so many of today's mobile applications databases are indeed what power
    • 1:25:36web applications and the like so many of today's mobile applications
    • 1:25:38now we're beginning to transition to web applications and the like
    • 1:25:39real world software with real world now we're beginning to transition to
    • 1:25:41languages at that and so now real world software with real world
    • 1:25:45let me introduce what we're going to languages at that and so now
    • 1:25:46call sql lite so it turns out that a let me introduce what we're going to
    • 1:25:49relational database call sql lite so it turns out that a
    • 1:25:50is a database that stores all of the relational database
    • 1:25:52data still in rows and columns is a database that stores all of the
    • 1:25:55but it doesn't do so using spreadsheets data still in rows and columns
    • 1:25:57or sheets but it doesn't do so using spreadsheets
    • 1:25:58it instead does so using what we're or sheets
    • 1:26:00going to call tables so it's pretty much it instead does so using what we're
    • 1:26:02the same idea going to call tables so it's pretty much
    • 1:26:03but in with tables do we get some the same idea
    • 1:26:05additional functionality but in with tables do we get some
    • 1:26:06with those tables we'll have the ability additional functionality
    • 1:26:08to search for data with those tables we'll have the ability
    • 1:26:10update data delete data insert new data to search for data
    • 1:26:13and the like and these are things that update data delete data insert new data
    • 1:26:14we absolutely and the like and these are things that
    • 1:26:15can do with spreadsheets but in the we absolutely
    • 1:26:17world of spreadsheets if you want to can do with spreadsheets but in the
    • 1:26:18search for something it's you the human world of spreadsheets if you want to
    • 1:26:20doing it search for something it's you the human
    • 1:26:20by manually clicking and scrolling doing it
    • 1:26:22typically if you want to insert data by manually clicking and scrolling
    • 1:26:24it's you the human typically if you want to insert data
    • 1:26:25typing it in manually after adding a new it's you the human
    • 1:26:27row if you want to delete something it's typing it in manually after adding a new
    • 1:26:28you right clicking or control clicking row if you want to delete something it's
    • 1:26:30and you right clicking or control clicking
    • 1:26:30deleting a whole row or updating the and
    • 1:26:32individual cells they're in deleting a whole row or updating the
    • 1:26:34with sql structured query language we individual cells they're in
    • 1:26:37have a new programming language that is with sql structured query language we
    • 1:26:39very often used in conjunction with have a new programming language that is
    • 1:26:41other programming languages and so very often used in conjunction with
    • 1:26:43today we'll see sql used on its own other programming languages and so
    • 1:26:45initially today we'll see sql used on its own
    • 1:26:46but we'll also see it in the context of initially
    • 1:26:48a python program so a language like but we'll also see it in the context of
    • 1:26:50python a python program so a language like
    • 1:26:51can itself use sql to python
    • 1:26:54do more powerful things than python can itself use sql to
    • 1:26:56alone could do do more powerful things than python
    • 1:26:58so with that said sql lite is like a alone could do
    • 1:27:00light version of sql it's a more so with that said sql lite is like a
    • 1:27:02user-friendly version it's more portable light version of sql it's a more
    • 1:27:04it can be used on macs and pcs and user-friendly version it's more portable
    • 1:27:06phones and laptops and desktops and it can be used on macs and pcs and
    • 1:27:08servers but it's incredibly common in phones and laptops and desktops and
    • 1:27:09fact in your iphone and your android servers but it's incredibly common in
    • 1:27:11phone fact in your iphone and your android
    • 1:27:12many of the applications you are running phone
    • 1:27:14today on your own device many of the applications you are running
    • 1:27:15are using sql lite underneath the hood today on your own device
    • 1:27:17so it isn't a toy language per se are using sql lite underneath the hood
    • 1:27:20it's instead a relatively simple so it isn't a toy language per se
    • 1:27:21implementation of a language generally it's instead a relatively simple
    • 1:27:23known as sql but long story short implementation of a language generally
    • 1:27:25there's other implementations of known as sql but long story short
    • 1:27:27relational databases out there and i there's other implementations of
    • 1:27:29rattled off several of them already relational databases out there and i
    • 1:27:30oracle and mysql and postgres and the rattled off several of them already
    • 1:27:32like those all oracle and mysql and postgres and the
    • 1:27:34have slightly different flavors or like those all
    • 1:27:36dialects of sql have slightly different flavors or
    • 1:27:37so sql is like a fairly standard dialects of sql
    • 1:27:40language so sql is like a fairly standard
    • 1:27:40for interacting with databases but language
    • 1:27:42different companies different for interacting with databases but
    • 1:27:43communities different companies different
    • 1:27:44have kind of added or subtracted their communities
    • 1:27:46own preferred features have kind of added or subtracted their
    • 1:27:47and so the syntax you use is generally own preferred features
    • 1:27:50constant across all platforms and so the syntax you use is generally
    • 1:27:52but we will standardize for our purposes constant across all platforms
    • 1:27:54on sql lite and indeed this is what you but we will standardize for our purposes
    • 1:27:56would use these days in the world of on sql lite and indeed this is what you
    • 1:27:57mobile applications would use these days in the world of
    • 1:27:59um so it's very much germane there so mobile applications
    • 1:28:01with sql lite um so it's very much germane there so
    • 1:28:03we're going to have ultimately the with sql lite
    • 1:28:05ability to we're going to have ultimately the
    • 1:28:06query data and update data delete data ability to
    • 1:28:08and the like but to do so we actually query data and update data delete data
    • 1:28:10need a program and the like but to do so we actually
    • 1:28:11with which to interact with our database need a program
    • 1:28:13so the way with which to interact with our database
    • 1:28:14sql lite works is that it stores all of so the way
    • 1:28:17your data still sql lite works is that it stores all of
    • 1:28:18in a file but it's a binary file now your data still
    • 1:28:21that is it's a file containing zeros and in a file but it's a binary file now
    • 1:28:23ones and those zeros and ones might that is it's a file containing zeros and
    • 1:28:24represent text they might represent ones and those zeros and ones might
    • 1:28:25numbers represent text they might represent
    • 1:28:26but it's a more compact efficient numbers
    • 1:28:28representation than a mere csv file but it's a more compact efficient
    • 1:28:30would be using ascii representation than a mere csv file
    • 1:28:32or unicode so that's the first would be using ascii
    • 1:28:33difference sqlite or unicode so that's the first
    • 1:28:35uses a single file a binary file to difference sqlite
    • 1:28:38store all of your data and represented uses a single file a binary file to
    • 1:28:40inside of that file by way of all of store all of your data and represented
    • 1:28:42those zeros and ones inside of that file by way of all of
    • 1:28:43are the tables to which i alluded before those zeros and ones
    • 1:28:45which are the analog are the tables to which i alluded before
    • 1:28:46in the database world of sheets or which are the analog
    • 1:28:49spreadsheets in the spreadsheet world in the database world of sheets or
    • 1:28:51so to interact with that binary file spreadsheets in the spreadsheet world
    • 1:28:54wherein all of your data is stored we so to interact with that binary file
    • 1:28:56need some kind of user facing program wherein all of your data is stored we
    • 1:28:59and there's many different tools to use need some kind of user facing program
    • 1:29:00but the most and there's many different tools to use
    • 1:29:01the the standard one that comes with sql but the most
    • 1:29:04lite is called sql lite3 essentially the the standard one that comes with sql
    • 1:29:06version three lite is called sql lite3 essentially
    • 1:29:07of the tool this is a command line tool version three
    • 1:29:10similar in spirits any of the commands of the tool this is a command line tool
    • 1:29:11you've run in a terminal window thus far similar in spirits any of the commands
    • 1:29:13that allows you to open up that binary you've run in a terminal window thus far
    • 1:29:15file and interact with all of your that allows you to open up that binary
    • 1:29:17tables now here again we kind of have a file and interact with all of your
    • 1:29:19chicken and the egg problem if i want to tables now here again we kind of have a
    • 1:29:21use a database chicken and the egg problem if i want to
    • 1:29:22but i don't yet have a database and yet use a database
    • 1:29:24i want to select data from my database but i don't yet have a database and yet
    • 1:29:26how do i actually load things in i want to select data from my database
    • 1:29:27well you can load data into a sqlite how do i actually load things in
    • 1:29:29database in at least two ways well you can load data into a sqlite
    • 1:29:31one which i'll do in a moment you can database in at least two ways
    • 1:29:33just import one which i'll do in a moment you can
    • 1:29:34an existing flat file database like a just import
    • 1:29:37csv an existing flat file database like a
    • 1:29:38and what you do is you save the csv on csv
    • 1:29:40your mac or pc and your cs50 ide and what you do is you save the csv on
    • 1:29:43you run a special command with sql light your mac or pc and your cs50 ide
    • 1:29:453 and it will just you run a special command with sql light
    • 1:29:46load the csv into memory it will figure 3 and it will just
    • 1:29:49out where all of the commas are load the csv into memory it will figure
    • 1:29:51and it will construct inside of that out where all of the commas are
    • 1:29:53binary file and it will construct inside of that
    • 1:29:54the corresponding rows and columns using binary file
    • 1:29:57the appropriate zeros and ones the corresponding rows and columns using
    • 1:29:58to store all of that information so it the appropriate zeros and ones
    • 1:30:00just imports it for you automatically to store all of that information so it
    • 1:30:03approach two would be to actually write just imports it for you automatically
    • 1:30:04code in a language like python or any approach two would be to actually write
    • 1:30:07other code in a language like python or any
    • 1:30:07that actually manually inserts all of other
    • 1:30:10the data that actually manually inserts all of
    • 1:30:12into your database and we'll do that as the data
    • 1:30:13well but let's start simple let me go into your database and we'll do that as
    • 1:30:15ahead and run for instance uh sqlite3 well but let's start simple let me go
    • 1:30:18uh and this is pre-installed on cs50 ide ahead and run for instance uh sqlite3
    • 1:30:20and it's not that hard to get it up and uh and this is pre-installed on cs50 ide
    • 1:30:22running on a mac and pc as well and it's not that hard to get it up and
    • 1:30:24i'm going to go ahead and run sqlite3 in running on a mac and pc as well
    • 1:30:26my terminal window here i'm going to go ahead and run sqlite3 in
    • 1:30:27and voila you just see some very simple my terminal window here
    • 1:30:30output and voila you just see some very simple
    • 1:30:31it's telling me to type period help if i output
    • 1:30:33want to see some usage hints but i know it's telling me to type period help if i
    • 1:30:35most of the commands and will generally want to see some usage hints but i know
    • 1:30:36give you all of the commands that you most of the commands and will generally
    • 1:30:38might need give you all of the commands that you
    • 1:30:38in fact one of the commands that we can might need
    • 1:30:40use is dot mode and another is dot in fact one of the commands that we can
    • 1:30:42import use is dot mode and another is dot
    • 1:30:43so generally you won't use these that import
    • 1:30:45frequently you'll only use them when so generally you won't use these that
    • 1:30:46creating a database for the first time frequently you'll only use them when
    • 1:30:48when you are creating that database from creating a database for the first time
    • 1:30:51an existing csv file and indeed that's when you are creating that database from
    • 1:30:53my goal at the moment let me take an existing csv file and indeed that's
    • 1:30:54our csv file containing all of your my goal at the moment let me take
    • 1:30:57favorite tv shows our csv file containing all of your
    • 1:30:58and load it into sql lite in a proper favorite tv shows
    • 1:31:01relational database and load it into sql lite in a proper
    • 1:31:03so that we can do better than for relational database
    • 1:31:05instance big o so that we can do better than for
    • 1:31:06of n when it comes to searching that instance big o
    • 1:31:08data and doing anything else on it of n when it comes to searching that
    • 1:31:10so to do this i have to execute two data and doing anything else on it
    • 1:31:12commands one i need to put sql lite into so to do this i have to execute two
    • 1:31:14csv commands one i need to put sql lite into
    • 1:31:15mode and that's just to distinguish it csv
    • 1:31:17from other flat file formats like mode and that's just to distinguish it
    • 1:31:18tsv for tabs or some other format and from other flat file formats like
    • 1:31:21now i'm going to go ahead and run tsv for tabs or some other format and
    • 1:31:22import then i have to specify the name now i'm going to go ahead and run
    • 1:31:25of the file to import which is the csv import then i have to specify the name
    • 1:31:27and i'm going to go ahead and call my of the file to import which is the csv
    • 1:31:29table shows and i'm going to go ahead and call my
    • 1:31:31so dot import takes two arguments the table shows
    • 1:31:34name of the file that you want to import so dot import takes two arguments the
    • 1:31:36and the name of the table that you want name of the file that you want to import
    • 1:31:38to create and the name of the table that you want
    • 1:31:39out of that file and again tables have to create
    • 1:31:41rows and columns out of that file and again tables have
    • 1:31:42and the commas in the file are going to rows and columns
    • 1:31:45delineate and the commas in the file are going to
    • 1:31:45where those columns begin and end i'm delineate
    • 1:31:48going to go ahead and hit enter where those columns begin and end i'm
    • 1:31:49it looks like it flew by pretty fast going to go ahead and hit enter
    • 1:31:52nothing it looks like it flew by pretty fast
    • 1:31:52seems to have happened but i think nothing
    • 1:31:54that's okay seems to have happened but i think
    • 1:31:55because now we're going to go ahead and that's okay
    • 1:31:57have the ability to actually manipulate because now we're going to go ahead and
    • 1:31:59that data have the ability to actually manipulate
    • 1:32:00but how do we manipulate the data we that data
    • 1:32:02need a new language but how do we manipulate the data we
    • 1:32:03sql structured query language is the need a new language
    • 1:32:06language sql structured query language is the
    • 1:32:07used by sql lite and oracle and mysql language
    • 1:32:10and postgres and bunches of other used by sql lite and oracle and mysql
    • 1:32:12products whose names you don't need to and postgres and bunches of other
    • 1:32:13know or remember anytime soon products whose names you don't need to
    • 1:32:15but sql is the language we'll use to know or remember anytime soon
    • 1:32:18query the database but sql is the language we'll use to
    • 1:32:20for information and do something with it query the database
    • 1:32:22generally speaking for information and do something with it
    • 1:32:23a relational database and in turn sql generally speaking
    • 1:32:26which is a language by a relational database and in turn sql
    • 1:32:27which you can interact with relational which is a language by
    • 1:32:29databases support four which you can interact with relational
    • 1:32:31fundamental operations and they're sort databases support four
    • 1:32:32of a crude acronym fundamental operations and they're sort
    • 1:32:34pun intended that is just helpful for of a crude acronym
    • 1:32:37remembering what those fundamental pun intended that is just helpful for
    • 1:32:38operations are with relational databases remembering what those fundamental
    • 1:32:40crud operations are with relational databases
    • 1:32:42stands for create read crud
    • 1:32:45update and delete and indeed the acronym stands for create read
    • 1:32:48is crud crud update and delete and indeed the acronym
    • 1:32:49so it helps you remember that the four is crud crud
    • 1:32:51basic operations supported by any so it helps you remember that the four
    • 1:32:52relational database are create basic operations supported by any
    • 1:32:54read update delete create means to relational database are create
    • 1:32:57create or add new data read update delete create means to
    • 1:32:58read means to access and load into create or add new data
    • 1:33:01memory read means to access and load into
    • 1:33:01new data we've seen read before with memory
    • 1:33:03opening files update and delete mean new data we've seen read before with
    • 1:33:05exactly that as well if you want to opening files update and delete mean
    • 1:33:07manipulate the data in your data set exactly that as well if you want to
    • 1:33:09now those are generic terms for any manipulate the data in your data set
    • 1:33:11relational database those are the four now those are generic terms for any
    • 1:33:12properties typically supported by any relational database those are the four
    • 1:33:14relational database properties typically supported by any
    • 1:33:15in the world of sql there are some very relational database
    • 1:33:18specific in the world of sql there are some very
    • 1:33:19commands or functions if you will that specific
    • 1:33:22implement commands or functions if you will that
    • 1:33:22those four functionalities implement
    • 1:33:26they are create and insert achieve the those four functionalities
    • 1:33:29same they are create and insert achieve the
    • 1:33:29thing as create more generally the same
    • 1:33:31keyword select thing as create more generally the
    • 1:33:32is what's used to read data from a keyword select
    • 1:33:34database update and delete are the same is what's used to read data from a
    • 1:33:37so it's kind of an annoying database update and delete are the same
    • 1:33:38inconsistency the acronym or the term of so it's kind of an annoying
    • 1:33:40art is crud inconsistency the acronym or the term of
    • 1:33:41create read update delete but in the art is crud
    • 1:33:42world of sql the authors of the language create read update delete but in the
    • 1:33:44decided to world of sql the authors of the language
    • 1:33:45implement those four ideas by way of decided to
    • 1:33:47these five keywords implement those four ideas by way of
    • 1:33:49or functions or commands if you will in these five keywords
    • 1:33:52the language sql so what you are looking or functions or commands if you will in
    • 1:33:53at the language sql so what you are looking
    • 1:33:54as are five of the keywords you can use at
    • 1:33:57in this new as are five of the keywords you can use
    • 1:33:57language called sql to actually do in this new
    • 1:33:59something language called sql to actually do
    • 1:34:00with your database now what does that something
    • 1:34:02mean well suppose that you wanted to with your database now what does that
    • 1:34:04manually create a database for the very mean well suppose that you wanted to
    • 1:34:05first time manually create a database for the very
    • 1:34:06what do you do well back in the world of first time
    • 1:34:08spreadsheets is pretty straightforward what do you do well back in the world of
    • 1:34:09right you like open up google spreadsheets is pretty straightforward
    • 1:34:11spreadsheets you go to like file new or right you like open up google
    • 1:34:13whatever spreadsheets you go to like file new or
    • 1:34:14and then you just voila you get a new whatever
    • 1:34:15spreadsheet into which you can start and then you just voila you get a new
    • 1:34:16creating rows and columns and the like spreadsheet into which you can start
    • 1:34:18microsoft excel apple number same thing creating rows and columns and the like
    • 1:34:21file menu new spreadsheet or whatever microsoft excel apple number same thing
    • 1:34:23and boom you have a new spreadsheet now file menu new spreadsheet or whatever
    • 1:34:25in the world of sql and boom you have a new spreadsheet now
    • 1:34:26sql databases are generally meant to be in the world of sql
    • 1:34:28interacted with sql databases are generally meant to be
    • 1:34:29code however there are graphical user interacted with
    • 1:34:32interfaces gui's by which you can code however there are graphical user
    • 1:34:34interact with them as well but we're interfaces gui's by which you can
    • 1:34:35going to use code today to do so interact with them as well but we're
    • 1:34:37and programs at a command line it turns going to use code today to do so
    • 1:34:39out that you can and programs at a command line it turns
    • 1:34:42create tables programmatically by out that you can
    • 1:34:45running a command like this create tables programmatically by
    • 1:34:47so if you literally type out syntax running a command like this
    • 1:34:49along the lines of create so if you literally type out syntax
    • 1:34:50table then the name of your table along the lines of create
    • 1:34:53indicated here in lowercase table then the name of your table
    • 1:34:54then a parenthesis then the name of your indicated here in lowercase
    • 1:34:57column that you want to create then a parenthesis then the name of your
    • 1:34:59and the type of that column a la column that you want to create
    • 1:35:02c and then comma dot dot some more and the type of that column a la
    • 1:35:05columns c and then comma dot dot some more
    • 1:35:05this is generally speaking the syntax columns
    • 1:35:08you will use this is generally speaking the syntax
    • 1:35:09to create in this language called sql a you will use
    • 1:35:11new table now this is in the abstract to create in this language called sql a
    • 1:35:13again new table now this is in the abstract
    • 1:35:14like table in lowercase is meant to again
    • 1:35:16represent the name you want to give your like table in lowercase is meant to
    • 1:35:17actual table represent the name you want to give your
    • 1:35:18column in lowercase is meant to be the actual table
    • 1:35:20name you want to give to your own column column in lowercase is meant to be the
    • 1:35:21maybe it's title maybe genres and name you want to give to your own column
    • 1:35:23dot dot just means of course you can maybe it's title maybe genres and
    • 1:35:25have even more columns than that but dot dot just means of course you can
    • 1:35:26literally in a moment if i were to type have even more columns than that but
    • 1:35:28in literally in a moment if i were to type
    • 1:35:29this kind of command into the terminal in
    • 1:35:31window after running the sql light 3 this kind of command into the terminal
    • 1:35:33program window after running the sql light 3
    • 1:35:34i could start creating one or more program
    • 1:35:36tables for myself i could start creating one or more
    • 1:35:37and in fact that's what already happened tables for myself
    • 1:35:39for me this dot import command which is and in fact that's what already happened
    • 1:35:42not part of sql for me this dot import command which is
    • 1:35:43this is like the equivalent of a menu not part of sql
    • 1:35:45option in excel or google spreadsheets this is like the equivalent of a menu
    • 1:35:47dot import just automates a certain option in excel or google spreadsheets
    • 1:35:49process for me and what it did for me is dot import just automates a certain
    • 1:35:51this process for me and what it did for me is
    • 1:35:52if i type now dot schema which is this
    • 1:35:54another sql light specific command if i type now dot schema which is
    • 1:35:56anything that starts with a dot another sql light specific command
    • 1:35:57is specific only to sqlite 3 this anything that starts with a dot
    • 1:36:00terminal window program is specific only to sqlite 3 this
    • 1:36:02notice what's outputted is this by terminal window program
    • 1:36:04running dot import notice what's outputted is this by
    • 1:36:06that automatically for me created a running dot import
    • 1:36:08table that automatically for me created a
    • 1:36:09in my database called table
    • 1:36:12shows and it gave it three columns in my database called
    • 1:36:15timestamp shows and it gave it three columns
    • 1:36:16title and genres where did those column timestamp
    • 1:36:18names come from title and genres where did those column
    • 1:36:19come from well they came from the very names come from
    • 1:36:21first line in the csv come from well they came from the very
    • 1:36:22and they all looked like text so the first line in the csv
    • 1:36:25type of those values was just in and they all looked like text so the
    • 1:36:27assumed to be text text text now to be type of those values was just in
    • 1:36:30clear assumed to be text text text now to be
    • 1:36:31i could have manually typed this out clear
    • 1:36:32created these three columns in a new i could have manually typed this out
    • 1:36:34table called shows for me created these three columns in a new
    • 1:36:36but again the dot import command just table called shows for me
    • 1:36:37automated that from a csv but again the dot import command just
    • 1:36:39but the sql is what we see here create automated that from a csv
    • 1:36:42table shows but the sql is what we see here create
    • 1:36:43and so forth so that is to say now table shows
    • 1:36:46in this database there is a file and so forth so that is to say now
    • 1:36:50or rather there is a table called in this database there is a file
    • 1:36:53shows inside of which is all of the data or rather there is a table called
    • 1:36:56from that csv shows inside of which is all of the data
    • 1:36:57how do i actually get at that data well from that csv
    • 1:36:59it turns out there's other commands how do i actually get at that data well
    • 1:37:00recalled not just create it turns out there's other commands
    • 1:37:02but also select it turns out select is recalled not just create
    • 1:37:05the equivalent of but also select it turns out select is
    • 1:37:06read getting data from the database and the equivalent of
    • 1:37:08this one's pretty powerful read getting data from the database and
    • 1:37:10and the reason that so many data this one's pretty powerful
    • 1:37:11scientists and statisticians and the reason that so many data
    • 1:37:13use and like using languages like sql scientists and statisticians
    • 1:37:15they make it relatively easy to just get use and like using languages like sql
    • 1:37:17data and filter that data and analyze they make it relatively easy to just get
    • 1:37:19that data data and filter that data and analyze
    • 1:37:20using new syntax for us today but that data
    • 1:37:23relatively simple syntax relative to using new syntax for us today but
    • 1:37:24other things we've seen relatively simple syntax relative to
    • 1:37:26the select command in sql lets you other things we've seen
    • 1:37:28select one or more columns the select command in sql lets you
    • 1:37:31from your table by the given name select one or more columns
    • 1:37:34so we'll see this now in just a moment from your table by the given name
    • 1:37:37here how might i go about doing this so we'll see this now in just a moment
    • 1:37:38well let me go ahead and now here how might i go about doing this
    • 1:37:41at my prompt after just clearing the well let me go ahead and now
    • 1:37:42window to keep things neat let me try at my prompt after just clearing the
    • 1:37:44this out window to keep things neat let me try
    • 1:37:44let me go ahead and select this out
    • 1:37:48uh let's say title from let me go ahead and select
    • 1:37:51shows semicolon so why am i doing this uh let's say title from
    • 1:37:54well again the conventional format for shows semicolon so why am i doing this
    • 1:37:56the select command well again the conventional format for
    • 1:37:57is to say select then the name of one or the select command
    • 1:37:59more columns is to say select then the name of one or
    • 1:38:00then literally the preposition from and more columns
    • 1:38:02then the name of the table then literally the preposition from and
    • 1:38:04from which you want to select that data then the name of the table
    • 1:38:06so if my table is called from which you want to select that data
    • 1:38:08shows and the column is called title it so if my table is called
    • 1:38:11stands to reason that select title from shows and the column is called title it
    • 1:38:13shows stands to reason that select title from
    • 1:38:14should give me back the data i want now shows
    • 1:38:15notice a couple of stylistic choices should give me back the data i want now
    • 1:38:17that aren't strictly required but are notice a couple of stylistic choices
    • 1:38:19good style that aren't strictly required but are
    • 1:38:20conventionally i would capitalize any good style
    • 1:38:22sql conventionally i would capitalize any
    • 1:38:23keywords including select and from in sql
    • 1:38:25this case keywords including select and from in
    • 1:38:26and then lowercase anything that's a this case
    • 1:38:28column name and then lowercase anything that's a
    • 1:38:29or a table name assuming you created column name
    • 1:38:32those columns and tables or a table name assuming you created
    • 1:38:34in in fact lowercase there's different those columns and tables
    • 1:38:35conventions out there some people will in in fact lowercase there's different
    • 1:38:36upper case some people use something conventions out there some people will
    • 1:38:38called camel case or snake case or the upper case some people use something
    • 1:38:40like called camel case or snake case or the
    • 1:38:40but generally speaking i would encourage like
    • 1:38:42all caps for sql syntax but generally speaking i would encourage
    • 1:38:44and lowercase for the column and table all caps for sql syntax
    • 1:38:46names i'm going to go ahead now and hit and lowercase for the column and table
    • 1:38:47enter names i'm going to go ahead now and hit
    • 1:38:48and voila we see rapidly a whole list of enter
    • 1:38:51values and voila we see rapidly a whole list of
    • 1:38:52outputted from the database and if you values
    • 1:38:54think way back outputted from the database and if you
    • 1:38:56you'll might recognize that this think way back
    • 1:38:58actually happens to be the same order you'll might recognize that this
    • 1:39:00as before because the csv file was actually happens to be the same order
    • 1:39:03loaded top to bottom as before because the csv file was
    • 1:39:04into this same database table and so loaded top to bottom
    • 1:39:06what we're seeing in fact is all of that into this same database table and so
    • 1:39:08same data what we're seeing in fact is all of that
    • 1:39:09duplicates and miscapitalizations and same data
    • 1:39:11weird spacing duplicates and miscapitalizations and
    • 1:39:12and all but suppose i want to see all of weird spacing
    • 1:39:15the data from the csv and all but suppose i want to see all of
    • 1:39:16well it turns out you can select the data from the csv
    • 1:39:17multiple columns you can select not only well it turns out you can select
    • 1:39:19title but maybe timestamp was of multiple columns you can select not only
    • 1:39:21interest and this one admittedly was title but maybe timestamp was of
    • 1:39:23capitalized because that's what it was interest and this one admittedly was
    • 1:39:24in the spreadsheet capitalized because that's what it was
    • 1:39:26that was not something i chose manually in the spreadsheet
    • 1:39:27so if i just use a comma separated list that was not something i chose manually
    • 1:39:29of column names notice what i can do now so if i just use a comma separated list
    • 1:39:31it's a little hard to see for us humans of column names notice what i can do now
    • 1:39:33because there's a lot going on now it's a little hard to see for us humans
    • 1:39:35but notice that in double quotes on the because there's a lot going on now
    • 1:39:37left there are all of the time stamps but notice that in double quotes on the
    • 1:39:40which represent the time at which you left there are all of the time stamps
    • 1:39:41all submitted your favorite shows and on which represent the time at which you
    • 1:39:43the right of the comma there's another all submitted your favorite shows and on
    • 1:39:45quoted string the right of the comma there's another
    • 1:39:47that is the title of the show that you quoted string
    • 1:39:48liked although sqlite omits the com that is the title of the show that you
    • 1:39:50the quotes if it's just a single word liked although sqlite omits the com
    • 1:39:52like friends just by convention the quotes if it's just a single word
    • 1:39:54you know in fact if i want to get all of like friends just by convention
    • 1:39:56the columns turns out there's some you know in fact if i want to get all of
    • 1:39:57shorthand syntax for that the columns turns out there's some
    • 1:39:59star is the so-called wildcard operator shorthand syntax for that
    • 1:40:01and it will get me star is the so-called wildcard operator
    • 1:40:02all of the columns from left to right in and it will get me
    • 1:40:04my table and voila now i see all of the all of the columns from left to right in
    • 1:40:07data my table and voila now i see all of the
    • 1:40:07including all of the genres data
    • 1:40:10as well so now i effectively have three including all of the genres
    • 1:40:13columns being outputted as well so now i effectively have three
    • 1:40:14all at once here well this is not that columns being outputted
    • 1:40:18useful all at once here well this is not that
    • 1:40:18thus far in fact all i've been doing is useful
    • 1:40:20really just outputting the contents of thus far in fact all i've been doing is
    • 1:40:21the csv really just outputting the contents of
    • 1:40:22but sql is powerful because it comes the csv
    • 1:40:24with other features right out of the box but sql is powerful because it comes
    • 1:40:26somewhat similar in spirits of functions with other features right out of the box
    • 1:40:28that are built into google spreadsheets somewhat similar in spirits of functions
    • 1:40:30in excel that are built into google spreadsheets
    • 1:40:31but now we can use them ultimately in in excel
    • 1:40:32our own code so functions like average but now we can use them ultimately in
    • 1:40:35count distinct lower max min our own code so functions like average
    • 1:40:37and upper and bunches more these are all count distinct lower max min
    • 1:40:39functions built into sql and upper and bunches more these are all
    • 1:40:41that you can use as part of your query functions built into sql
    • 1:40:43to sort of that you can use as part of your query
    • 1:40:44alter the data as it's coming back from to sort of
    • 1:40:47the database not permanently but as it's alter the data as it's coming back from
    • 1:40:49coming back to you the database not permanently but as it's
    • 1:40:50so that it's in a format you actually coming back to you
    • 1:40:51care about so for instance one of my so that it's in a format you actually
    • 1:40:53goals earlier was to get back just the care about so for instance one of my
    • 1:40:55distinct goals earlier was to get back just the
    • 1:40:56the unique titles and we had to write distinct
    • 1:40:58all that annoying code the unique titles and we had to write
    • 1:40:59using a set and then add things to the all that annoying code
    • 1:41:01set and then loop over it again right using a set and then add things to the
    • 1:41:02like set and then loop over it again right
    • 1:41:03that was not a huge amount of code but like
    • 1:41:04it definitely took us what five ten that was not a huge amount of code but
    • 1:41:06minutes to get the job done at least it definitely took us what five ten
    • 1:41:08in sql you can do all of that in one minutes to get the job done at least
    • 1:41:10breath i'm going to go ahead now and do in sql you can do all of that in one
    • 1:41:12this breath i'm going to go ahead now and do
    • 1:41:13select not just title from this
    • 1:41:16shows let me go ahead and select select not just title from
    • 1:41:19distinct shows let me go ahead and select
    • 1:41:20title from shows so distinct again is an distinct
    • 1:41:23available function in sql title from shows so distinct again is an
    • 1:41:25that does what the name says it's going available function in sql
    • 1:41:26to filter out all of the titles to just that does what the name says it's going
    • 1:41:28give me the distinct ones back so if i to filter out all of the titles to just
    • 1:41:30hit enter now give me the distinct ones back so if i
    • 1:41:31you'll see a similarly messy list but hit enter now
    • 1:41:35including no idea someone that doesn't you'll see a similarly messy list but
    • 1:41:37watch tv including an unsorted list including no idea someone that doesn't
    • 1:41:40of those titles so i think we can watch tv including an unsorted list
    • 1:41:42probably start to clean this thing up as of those titles so i think we can
    • 1:41:44we did before probably start to clean this thing up as
    • 1:41:45let me go ahead and now select not just we did before
    • 1:41:47distinct but let me go ahead and let me go ahead and now select not just
    • 1:41:49uppercase everything as well distinct but let me go ahead and
    • 1:41:51and i can use upper as another function uppercase everything as well
    • 1:41:53and notice i'm just nesting things like and i can use upper as another function
    • 1:41:55the output of one function as we've seen and notice i'm just nesting things like
    • 1:41:56in many languages now can be the input the output of one function as we've seen
    • 1:41:58to another in many languages now can be the input
    • 1:41:59let me hit enter now and now it's to another
    • 1:42:01getting a little let me hit enter now and now it's
    • 1:42:02more canonicalized so to speak because getting a little
    • 1:42:04i'm using capitalization for everything more canonicalized so to speak because
    • 1:42:06but it would seem that things still i'm using capitalization for everything
    • 1:42:09aren't really but it would seem that things still
    • 1:42:10sorted it's just the same order in which aren't really
    • 1:42:12you inputted them but without sorted it's just the same order in which
    • 1:42:14duplicates this time so it turns out you inputted them but without
    • 1:42:16that sql has duplicates this time so it turns out
    • 1:42:17other syntax that we can use to make our that sql has
    • 1:42:20queries more precise and more powerful other syntax that we can use to make our
    • 1:42:23so in addition to these kinds of queries more precise and more powerful
    • 1:42:24functions that you can use to alter the so in addition to these kinds of
    • 1:42:26data that's being shown to you and functions that you can use to alter the
    • 1:42:28coming back data that's being shown to you and
    • 1:42:28you can also use these kinds of clauses coming back
    • 1:42:31or syntax in sql queries you can also use these kinds of clauses
    • 1:42:33you can say where which is the or syntax in sql queries
    • 1:42:35equivalent of a condition you can say where which is the
    • 1:42:36you can say give select all of this data equivalent of a condition
    • 1:42:38where something is true you can say give select all of this data
    • 1:42:40or false you can say like where you can where something is true
    • 1:42:42say give me data that doesn't isn't or false you can say like where you can
    • 1:42:44exactly this but is like this say give me data that doesn't isn't
    • 1:42:46you can order the data by some column exactly this but is like this
    • 1:42:48you can limit the number of you can order the data by some column
    • 1:42:49rows that come back and you can group you can limit the number of
    • 1:42:52identical values together in some way rows that come back and you can group
    • 1:42:54so let's see a few examples of this let identical values together in some way
    • 1:42:56me go back here so let's see a few examples of this let
    • 1:42:57and play around now with uh how about me go back here
    • 1:42:59the office that was the one we looked at and play around now with uh how about
    • 1:43:01earlier so let me go ahead and select the office that was the one we looked at
    • 1:43:02title from earlier so let me go ahead and select
    • 1:43:03shows where title equals title from
    • 1:43:07the office quote unquote semicolon so shows where title equals
    • 1:43:10i've added this the office quote unquote semicolon so
    • 1:43:10where predicate so to speak where title i've added this
    • 1:43:14equals quote unquote the office so sql where predicate so to speak where title
    • 1:43:16is nice similar in spirit to python equals quote unquote the office so sql
    • 1:43:18it's more user-friendly perhaps than c is nice similar in spirit to python
    • 1:43:20where everything kinda sort of reads it's more user-friendly perhaps than c
    • 1:43:22like an english sentence even though where everything kinda sort of reads
    • 1:43:23it's a little more like an english sentence even though
    • 1:43:24uh precise and it's a little more it's a little more
    • 1:43:26succinct let me go ahead and hit enter uh precise and it's a little more
    • 1:43:28and voila that's how many of you succinct let me go ahead and hit enter
    • 1:43:31inputted the office and voila that's how many of you
    • 1:43:33but notice it's not everyone is it we're inputted the office
    • 1:43:36missing some still but notice it's not everyone is it we're
    • 1:43:37it seems that i got back only those of missing some still
    • 1:43:40you who typed in literally it seems that i got back only those of
    • 1:43:41the office capital t capital o so what you who typed in literally
    • 1:43:44if i want to be a little more resilient the office capital t capital o so what
    • 1:43:46than that well let me get back if i want to be a little more resilient
    • 1:43:47any rose where you all typed in office than that well let me get back
    • 1:43:50maybe you omitted the um any rose where you all typed in office
    • 1:43:53the article the so let me go ahead and maybe you omitted the um
    • 1:43:55say not title the article the so let me go ahead and
    • 1:43:56equals office but let me go ahead and say not title
    • 1:43:58say where the title is equals office but let me go ahead and
    • 1:44:00like office but i don't want it to just say where the title is
    • 1:44:02be office i want to allow for maybe some like office but i don't want it to just
    • 1:44:04stuff at the beginning be office i want to allow for maybe some
    • 1:44:06maybe some stuff at the end and even stuff at the beginning
    • 1:44:07though this seems like a bit of an maybe some stuff at the end and even
    • 1:44:08inconsistency though this seems like a bit of an
    • 1:44:09in the context of using like there's inconsistency
    • 1:44:12another wildcard character the percent in the context of using like there's
    • 1:44:15sign another wildcard character the percent
    • 1:44:15represents zero or more characters to sign
    • 1:44:18the left represents zero or more characters to
    • 1:44:19and this percent sign represents zero or the left
    • 1:44:21more characters to the right and this percent sign represents zero or
    • 1:44:22so it's kind of this catch-all that will more characters to the right
    • 1:44:24now find me all so it's kind of this catch-all that will
    • 1:44:26titles that somewhere have o f f i now find me all
    • 1:44:29c e inside of them and it turns out like titles that somewhere have o f f i
    • 1:44:31is case insensitive so i don't even need c e inside of them and it turns out like
    • 1:44:33to worry about capitalization with like is case insensitive so i don't even need
    • 1:44:34now let me hit enter and voila now i get to worry about capitalization with like
    • 1:44:37back more answers and you can really see now let me hit enter and voila now i get
    • 1:44:39the messiness now notice up here back more answers and you can really see
    • 1:44:41one of you used lowercase you know that the messiness now notice up here
    • 1:44:43tends to be common when one of you used lowercase you know that
    • 1:44:44typing things in quickly one of you did tends to be common when
    • 1:44:47it lowercase here and then also gave us typing things in quickly one of you did
    • 1:44:49an extra white space at the end it lowercase here and then also gave us
    • 1:44:50one of you just typed in office one of an extra white space at the end
    • 1:44:52you typed in the office again with a one of you just typed in office one of
    • 1:44:54space at the end you typed in the office again with a
    • 1:44:55and so there's a lot of variation here space at the end
    • 1:44:56and that's why when we forced everything and so there's a lot of variation here
    • 1:44:58to uppercase and we started trimming and that's why when we forced everything
    • 1:45:00things to uppercase and we started trimming
    • 1:45:00we were able to get rid of a lot of things
    • 1:45:03those redundancies we were able to get rid of a lot of
    • 1:45:05well in fact let's go ahead and and those redundancies
    • 1:45:06order this now so let me go back to well in fact let's go ahead and and
    • 1:45:09selecting order this now so let me go back to
    • 1:45:10the distinct uppercase title so select selecting
    • 1:45:12distinct the distinct uppercase title so select
    • 1:45:13upper of title distinct
    • 1:45:17from shows and let me now order by which upper of title
    • 1:45:20is a new clause from shows and let me now order by which
    • 1:45:22the uppercased version of title is a new clause
    • 1:45:25so now notice there's a few things going the uppercased version of title
    • 1:45:27on here but i'm just building up more so now notice there's a few things going
    • 1:45:28complicated queries similar learn to on here but i'm just building up more
    • 1:45:30scratch where we just started throwing complicated queries similar learn to
    • 1:45:31more and more puzzle pieces out of scratch where we just started throwing
    • 1:45:32problem more and more puzzle pieces out of
    • 1:45:33i'm selecting all of the distinct problem
    • 1:45:35uppercase titles i'm selecting all of the distinct
    • 1:45:36from the shows table but i'm going to uppercase titles
    • 1:45:38order the results this time from the shows table but i'm going to
    • 1:45:40by the uppercase version of title so order the results this time
    • 1:45:44everything's going to be uppercase and by the uppercase version of title so
    • 1:45:45then it's going to be sorted a through z everything's going to be uppercase and
    • 1:45:47hit enter now and now things are a then it's going to be sorted a through z
    • 1:45:49little easier to make sense of notice hit enter now and now things are a
    • 1:45:51the quotes little easier to make sense of notice
    • 1:45:51are there only when there are multiple the quotes
    • 1:45:53words in a title otherwise sequel light are there only when there are multiple
    • 1:45:553 doesn't bother showing us words in a title otherwise sequel light
    • 1:45:57but notice here's all the the shows and 3 doesn't bother showing us
    • 1:45:59if we keep scrolling up the p's the ends but notice here's all the the shows and
    • 1:46:02the if we keep scrolling up the p's the ends
    • 1:46:02m's the l's and so forth it's indeed the
    • 1:46:04alphabetized m's the l's and so forth it's indeed
    • 1:46:06thanks to using order by all right let alphabetized
    • 1:46:09me pause for just a second because i thanks to using order by all right let
    • 1:46:10know that's a lot all at once me pause for just a second because i
    • 1:46:13any questions thus far on select know that's a lot all at once
    • 1:46:16or on distinct or on upper table names any questions thus far on select
    • 1:46:19the where clause the order by clause or on distinct or on upper table names
    • 1:46:22it's a lot quickly but it just generally the where clause the order by clause
    • 1:46:25expresses the kinds of problems we've it's a lot quickly but it just generally
    • 1:46:27already seen expresses the kinds of problems we've
    • 1:46:28but solved in code anything on your end already seen
    • 1:46:34brian but solved in code anything on your end
    • 1:46:35no hands here all right well let's start brian
    • 1:46:38to solve no hands here all right well let's start
    • 1:46:38more similar problems now in sql by to solve
    • 1:46:41writing way less code than we did more similar problems now in sql by
    • 1:46:44a bit ago in python suppose i want to writing way less code than we did
    • 1:46:46actually figure out a bit ago in python suppose i want to
    • 1:46:47the counts of these most popular shows actually figure out
    • 1:46:50so i want to the counts of these most popular shows
    • 1:46:51combine all of the identical shows and so i want to
    • 1:46:54figure out all of the corresponding combine all of the identical shows and
    • 1:46:55counts well let me go ahead and try this figure out all of the corresponding
    • 1:46:58let me go ahead and select again um counts well let me go ahead and try this
    • 1:47:02the uppercase version of title but i'm let me go ahead and select again um
    • 1:47:04not going to do distinct this time the uppercase version of title but i'm
    • 1:47:05because i want to do that a little not going to do distinct this time
    • 1:47:06differently because i want to do that a little
    • 1:47:07i'm going to select the uppercase differently
    • 1:47:09version of title the i'm going to select the uppercase
    • 1:47:10count of those titles so the number of version of title the
    • 1:47:13times a given title appears so count as count of those titles so the number of
    • 1:47:15a new keyword now times a given title appears so count as
    • 1:47:16from shows but now how do i figure out a new keyword now
    • 1:47:20what the count is from shows but now how do i figure out
    • 1:47:21well if you think about this table as what the count is
    • 1:47:23having a lot of titles well if you think about this table as
    • 1:47:25title title title title title it would having a lot of titles
    • 1:47:27be nice to kind of group the identical title title title title title it would
    • 1:47:30titles together be nice to kind of group the identical
    • 1:47:31and then actually count titles together
    • 1:47:35how many such titles we group together and then actually count
    • 1:47:38and the syntax for that is literally to how many such titles we group together
    • 1:47:40say and the syntax for that is literally to
    • 1:47:40group by upper title this tells sql say
    • 1:47:44to group all of the uppercase titles group by upper title this tells sql
    • 1:47:46together kind of collapse multiple rows to group all of the uppercase titles
    • 1:47:48into one together kind of collapse multiple rows
    • 1:47:49but keep track of the count of titles into one
    • 1:47:53after that collapse let me go ahead now but keep track of the count of titles
    • 1:47:55and after that collapse let me go ahead now
    • 1:47:56hit enter and you'll see very similar and
    • 1:47:59to one of the earlier python programs we hit enter and you'll see very similar
    • 1:48:01wrote all of the titles to one of the earlier python programs we
    • 1:48:02on the left followed by a comma followed wrote all of the titles
    • 1:48:05by the count so one of you really likes on the left followed by a comma followed
    • 1:48:06tom and jerry one of you really likes by the count so one of you really likes
    • 1:48:08top tom and jerry one of you really likes
    • 1:48:09top gear if i scroll up though two of top
    • 1:48:11you really liked the wire top gear if i scroll up though two of
    • 1:48:1223 of you here like the office although you really liked the wire
    • 1:48:15we still haven't trimmed the issue here 23 of you here like the office although
    • 1:48:17so we could still combine that further we still haven't trimmed the issue here
    • 1:48:19by trimming white space if we want but so we could still combine that further
    • 1:48:21now we're getting these kinds of counts by trimming white space if we want but
    • 1:48:22well how can i go ahead and order this now we're getting these kinds of counts
    • 1:48:26as we did before let me go ahead here well how can i go ahead and order this
    • 1:48:29and as we did before let me go ahead here
    • 1:48:30add order by count and
    • 1:48:33of title and then hit semicolon now add order by count
    • 1:48:37and now notice just as in python of title and then hit semicolon now
    • 1:48:40everything is from smallest to largest and now notice just as in python
    • 1:48:42initially with game of thrones here down everything is from smallest to largest
    • 1:48:44on the bottom initially with game of thrones here down
    • 1:48:44how can i fix this well it turns out if on the bottom
    • 1:48:47you can order things in descending order how can i fix this well it turns out if
    • 1:48:49d e s c for short instead of asc which you can order things in descending order
    • 1:48:52is the default for ascending d e s c for short instead of asc which
    • 1:48:54so if i do it in descending order now is the default for ascending
    • 1:48:56i'd have to scroll all the way back up so if i do it in descending order now
    • 1:48:57to the a's the very top i'd have to scroll all the way back up
    • 1:48:59to see where the lines begin to the a's the very top
    • 1:49:03whoops if i scroll all the way back up to see where the lines begin
    • 1:49:06to the top whoops if i scroll all the way back up
    • 1:49:07we'll see where all of the a words begin to the top
    • 1:49:09up here we'll see where all of the a words begin
    • 1:49:10and now if i want to whoops up here
    • 1:49:13whoops did i do that right sorry i don't and now if i want to whoops
    • 1:49:16want to whoops did i do that right sorry i don't
    • 1:49:17uh there we go order by count descending want to
    • 1:49:19now let me go ahead and uh there we go order by count descending
    • 1:49:20this is just a little too unwieldy to now let me go ahead and
    • 1:49:22see let me just limit myself to the top this is just a little too unwieldy to
    • 1:49:23ten and keep it simple and only look at see let me just limit myself to the top
    • 1:49:25the top ten values here ten and keep it simple and only look at
    • 1:49:26voila now i have game of thrones at 33 the top ten values here
    • 1:49:29friends at 26 the office at 23 though i voila now i have game of thrones at 33
    • 1:49:32think i'm still missing a few brian do friends at 26 the office at 23 though i
    • 1:49:34you recall the sequel function for think i'm still missing a few brian do
    • 1:49:36trimming leading and trailing white you recall the sequel function for
    • 1:49:37space trimming leading and trailing white
    • 1:49:40uh i think it's just trim trim okay
    • 1:49:43i myself did not remember so when in uh i think it's just trim trim okay
    • 1:49:44doubt google or s brian so let me go i myself did not remember so when in
    • 1:49:46ahead and fix doubt google or s brian so let me go
    • 1:49:47this let me go ahead and select ahead and fix
    • 1:49:49uppercase of this let me go ahead and select
    • 1:49:50trimming the title first and then i'm uppercase of
    • 1:49:53going to trimming the title first and then i'm
    • 1:49:53group by trimming and then uppercasing going to
    • 1:49:57it there and now enter group by trimming and then uppercasing
    • 1:49:58and voila thank you brian so now we're it there and now enter
    • 1:50:00up to our 26 and voila thank you brian so now we're
    • 1:50:02offices here so in short it took us a up to our 26
    • 1:50:05little while to get to this point in the offices here so in short it took us a
    • 1:50:06story in sql but notice what we've done little while to get to this point in the
    • 1:50:09we've taken a program that took us a few story in sql but notice what we've done
    • 1:50:10minutes and certainly a dozen or more we've taken a program that took us a few
    • 1:50:12lines of code minutes and certainly a dozen or more
    • 1:50:13and we've distilled it into something lines of code
    • 1:50:15that yes is a new language but it's just and we've distilled it into something
    • 1:50:17kind of a one-liner that yes is a new language but it's just
    • 1:50:19and once you get comfortable with a kind of a one-liner
    • 1:50:20language-like sql especially if you're and once you get comfortable with a
    • 1:50:22not even a computer scientist but maybe language-like sql especially if you're
    • 1:50:24a data scientist not even a computer scientist but maybe
    • 1:50:25or an analyst of some sort who spends a a data scientist
    • 1:50:27lot of their day looking at financial or an analyst of some sort who spends a
    • 1:50:28information or medical information or lot of their day looking at financial
    • 1:50:30really information or medical information or
    • 1:50:31any data set that can be loaded into really
    • 1:50:33rows and columns any data set that can be loaded into
    • 1:50:34once you start to speak and read sql rows and columns
    • 1:50:37as a human can you start to express some once you start to speak and read sql
    • 1:50:39pretty powerful queries as a human can you start to express some
    • 1:50:40relatively succinctly and boom get back pretty powerful queries
    • 1:50:43your answer and by using a command line relatively succinctly and boom get back
    • 1:50:45program like sql lite3 your answer and by using a command line
    • 1:50:47you can immediately see the results program like sql lite3
    • 1:50:48there albeit it's very simplistic text you can immediately see the results
    • 1:50:50but as mentioned too there albeit it's very simplistic text
    • 1:50:51there's also some graphical programs out but as mentioned too
    • 1:50:53there free and commercial there's also some graphical programs out
    • 1:50:55that also supports sql where you can there free and commercial
    • 1:50:57still type these commands and then it that also supports sql where you can
    • 1:50:58will show it to you in a more still type these commands and then it
    • 1:50:59user-friendly way will show it to you in a more
    • 1:51:00much like in windows or mac os would by user-friendly way
    • 1:51:03default much like in windows or mac os would by
    • 1:51:04so any questions now on the syntax or default
    • 1:51:07capabilities so any questions now on the syntax or
    • 1:51:09of select statements capabilities
    • 1:51:13any questions on selects of select statements
    • 1:51:19seeing anything on your end brian uh one
    • 1:51:22question came in where is the file with seeing anything on your end brian uh one
    • 1:51:24this data question came in where is the file with
    • 1:51:25actually being stored where is the good this data
    • 1:51:27question where is the actually being stored where is the good
    • 1:51:28file actually being stored so before question where is the
    • 1:51:30quitting file actually being stored so before
    • 1:51:31i can actually save this file as quitting
    • 1:51:33anything i want the file extension would i can actually save this file as
    • 1:51:34typically be anything i want the file extension would
    • 1:51:36db and in fact brian do you mind just typically be
    • 1:51:37checking what's the syntax for writing db and in fact brian do you mind just
    • 1:51:39the file manually with dot something checking what's the syntax for writing
    • 1:51:41it would be under dot help i think the file manually with dot something
    • 1:51:49i think it's dot save if followed by the
    • 1:51:50name of the file dot save so i'll call i think it's dot save if followed by the
    • 1:51:52this shows dot name of the file dot save so i'll call
    • 1:51:53uh db enter and now if i open up another this shows dot
    • 1:51:57terminal window just to demonstrate uh db enter and now if i open up another
    • 1:51:59whoops terminal window just to demonstrate
    • 1:52:01sorry close the whole thing if i now
    • 1:52:05go ahead and open up another terminal sorry close the whole thing if i now
    • 1:52:07window and type our old friend go ahead and open up another terminal
    • 1:52:08ls you'll see that now i have a csv file window and type our old friend
    • 1:52:11i have my python file from before and i ls you'll see that now i have a csv file
    • 1:52:13have a new file called i have my python file from before and i
    • 1:52:14shows.db which i've created that is the have a new file called
    • 1:52:16binary file shows.db which i've created that is the
    • 1:52:18that contains the tables that i the binary file
    • 1:52:20table that i've loaded dynamically in that contains the tables that i the
    • 1:52:22from that csv file table that i've loaded dynamically in
    • 1:52:26any other questions on select queries or from that csv file
    • 1:52:29what we can do with them any other questions on select queries or
    • 1:52:33anything on your end brian yeah a few
    • 1:52:36people are asking about what the run anything on your end brian yeah a few
    • 1:52:37time of this is people are asking about what the run
    • 1:52:39yeah really good question what is the time of this is
    • 1:52:41run time i'm gonna come back to that yeah really good question what is the
    • 1:52:42question just a little bit if that's run time i'm gonna come back to that
    • 1:52:44okay question just a little bit if that's
    • 1:52:44right now it's admittedly big o of n okay
    • 1:52:47i've not actually done anything better right now it's admittedly big o of n
    • 1:52:49than we did with our csv i've not actually done anything better
    • 1:52:50file or our python code right now it's than we did with our csv
    • 1:52:53still big o of n by default but there's file or our python code right now it's
    • 1:52:55going to be a better answer to that still big o of n by default but there's
    • 1:52:56that's going to make it something much going to be a better answer to that
    • 1:52:58more logarithmic so let me come back to that's going to make it something much
    • 1:53:00that feature more logarithmic so let me come back to
    • 1:53:01when it's time to enable it but in fact that feature
    • 1:53:03let's start to take some steps toward when it's time to enable it but in fact
    • 1:53:05that because it turns out let's start to take some steps toward
    • 1:53:06when loading in data we're not always that because it turns out
    • 1:53:08going to have the luxury of just having when loading in data we're not always
    • 1:53:09one big file and csv format that we going to have the luxury of just having
    • 1:53:11import and we go about our business one big file and csv format that we
    • 1:53:12we're gonna have to decide in advance import and we go about our business
    • 1:53:14how we wanna store the data and what we're gonna have to decide in advance
    • 1:53:15data we wanna store how we wanna store the data and what
    • 1:53:16and what the relationships might be data we wanna store
    • 1:53:18across not one single table and what the relationships might be
    • 1:53:20but multiple tables so let me go ahead across not one single table
    • 1:53:22and run one other command here that but multiple tables so let me go ahead
    • 1:53:24actually introduces the first of a and run one other command here that
    • 1:53:26problem let me go ahead and select actually introduces the first of a
    • 1:53:27title from shows where problem let me go ahead and select
    • 1:53:31genres equals for instance comedy that title from shows where
    • 1:53:33was one of the genres genres equals for instance comedy that
    • 1:53:35and notice that we get back a whole was one of the genres
    • 1:53:36bunch of results and notice that we get back a whole
    • 1:53:38but i get i bet i'm missing some i'm bunch of results
    • 1:53:41skimming through this pretty quickly but i get i bet i'm missing some i'm
    • 1:53:42but i bet i'm missing some because if i skimming through this pretty quickly
    • 1:53:45check if genre is but i bet i'm missing some because if i
    • 1:53:46equals comedy what am i omitting well check if genre is
    • 1:53:48those of you who checked multiple boxes equals comedy what am i omitting well
    • 1:53:50might have said something is a comedy those of you who checked multiple boxes
    • 1:53:52and a drama or might have said something is a comedy
    • 1:53:53comedy and romance or maybe a couple of and a drama or
    • 1:53:55other permutations of genres comedy and romance or maybe a couple of
    • 1:53:57if i'm searching for equality here other permutations of genres
    • 1:53:59equals comedy i'm only gonna get if i'm searching for equality here
    • 1:54:01those favorites from you where you only equals comedy i'm only gonna get
    • 1:54:04said my favorite tv show is a those favorites from you where you only
    • 1:54:06a comedy but what about something like said my favorite tv show is a
    • 1:54:08uh a comedy but what about something like
    • 1:54:09what if comedy what if we want to do uh
    • 1:54:11something like like what if comedy what if we want to do
    • 1:54:13comedy instead and we could say something like like
    • 1:54:15something like well so long as the word comedy instead and we could say
    • 1:54:17comedy is in there something like well so long as the word
    • 1:54:18then we should get back even more comedy is in there
    • 1:54:20results and let me stipulate that indeed then we should get back even more
    • 1:54:22i now have a longer list of results results and let me stipulate that indeed
    • 1:54:24now we have all shows where you checked i now have a longer list of results
    • 1:54:25at least the comedy box now we have all shows where you checked
    • 1:54:27but unfortunately this starts to get a at least the comedy box
    • 1:54:29little sloppy because recall what the but unfortunately this starts to get a
    • 1:54:31genres column looks like select let me little sloppy because recall what the
    • 1:54:34select genres column looks like select let me
    • 1:54:35genres from shows semicolon notice that select
    • 1:54:38all of the genres that we loaded into genres from shows semicolon notice that
    • 1:54:40this table all of the genres that we loaded into
    • 1:54:41from the csv file are a comma this table
    • 1:54:44separated list of genres that's just the from the csv file are a comma
    • 1:54:47way google forms did it separated list of genres that's just the
    • 1:54:48and that's fine for csv purposes that's way google forms did it
    • 1:54:51kind of fine for sql purposes but this and that's fine for csv purposes that's
    • 1:54:53is kind of messy kind of fine for sql purposes but this
    • 1:54:54like generally speaking storing comma is kind of messy
    • 1:54:56separated lists like generally speaking storing comma
    • 1:54:58of values in a sql database separated lists
    • 1:55:01is not what you should be doing the of values in a sql database
    • 1:55:02whole point of using a sql database is not what you should be doing the
    • 1:55:04is to move away from commas and csvs and whole point of using a sql database
    • 1:55:07to actually store things more cleanly is to move away from commas and csvs and
    • 1:55:09because in fact let me propose a problem to actually store things more cleanly
    • 1:55:11notice that suppose i want to search because in fact let me propose a problem
    • 1:55:13not for comedy but maybe also notice that suppose i want to search
    • 1:55:17music like this thereby allowing me to not for comedy but maybe also
    • 1:55:20find any shows where the word music is music like this thereby allowing me to
    • 1:55:23somewhere find any shows where the word music is
    • 1:55:24in the comma separated list there's a somewhere
    • 1:55:27subtle bug here in the comma separated list there's a
    • 1:55:28and you might have to think back to subtle bug here
    • 1:55:30where we began and you might have to think back to
    • 1:55:31the form that i select the form that you where we began
    • 1:55:33pulled up i can't show the whole thing the form that i select the form that you
    • 1:55:35here but we started with action pulled up i can't show the whole thing
    • 1:55:37adventure animation biography dot dot here but we started with action
    • 1:55:39dot adventure animation biography dot dot
    • 1:55:40music musical was also there so distinct dot
    • 1:55:45a music video versus a musical or music musical was also there so distinct
    • 1:55:47two different types of genres but notice a music video versus a musical or
    • 1:55:50my query at the moment two different types of genres but notice
    • 1:55:51what's problematic with this at the my query at the moment
    • 1:55:53moment we would seem to have a bug what's problematic with this at the
    • 1:55:55whereby moment we would seem to have a bug
    • 1:55:56this query will select not only music whereby
    • 1:55:58but also this query will select not only music
    • 1:55:59musical and so this is just where things but also
    • 1:56:02are getting messy now yeah you know what musical and so this is just where things
    • 1:56:03we could kind of clean this up are getting messy now yeah you know what
    • 1:56:05maybe we could put a comma here so that we could kind of clean this up
    • 1:56:08it can't just be maybe we could put a comma here so that
    • 1:56:09music something it has to be music comma it can't just be
    • 1:56:12but what if music is the last box that music something it has to be music comma
    • 1:56:13you checked well then it's music but what if music is the last box that
    • 1:56:15nothing there is no comma so now i need you checked well then it's music
    • 1:56:17to like or things together so maybe i nothing there is no comma so now i need
    • 1:56:19have to do something like we're music to like or things together so maybe i
    • 1:56:21like this have to do something like we're music
    • 1:56:21or or genres like quote-unquote like this
    • 1:56:25music like this but honestly this is or or genres like quote-unquote
    • 1:56:28just getting messy music like this but honestly this is
    • 1:56:29like this is poorly designed if you're just getting messy
    • 1:56:30just storing your data as a comma like this is poorly designed if you're
    • 1:56:32separated list of values inside of a just storing your data as a comma
    • 1:56:34column separated list of values inside of a
    • 1:56:34and you have to resort to this kind of column
    • 1:56:36hack to figure out well maybe it's over and you have to resort to this kind of
    • 1:56:38here or here or here and thinking about hack to figure out well maybe it's over
    • 1:56:40all the permutations of syntax here or here or here and thinking about
    • 1:56:42you're doing it wrong you're not using a all the permutations of syntax
    • 1:56:44sql database to its fullest potential you're doing it wrong you're not using a
    • 1:56:46so how do we go about designing this sql database to its fullest potential
    • 1:56:48thing better and actually load this csv so how do we go about designing this
    • 1:56:50into a database a little more cleanly in thing better and actually load this csv
    • 1:56:53short into a database a little more cleanly in
    • 1:56:53how do we get rid of the stupid commas short
    • 1:56:56in the genres how do we get rid of the stupid commas
    • 1:56:57column and instead put one word in the genres
    • 1:57:01comedy or music or musical in each of column and instead put one word
    • 1:57:04those cells comedy or music or musical in each of
    • 1:57:04so to speak not two not three one only those cells
    • 1:57:07without throwing away some of those so to speak not two not three one only
    • 1:57:09genres without throwing away some of those
    • 1:57:10well let me introduce a few building genres
    • 1:57:11blocks that'll get us there well let me introduce a few building
    • 1:57:13it turns out in sequel that when you blocks that'll get us there
    • 1:57:15want to it turns out in sequel that when you
    • 1:57:17create your own tables we can want to
    • 1:57:22sorry let me just think for a moment
    • 1:57:25it turns out when creating your own
    • 1:57:27tables and loading data into a database it turns out when creating your own
    • 1:57:29on your own tables and loading data into a database
    • 1:57:30we're going to need more than just on your own
    • 1:57:32select select of course is just for we're going to need more than just
    • 1:57:33reading select select of course is just for
    • 1:57:34but if we're going to do this better and reading
    • 1:57:36not just use sqlite 3's but if we're going to do this better and
    • 1:57:37built-in dot import command but instead not just use sqlite 3's
    • 1:57:41we're going to write some code built-in dot import command but instead
    • 1:57:42to load all of our data into maybe two we're going to write some code
    • 1:57:46tables to load all of our data into maybe two
    • 1:57:46one for the titles one for the genres tables
    • 1:57:49we're going to need a little more one for the titles one for the genres
    • 1:57:50expressiveness we're going to need a little more
    • 1:57:51when it comes to um sql and so for that expressiveness
    • 1:57:55we're going to need one the ability to when it comes to um sql and so for that
    • 1:57:57create our own tables and we've seen a we're going to need one the ability to
    • 1:57:58glimpse of this before create our own tables and we've seen a
    • 1:57:59but we're also going to need to see glimpse of this before
    • 1:58:00another piece of syntax as well so but we're also going to need to see
    • 1:58:02inserting another piece of syntax as well so
    • 1:58:03inserting is another command that you inserting
    • 1:58:05can execute inserting is another command that you
    • 1:58:06on a sql database in order to actually can execute
    • 1:58:09add data to a database on a sql database in order to actually
    • 1:58:11which is great because if i want to add data to a database
    • 1:58:13ultimately which is great because if i want to
    • 1:58:14iterate over that same csv but this time ultimately
    • 1:58:17manually iterate over that same csv but this time
    • 1:58:18add all of the rows to the database manually
    • 1:58:21myself add all of the rows to the database
    • 1:58:22well then i'm going to need some way of myself
    • 1:58:23inserting and the syntax for that is well then i'm going to need some way of
    • 1:58:25as follows insert into the name of the inserting and the syntax for that is
    • 1:58:27table as follows insert into the name of the
    • 1:58:28the column or columns that you want to table
    • 1:58:30insert values into the column or columns that you want to
    • 1:58:32then literally the word values and then insert values into
    • 1:58:34literally in parentheses again then literally the word values and then
    • 1:58:36the actual list of values so it's a literally in parentheses again
    • 1:58:38little abstract when we see it in this the actual list of values so it's a
    • 1:58:39generic form but we'll see this more little abstract when we see it in this
    • 1:58:41explicitly in just a moment here generic form but we'll see this more
    • 1:58:44as well so when it comes to inserting explicitly in just a moment here
    • 1:58:47something into a database let's go ahead as well so when it comes to inserting
    • 1:58:48and try this so suppose that um something into a database let's go ahead
    • 1:58:51let's see what's what's a show that the and try this so suppose that um
    • 1:58:54muppet show like i grew up loving the let's see what's what's a show that the
    • 1:58:55muppet show it was out in like the 70s muppet show like i grew up loving the
    • 1:58:57and i don't think it was on the list but muppet show it was out in like the 70s
    • 1:58:59i can check this for sure so and i don't think it was on the list but
    • 1:59:01select star from shows where i can check this for sure so
    • 1:59:05uh title like let's just search for select star from shows where
    • 1:59:08muppets with a wild card uh title like let's just search for
    • 1:59:10and i'm guessing no one put it there muppets with a wild card
    • 1:59:11good so it's a missed opportunity i and i'm guessing no one put it there
    • 1:59:13forgot to fill out the form good so it's a missed opportunity i
    • 1:59:14i could go back and fill out the form forgot to fill out the form
    • 1:59:15and re-import the csv but let's go ahead i could go back and fill out the form
    • 1:59:17and do this manually so let me go ahead and re-import the csv but let's go ahead
    • 1:59:18and and do this manually so let me go ahead
    • 1:59:19insert into shows and
    • 1:59:22what columns title and genres insert into shows
    • 1:59:26and i guess i could do a time stamp just what columns title and genres
    • 1:59:28for kicks and i guess i could do a time stamp just
    • 1:59:29and then i'm going to insert what values for kicks
    • 1:59:31the values will be well i don't know and then i'm going to insert what values
    • 1:59:33whatever the values will be well i don't know
    • 1:59:33time it is now so i'm going to cheat whatever
    • 1:59:35there just rather than look up the date time it is now so i'm going to cheat
    • 1:59:36and the time there just rather than look up the date
    • 1:59:37the title will be like the muppet show and the time
    • 1:59:40and the genres will be it was kind of a the title will be like the muppet show
    • 1:59:43comedy it was kind of a musical and the genres will be it was kind of a
    • 1:59:45so we'll kind of leave it at that comedy it was kind of a musical
    • 1:59:47semicolon so we'll kind of leave it at that
    • 1:59:48so again this follows the standard semicolon
    • 1:59:50syntax here of specifying the table you so again this follows the standard
    • 1:59:52want to insert into syntax here of specifying the table you
    • 1:59:53the columns you want to insert into and want to insert into
    • 1:59:55the values you want to put into those the columns you want to insert into and
    • 1:59:56columns the values you want to put into those
    • 1:59:57and i'm going to go ahead and hit enter columns
    • 1:59:58now nothing seems to have happened and i'm going to go ahead and hit enter
    • 2:00:01but if i now select that same query now nothing seems to have happened
    • 2:00:07oh okay uh it's still nothing because i but if i now select that same query
    • 2:00:09made a subtle mistake oh okay uh it's still nothing because i
    • 2:00:11uh not i'm not searching for muppets made a subtle mistake
    • 2:00:13plural i'm searching for muppet uh not i'm not searching for muppets
    • 2:00:14singular the muppet show voila now you plural i'm searching for muppet
    • 2:00:17see my row singular the muppet show voila now you
    • 2:00:18in this database and so insert would see my row
    • 2:00:20give us the ability now to insert new in this database and so insert would
    • 2:00:22rows into the database give us the ability now to insert new
    • 2:00:23suppose you want to update rows into the database
    • 2:00:26something maybe you know some of the suppose you want to update
    • 2:00:28muppet shows were actually pretty something maybe you know some of the
    • 2:00:29dramatic so how might we do that muppet shows were actually pretty
    • 2:00:31well i can say update shows set dramatic so how might we do that
    • 2:00:36let's see genres equal to comedy well i can say update shows set
    • 2:00:39drama musical where let's see genres equal to comedy
    • 2:00:43title equals the muppet show drama musical where
    • 2:00:46so again i'll pull up the canonical title equals the muppet show
    • 2:00:48syntax for this in a bit but for now so again i'll pull up the canonical
    • 2:00:50just a little teaser syntax for this in a bit but for now
    • 2:00:51you can update things pretty simply and just a little teaser
    • 2:00:53even though it takes a little getting you can update things pretty simply and
    • 2:00:54used to the syntax even though it takes a little getting
    • 2:00:55it kind of does what it says update used to the syntax
    • 2:00:57shows set genres equal to this it kind of does what it says update
    • 2:01:00where title equals that and now i can go shows set genres equal to this
    • 2:01:02ahead and enter where title equals that and now i can go
    • 2:01:03if i go ahead and select the same thing ahead and enter
    • 2:01:05just like in a terminal window you can if i go ahead and select the same thing
    • 2:01:06go up and down that's how i'm typing so just like in a terminal window you can
    • 2:01:08quickly go up and down that's how i'm typing so
    • 2:01:08i'm just going up and down to previous quickly
    • 2:01:10commands voila now i see that the muppet i'm just going up and down to previous
    • 2:01:12show is a comedy a drama commands voila now i see that the muppet
    • 2:01:13drama and a musical well i i take show is a comedy a drama
    • 2:01:16issue though with one of the more drama and a musical well i i take
    • 2:01:18popular shows that was in the list a issue though with one of the more
    • 2:01:20whole bunch of you liked um popular shows that was in the list a
    • 2:01:22let's say friends which i've never whole bunch of you liked um
    • 2:01:24really been a fan of and let me go ahead let's say friends which i've never
    • 2:01:25and select really been a fan of and let me go ahead
    • 2:01:26uh title from shows where and select
    • 2:01:30title equals friends and maybe i should uh title from shows where
    • 2:01:33be a little more rigorous than that i title equals friends and maybe i should
    • 2:01:35should say title be a little more rigorous than that i
    • 2:01:36like friends just in case there was should say title
    • 2:01:38different capitalizations enter like friends just in case there was
    • 2:01:40a lot of you really liked friends in different capitalizations enter
    • 2:01:42fact how many of you well recall that i a lot of you really liked friends in
    • 2:01:44can do this i can say fact how many of you well recall that i
    • 2:01:45count and i can let sequel do the count can do this i can say
    • 2:01:47for me 26 of you i disagree with count and i can let sequel do the count
    • 2:01:49strongly and there's a couple of you for me 26 of you i disagree with
    • 2:01:50that even added all the dots but we'll strongly and there's a couple of you
    • 2:01:52deal with you later that even added all the dots but we'll
    • 2:01:53so suppose i do take issue with this deal with you later
    • 2:01:55well delete from so suppose i do take issue with this
    • 2:01:57shows where title equals quote unquote well delete from
    • 2:02:00friends actually title like friends shows where title equals quote unquote
    • 2:02:03let's get them all friends actually title like friends
    • 2:02:04enter and now if we select this again let's get them all
    • 2:02:07i'm sorry friends has been cancelled so enter and now if we select this again
    • 2:02:10you can again i'm sorry friends has been cancelled so
    • 2:02:11update the you can execute these you can again
    • 2:02:12fundamental commands of crud create read update the you can execute these
    • 2:02:14update and delete fundamental commands of crud create read
    • 2:02:15by using create or insert by using update and delete
    • 2:02:18select by using create or insert by using
    • 2:02:19by using update literally and delete select
    • 2:02:21literally as well by using update literally and delete
    • 2:02:22and that's about it like even though literally as well
    • 2:02:23this was a lot quickly there really are and that's about it like even though
    • 2:02:25just those four fundamental operations this was a lot quickly there really are
    • 2:02:27in sql plus some of these add-on just those four fundamental operations
    • 2:02:29features like these additional functions in sql plus some of these add-on
    • 2:02:31like features like these additional functions
    • 2:02:31count that you can use and also some of like
    • 2:02:34these keywords like where and the like count that you can use and also some of
    • 2:02:36well let me propose that we now do these keywords like where and the like
    • 2:02:38better if we have the ability to select well let me propose that we now do
    • 2:02:40data better if we have the ability to select
    • 2:02:41and create tables and insert data let's data
    • 2:02:43go ahead and write our own and create tables and insert data let's
    • 2:02:45python script that uses sql go ahead and write our own
    • 2:02:49as in a loop to read over my csv file python script that uses sql
    • 2:02:52and to insert insert insert insert each as in a loop to read over my csv file
    • 2:02:54of the rows manually because honestly it and to insert insert insert insert each
    • 2:02:56will take me forever to like manually of the rows manually because honestly it
    • 2:02:58type out like hundreds of sql queries to will take me forever to like manually
    • 2:03:00import all of your rows into a new type out like hundreds of sql queries to
    • 2:03:01database i want to write a program that import all of your rows into a new
    • 2:03:03does this instead and i'm going to database i want to write a program that
    • 2:03:05propose that we design it does this instead and i'm going to
    • 2:03:06in the following way i'm going to have propose that we design it
    • 2:03:09two tables this time in the following way i'm going to have
    • 2:03:11represented here with this artist's two tables this time
    • 2:03:12rendition one is going to be called represented here with this artist's
    • 2:03:14shows rendition one is going to be called
    • 2:03:15one is going to be called genres and shows
    • 2:03:17this is a one is going to be called genres and
    • 2:03:18fundamental principle of designing this is a
    • 2:03:21relational databases to figure out the fundamental principle of designing
    • 2:03:24relationships among data relational databases to figure out the
    • 2:03:26and to normalize your data to normalize relationships among data
    • 2:03:29your data means to eliminate and to normalize your data to normalize
    • 2:03:31redundancies to normalize your data your data means to eliminate
    • 2:03:33means to redundancies to normalize your data
    • 2:03:35eliminate mentions of the same words means to
    • 2:03:37again and again eliminate mentions of the same words
    • 2:03:38and have just single sources of truth again and again
    • 2:03:40for your data so to speak so what do i and have just single sources of truth
    • 2:03:42mean by that for your data so to speak so what do i
    • 2:03:43i'm going to propose that we instead mean by that
    • 2:03:45create a simpler table called shows i'm going to propose that we instead
    • 2:03:47that has just two columns one is going create a simpler table called shows
    • 2:03:49to be called id that has just two columns one is going
    • 2:03:50which is new the other is going to be to be called id
    • 2:03:52called title as before honestly i don't which is new the other is going to be
    • 2:03:54care about time stamps so we're just called title as before honestly i don't
    • 2:03:56going to throw that value away which is care about time stamps so we're just
    • 2:03:57another upside of writing our own going to throw that value away which is
    • 2:03:59program we can add or remove any data we another upside of writing our own
    • 2:04:01want program we can add or remove any data we
    • 2:04:02for id i'm introducing this which is want
    • 2:04:04going to be a unique identifier for id i'm introducing this which is
    • 2:04:06literally a simple integer one going to be a unique identifier
    • 2:04:08two three all the way up to a billion or literally a simple integer one
    • 2:04:09two billion however many favorites we two three all the way up to a billion or
    • 2:04:11have two billion however many favorites we
    • 2:04:12i'm just going to let this auto have
    • 2:04:13increment as we go why i'm just going to let this auto
    • 2:04:15i propose that we move to another table increment as we go why
    • 2:04:19all of the genres and that instead of i propose that we move to another table
    • 2:04:22having all of the genres and that instead of
    • 2:04:23one or two or three or five genres having
    • 2:04:26in one column as a stupid comma one or two or three or five genres
    • 2:04:28separated list in one column as a stupid comma
    • 2:04:29which is stupid only in the sense that separated list
    • 2:04:31it's just messy right it means that i which is stupid only in the sense that
    • 2:04:32have to run stupid commands where i'm it's just messy right it means that i
    • 2:04:34checking for the comma here the comma have to run stupid commands where i'm
    • 2:04:35there checking for the comma here the comma
    • 2:04:36it's very hackish so to speak bad design there
    • 2:04:39instead of doing that it's very hackish so to speak bad design
    • 2:04:40i'm going to create another table that instead of doing that
    • 2:04:42also has two columns i'm going to create another table that
    • 2:04:44one is going to be called show id and also has two columns
    • 2:04:46the other is going to be called one is going to be called show id and
    • 2:04:47genre and genre here is just going to be the other is going to be called
    • 2:04:50a single word now genre and genre here is just going to be
    • 2:04:52that column will contain single words a single word now
    • 2:04:54for genres like that column will contain single words
    • 2:04:55comedy or music or musical but for genres like
    • 2:04:59we're going to associate all of those comedy or music or musical but
    • 2:05:01genres we're going to associate all of those
    • 2:05:02with the original show to which they genres
    • 2:05:04belong per your google form submissions with the original show to which they
    • 2:05:07by using this show id here so what does belong per your google form submissions
    • 2:05:10this mean in particular by using this show id here so what does
    • 2:05:12by adding to our first table shows this this mean in particular
    • 2:05:15unique identifier one two three four by adding to our first table shows this
    • 2:05:17five six unique identifier one two three four
    • 2:05:18i can now refer to that same show five six
    • 2:05:22in a very efficient way using a very i can now refer to that same show
    • 2:05:24simple number instead of redundantly in a very efficient way using a very
    • 2:05:26having the office the office the office simple number instead of redundantly
    • 2:05:28again and again i can refer to it by having the office the office the office
    • 2:05:29just one again and again i can refer to it by
    • 2:05:30canonical number which is only going to just one
    • 2:05:31be like four bytes or 32 bits canonical number which is only going to
    • 2:05:34pretty efficient but i can still be like four bytes or 32 bits
    • 2:05:36associate that show with pretty efficient but i can still
    • 2:05:37one genre or two or three or more or associate that show with
    • 2:05:40even none one genre or two or three or more or
    • 2:05:42so in this way every row in our current even none
    • 2:05:45table so in this way every row in our current
    • 2:05:46is going to become one or more rows table
    • 2:05:49in our new pair of tables we're is going to become one or more rows
    • 2:05:52factoring out the genres in our new pair of tables we're
    • 2:05:54so that we can add multiple rows for factoring out the genres
    • 2:05:57every show potentially so that we can add multiple rows for
    • 2:05:58but still remap those genres back to the every show potentially
    • 2:06:02original show itself but still remap those genres back to the
    • 2:06:04so what is some of the the buzzwords original show itself
    • 2:06:06here what's some of the the language so what is some of the the buzzwords
    • 2:06:08to be familiar with well we need to know here what's some of the the language
    • 2:06:11what kinds of to be familiar with well we need to know
    • 2:06:12types are at our disposal here so for what kinds of
    • 2:06:14that let me propose types are at our disposal here so for
    • 2:06:15this let me propose that we have that let me propose
    • 2:06:19this list here it turns out in sql lite this let me propose that we have
    • 2:06:22there are five main data types and this list here it turns out in sql lite
    • 2:06:24that's a bit of an oversimplification there are five main data types and
    • 2:06:25but there's five main data types that's a bit of an oversimplification
    • 2:06:27some of which look familiar a couple of but there's five main data types
    • 2:06:28which are a little weird um some of which look familiar a couple of
    • 2:06:30integer is a thing uh real which are a little weird um
    • 2:06:33is the same thing as float so an integer integer is a thing uh real
    • 2:06:35might be a 32 bit or four byte is the same thing as float so an integer
    • 2:06:37value like one two three or four might be a 32 bit or four byte
    • 2:06:39positive or negative a real number is value like one two three or four
    • 2:06:40going to have a decimal point in it a positive or negative a real number is
    • 2:06:42floating point value probably 32 bits by going to have a decimal point in it a
    • 2:06:44default but those kinds of things the floating point value probably 32 bits by
    • 2:06:45sizes of these types default but those kinds of things the
    • 2:06:47vary by system just like they sizes of these types
    • 2:06:49technically did in c so do they vary by vary by system just like they
    • 2:06:51system in the world of sql but generally technically did in c so do they vary by
    • 2:06:53speaking these are good rules of thumb system in the world of sql but generally
    • 2:06:55text is just that it's sort of the speaking these are good rules of thumb
    • 2:06:56equivalent of a string of some length text is just that it's sort of the
    • 2:06:58but then in sql lite it turns out equivalent of a string of some length
    • 2:07:00there's two other data types we've not but then in sql lite it turns out
    • 2:07:02seen before numeric and blob there's two other data types we've not
    • 2:07:04but more on those in just a little bit seen before numeric and blob
    • 2:07:05blob is binary large object it means you but more on those in just a little bit
    • 2:07:07can store zeros and ones in your blob is binary large object it means you
    • 2:07:09database can store zeros and ones in your
    • 2:07:09numeric is going to be something that's database
    • 2:07:11number like but isn't a number per se numeric is going to be something that's
    • 2:07:13it's like a year or a time number like but isn't a number per se
    • 2:07:15something that has numbers but isn't it's like a year or a time
    • 2:07:17just a simple integer something that has numbers but isn't
    • 2:07:18at that and then we propose too that sql just a simple integer
    • 2:07:21lite is going to allow us to specify two at that and then we propose too that sql
    • 2:07:23when we create our own columns manually lite is going to allow us to specify two
    • 2:07:25by executing the sql code ourselves we when we create our own columns manually
    • 2:07:28can specify that a column by executing the sql code ourselves we
    • 2:07:30cannot be null thus far we've ignored can specify that a column
    • 2:07:32this but some of you might have uh taken cannot be null thus far we've ignored
    • 2:07:34the fifth and just not given us the this but some of you might have uh taken
    • 2:07:36title of a show or a genre the fifth and just not given us the
    • 2:07:37your answers might be blank uh some of title of a show or a genre
    • 2:07:40you may be in registering for a website your answers might be blank uh some of
    • 2:07:42don't want to provide information like you may be in registering for a website
    • 2:07:43where you live or your phone number don't want to provide information like
    • 2:07:45so a database in general sometimes does where you live or your phone number
    • 2:07:47want to support null values so a database in general sometimes does
    • 2:07:49but you might want to say that it can't want to support null values
    • 2:07:50be null a website probably needs your but you might want to say that it can't
    • 2:07:52email address needs your be null a website probably needs your
    • 2:07:55password and a few other fields but not email address needs your
    • 2:07:56everything and there's another keyword password and a few other fields but not
    • 2:07:58in sql just so you've seen it called everything and there's another keyword
    • 2:08:00unique in sql just so you've seen it called
    • 2:08:01where you can additionally say that unique
    • 2:08:02whatever values are in this column where you can additionally say that
    • 2:08:04must be unique so a data a website might whatever values are in this column
    • 2:08:06also use that must be unique so a data a website might
    • 2:08:07if you want to make sure that the same also use that
    • 2:08:09email address can't register for your if you want to make sure that the same
    • 2:08:11website multiple times email address can't register for your
    • 2:08:12you just specify that the email column website multiple times
    • 2:08:14is unique that way you can't put you just specify that the email column
    • 2:08:16multiple people is unique that way you can't put
    • 2:08:16in with identical email addresses so multiple people
    • 2:08:19long story short this is just more of in with identical email addresses so
    • 2:08:21the tools in our sql toolkit because long story short this is just more of
    • 2:08:23we'll see some of these now indirectly the tools in our sql toolkit because
    • 2:08:25and the last piece of jargon we need we'll see some of these now indirectly
    • 2:08:26before designing our own tables and the last piece of jargon we need
    • 2:08:28is going to be this it turns out that in before designing our own tables
    • 2:08:31sql is going to be this it turns out that in
    • 2:08:32there's this notion of primary keys and sql
    • 2:08:34foreign keys and we've not seen this in there's this notion of primary keys and
    • 2:08:36spreadsheets foreign keys and we've not seen this in
    • 2:08:37odds are unless you've been working in spreadsheets
    • 2:08:39the real world for some years and you odds are unless you've been working in
    • 2:08:41have fairly fancy spreadsheets in front the real world for some years and you
    • 2:08:43of you as an analyst or financial person have fairly fancy spreadsheets in front
    • 2:08:44or the like of you as an analyst or financial person
    • 2:08:45odds are you've not seen keys or unique or the like
    • 2:08:48identifiers in quite the same way odds are you've not seen keys or unique
    • 2:08:50but they're relatively simple in fact identifiers in quite the same way
    • 2:08:52let me go back to but they're relatively simple in fact
    • 2:08:54our picture before and propose that when let me go back to
    • 2:08:57you have two tables like this our picture before and propose that when
    • 2:09:00and you want to use a simple integer to you have two tables like this
    • 2:09:02uniquely identify and you want to use a simple integer to
    • 2:09:03all of the rows in one of the tables uniquely identify
    • 2:09:05that's called all of the rows in one of the tables
    • 2:09:06technically an id that's what i'll call that's called
    • 2:09:08it by convention you could call it technically an id that's what i'll call
    • 2:09:09anything you want it by convention you could call it
    • 2:09:10but id just means it's a unique anything you want
    • 2:09:11identifier but but id just means it's a unique
    • 2:09:13semantically this id is what's called a identifier but
    • 2:09:15primary key a primary key is the column semantically this id is what's called a
    • 2:09:18in a table that uniquely identifies primary key a primary key is the column
    • 2:09:21every row in a table that uniquely identifies
    • 2:09:23this means you can have multiple every row
    • 2:09:24versions of the office in that this means you can have multiple
    • 2:09:26title field but each of those rows is versions of the office in that
    • 2:09:29going to have its own number title field but each of those rows is
    • 2:09:30uniquely potentially so primary key going to have its own number
    • 2:09:32uniquely identifies uniquely potentially so primary key
    • 2:09:34each row in another table like genres uniquely identifies
    • 2:09:37which i'm proposing we each row in another table like genres
    • 2:09:38create in just a moment it turns out which i'm proposing we
    • 2:09:41that you're create in just a moment it turns out
    • 2:09:42welcome to refer back to another table that you're
    • 2:09:45by way of that unique identifier welcome to refer back to another table
    • 2:09:48but when it's in this context that id is by way of that unique identifier
    • 2:09:51called a foreign key but when it's in this context that id is
    • 2:09:52so even though i've called it show id called a foreign key
    • 2:09:54here that's just a convention in a lot so even though i've called it show id
    • 2:09:56of sql databases to imply that here that's just a convention in a lot
    • 2:09:58this is technically a column called id of sql databases to imply that
    • 2:10:01in a table this is technically a column called id
    • 2:10:02called show or shows plural in this case in a table
    • 2:10:05so if there's a number one here and called show or shows plural in this case
    • 2:10:08suppose that so if there's a number one here and
    • 2:10:09the office has a unique id of one suppose that
    • 2:10:12we would have a row in this table called the office has a unique id of one
    • 2:10:14id is one we would have a row in this table called
    • 2:10:16title is the office the office might be id is one
    • 2:10:18in title is the office the office might be
    • 2:10:19the comedy category the drama category in
    • 2:10:22and the romance category so multiple the comedy category the drama category
    • 2:10:24ones and the romance category so multiple
    • 2:10:25therefore in the genres table we want to ones
    • 2:10:28output therefore in the genres table we want to
    • 2:10:29three rows the number one one one output
    • 2:10:32in each of those rows but the words three rows the number one one one
    • 2:10:34comedy drama in each of those rows but the words
    • 2:10:36romance in each of those rows comedy drama
    • 2:10:38respectively romance in each of those rows
    • 2:10:39so again the goal here is to just design respectively
    • 2:10:41our database better not have these so again the goal here is to just design
    • 2:10:43stupid comma separated list of values our database better not have these
    • 2:10:45inside of a single column we want to stupid comma separated list of values
    • 2:10:47kind of blow that up explode it inside of a single column we want to
    • 2:10:49into individual rows you might think kind of blow that up explode it
    • 2:10:52well why don't we just use multiple into individual rows you might think
    • 2:10:54columns well why don't we just use multiple
    • 2:10:54but again per our principle from columns
    • 2:10:57spreadsheets you should not be in the but again per our principle from
    • 2:10:58habit of adding more and more columns spreadsheets you should not be in the
    • 2:11:00when the data is all the same like genre habit of adding more and more columns
    • 2:11:02genre genre when the data is all the same like genre
    • 2:11:03right the sort of stupid way to do this genre genre
    • 2:11:05in the spreadsheet world would be to right the sort of stupid way to do this
    • 2:11:06have one column called in the spreadsheet world would be to
    • 2:11:07genre one another column called genre have one column called
    • 2:11:10two another column called genre one another column called genre
    • 2:11:11genre three genre four and you can two another column called
    • 2:11:13imagine just how stupid and inefficient genre three genre four and you can
    • 2:11:15this imagine just how stupid and inefficient
    • 2:11:15is a lot of those columns are going to this
    • 2:11:17be empty for shows with very few is a lot of those columns are going to
    • 2:11:19genres and it's just kind of messy at be empty for shows with very few
    • 2:11:22that point so better genres and it's just kind of messy at
    • 2:11:23in the world of relational databases to that point so better
    • 2:11:26have something like a second table in the world of relational databases to
    • 2:11:28where you have multiple rows that have something like a second table
    • 2:11:30somehow link back to where you have multiple rows that
    • 2:11:32that primary key by way of what we're somehow link back to
    • 2:11:34calling conceptually that primary key by way of what we're
    • 2:11:35a foreign key all right so let's go calling conceptually
    • 2:11:39ahead now and try to write this code let a foreign key all right so let's go
    • 2:11:40me go back to my ide ahead now and try to write this code let
    • 2:11:42let me quit out of sql light now me go back to my ide
    • 2:11:46and let me just move away i'm going to let me quit out of sql light now
    • 2:11:50move this away and let me just move away i'm going to
    • 2:11:53my file for just a moment so that we're
    • 2:11:54only left with our original data my file for just a moment so that we're
    • 2:11:56let's go about implementing a final only left with our original data
    • 2:11:58version of my python file let's go about implementing a final
    • 2:12:00that does this creates two tables one version of my python file
    • 2:12:03called that does this creates two tables one
    • 2:12:03shows one called genres and then two called
    • 2:12:06in a for loop iterates over that csv and shows one called genres and then two
    • 2:12:09inserts some data into the shows and in a for loop iterates over that csv and
    • 2:12:11other data into the genres inserts some data into the shows and
    • 2:12:13how can we do this programmatically well other data into the genres
    • 2:12:15there's a final piece of the puzzle that how can we do this programmatically well
    • 2:12:17we need there's a final piece of the puzzle that
    • 2:12:17we need some way of bridging the world we need
    • 2:12:19of python and sql we need some way of bridging the world
    • 2:12:21and here we do need a library because it of python and sql
    • 2:12:22would just be way too painful to do and here we do need a library because it
    • 2:12:24without a library would just be way too painful to do
    • 2:12:25it can be cs50 cs50s as we'll see makes without a library
    • 2:12:28this very simple it can be cs50 cs50s as we'll see makes
    • 2:12:29there are other third-party commercial this very simple
    • 2:12:31and open source libraries that you can there are other third-party commercial
    • 2:12:32also use in the real world as well and open source libraries that you can
    • 2:12:34that do the same thing but the syntax is also use in the real world as well
    • 2:12:36a little uh less friendly that do the same thing but the syntax is
    • 2:12:37so we'll start by using the cs50 library a little uh less friendly
    • 2:12:39which in python recall has functions so we'll start by using the cs50 library
    • 2:12:41like getstring and getint and getfloat which in python recall has functions
    • 2:12:43but today it also has support it turns like getstring and getint and getfloat
    • 2:12:46out but today it also has support it turns
    • 2:12:46for sql capabilities as well so i'm out
    • 2:12:49going to go back to my favorites file for sql capabilities as well so i'm
    • 2:12:51and i'm going to import not only csv but going to go back to my favorites file
    • 2:12:54i'm also going to import and i'm going to import not only csv but
    • 2:12:56from the cs50 library a feature i'm also going to import
    • 2:12:59called sql so we have a a variable if from the cs50 library a feature
    • 2:13:02you will called sql so we have a a variable if
    • 2:13:03inside of the cs50 library or rather a you will
    • 2:13:05function inside of the cs50 library inside of the cs50 library or rather a
    • 2:13:07called sql function inside of the cs50 library
    • 2:13:09that if i call it will allow me to load called sql
    • 2:13:11a sql lite that if i call it will allow me to load
    • 2:13:12database into memory so how do i do this a sql lite
    • 2:13:15well let me go ahead and add a couple of database into memory so how do i do this
    • 2:13:16new lines of code well let me go ahead and add a couple of
    • 2:13:17let me go ahead and open up new lines of code
    • 2:13:21a file called shows.db let me go ahead and open up
    • 2:13:24but this time in write mode and then a file called shows.db
    • 2:13:26just for kick just for now rather i'm but this time in write mode and then
    • 2:13:28going to go ahead and close it right just for kick just for now rather i'm
    • 2:13:29away going to go ahead and close it right
    • 2:13:30this is a a pythonic way of creating an away
    • 2:13:32empty file it's kind of stupid looking this is a a pythonic way of creating an
    • 2:13:34but by opening a file called shows.db in empty file it's kind of stupid looking
    • 2:13:37write mode and then immediately closing but by opening a file called shows.db in
    • 2:13:39it write mode and then immediately closing
    • 2:13:39it has the effect of creating the file it
    • 2:13:41closing the file so i now have an empty it has the effect of creating the file
    • 2:13:43file with which to interact closing the file so i now have an empty
    • 2:13:45i could also do this as an aside by file with which to interact
    • 2:13:47doing this touch i could also do this as an aside by
    • 2:13:48shows.db touch kind of a strange a doing this touch
    • 2:13:51command shows.db touch kind of a strange a
    • 2:13:52but in a terminal window it means to command
    • 2:13:54create a file but in a terminal window it means to
    • 2:13:55if it doesn't exist so we could also do create a file
    • 2:13:57that instead but that would be on if it doesn't exist so we could also do
    • 2:13:59uh that would be independent of python that instead but that would be on
    • 2:14:01so once i've created this file uh that would be independent of python
    • 2:14:03let me go ahead and open the file now as so once i've created this file
    • 2:14:06a let me go ahead and open the file now as
    • 2:14:06sqlite database i'm going to declare a a
    • 2:14:08variable called db for database sqlite database i'm going to declare a
    • 2:14:10i'm going to use the sql function from variable called db for database
    • 2:14:12cs50s library i'm going to use the sql function from
    • 2:14:14and i'm going to open via somewhat cs50s library
    • 2:14:15cryptic string this and i'm going to open via somewhat
    • 2:14:17sqlite colon slash slash cryptic string this
    • 2:14:20shows dot db now it looks like a url sqlite colon slash slash
    • 2:14:24http colon slash but it's sql lite shows dot db now it looks like a url
    • 2:14:27instead and there's three http colon slash but it's sql lite
    • 2:14:29slashes instead of the usual two but instead and there's three
    • 2:14:31this line of code line 6 has the result slashes instead of the usual two but
    • 2:14:33of opening now that otherwise empty file this line of code line 6 has the result
    • 2:14:36with nothing of opening now that otherwise empty file
    • 2:14:36in it yet as being a sqlite database with nothing
    • 2:14:40using cs50s library why did i do that in it yet as being a sqlite database
    • 2:14:44well i did that because i now want to using cs50s library why did i do that
    • 2:14:46create my first table well i did that because i now want to
    • 2:14:48let me go ahead and execute db.execute create my first table
    • 2:14:51so there's a function called execute let me go ahead and execute db.execute
    • 2:14:52inside of the cs50 sql library so there's a function called execute
    • 2:14:55and i'm going to go ahead and run this inside of the cs50 sql library
    • 2:14:57create table and i'm going to go ahead and run this
    • 2:14:58called shows the type of which create table
    • 2:15:01the columns of which are an id which is called shows the type of which
    • 2:15:04going to be an integer the columns of which are an id which is
    • 2:15:05a title which is going to be text the going to be an integer
    • 2:15:08primary key a title which is going to be text the
    • 2:15:09in which is going to be the id column primary key
    • 2:15:12so this is a bit cryptic but let's see in which is going to be the id column
    • 2:15:14what's happening so this is a bit cryptic but let's see
    • 2:15:15i seem to now in line eight be combining what's happening
    • 2:15:19python with sql and this is where now i seem to now in line eight be combining
    • 2:15:22like programming gets really python with sql and this is where now
    • 2:15:23powerful fancy cool difficult whatever like programming gets really
    • 2:15:26however you want to perceive it powerful fancy cool difficult whatever
    • 2:15:27i can actually use one language inside however you want to perceive it
    • 2:15:29of another how well sql is just a bunch i can actually use one language inside
    • 2:15:31of textual commands of another how well sql is just a bunch
    • 2:15:32up until now i've been typing them out of textual commands
    • 2:15:34manually in this program called sql up until now i've been typing them out
    • 2:15:35lite3 manually in this program called sql
    • 2:15:36there's nothing stopping me though from lite3
    • 2:15:38storing those same commands in python there's nothing stopping me though from
    • 2:15:40strings storing those same commands in python
    • 2:15:41and then passing them to a database strings
    • 2:15:43using code and then passing them to a database
    • 2:15:44the code i'm using is a function called using code
    • 2:15:46execute and its purpose in life and cs50 the code i'm using is a function called
    • 2:15:49staff wrote this execute and its purpose in life and cs50
    • 2:15:50is to pass the argument staff wrote this
    • 2:15:53from your python code into the database is to pass the argument
    • 2:15:56for execution from your python code into the database
    • 2:15:57so it's like the programmatic way of for execution
    • 2:16:00just typing things manually at the sql so it's like the programmatic way of
    • 2:16:02lite prompt a few minutes ago just typing things manually at the sql
    • 2:16:04so that's going to go ahead and create lite prompt a few minutes ago
    • 2:16:05my table called shows in which i'm going so that's going to go ahead and create
    • 2:16:07to store all of those unique ids my table called shows in which i'm going
    • 2:16:09and also the titles and then let me do to store all of those unique ids
    • 2:16:11this again db.execute and also the titles and then let me do
    • 2:16:14create table genres and that's going to this again db.execute
    • 2:16:18have a column called create table genres and that's going to
    • 2:16:19show id which is an integer also genre have a column called
    • 2:16:22which is text and lastly it's going to show id which is an integer also genre
    • 2:16:25have a foreign key it's going to wrap which is text and lastly it's going to
    • 2:16:27onto two have a foreign key it's going to wrap
    • 2:16:28it's going to wrap a little long here on onto two
    • 2:16:30show id it's going to wrap a little long here on
    • 2:16:32which references the shows table id show id
    • 2:16:35all right so this is a lot so let's just which references the shows table id
    • 2:16:37recap left to right all right so this is a lot so let's just
    • 2:16:38db execute is my python function that recap left to right
    • 2:16:40executes any sql i want db execute is my python function that
    • 2:16:42create table genres creates a table executes any sql i want
    • 2:16:44called genres the columns create table genres creates a table
    • 2:16:46in that table will be something called called genres the columns
    • 2:16:48show id which is an integer in that table will be something called
    • 2:16:50and genre which is a text field but it's show id which is an integer
    • 2:16:53going to be one genre at a time and genre which is a text field but it's
    • 2:16:54not multiple and then here i'm going to be one genre at a time
    • 2:16:57specifying a foreign key not multiple and then here i'm
    • 2:16:59will be the show id column which happens specifying a foreign key
    • 2:17:02to refer back to will be the show id column which happens
    • 2:17:03the shows tables ids column to refer back to
    • 2:17:07uh brian question or comment the shows tables ids column
    • 2:17:14brian over to you uh never mind no okay
    • 2:17:17did i just fix the bug brian over to you uh never mind no okay
    • 2:17:18yes okay brian's very good at secretly did i just fix the bug
    • 2:17:20messaging me when i screwed up but i saw yes okay brian's very good at secretly
    • 2:17:21it first so messaging me when i screwed up but i saw
    • 2:17:22well i knew that something was wrong all it first so
    • 2:17:24right so it's a little cryptic but all well i knew that something was wrong all
    • 2:17:26this is doing is implementing for us right so it's a little cryptic but all
    • 2:17:28the equivalent of this picture here i this is doing is implementing for us
    • 2:17:30could have manually typed both of these the equivalent of this picture here i
    • 2:17:32sql commands at that blinking prompt could have manually typed both of these
    • 2:17:34but again no i want to write a program sql commands at that blinking prompt
    • 2:17:36now in python that creates the tables but again no i want to write a program
    • 2:17:38for me and now now in python that creates the tables
    • 2:17:39more interestingly loads the data into for me and now
    • 2:17:43that that database so let's go ahead and more interestingly loads the data into
    • 2:17:45do this now that that database so let's go ahead and
    • 2:17:46i'm not going to select a title from the do this now
    • 2:17:48user because i want to import everything i'm not going to select a title from the
    • 2:17:49i'm not going to use any counting or user because i want to import everything
    • 2:17:51anything like that so let's go ahead and i'm not going to use any counting or
    • 2:17:52just go inside of my loop as before anything like that so let's go ahead and
    • 2:17:55and this time let's go ahead and for just go inside of my loop as before
    • 2:17:58row in reader let's go ahead and get the and this time let's go ahead and for
    • 2:18:00current title as we've always done row in reader let's go ahead and get the
    • 2:18:02but let's also as always go ahead and current title as we've always done
    • 2:18:04strip it of white space but let's also as always go ahead and
    • 2:18:06and capitalize it just to canonicalize strip it of white space
    • 2:18:08it and now i'm going to go ahead and and capitalize it just to canonicalize
    • 2:18:10execute db it and now i'm going to go ahead and
    • 2:18:11execute quote unquote insert into execute db
    • 2:18:15shows the title column the value of execute quote unquote insert into
    • 2:18:18quote-unquote uh title so i want to put shows the title column the value of
    • 2:18:22the title here it turns out quote-unquote uh title so i want to put
    • 2:18:24that sql libraries like ours supports the title here it turns out
    • 2:18:27one a final piece of syntax which is a that sql libraries like ours supports
    • 2:18:29placeholder in c one a final piece of syntax which is a
    • 2:18:30we used percent s in python we just used placeholder in c
    • 2:18:33curly braces and put the word right we used percent s in python we just used
    • 2:18:34there curly braces and put the word right
    • 2:18:35in sql we have a third approach to the there
    • 2:18:37same problem just syntactically in sql we have a third approach to the
    • 2:18:38different but same problem just syntactically
    • 2:18:39uh conceptually the same you put a different but
    • 2:18:41question mark where you want to put a uh conceptually the same you put a
    • 2:18:42placeholder question mark where you want to put a
    • 2:18:43and then outside of this string i'm placeholder
    • 2:18:45going to actually type in and then outside of this string i'm
    • 2:18:47the value that i want to plug into that going to actually type in
    • 2:18:49question mark so this is so similar to the value that i want to plug into that
    • 2:18:51printf in week one question mark so this is so similar to
    • 2:18:52but instead of percent s it's a question printf in week one
    • 2:18:55mark now but instead of percent s it's a question
    • 2:18:56and then a comma separated list of the mark now
    • 2:18:57arguments you want to plug in for those and then a comma separated list of the
    • 2:18:59placeholders arguments you want to plug in for those
    • 2:19:00so now this line of code 16 has just placeholders
    • 2:19:04inserted so now this line of code 16 has just
    • 2:19:05all of those values into my database and inserted
    • 2:19:06let's go ahead and run this before i go all of those values into my database and
    • 2:19:08any further let's go ahead and run this before i go
    • 2:19:08let me go ahead and do this i'm going to any further
    • 2:19:10go ahead now and run python let me go ahead and do this i'm going to
    • 2:19:12stop pie and cross my fingers as always go ahead now and run python
    • 2:19:15it's taking a moment stop pie and cross my fingers as always
    • 2:19:16taking a moment that's because there's a it's taking a moment
    • 2:19:18decent sized file there taking a moment that's because there's a
    • 2:19:20or i screwed up take decent sized file there
    • 2:19:23this is taking too long what did i do or i screwed up take
    • 2:19:26wrong this is taking too long what did i do
    • 2:19:28uh-huh checking wrong
    • 2:19:34okay this is inexplicably taking too
    • 2:19:36long oh interesting okay this is inexplicably taking too
    • 2:19:40it's getting bigger and bigger for some
    • 2:19:42reason it's getting bigger and bigger for some
    • 2:19:44uh oh okay i should have just been more reason
    • 2:19:47patient all right uh oh okay i should have just been more
    • 2:19:48so it just seems my connection's a patient all right
    • 2:19:50little slow so so it just seems my connection's a
    • 2:19:52uh as i expected everything is 100 little slow so
    • 2:19:54correct and it's working fine uh as i expected everything is 100
    • 2:19:55so now let's go ahead and see what i correct and it's working fine
    • 2:19:57actually did if i type ls so now let's go ahead and see what i
    • 2:19:59notice that i have a file called actually did if i type ls
    • 2:20:01shows.db this is brand new because my notice that i have a file called
    • 2:20:03python program created at this time shows.db this is brand new because my
    • 2:20:05let's go ahead and run sqlite3 of python program created at this time
    • 2:20:07shows.db just so i can now see what's let's go ahead and run sqlite3 of
    • 2:20:09inside of it shows.db just so i can now see what's
    • 2:20:10notice that i can do dot schema just to inside of it
    • 2:20:12see what tables exist notice that i can do dot schema just to
    • 2:20:14and indeed the two tables that i created see what tables exist
    • 2:20:17in my python code seem to exist and indeed the two tables that i created
    • 2:20:19but notice that there's if i do select in my python code seem to exist
    • 2:20:22star from shows let's see all the data but notice that there's if i do select
    • 2:20:26voila there is a table that's been star from shows let's see all the data
    • 2:20:28programmatically created voila there is a table that's been
    • 2:20:30and it has notice this time no time programmatically created
    • 2:20:32stamps no genres and it has notice this time no time
    • 2:20:33but it has an id on the left and the stamps no genres
    • 2:20:36title but it has an id on the left and the
    • 2:20:36on the right and amazingly all of the title
    • 2:20:39ids on the right and amazingly all of the
    • 2:20:40are monotonically increasing from 1 on ids
    • 2:20:42up to 513 in this case why is that are monotonically increasing from 1 on
    • 2:20:45well one of the features you get in a up to 513 in this case why is that
    • 2:20:46sql database is if you define a column well one of the features you get in a
    • 2:20:49as being a primary key in sql light sql database is if you define a column
    • 2:20:51it's going to be auto-incremented for as being a primary key in sql light
    • 2:20:53you recall that nowhere in my code it's going to be auto-incremented for
    • 2:20:55did i even have a line an integer you recall that nowhere in my code
    • 2:20:59inputting 1 then 2 then 3. i could did i even have a line an integer
    • 2:21:01absolutely do that inputting 1 then 2 then 3. i could
    • 2:21:02i could have done something like this absolutely do that
    • 2:21:03counter uh rather i could have done something like this
    • 2:21:05i could have done something like this counter uh rather
    • 2:21:07counter equals one i could have done something like this
    • 2:21:09and then down here i could have said uh counter equals one
    • 2:21:11id comma title and then down here i could have said uh
    • 2:21:13give myself two placeholders and then id comma title
    • 2:21:15pass in the counter each time give myself two placeholders and then
    • 2:21:16i could have implemented this myself and pass in the counter each time
    • 2:21:18then on each iteration done counter plus i could have implemented this myself and
    • 2:21:20equals one but with sql databases as then on each iteration done counter plus
    • 2:21:22we've seen equals one but with sql databases as
    • 2:21:23you get a lot more functionality built we've seen
    • 2:21:25in i don't have to do any of that you get a lot more functionality built
    • 2:21:27because if i've declared that id as in i don't have to do any of that
    • 2:21:29being a primary key because if i've declared that id as
    • 2:21:31sqlite is going to insert it for me and being a primary key
    • 2:21:34increment it also for me as well all sqlite is going to insert it for me and
    • 2:21:37right increment it also for me as well all
    • 2:21:37so if i go back to sql light though right
    • 2:21:39notice that i do have ids and titles but so if i go back to sql light though
    • 2:21:41if i select star from notice that i do have ids and titles but
    • 2:21:43genres there's of course nothing there if i select star from
    • 2:21:45yet so how now genres there's of course nothing there
    • 2:21:46do i get all of the genres for each of yet so how now
    • 2:21:48these shows in i need to finish my do i get all of the genres for each of
    • 2:21:50script these shows in i need to finish my
    • 2:21:51so inside of this same loop i have not script
    • 2:21:53only the title so inside of this same loop i have not
    • 2:21:54in my current row but i also have genres only the title
    • 2:21:58in the current row in my current row but i also have genres
    • 2:21:59but the genres are separated by commas in the current row
    • 2:22:02recall that in the csv but the genres are separated by commas
    • 2:22:04next to every title there's a comma recall that in the csv
    • 2:22:05separated list of titles so how do next to every title there's a comma
    • 2:22:07genres how do i get at each genre separated list of titles so how do
    • 2:22:09individually well i'd like to be able to genres how do i get at each genre
    • 2:22:11say for individually well i'd like to be able to
    • 2:22:12genre in row bracket say for
    • 2:22:15genres but this is not going to work genre in row bracket
    • 2:22:18because that's not genres but this is not going to work
    • 2:22:19going to be split up based on those because that's not
    • 2:22:22commas that's literally just going to going to be split up based on those
    • 2:22:23iterate over commas that's literally just going to
    • 2:22:24in fact all of the characters in that iterate over
    • 2:22:26string as we saw last week in fact all of the characters in that
    • 2:22:28but it turns out that strings in python string as we saw last week
    • 2:22:30have a fancy split function but it turns out that strings in python
    • 2:22:32whereby i can split on a comma followed have a fancy split function
    • 2:22:35by a space whereby i can split on a comma followed
    • 2:22:36and what this function will do for me in by a space
    • 2:22:38python is take a comma separated list of and what this function will do for me in
    • 2:22:40genres python is take a comma separated list of
    • 2:22:41and explode it so to speak split it genres
    • 2:22:44on every comma space into a python and explode it so to speak split it
    • 2:22:48list containing genre after on every comma space into a python
    • 2:22:51genre in an actual python list allah list containing genre after
    • 2:22:54square brackets genre in an actual python list allah
    • 2:22:55so now i can iterate over that list square brackets
    • 2:22:58of individual genres and inside of here so now i can iterate over that list
    • 2:23:00i can do db of individual genres and inside of here
    • 2:23:01execute insert into genres i can do db
    • 2:23:04show id genre the values execute insert into genres
    • 2:23:08question mark question mark but huh show id genre the values
    • 2:23:12there's a problem i can definitely plug question mark question mark but huh
    • 2:23:14in the current genre which is this there's a problem i can definitely plug
    • 2:23:17but i need to put something here still in the current genre which is this
    • 2:23:20for that first question mark i need a but i need to put something here still
    • 2:23:22value for the show id for that first question mark i need a
    • 2:23:24how do i know what the id is of the value for the show id
    • 2:23:27current tv show well it turns out the how do i know what the id is of the
    • 2:23:29library can help you with this current tv show well it turns out the
    • 2:23:31when you insert new rows into a table library can help you with this
    • 2:23:34that has a primary key when you insert new rows into a table
    • 2:23:36it turns out that most libraries will that has a primary key
    • 2:23:38return you that value in some way it turns out that most libraries will
    • 2:23:40and if i go back to line 15 and i return you that value in some way
    • 2:23:43actually store the return and if i go back to line 15 and i
    • 2:23:44value of db execute after using insert actually store the return
    • 2:23:48my the library will tell me what was the value of db execute after using insert
    • 2:23:51integer that was just used for this my the library will tell me what was the
    • 2:23:53given show maybe it's one two three integer that was just used for this
    • 2:23:55i don't have to know or care as the given show maybe it's one two three
    • 2:23:56programmer but the return value i can i don't have to know or care as the
    • 2:23:58store in a variable programmer but the return value i can
    • 2:23:59and then down here i can literally put store in a variable
    • 2:24:02that same id and then down here i can literally put
    • 2:24:04so that now if i am inputting the office that same id
    • 2:24:06whose id is one so that now if i am inputting the office
    • 2:24:07into the shows table and its genres are whose id is one
    • 2:24:10comedy drama into the shows table and its genres are
    • 2:24:11romance i can now inside of this for comedy drama
    • 2:24:13loop this nested for loop romance i can now inside of this for
    • 2:24:15insert one followed by comedy loop this nested for loop
    • 2:24:18one followed by drama one followed by insert one followed by comedy
    • 2:24:21romance one followed by drama one followed by
    • 2:24:22three rows all at once and so now let's romance
    • 2:24:25go back down here three rows all at once and so now let's
    • 2:24:27into my terminal window let me remove go back down here
    • 2:24:29the old shows.db with rm into my terminal window let me remove
    • 2:24:31just to start fresh let me go ahead and the old shows.db with rm
    • 2:24:33rerun just to start fresh let me go ahead and
    • 2:24:35python of favorites.pie i'll be more rerun
    • 2:24:37patient this time because python of favorites.pie i'll be more
    • 2:24:39cloud's being a little slow so it's patient this time because
    • 2:24:41doing some thinking and in fact there's cloud's being a little slow so it's
    • 2:24:42more work being done now doing some thinking and in fact there's
    • 2:24:44at this point in the story my program is more work being done now
    • 2:24:46presumably iterating over all of the at this point in the story my program is
    • 2:24:48rows presumably iterating over all of the
    • 2:24:49in the csv and it's inserting into rows
    • 2:24:52the shows table one at a time and then in the csv and it's inserting into
    • 2:24:54it's inserting the shows table one at a time and then
    • 2:24:55one or more genres into it's inserting
    • 2:24:58the genres table and one or more genres into
    • 2:25:02if we keep going and going let me let the genres table and
    • 2:25:05that if we keep going and going let me let
    • 2:25:05do its thing there let me let it that
    • 2:25:08do its thing there let me let it do its do its thing there let me let it
    • 2:25:12thing there do its thing there let me let it do its
    • 2:25:13it's a little slow if we're on a faster thing there
    • 2:25:15system or if i were doing it on my own it's a little slow if we're on a faster
    • 2:25:16mac or pc it would probably go system or if i were doing it on my own
    • 2:25:18down more quickly but you can see here mac or pc it would probably go
    • 2:25:20an example of why i used the dot import down more quickly but you can see here
    • 2:25:22command in the first place that an example of why i used the dot import
    • 2:25:23automated some of this process but command in the first place that
    • 2:25:25unfortunately it didn't allow me to automated some of this process but
    • 2:25:26change unfortunately it didn't allow me to
    • 2:25:27the format of my data but the key point change
    • 2:25:29to make here the format of my data but the key point
    • 2:25:30is that even though this is taking a to make here
    • 2:25:32little bit of time to insert these is that even though this is taking a
    • 2:25:33hundreds of rows all at once little bit of time to insert these
    • 2:25:35i'm only going to have to do this once hundreds of rows all at once
    • 2:25:37and was asked a bit ago i'm only going to have to do this once
    • 2:25:39was the performance of this it turns out and was asked a bit ago
    • 2:25:41that now that we have full control was the performance of this it turns out
    • 2:25:44over the sql database it turns out we're that now that we have full control
    • 2:25:46going to have the ability to over the sql database it turns out we're
    • 2:25:48to actually um improve the performance going to have the ability to
    • 2:25:51thereof and i really don't want to keep to actually um improve the performance
    • 2:25:53stalling here i really just want this thereof and i really don't want to keep
    • 2:25:55thing to finish i should have used a stalling here i really just want this
    • 2:25:56faster computer because this does not thing to finish i should have used a
    • 2:25:58take nearly as long on faster computer because this does not
    • 2:26:00some other systems but let me just stall take nearly as long on
    • 2:26:03for a few more seconds some other systems but let me just stall
    • 2:26:05stalling stalling stalling stalling for a few more seconds
    • 2:26:09well let me uh suspend our stalling stalling stalling stalling
    • 2:26:12suspense for just a moment and use the well let me uh suspend our
    • 2:26:15time wisely let me switch over suspense for just a moment and use the
    • 2:26:17oh okay as expected it finished right on time wisely let me switch over
    • 2:26:20time and let me go ahead now and run sql oh okay as expected it finished right on
    • 2:26:22light 3 time and let me go ahead now and run sql
    • 2:26:23on shows.db all right so now i'm back in light 3
    • 2:26:26my raw sql environment on shows.db all right so now i'm back in
    • 2:26:28if i do select star from shows which i my raw sql environment
    • 2:26:30did before if i do select star from shows which i
    • 2:26:31we'll see all of this as before if i did before
    • 2:26:33select star from shows where we'll see all of this as before if i
    • 2:26:35title equals quote unquote the office select star from shows where
    • 2:26:37i'll see the actual title equals quote unquote the office
    • 2:26:38unique ids of all of those we didn't i'll see the actual
    • 2:26:40bother eliminating duplicates we just unique ids of all of those we didn't
    • 2:26:42kept everything as bother eliminating duplicates we just
    • 2:26:43is but we gave everything a unique id kept everything as
    • 2:26:45but if i now do select star is but we gave everything a unique id
    • 2:26:47from um genres but if i now do select star
    • 2:26:51we'll see all of the values there and from um genres
    • 2:26:52notice the key detail we'll see all of the values there and
    • 2:26:54there is only one genre per row here notice the key detail
    • 2:26:58and so we can ultimately line those up there is only one genre per row here
    • 2:27:00with our titles and our titles here we and so we can ultimately line those up
    • 2:27:02had all of these here with our titles and our titles here we
    • 2:27:10um something's wrong had all of these here
    • 2:27:14sorry let me think for a while i want to um something's wrong
    • 2:27:17get this right sorry let me think for a while i want to
    • 2:27:17let's go ahead and take our second and get this right
    • 2:27:18final five-minute break here and we'll let's go ahead and take our second and
    • 2:27:20come back and i will explain final five-minute break here and we'll
    • 2:27:21what's going on come back and i will explain
    • 2:32:02foreign
    • 2:32:06all right we are back and just before we
    • 2:32:09broke up my own self-doubt was starting all right we are back and just before we
    • 2:32:11to creep in but i'm happy to say with no broke up my own self-doubt was starting
    • 2:32:13fancy magic behind the scenes everything to creep in but i'm happy to say with no
    • 2:32:15was actually working fine i was just fancy magic behind the scenes everything
    • 2:32:17doubting was actually working fine i was just
    • 2:32:17the correctness of this if i do select doubting
    • 2:32:19star from shows the correctness of this if i do select
    • 2:32:20i indeed get back two columns one with star from shows
    • 2:32:22the unique id i indeed get back two columns one with
    • 2:32:24the so-called primary key followed by the unique id
    • 2:32:26the title of each of those shows the so-called primary key followed by
    • 2:32:28and if i similarly search for star from the title of each of those shows
    • 2:32:31genres and if i similarly search for star from
    • 2:32:32i get single genres at a time but on the genres
    • 2:32:34left hand side are not primary i get single genres at a time but on the
    • 2:32:36keys per se but now those same numbers left hand side are not primary
    • 2:32:39here in this context called foreign keys keys per se but now those same numbers
    • 2:32:41that map one to the other so for here in this context called foreign keys
    • 2:32:43instance whatever show that map one to the other so for
    • 2:32:45512 is has five different genres instance whatever show
    • 2:32:49associated with it and in fact if i 512 is has five different genres
    • 2:32:50don't go back a moment to shows associated with it and in fact if i
    • 2:32:52it looks like game of thrones was don't go back a moment to shows
    • 2:32:54decided by one of you it looks like game of thrones was
    • 2:32:55as belonging in thriller history decided by one of you
    • 2:32:58adventure as belonging in thriller history
    • 2:32:58action and war as well adventure
    • 2:33:02those five so now this is what's meant action and war as well
    • 2:33:04by relational database you have this those five so now this is what's meant
    • 2:33:06relation or relationship by relational database you have this
    • 2:33:08across multiple tables that link some relation or relationship
    • 2:33:10data in one across multiple tables that link some
    • 2:33:11to some other data in the like the catch data in one
    • 2:33:14though is that it would seem a little to some other data in the like the catch
    • 2:33:15harder now to answer though is that it would seem a little
    • 2:33:16questions because now i have to kind of harder now to answer
    • 2:33:18query two tables or questions because now i have to kind of
    • 2:33:20execute two separate queries and then query two tables or
    • 2:33:21combine the data but that's not actually execute two separate queries and then
    • 2:33:23the case combine the data but that's not actually
    • 2:33:24suppose that i want to answer the the case
    • 2:33:26question of what are all of the musicals suppose that i want to answer the
    • 2:33:29among your favorite tv shows question of what are all of the musicals
    • 2:33:31i can't select just the shows because among your favorite tv shows
    • 2:33:33there's no genres in there anymore i can't select just the shows because
    • 2:33:34but i also can't select just the genres there's no genres in there anymore
    • 2:33:36table because there's no but i also can't select just the genres
    • 2:33:37titles in there but there is a value table because there's no
    • 2:33:40that's bridging titles in there but there is a value
    • 2:33:41one in the other that foreign key to that's bridging
    • 2:33:43primary key relationship one in the other that foreign key to
    • 2:33:45so you know what i can do off the top of primary key relationship
    • 2:33:47my head i'm pretty sure i can select all so you know what i can do off the top of
    • 2:33:49of the show ids my head i'm pretty sure i can select all
    • 2:33:50from the genres table where a specific of the show ids
    • 2:33:53genre equals quote unquote musical from the genres table where a specific
    • 2:33:55and i don't have to worry about commas genre equals quote unquote musical
    • 2:33:56or spaces now because again in this new and i don't have to worry about commas
    • 2:33:59version or spaces now because again in this new
    • 2:33:59that i have designed programmatically version
    • 2:34:01with code musical and every other genre that i have designed programmatically
    • 2:34:04is just a single word with code musical and every other genre
    • 2:34:05if i hit enter all of these show ids is just a single word
    • 2:34:08were decided by you all as belonging to if i hit enter all of these show ids
    • 2:34:11musicals but now this is not interesting were decided by you all as belonging to
    • 2:34:13and i certainly don't want to execute 10 musicals but now this is not interesting
    • 2:34:15or so queries manually and i certainly don't want to execute 10
    • 2:34:16to look up every one of those ids but or so queries manually
    • 2:34:18notice what we can do in sql as well to look up every one of those ids but
    • 2:34:21i can nest queries let me put this whole notice what we can do in sql as well
    • 2:34:23query in parentheses for just a moment i can nest queries let me put this whole
    • 2:34:25and then prepend to it the following query in parentheses for just a moment
    • 2:34:27select title and then prepend to it the following
    • 2:34:28from shows where the primary key select title
    • 2:34:31id is in this sub query from shows where the primary key
    • 2:34:35so you can have nested queries similar id is in this sub query
    • 2:34:37in spirit a bit like in python and see so you can have nested queries similar
    • 2:34:39when you have nested for loops in spirit a bit like in python and see
    • 2:34:41in this case just like in grade school when you have nested for loops
    • 2:34:42math whatever's in the parentheses will in this case just like in grade school
    • 2:34:44be executed first math whatever's in the parentheses will
    • 2:34:45then the outer query will be executed be executed first
    • 2:34:48using then the outer query will be executed
    • 2:34:48the results of that inner query so if i using
    • 2:34:51select the title from shows where the id the results of that inner query so if i
    • 2:34:53is in select the title from shows where the id
    • 2:34:53that list of ids voila it seems that is in
    • 2:34:57somewhat amusingly several of you think that list of ids voila it seems that
    • 2:35:00that breaking bad supernatural glee somewhat amusingly several of you think
    • 2:35:02sherlock how i met your mother hawaii that breaking bad supernatural glee
    • 2:35:035-0 twin peaks the lawyer sherlock how i met your mother hawaii
    • 2:35:05and my brother my brother and me are all 5-0 twin peaks the lawyer
    • 2:35:07musicals i and my brother my brother and me are all
    • 2:35:08take exception to a few of those but so musicals i
    • 2:35:10be it you check the box for musical for take exception to a few of those but so
    • 2:35:12those shows be it you check the box for musical for
    • 2:35:13so even though we've sort of done things those shows
    • 2:35:15we've designed things better in the so even though we've sort of done things
    • 2:35:17sense that we've normalized our database we've designed things better in the
    • 2:35:19by factoring out commonalities or rather sense that we've normalized our database
    • 2:35:21we've cleaned up the data by factoring out commonalities or rather
    • 2:35:23there's still admittedly some redundancy we've cleaned up the data
    • 2:35:25there's still admittedly some redundancy there's still admittedly some redundancy
    • 2:35:27but i at least now have the data there's still admittedly some redundancy
    • 2:35:31in clean fashion so that every column but i at least now have the data
    • 2:35:34has just a single value in it and not in clean fashion so that every column
    • 2:35:36some contrived comma separated list has just a single value in it and not
    • 2:35:38suppose i want to find out all of the some contrived comma separated list
    • 2:35:39genres that you all thought the office suppose i want to find out all of the
    • 2:35:41was in so let's genres that you all thought the office
    • 2:35:42ask kind of the opposite question well was in so let's
    • 2:35:43how might i do that well to figure out ask kind of the opposite question well
    • 2:35:46the office i'm going to first need to how might i do that well to figure out
    • 2:35:47select the office i'm going to first need to
    • 2:35:48the id from shows where title select
    • 2:35:51equals quote unquote the office because the id from shows where title
    • 2:35:53a whole bunch of you equals quote unquote the office because
    • 2:35:54typed in the office and we gave each of a whole bunch of you
    • 2:35:56your answers a unique identifier so we typed in the office and we gave each of
    • 2:35:58could keep track of it your answers a unique identifier so we
    • 2:35:59and there's all of those numbers now could keep track of it
    • 2:36:00this is like dozens of responses i and there's all of those numbers now
    • 2:36:02certainly don't want to execute that this is like dozens of responses i
    • 2:36:03many queries certainly don't want to execute that
    • 2:36:04but i think a sub query will help us out many queries
    • 2:36:06again let me put parentheses around this but i think a sub query will help us out
    • 2:36:08whole thing again let me put parentheses around this
    • 2:36:09and now let me say select distinct whole thing
    • 2:36:12genre from genres where and now let me say select distinct
    • 2:36:16the show id in the genres table is in genre from genres where
    • 2:36:19that query and just for kicks let me go the show id in the genres table is in
    • 2:36:22ahead and order by that query and just for kicks let me go
    • 2:36:23uh genre so let me go ahead and execute ahead and order by
    • 2:36:26this uh genre so let me go ahead and execute
    • 2:36:27and okay somewhat amusingly those of you this
    • 2:36:29who inputed the office and okay somewhat amusingly those of you
    • 2:36:30checked boxes for animation comedy who inputed the office
    • 2:36:33documentary drama family checked boxes for animation comedy
    • 2:36:34horror reality tv romance and sci-fi i documentary drama family
    • 2:36:37take a horror reality tv romance and sci-fi i
    • 2:36:38section exception to a few of those too take a
    • 2:36:40but this is what happens when you accept section exception to a few of those too
    • 2:36:41user but this is what happens when you accept
    • 2:36:42input so here again we have with this user
    • 2:36:45sql input so here again we have with this
    • 2:36:46language the ability to express fairly sql
    • 2:36:47succinctly even though it's a lot of new language the ability to express fairly
    • 2:36:50features today all at once what would succinctly even though it's a lot of new
    • 2:36:52otherwise take me a dozen or two lines features today all at once what would
    • 2:36:54in python code to implement and god otherwise take me a dozen or two lines
    • 2:36:56knows how many lines of code in python code to implement and god
    • 2:36:57and how many hours it would take me to knows how many lines of code
    • 2:36:58implement something like this in and how many hours it would take me to
    • 2:37:00c now admittedly we could do better than implement something like this in
    • 2:37:03this design c now admittedly we could do better than
    • 2:37:04this table or this picture represents this design
    • 2:37:06what we have now this table or this picture represents
    • 2:37:07but you'll notice a lot of redundancy what we have now
    • 2:37:09implicit in the genres table but you'll notice a lot of redundancy
    • 2:37:11anytime you check the comedy box i have implicit in the genres table
    • 2:37:13a row now that says comedy anytime you check the comedy box i have
    • 2:37:15comedy comedy comedy and the show id a row now that says comedy
    • 2:37:18differs but i have the word comedy again comedy comedy comedy and the show id
    • 2:37:20and again differs but i have the word comedy again
    • 2:37:21and now that tends to be frowned upon in and again
    • 2:37:23the world of relational databases and now that tends to be frowned upon in
    • 2:37:25because if you have a the world of relational databases
    • 2:37:26genre called called comedy or one called because if you have a
    • 2:37:28musical or anything else genre called called comedy or one called
    • 2:37:30you should ideally just have that living musical or anything else
    • 2:37:32in one place and so if we really wanted you should ideally just have that living
    • 2:37:34to be particular in one place and so if we really wanted
    • 2:37:36and really truly normalize this database to be particular
    • 2:37:38which is an academic term referring to and really truly normalize this database
    • 2:37:40removing all such redundancies which is an academic term referring to
    • 2:37:42we could actually do it like this we removing all such redundancies
    • 2:37:44could have a shows table still with an we could actually do it like this we
    • 2:37:46id and title could have a shows table still with an
    • 2:37:47no difference there but we could have a id and title
    • 2:37:49genres table no difference there but we could have a
    • 2:37:50with two columns id and name now this is genres table
    • 2:37:53its own id it has no connection with the with two columns id and name now this is
    • 2:37:55show id it's just its own unique its own id it has no connection with the
    • 2:37:57identifier show id it's just its own unique
    • 2:37:58a primary key here now and the name of identifier
    • 2:38:00that genre so you would have one row in a primary key here now and the name of
    • 2:38:02the genres table for comedy for drama that genre so you would have one row in
    • 2:38:04music musical and everything else the genres table for comedy for drama
    • 2:38:06and then you would use a third table music musical and everything else
    • 2:38:08which is colloquially called a join and then you would use a third table
    • 2:38:10table which i'll draw here in the middle which is colloquially called a join
    • 2:38:13and you can call it anything you want table which i'll draw here in the middle
    • 2:38:14but we've drawn called it shows and you can call it anything you want
    • 2:38:16underscore genres to make clear that but we've drawn called it shows
    • 2:38:18this table implements a relationship underscore genres to make clear that
    • 2:38:19between those two this table implements a relationship
    • 2:38:21tables and notice that in this table is between those two
    • 2:38:24really no juicy data tables and notice that in this table is
    • 2:38:26it's just foreign keys show id genre id really no juicy data
    • 2:38:30and by having this third table we can it's just foreign keys show id genre id
    • 2:38:33now make sure that the word comedy only and by having this third table we can
    • 2:38:35appears in one row now make sure that the word comedy only
    • 2:38:36anywhere the word musical only appears appears in one row
    • 2:38:38in one row anywhere anywhere the word musical only appears
    • 2:38:40but we use these more efficient integers in one row anywhere
    • 2:38:42called show id and genre id but we use these more efficient integers
    • 2:38:44which respectively point to those called show id and genre id
    • 2:38:47primary keys which respectively point to those
    • 2:38:48in their primary tables to link those primary keys
    • 2:38:50two together in their primary tables to link those
    • 2:38:51and this is an example of what's called two together
    • 2:38:53in the world of databases a many-to-many and this is an example of what's called
    • 2:38:55relationship in the world of databases a many-to-many
    • 2:38:56one show can have many genres one relationship
    • 2:38:59genre can belong to many shows and so by one show can have many genres one
    • 2:39:02having this third table you can have genre can belong to many shows and so by
    • 2:39:03that many-to-many relationship having this third table you can have
    • 2:39:05and again the third table now allows us that many-to-many relationship
    • 2:39:07to truly normalize our data set and again the third table now allows us
    • 2:39:09by getting rid of all of the duplicate to truly normalize our data set
    • 2:39:11comedy comedy comedy by getting rid of all of the duplicate
    • 2:39:13why is this important probably not a comedy comedy comedy
    • 2:39:15huge deal for genres why is this important probably not a
    • 2:39:16but imagine with my current design if i huge deal for genres
    • 2:39:18want if i made a spelling mistake and i but imagine with my current design if i
    • 2:39:20misnamed want if i made a spelling mistake and i
    • 2:39:21comedy i would now have to change every misnamed
    • 2:39:23row with the word comedy again and again comedy i would now have to change every
    • 2:39:25or if maybe you change the cat the row with the word comedy again and again
    • 2:39:27genres of the shows or if maybe you change the cat the
    • 2:39:28you have to change it in multiple places genres of the shows
    • 2:39:31but with this other approach with three you have to change it in multiple places
    • 2:39:32tables but with this other approach with three
    • 2:39:33you can argue that now you only have to tables
    • 2:39:35change the name of a genre you can argue that now you only have to
    • 2:39:36in one place not all over the place and change the name of a genre
    • 2:39:39that in general in c in one place not all over the place and
    • 2:39:40and now in python and now sql has that in general in c
    • 2:39:42generally been a good thing and now in python and now sql has
    • 2:39:43not to copy-paste identical values all generally been a good thing
    • 2:39:47over the place all right so with that not to copy-paste identical values all
    • 2:39:50said over the place all right so with that
    • 2:39:51what other tools do we have at our said
    • 2:39:53disposal well it turns out what other tools do we have at our
    • 2:39:55that there are other data types out disposal well it turns out
    • 2:39:57there in the real world that there are other data types out
    • 2:39:58using sql besides just these five blob there in the real world
    • 2:40:01integer numeric using sql besides just these five blob
    • 2:40:02real and text blob again is for binary integer numeric
    • 2:40:04stuff generally not used except for more real and text blob again is for binary
    • 2:40:06specialized applications let's say stuff generally not used except for more
    • 2:40:08integer which is an int typically 32 specialized applications let's say
    • 2:40:10bits numeric which is something like a integer which is an int typically 32
    • 2:40:11date or a year or time bits numeric which is something like a
    • 2:40:13or something like that real numbers date or a year or time
    • 2:40:15which are floating point values and text or something like that real numbers
    • 2:40:17which are things like strings but if you which are floating point values and text
    • 2:40:20graduate ultimately from sql lite which are things like strings but if you
    • 2:40:22on phones and on macs and pcs to actual graduate ultimately from sql lite
    • 2:40:25servers that run on phones and on macs and pcs to actual
    • 2:40:26oracle mysql and postgres if you're servers that run
    • 2:40:28actually running your own oracle mysql and postgres if you're
    • 2:40:30internet style business well it turns actually running your own
    • 2:40:32out that internet style business well it turns
    • 2:40:33more sophisticated even more powerful out that
    • 2:40:36databases more sophisticated even more powerful
    • 2:40:37come with other subtypes if you will so databases
    • 2:40:40besides integer you can specify small come with other subtypes if you will so
    • 2:40:42int for small numbers maybe using just a besides integer you can specify small
    • 2:40:44few bits instead of int for small numbers maybe using just a
    • 2:40:4632 integer itself or bigint which uses few bits instead of
    • 2:40:4964 bits instead of 32. 32 integer itself or bigint which uses
    • 2:40:51the facebooks the twitters of the world 64 bits instead of 32.
    • 2:40:52need to use big ants a lot because they the facebooks the twitters of the world
    • 2:40:54have so much need to use big ants a lot because they
    • 2:40:55data you and i can get away with simple have so much
    • 2:40:57integers because we're not going to have data you and i can get away with simple
    • 2:40:59more than 4 billion favorite tv shows in integers because we're not going to have
    • 2:41:01a class certainly something like real more than 4 billion favorite tv shows in
    • 2:41:03you can have 32-bit real numbers or a a class certainly something like real
    • 2:41:05little weirdly named you can have 32-bit real numbers or a
    • 2:41:06double precision which is like a double little weirdly named
    • 2:41:09was in c double precision which is like a double
    • 2:41:10using 64 bits instead for more precision was in c
    • 2:41:12numeric is kind of this catch-all you using 64 bits instead for more precision
    • 2:41:14can have not only dates and date times numeric is kind of this catch-all you
    • 2:41:16but things like boolean values can have not only dates and date times
    • 2:41:18you can specify the total number of but things like boolean values
    • 2:41:20digits to store using this numeric scale you can specify the total number of
    • 2:41:22and precision so it relates to digits to store using this numeric scale
    • 2:41:24numbers that aren't just quite integers and precision so it relates to
    • 2:41:26and then you also have categories of numbers that aren't just quite integers
    • 2:41:28text and then you also have categories of
    • 2:41:29char followed by a number which text
    • 2:41:31specifies that char followed by a number which
    • 2:41:32every value in the column will have the specifies that
    • 2:41:34same number of characters every value in the column will have the
    • 2:41:36that's helpful for things where you know same number of characters
    • 2:41:37the length in advance like in the u.s that's helpful for things where you know
    • 2:41:39all states all 50 states have two digit the length in advance like in the u.s
    • 2:41:41code or two character codes all states all 50 states have two digit
    • 2:41:43like m a for massachusetts ca for code or two character codes
    • 2:41:45california like m a for massachusetts ca for
    • 2:41:46char two would be appropriate there california
    • 2:41:49because you know every value in the char two would be appropriate there
    • 2:41:50column is going to have two because you know every value in the
    • 2:41:51characters when you don't know though column is going to have two
    • 2:41:53you can use varchar characters when you don't know though
    • 2:41:54and varchar specifies a maximum number you can use varchar
    • 2:41:56of characters and varchar specifies a maximum number
    • 2:41:57and so you might specify varchar of like of characters
    • 2:42:0032 and so you might specify varchar of like
    • 2:42:01no one might be able to type in a name 32
    • 2:42:03that's longer than 32 characters or no one might be able to type in a name
    • 2:42:05varchar 200 if you want to allow for that's longer than 32 characters or
    • 2:42:07something even bigger varchar 200 if you want to allow for
    • 2:42:08but this is germane to our real world something even bigger
    • 2:42:09experience with the web if you've ever but this is germane to our real world
    • 2:42:11gone to a website experience with the web if you've ever
    • 2:42:12start filling out a form and all of a gone to a website
    • 2:42:14sudden you can't type any more start filling out a form and all of a
    • 2:42:15characters your response is too long sudden you can't type any more
    • 2:42:17why is that well one the programmers characters your response is too long
    • 2:42:19just might not want you to keep why is that well one the programmers
    • 2:42:21expressing yourself in more detail just might not want you to keep
    • 2:42:22especially if it's like a complaint form expressing yourself in more detail
    • 2:42:24on a customer service site especially if it's like a complaint form
    • 2:42:25but pragmatically it's probably because on a customer service site
    • 2:42:28their database but pragmatically it's probably because
    • 2:42:29was designed to store a finite number of their database
    • 2:42:31characters and you have hit that was designed to store a finite number of
    • 2:42:32threshold characters and you have hit that
    • 2:42:33and you certainly don't want to have a threshold
    • 2:42:34buffer overflow like in c and you certainly don't want to have a
    • 2:42:36so the database won't force a maximum buffer overflow like in c
    • 2:42:38value n and then text is for even bigger so the database won't force a maximum
    • 2:42:41chunks of text if you're letting people value n and then text is for even bigger
    • 2:42:42copy paste their resumes or whole chunks of text if you're letting people
    • 2:42:44documents copy paste their resumes or whole
    • 2:42:45or even larger sets of text you might documents
    • 2:42:47use text instead or even larger sets of text you might
    • 2:42:49so let's then consider a real world data use text instead
    • 2:42:52set so let's then consider a real world data
    • 2:42:52things get really interesting and all of set
    • 2:42:54these very academic ideas and things get really interesting and all of
    • 2:42:56recommendations these very academic ideas and
    • 2:42:57really come into play when we don't have recommendations
    • 2:42:59hundreds of favorites really come into play when we don't have
    • 2:43:00but when we have uh thousands instead hundreds of favorites
    • 2:43:04and so what i'm going to go ahead and do but when we have uh thousands instead
    • 2:43:06here is download a file here and so what i'm going to go ahead and do
    • 2:43:08give me just a moment to grab it from here is download a file here
    • 2:43:10the course's website give me just a moment to grab it from
    • 2:43:12i'm going to go ahead and download a the course's website
    • 2:43:13file from today which is a i'm going to go ahead and download a
    • 2:43:16sequel light version of the imdb file from today which is a
    • 2:43:19internet movie database that some of you sequel light version of the imdb
    • 2:43:20might have used in website form in order internet movie database that some of you
    • 2:43:22to look up movies and ratings thereof might have used in website form in order
    • 2:43:25and the like and what we've done in to look up movies and ratings thereof
    • 2:43:26advance is we wrote a script and the like and what we've done in
    • 2:43:28i wrote a script that downloaded all of advance is we wrote a script
    • 2:43:30that information in advance i wrote a script that downloaded all of
    • 2:43:32as tsv files it turns out that they that information in advance
    • 2:43:35internet movie database make it as tsv files it turns out that they
    • 2:43:36available internet movie database make it
    • 2:43:37all of their data available as tsv files available
    • 2:43:40tab separated values all of their data available as tsv files
    • 2:43:42and we went ahead and imported it with a tab separated values
    • 2:43:44script and we went ahead and imported it with a
    • 2:43:45called shows.db script
    • 2:43:48as follows so i'm going to go ahead in called shows.db
    • 2:43:50just a moment and open up shows.db which as follows so i'm going to go ahead in
    • 2:43:52is not the version i created earlier just a moment and open up shows.db which
    • 2:43:54based on your favorites is not the version i created earlier
    • 2:43:56this is now the version that we the based on your favorites
    • 2:43:57staff created in advance this is now the version that we the
    • 2:43:59by downloading hundreds of thousands of staff created in advance
    • 2:44:02movies and tv shows and actors and by downloading hundreds of thousands of
    • 2:44:04directors from imdb.com under their movies and tv shows and actors and
    • 2:44:06license directors from imdb.com under their
    • 2:44:07and then imported into a sqlite database license
    • 2:44:11so how can i see what's in here well let and then imported into a sqlite database
    • 2:44:12me go ahead and type dot schema recall so how can i see what's in here well let
    • 2:44:14and you'll see a whole bunch of data me go ahead and type dot schema recall
    • 2:44:17therein and you'll see a whole bunch of data
    • 2:44:17and in fact in pictorial form it therein
    • 2:44:19actually looks like this here's a and in fact in pictorial form it
    • 2:44:21picture that just gives you the lay of actually looks like this here's a
    • 2:44:22the land picture that just gives you the lay of
    • 2:44:23there's going to be a people table that the land
    • 2:44:25has an id for every person there's going to be a people table that
    • 2:44:27a name and their birth year uh there's has an id for every person
    • 2:44:29going to be a shows table just like a name and their birth year uh there's
    • 2:44:31we've been talking which is going to be a shows table just like
    • 2:44:32ids titles of shows also though the year we've been talking which is
    • 2:44:35that the show debuted and the number of ids titles of shows also though the year
    • 2:44:37episodes that the show had that the show debuted and the number of
    • 2:44:38then there's going to be genres similar episodes that the show had
    • 2:44:40in design to before so we didn't go all then there's going to be genres similar
    • 2:44:42out and factor it out into a third table in design to before so we didn't go all
    • 2:44:45we just have some duplication here out and factor it out into a third table
    • 2:44:46admittedly in genres we just have some duplication here
    • 2:44:48but then there's a ratings table and admittedly in genres
    • 2:44:49here's where you can see where but then there's a ratings table and
    • 2:44:50relational databases get interesting here's where you can see where
    • 2:44:52you can have a ratings table storing relational databases get interesting
    • 2:44:54ratings like one to five you can have a ratings table storing
    • 2:44:56uh but also associate those ratings with ratings like one to five
    • 2:44:58a show by way of its show id uh but also associate those ratings with
    • 2:45:00and then you can keep track of the a show by way of its show id
    • 2:45:01number of votes that that show got and then you can keep track of the
    • 2:45:03uh writers notice is a separate table number of votes that that show got
    • 2:45:06and notice this is kind of cool uh writers notice is a separate table
    • 2:45:07this table per the arrows relates to and notice this is kind of cool
    • 2:45:11the shows table and the people table this table per the arrows relates to
    • 2:45:14because this is a join table a foreign the shows table and the people table
    • 2:45:16key of show id and a foreign key of because this is a join table a foreign
    • 2:45:18person id key of show id and a foreign key of
    • 2:45:19refer to the shows table and the people person id
    • 2:45:22table respectively refer to the shows table and the people
    • 2:45:23so that a human uh person can be a table respectively
    • 2:45:26writer for multiple shows and so that a human uh person can be a
    • 2:45:28one show can have multiple writers writer for multiple shows and
    • 2:45:30another many-to-many relationship one show can have multiple writers
    • 2:45:31and then lastly stars the actors in a another many-to-many relationship
    • 2:45:34show notice that this too is a join and then lastly stars the actors in a
    • 2:45:36table it's only got two foreign keys a show notice that this too is a join
    • 2:45:38show id table it's only got two foreign keys a
    • 2:45:38and a person id that are referring back show id
    • 2:45:41to those tables respectively and a person id that are referring back
    • 2:45:42and here's where it really makes sense a to those tables respectively
    • 2:45:44relational database it would be pretty and here's where it really makes sense a
    • 2:45:46stupid and bad design relational database it would be pretty
    • 2:45:47if you had names of all of the directors stupid and bad design
    • 2:45:51and names of all of the writers and if you had names of all of the directors
    • 2:45:53names of all of the stars of these shows and names of all of the writers and
    • 2:45:55in separate tables in duplicate like names of all of the stars of these shows
    • 2:45:57steve carell steve carell steve carell in separate tables in duplicate like
    • 2:45:59all of those actors and directors and steve carell steve carell steve carell
    • 2:46:01writers and every other all of those actors and directors and
    • 2:46:03role in the business are just people at writers and every other
    • 2:46:05the end of the day so in a relational role in the business are just people at
    • 2:46:07database the advice would be to put all the end of the day so in a relational
    • 2:46:09of those people in a people table database the advice would be to put all
    • 2:46:11and then use primary and foreign keys to of those people in a people table
    • 2:46:14refer to to relate them to and then use primary and foreign keys to
    • 2:46:16these other types of tables the catch is refer to to relate them to
    • 2:46:20though that when we do this these other types of tables the catch is
    • 2:46:22it turns out that things can be slow though that when we do this
    • 2:46:25when we have lots of data so for it turns out that things can be slow
    • 2:46:27instance let me go into this when we have lots of data so for
    • 2:46:28let me go ahead and select star from instance let me go into this
    • 2:46:30shows semicolon let me go ahead and select star from
    • 2:46:32that's a lot of data it's pretty fast on shows semicolon
    • 2:46:34my mac and i switch from the ide to my that's a lot of data it's pretty fast on
    • 2:46:36mac just to save time because it's a my mac and i switch from the ide to my
    • 2:46:37little faster doing things locally mac just to save time because it's a
    • 2:46:39instead of in the cloud little faster doing things locally
    • 2:46:40let me go ahead and count the number of instead of in the cloud
    • 2:46:41shows in this imdb let me go ahead and count the number of
    • 2:46:43database by using count 153 shows in this imdb
    • 2:46:46331 tv shows so that's a lot how about database by using count 153
    • 2:46:50the count 331 tv shows so that's a lot how about
    • 2:46:50of people uh from the people table the count
    • 2:46:54uh 457 886 people who might be stars or of people uh from the people table
    • 2:46:59writers or some other role as well so uh 457 886 people who might be stars or
    • 2:47:01this is a sizable data set writers or some other role as well so
    • 2:47:03so let me go ahead and do something this is a sizable data set
    • 2:47:04simple though let me go ahead and select so let me go ahead and do something
    • 2:47:06star from simple though let me go ahead and select
    • 2:47:06shows where title equals the office star from
    • 2:47:09and this time i don't have to worry shows where title equals the office
    • 2:47:11about weird capitalization or spacing and this time i don't have to worry
    • 2:47:13this is imdb this is clean about weird capitalization or spacing
    • 2:47:15data from an authoritative source notice this is imdb this is clean
    • 2:47:17that there's actually different versions data from an authoritative source notice
    • 2:47:19of the office you probably know the uk that there's actually different versions
    • 2:47:20one and the us one of the office you probably know the uk
    • 2:47:21there's other shows that are unrelated one and the us one
    • 2:47:23to that particular there's other shows that are unrelated
    • 2:47:24type of show but each of them is to that particular
    • 2:47:27distinguished notice by the year type of show but each of them is
    • 2:47:29here all right so that's kind of a lot distinguished notice by the year
    • 2:47:32and let's do this again let me go ahead here all right so that's kind of a lot
    • 2:47:34and turn on a feature temporarily just and let's do this again let me go ahead
    • 2:47:36to time this query by turning on a timer and turn on a feature temporarily just
    • 2:47:38in this program to time this query by turning on a timer
    • 2:47:39and let me run it again it looks like it in this program
    • 2:47:41took 0.012 and let me run it again it looks like it
    • 2:47:44seconds of real time to do that search took 0.012
    • 2:47:47that's pretty fast i barely noticed seconds of real time to do that search
    • 2:47:49certainly because it's so fast that's pretty fast i barely noticed
    • 2:47:50but let me go ahead and do this let me certainly because it's so fast
    • 2:47:52go ahead and create but let me go ahead and do this let me
    • 2:47:53an index called title index on the table go ahead and create
    • 2:47:56called shows an index called title index on the table
    • 2:47:57on its title column well what am i doing called shows
    • 2:48:00well to answer the question on its title column well what am i doing
    • 2:48:02finally from before about performance by well to answer the question
    • 2:48:04default everything we've been doing is finally from before about performance by
    • 2:48:05indeed big o of n it's just being default everything we've been doing is
    • 2:48:07linearly searched from top to bottom indeed big o of n it's just being
    • 2:48:09which seems to call into question the linearly searched from top to bottom
    • 2:48:10whole purpose of sql if we were doing no which seems to call into question the
    • 2:48:12better than with csvs whole purpose of sql if we were doing no
    • 2:48:14but an index is a clue to the database better than with csvs
    • 2:48:17to sort of load the data more but an index is a clue to the database
    • 2:48:18efficiently to sort of load the data more
    • 2:48:19in such a way that you get logarithmic efficiently
    • 2:48:21time in index is a fancy data structure in such a way that you get logarithmic
    • 2:48:23that the sql lite database or the oracle time in index is a fancy data structure
    • 2:48:25database or the mysql database whatever that the sql lite database or the oracle
    • 2:48:27product you're using database or the mysql database whatever
    • 2:48:28builds up for you in memory and then it product you're using
    • 2:48:31does builds up for you in memory and then it
    • 2:48:31something using syntax like this that does
    • 2:48:34builds in memory something using syntax like this that
    • 2:48:35generally something known as a bee tree builds in memory
    • 2:48:37we've talked a bit about trees in the generally something known as a bee tree
    • 2:48:39class we talked about binary search we've talked a bit about trees in the
    • 2:48:40trees class we talked about binary search
    • 2:48:41things that kind of look like family trees
    • 2:48:42trees a bee tree is essentially a family things that kind of look like family
    • 2:48:45tree trees a bee tree is essentially a family
    • 2:48:45that's just very wide and not that tall tree
    • 2:48:48it's a data structure similar in spirit that's just very wide and not that tall
    • 2:48:50to what we looked at in c it's a data structure similar in spirit
    • 2:48:51but it tries to keep all of the leaf to what we looked at in c
    • 2:48:53nodes all of the children or but it tries to keep all of the leaf
    • 2:48:55grandchildren or great grandchildren so nodes all of the children or
    • 2:48:57to speak grandchildren or great grandchildren so
    • 2:48:57as close to the root as possible and the to speak
    • 2:49:00algorithm it uses for that as close to the root as possible and the
    • 2:49:01tends to be proprietary or documented algorithm it uses for that
    • 2:49:03based on the system you're using but it tends to be proprietary or documented
    • 2:49:05doesn't store things in a list based on the system you're using but it
    • 2:49:07it does not store things top to bottom doesn't store things in a list
    • 2:49:10like the tables we view them as it does not store things top to bottom
    • 2:49:13underneath the hood those tables that like the tables we view them as
    • 2:49:15look like very tall structures underneath the hood those tables that
    • 2:49:17are actually underneath the hood look like very tall structures
    • 2:49:18implemented with fancier things called are actually underneath the hood
    • 2:49:20trees implemented with fancier things called
    • 2:49:21and if we create those trees by creating trees
    • 2:49:23what they're properly called and if we create those trees by creating
    • 2:49:24indexes like this it might take us a what they're properly called
    • 2:49:26moment like 0.098 seconds to create an indexes like this it might take us a
    • 2:49:29index but now notice what happens moment like 0.098 seconds to create an
    • 2:49:31previously when i searched the titles index but now notice what happens
    • 2:49:33for the office previously when i searched the titles
    • 2:49:34using linear search it took .012 for the office
    • 2:49:37seconds if i do the same query again using linear search it took .012
    • 2:49:40after having created the index seconds if i do the same query again
    • 2:49:42and having told sql light build me this after having created the index
    • 2:49:43fancy tree in memory and having told sql light build me this
    • 2:49:45voila 0.001 seconds fancy tree in memory
    • 2:49:48so orders of magnitude faster now both voila 0.001 seconds
    • 2:49:51are fast to us humans certainly but so orders of magnitude faster now both
    • 2:49:53imagine the data set being even bigger are fast to us humans certainly but
    • 2:49:54the query being even bigger imagine the data set being even bigger
    • 2:49:56these indexes can get even larger than the query being even bigger
    • 2:49:59that these indexes can get even larger than
    • 2:50:00and they're rather the queries can take that
    • 2:50:02longer than that and therefore take even and they're rather the queries can take
    • 2:50:04more time longer than that and therefore take even
    • 2:50:05than that but unfortunately if i've got more time
    • 2:50:07all of my data all over the place as in than that but unfortunately if i've got
    • 2:50:09a chart all of my data all over the place as in
    • 2:50:10as in a diagram like this my god how do a chart
    • 2:50:12i actually get useful work done how do i as in a diagram like this my god how do
    • 2:50:14get back the people in a movie and the i actually get useful work done how do i
    • 2:50:16writers and the stars and the ratings get back the people in a movie and the
    • 2:50:18if it's all over the place i would seem writers and the stars and the ratings
    • 2:50:19to have created such a mess if it's all over the place i would seem
    • 2:50:21and that i now need to execute all of to have created such a mess
    • 2:50:23these queries but notice it doesn't have and that i now need to execute all of
    • 2:50:25to be these queries but notice it doesn't have
    • 2:50:26that complicated it turns out that to be
    • 2:50:28there's another that complicated it turns out that
    • 2:50:29keyword in sql really the last that there's another
    • 2:50:31we'll look at here keyword in sql really the last that
    • 2:50:32called join the join keyword which you we'll look at here
    • 2:50:34can use implicitly or explicitly called join the join keyword which you
    • 2:50:36allows you to just join tables together can use implicitly or explicitly
    • 2:50:39and sort of reconstitute a bigger more allows you to just join tables together
    • 2:50:41user-friendly table and sort of reconstitute a bigger more
    • 2:50:43so for instance suppose i want to get user-friendly table
    • 2:50:44all of steve carell's tv shows not just so for instance suppose i want to get
    • 2:50:46the office all of steve carell's tv shows not just
    • 2:50:47well recall that i can select steve's id the office
    • 2:50:50from the people table well recall that i can select steve's id
    • 2:50:52where name equals steve carell so again from the people table
    • 2:50:55he has a different id in this table where name equals steve carell so again
    • 2:50:56because this is from imdb he has a different id in this table
    • 2:50:58but there's his id and let me go ahead because this is from imdb
    • 2:51:00and turn the timer off but there's his id and let me go ahead
    • 2:51:01for now all right so there is his id and turn the timer off
    • 2:51:04136 797 i could copy paste that into my for now all right so there is his id
    • 2:51:08code but that's not necessary 136 797 i could copy paste that into my
    • 2:51:10thanks to these nested queries i can do code but that's not necessary
    • 2:51:12something like this thanks to these nested queries i can do
    • 2:51:13let me go ahead and now select all of something like this
    • 2:51:15the show ids let me go ahead and now select all of
    • 2:51:17from the stars table where person id the show ids
    • 2:51:21from that table is in or is equal to from the stars table where person id
    • 2:51:23this result from that table is in or is equal to
    • 2:51:25so there's that join table stars that this result
    • 2:51:26links people and shows so there's that join table stars that
    • 2:51:28so let me go ahead and execute oh i hit links people and shows
    • 2:51:31the wrong key so let me go ahead and execute oh i hit
    • 2:51:31let me go ahead and execute that by the wrong key
    • 2:51:34retyping it let me go ahead and execute that by
    • 2:51:34select show id from stars retyping it
    • 2:51:38where person id equals select show id from stars
    • 2:51:41whatever steve carell's ideas all right where person id equals
    • 2:51:43so there's all of the show ids of steve whatever steve carell's ideas all right
    • 2:51:45carell's tv shows that's a lot so there's all of the show ids of steve
    • 2:51:47and it's very non-obvious what they are carell's tv shows that's a lot
    • 2:51:49so let me do another nested query by and it's very non-obvious what they are
    • 2:51:51putting all of that in parentheses and so let me do another nested query by
    • 2:51:53now select title putting all of that in parentheses and
    • 2:51:55from shows where the id of the show now select title
    • 2:51:59is in this big long list of show ids from shows where the id of the show
    • 2:52:03and there are all of the shows that he's is in this big long list of show ids
    • 2:52:05in including and there are all of the shows that he's
    • 2:52:06uh the dana carvey show back when uh in including
    • 2:52:09the office up at the top and then most uh the dana carvey show back when uh
    • 2:52:11recently shows like the morning show the office up at the top and then most
    • 2:52:13on apple tv all right so that's pretty recently shows like the morning show
    • 2:52:16cool that we can actually reconstitute on apple tv all right so that's pretty
    • 2:52:17the data like that but it turns out cool that we can actually reconstitute
    • 2:52:19there's different ways of doing that as the data like that but it turns out
    • 2:52:21well and you'll see more of this in the there's different ways of doing that as
    • 2:52:22coming weeks well and you'll see more of this in the
    • 2:52:23and in problem sets in labs and the like coming weeks
    • 2:52:25but it turns out we can do other things and in problem sets in labs and the like
    • 2:52:27as well and let me just show this syntax but it turns out we can do other things
    • 2:52:28even though it'll look a little cryptic as well and let me just show this syntax
    • 2:52:30at first glance even though it'll look a little cryptic
    • 2:52:31you can also use that join keyword as at first glance
    • 2:52:33follow i can select the title you can also use that join keyword as
    • 2:52:35from the people table joined with follow i can select the title
    • 2:52:39the stars table on the people's id from the people table joined with
    • 2:52:43column the stars table on the people's id
    • 2:52:43equaling the stars person id column column
    • 2:52:47so in other words i can select a title equaling the stars person id column
    • 2:52:49from the result of joining so in other words i can select a title
    • 2:52:51people and stars like this on the id from the result of joining
    • 2:52:53column in one and the person id column people and stars like this on the id
    • 2:52:55in the other column in one and the person id column
    • 2:52:56and i can join in the shows table in the other
    • 2:53:00on the stars dot show and i can join in the shows table
    • 2:53:03id equaling the shows dot id on the stars dot show
    • 2:53:06so again now i'm uh joining the primary id equaling the shows dot id
    • 2:53:09and foreign keys on these two tables so again now i'm uh joining the primary
    • 2:53:11where the name equals quote unquote and foreign keys on these two tables
    • 2:53:14steve carell where the name equals quote unquote
    • 2:53:15so this is the most cryptic thing we've steve carell
    • 2:53:17seen yet but it just means take this so this is the most cryptic thing we've
    • 2:53:18table and join it with this one and then seen yet but it just means take this
    • 2:53:20join it with this one table and join it with this one and then
    • 2:53:21and filter all of the resulting joined join it with this one
    • 2:53:23rows by a name of steve carell and filter all of the resulting joined
    • 2:53:26and voila there we have all of those rows by a name of steve carell
    • 2:53:28answers and voila there we have all of those
    • 2:53:29as well and there's other ways of doing answers
    • 2:53:31this too as well and there's other ways of doing
    • 2:53:32i'll leave unsaid now some of the syntax this too
    • 2:53:34for that but that's felt a little slow i'll leave unsaid now some of the syntax
    • 2:53:37and in fact let me go ahead and turn my for that but that's felt a little slow
    • 2:53:38timer back on let me re-execute this and in fact let me go ahead and turn my
    • 2:53:41last query timer back on let me re-execute this
    • 2:53:42select title from people joining on last query
    • 2:53:45stars select title from people joining on
    • 2:53:46joining on shows where name equals steve stars
    • 2:53:49carell joining on shows where name equals steve
    • 2:53:50that took over half a second so that was carell
    • 2:53:52actually admittedly kind of slow that took over half a second so that was
    • 2:53:55but again indexes come to the rescue and actually admittedly kind of slow
    • 2:53:57if again we don't but again indexes come to the rescue and
    • 2:53:58allow linear search to dominate but let if again we don't
    • 2:54:00me go ahead and create a few indexes allow linear search to dominate but let
    • 2:54:02create an index on called person index me go ahead and create a few indexes
    • 2:54:06on the stars table the person id column create an index on called person index
    • 2:54:09why on the stars table the person id column
    • 2:54:10well my query a moment ago use the why
    • 2:54:12person id column it filtered on it so well my query a moment ago use the
    • 2:54:14that might be a bottleneck person id column it filtered on it so
    • 2:54:15i'm going to go ahead and create another that might be a bottleneck
    • 2:54:16index on the called show index i'm going to go ahead and create another
    • 2:54:19on the stars table on show id similarly index on the called show index
    • 2:54:23a moment ago my query used the show id on the stars table on show id similarly
    • 2:54:25column and so that too might have been a a moment ago my query used the show id
    • 2:54:27bottleneck linearly top to bottom column and so that too might have been a
    • 2:54:29so let me create that index and then bottleneck linearly top to bottom
    • 2:54:30lastly let me create an index called so let me create that index and then
    • 2:54:32name index and this is perhaps the most lastly let me create an index called
    • 2:54:33obvious similar to the show titles name index and this is perhaps the most
    • 2:54:35before obvious similar to the show titles
    • 2:54:36on the people table on the name column before
    • 2:54:39and that too took a moment now in total on the people table on the name column
    • 2:54:41this took like almost a full second and that too took a moment now in total
    • 2:54:42but these these indexes only get created this took like almost a full second
    • 2:54:45once they get maintained automatically but these these indexes only get created
    • 2:54:46over time once they get maintained automatically
    • 2:54:47but you don't incur this with every over time
    • 2:54:49query now let me do my but you don't incur this with every
    • 2:54:51select again let me select title from query now let me do my
    • 2:54:53people joining the stars table select again let me select title from
    • 2:54:56joining the shows table where name people joining the stars table
    • 2:54:59equals steve carell joining the shows table where name
    • 2:55:00boom 0.001 seconds equals steve carell
    • 2:55:04that it's an order of magnitude faster boom 0.001 seconds
    • 2:55:06than the like more than half a second it that it's an order of magnitude faster
    • 2:55:08took us than the like more than half a second it
    • 2:55:09a little bit ago so here too you see the took us
    • 2:55:11power of a relational database a little bit ago so here too you see the
    • 2:55:13so even though we've created some power of a relational database
    • 2:55:14problems for ourselves over time so even though we've created some
    • 2:55:16we've solved them ultimately granted problems for ourselves over time
    • 2:55:18with some more sophisticated features we've solved them ultimately granted
    • 2:55:19and additional syntax with some more sophisticated features
    • 2:55:20but a relational database is indeed why and additional syntax
    • 2:55:23you use them in the real world for the but a relational database is indeed why
    • 2:55:24twitters the instagrams the facebooks you use them in the real world for the
    • 2:55:26the googles twitters the instagrams the facebooks
    • 2:55:27because they can store data so the googles
    • 2:55:29efficiently because they can store data so
    • 2:55:30without redundancy because you can efficiently
    • 2:55:32normalize them and factor everything out without redundancy because you can
    • 2:55:34but they can still maintain the normalize them and factor everything out
    • 2:55:35relations that you might have seen in a but they can still maintain the
    • 2:55:37spreadsheet relations that you might have seen in a
    • 2:55:38but using something closer to spreadsheet
    • 2:55:39logarithmic thanks to those tree but using something closer to
    • 2:55:41structures logarithmic thanks to those tree
    • 2:55:42but there are problems and what we structures
    • 2:55:43wanted to do is end on today two primary but there are problems and what we
    • 2:55:46problems that are introduced with sql wanted to do is end on today two primary
    • 2:55:48and because they are just unfortunately problems that are introduced with sql
    • 2:55:50so commonly done and because they are just unfortunately
    • 2:55:51notice this here there is something so commonly done
    • 2:55:53generally known as a sql injection notice this here there is something
    • 2:55:55attack generally known as a sql injection
    • 2:55:56which you are vulnerable to in any attack
    • 2:55:58application where you're taking user which you are vulnerable to in any
    • 2:55:59input application where you're taking user
    • 2:56:00that hasn't been an issue for my input
    • 2:56:01favorites.pi file that hasn't been an issue for my
    • 2:56:03where i only took input from a csv but favorites.pi file
    • 2:56:06if one of you were malicious what if one where i only took input from a csv but
    • 2:56:07of you had maliciously typed in the word if one of you were malicious what if one
    • 2:56:09delete of you had maliciously typed in the word
    • 2:56:10or update or something else as the title delete
    • 2:56:13of your show or update or something else as the title
    • 2:56:13and i accidentally plugged it into my of your show
    • 2:56:16own python code when and i accidentally plugged it into my
    • 2:56:17executing a query you could potentially own python code when
    • 2:56:19inject sql executing a query you could potentially
    • 2:56:20into my own code how might that be well inject sql
    • 2:56:23if logging in into my own code how might that be well
    • 2:56:24via yale you'll typically see a form if logging in
    • 2:56:26like this or logging in via harvard to via yale you'll typically see a form
    • 2:56:27something you'll see a form like this like this or logging in via harvard to
    • 2:56:29here's an example that i'm pretty sure something you'll see a form like this
    • 2:56:30neither harvard nor yale or vulnerable here's an example that i'm pretty sure
    • 2:56:32to suppose i type in my email address to neither harvard nor yale or vulnerable
    • 2:56:35this login form as mainland.harvard.edu to suppose i type in my email address to
    • 2:56:38single quote dash dash it turns out in this login form as mainland.harvard.edu
    • 2:56:41sql single quote dash dash it turns out in
    • 2:56:41dash dash is the symbol for commenting sql
    • 2:56:44if you want to comment something out dash dash is the symbol for commenting
    • 2:56:45it turns out that the single quote is if you want to comment something out
    • 2:56:47used when you want to search for it turns out that the single quote is
    • 2:56:48something like used when you want to search for
    • 2:56:49steve carell or in this case mailing at something like
    • 2:56:51harvard.edu it can be double quotes it steve carell or in this case mailing at
    • 2:56:53can be single quotes harvard.edu it can be double quotes it
    • 2:56:54in this case i'm using single quotes can be single quotes
    • 2:56:56here but let's consider some sample in this case i'm using single quotes
    • 2:56:59code if you will in python here's a line here but let's consider some sample
    • 2:57:01of code that i propose code if you will in python here's a line
    • 2:57:02might exist in the back end for of code that i propose
    • 2:57:04harvard's authentication or yales or might exist in the back end for
    • 2:57:06anyone else's harvard's authentication or yales or
    • 2:57:07maybe someone wrote some python code anyone else's
    • 2:57:09like this using select star from users maybe someone wrote some python code
    • 2:57:12where username equals question mark and like this using select star from users
    • 2:57:14password equals question mark where username equals question mark and
    • 2:57:16and they plugged in username and password equals question mark
    • 2:57:17password whatever the user typed into and they plugged in username and
    • 2:57:19that web form a moment ago password whatever the user typed into
    • 2:57:21gets plugged in here to these question that web form a moment ago
    • 2:57:23marks this is good gets plugged in here to these question
    • 2:57:24this is good code because you're using marks this is good
    • 2:57:27the sql question marks this is good code because you're using
    • 2:57:28so if you literally just do what we the sql question marks
    • 2:57:30preach today and use these question mark so if you literally just do what we
    • 2:57:32placeholders preach today and use these question mark
    • 2:57:33you are safe from sql injection attacks placeholders
    • 2:57:35unfortunately there are too many you are safe from sql injection attacks
    • 2:57:36developers in the world unfortunately there are too many
    • 2:57:38that don't practice this or don't developers in the world
    • 2:57:40realize this or do forget this that don't practice this or don't
    • 2:57:42if you instead resort to python realize this or do forget this
    • 2:57:45approaches like this if you instead resort to python
    • 2:57:46where you use an f string instead which approaches like this
    • 2:57:49might be your instincts after last week where you use an f string instead which
    • 2:57:51because they're wonderfully convenient might be your instincts after last week
    • 2:57:52with the curly braces and all because they're wonderfully convenient
    • 2:57:53suppose that you literally plug in with the curly braces and all
    • 2:57:55username and password suppose that you literally plug in
    • 2:57:58not with the question mark placeholders username and password
    • 2:58:00but just literally not with the question mark placeholders
    • 2:58:01in between those curly braces watch what but just literally
    • 2:58:03happens if my username in between those curly braces watch what
    • 2:58:04maylyn at harvard.edu was actually typed happens if my username
    • 2:58:07in by me maliciously as maylyn at harvard.edu was actually typed
    • 2:58:09mailin single quote in by me maliciously as
    • 2:58:12dash dash that would have the effect of mailin single quote
    • 2:58:15tricking dash dash that would have the effect of
    • 2:58:15this python code into doing essentially tricking
    • 2:58:18this this python code into doing essentially
    • 2:58:19let me do a find and replace it with this
    • 2:58:21trick python let me do a find and replace it with
    • 2:58:23into executing username equals quote trick python
    • 2:58:26mainland harvard.edu into executing username equals quote
    • 2:58:28quote dash dash and then mainland harvard.edu
    • 2:58:31other stuff unfortunately the dash dash quote dash dash and then
    • 2:58:33again means comment other stuff unfortunately the dash dash
    • 2:58:35which means you could maybe trick a again means comment
    • 2:58:38server which means you could maybe trick a
    • 2:58:38into ignoring the whole password part of server
    • 2:58:41the sql query into ignoring the whole password part of
    • 2:58:42and if the sql query's purpose in life the sql query
    • 2:58:44is to check is this username and and if the sql query's purpose in life
    • 2:58:46password valid is to check is this username and
    • 2:58:47so that you can decide to log the user password valid
    • 2:58:49in or to say no you're not authorized so that you can decide to log the user
    • 2:58:52well by essentially commenting out in or to say no you're not authorized
    • 2:58:54everything related to password well by essentially commenting out
    • 2:58:56notice what i've done i've just now everything related to password
    • 2:58:58theoretically notice what i've done i've just now
    • 2:58:59logged myself in as mainland harvard.edu theoretically
    • 2:59:03without even knowing or inputting a logged myself in as mainland harvard.edu
    • 2:59:04password without even knowing or inputting a
    • 2:59:05because i injected sql syntax the quote password
    • 2:59:08and the dash dash into my query because i injected sql syntax the quote
    • 2:59:10tricking the server into just ignoring and the dash dash into my query
    • 2:59:12the password equality check tricking the server into just ignoring
    • 2:59:14and so it turns out that db execute when the password equality check
    • 2:59:17you execute an insert and so it turns out that db execute when
    • 2:59:18it returns to you as said the id of the you execute an insert
    • 2:59:21newly inserted row it returns to you as said the id of the
    • 2:59:22when you use db execute to select rows newly inserted row
    • 2:59:26from a database table it returns to you when you use db execute to select rows
    • 2:59:29a list of rows each of which is a from a database table it returns to you
    • 2:59:32dictionary so this is now pseudocode a list of rows each of which is a
    • 2:59:34down here with my comment dictionary so this is now pseudocode
    • 2:59:35but if you get back one row that would down here with my comment
    • 2:59:38seem to imply that there is a user named but if you get back one row that would
    • 2:59:40malin at harvard.edu seem to imply that there is a user named
    • 2:59:42don't know what his password is because malin at harvard.edu
    • 2:59:43whoever this person is maliciously don't know what his password is because
    • 2:59:45tricked the server into ignoring whoever this person is maliciously
    • 2:59:47that syntax so sql injection attacks are tricked the server into ignoring
    • 2:59:51unfortunately one of the most common that syntax so sql injection attacks are
    • 2:59:52attacks against sql databases they are unfortunately one of the most common
    • 2:59:54completely preventable attacks against sql databases they are
    • 2:59:55if you simply use placeholders and use completely preventable
    • 2:59:58libraries whether it's cs50s or other if you simply use placeholders and use
    • 3:00:00third-party libraries that you may use libraries whether it's cs50s or other
    • 3:00:02down the road third-party libraries that you may use
    • 3:00:03a common meme on the internet is this down the road
    • 3:00:04picture here uh a common meme on the internet is this
    • 3:00:06if we zoom in on this person's license picture here uh
    • 3:00:08plate or where the license plate should if we zoom in on this person's license
    • 3:00:09be plate or where the license plate should
    • 3:00:10this is an example of someone be
    • 3:00:12theoretically trying to trick some this is an example of someone
    • 3:00:14camera on the highway theoretically trying to trick some
    • 3:00:16into like dropping the whole database camera on the highway
    • 3:00:18drop is another keyword in sql that into like dropping the whole database
    • 3:00:19deletes a database table drop is another keyword in sql that
    • 3:00:21and this person was either intentionally deletes a database table
    • 3:00:23or just humorously and this person was either intentionally
    • 3:00:24trying to trick it into executing sql by or just humorously
    • 3:00:27using syntax like this so trying to trick it into executing sql by
    • 3:00:29characters like single quotes dash dash using syntax like this so
    • 3:00:31semicolons characters like single quotes dash dash
    • 3:00:32are all potentially dangerous characters semicolons
    • 3:00:34in sql if they're passed through are all potentially dangerous characters
    • 3:00:36unchanged to the database a very popular in sql if they're passed through
    • 3:00:38xkcd comic let me give you a moment to unchanged to the database a very popular
    • 3:00:40just read this xkcd comic let me give you a moment to
    • 3:00:41is another uh well-known meme of sorts just read this
    • 3:00:45now is another uh well-known meme of sorts
    • 3:00:45in computer science if you'd like to now
    • 3:00:48read this one in computer science if you'd like to
    • 3:00:50on your own but henceforth you are now read this one
    • 3:00:52in the on your own but henceforth you are now
    • 3:00:53um family of of educated in the
    • 3:00:58learners who know who little bobby um family of of educated
    • 3:01:00tables learners who know who little bobby
    • 3:01:01is unfortunately it's dead silence in tables
    • 3:01:03here so i can't tell if anyone is is unfortunately it's dead silence in
    • 3:01:04actually laughing at this joke but here so i can't tell if anyone is
    • 3:01:06anyhow this is a very well-known meme so actually laughing at this joke but
    • 3:01:07if you're a computer scientist who knows anyhow this is a very well-known meme so
    • 3:01:09sequel you know this one if you're a computer scientist who knows
    • 3:01:10and there's one last problem we'd like sequel you know this one
    • 3:01:11to introduce if you don't mind just a and there's one last problem we'd like
    • 3:01:13couple final moments here to introduce if you don't mind just a
    • 3:01:14and that is a fundamental problem in couple final moments here
    • 3:01:16computing called race conditions which and that is a fundamental problem in
    • 3:01:18for the first time is now manifest computing called race conditions which
    • 3:01:20in our discussion of sql it turns out for the first time is now manifest
    • 3:01:22that sql and sql databases in our discussion of sql it turns out
    • 3:01:24are very often used again in the real that sql and sql databases
    • 3:01:26world for very high performing are very often used again in the real
    • 3:01:28applications and by that i mean again world for very high performing
    • 3:01:30the googles the facebooks the twitters applications and by that i mean again
    • 3:01:31of the world where lots and lots of data the googles the facebooks the twitters
    • 3:01:33is coming into servers of the world where lots and lots of data
    • 3:01:35all at once and case in point some of is coming into servers
    • 3:01:37you might have clicked like all at once and case in point some of
    • 3:01:38on this egg uh some time ago this is the you might have clicked like
    • 3:01:41most liked instagram post ever on this egg uh some time ago this is the
    • 3:01:43as of last night it was up to like 50 most liked instagram post ever
    • 3:01:45plus million as of last night it was up to like 50
    • 3:01:46likes uh well eclipsed kim kardashian's plus million
    • 3:01:49previous post which is still at like 18 likes uh well eclipsed kim kardashian's
    • 3:01:51million or so previous post which is still at like 18
    • 3:01:52this is to say this is a hard problem to million or so
    • 3:01:54solve this is to say this is a hard problem to
    • 3:01:55this notion of likes coming in at such solve
    • 3:01:58an incredible rate this notion of likes coming in at such
    • 3:01:59because suppose that long story short an incredible rate
    • 3:02:01instagram because suppose that long story short
    • 3:02:02actually has a server with a sql instagram
    • 3:02:04database and they have code in python or actually has a server with a sql
    • 3:02:06c database and they have code in python or
    • 3:02:07or whatever language that's talking to c
    • 3:02:09that database or whatever language that's talking to
    • 3:02:10and suppose that they have code that's that database
    • 3:02:12trying to increment the total number of and suppose that they have code that's
    • 3:02:13likes well how might this work logically trying to increment the total number of
    • 3:02:15well in order to increment the number of likes well how might this work logically
    • 3:02:17likes that a picture like this egg has well in order to increment the number of
    • 3:02:19you might first select from the database likes that a picture like this egg has
    • 3:02:21the current number of likes you might first select from the database
    • 3:02:23for the id of that egg photograph then the current number of likes
    • 3:02:26you might add one to it for the id of that egg photograph then
    • 3:02:27then you might update the database and i you might add one to it
    • 3:02:29didn't use it before but just like then you might update the database and i
    • 3:02:30there's insert and delete there's update didn't use it before but just like
    • 3:02:32as well there's insert and delete there's update
    • 3:02:33so you might update the database with as well
    • 3:02:35the new count plus one so you might update the database with
    • 3:02:37so the code for that might look a little the new count plus one
    • 3:02:39something like this three lines of code so the code for that might look a little
    • 3:02:41using cs50s library here where you something like this three lines of code
    • 3:02:43execute select using cs50s library here where you
    • 3:02:44likes from posts where id equals execute select
    • 3:02:46question mark likes from posts where id equals
    • 3:02:47where id is the unique identifier for question mark
    • 3:02:49that egg and then i'm storing the result where id is the unique identifier for
    • 3:02:51in a rows variable which again i claim that egg and then i'm storing the result
    • 3:02:54is a list in a rows variable which again i claim
    • 3:02:55of rows i'm going to go into the first is a list
    • 3:02:57row so that's rows bracket 0 of rows i'm going to go into the first
    • 3:02:59and i'm going to go into the likes row so that's rows bracket 0
    • 3:03:01column to get the actual number and that and i'm going to go into the likes
    • 3:03:02number i'm going to store in a variable column to get the actual number and that
    • 3:03:04called like so this is gonna be like 50 number i'm going to store in a variable
    • 3:03:05million called like so this is gonna be like 50
    • 3:03:06and i want it to go to 50 million in one million
    • 3:03:08so how do i do that and i want it to go to 50 million in one
    • 3:03:09well i execute on the database update so how do i do that
    • 3:03:12posts well i execute on the database update
    • 3:03:13set likes equal to question mark and posts
    • 3:03:16then i just plug in likes plus one set likes equal to question mark and
    • 3:03:18the problem though with the instagrams then i just plug in likes plus one
    • 3:03:20and googles and twitters of the world the problem though with the instagrams
    • 3:03:22is that they don't just have one server and googles and twitters of the world
    • 3:03:24they have many thousands of servers and is that they don't just have one server
    • 3:03:26all of those servers might in parallel they have many thousands of servers and
    • 3:03:28be receiving clicks from you and i all of those servers might in parallel
    • 3:03:30on the internet and those clicks be receiving clicks from you and i
    • 3:03:33translate into this code getting on the internet and those clicks
    • 3:03:34executed executed executed translate into this code getting
    • 3:03:36and the problem is that when you have executed executed executed
    • 3:03:38three lines of code and suppose brian and the problem is that when you have
    • 3:03:40and i click on that egg at roughly the three lines of code and suppose brian
    • 3:03:42same time and i click on that egg at roughly the
    • 3:03:43my three lines might not get executed same time
    • 3:03:45before his three lines or vice versa my three lines might not get executed
    • 3:03:47they might get co-mingled before his three lines or vice versa
    • 3:03:49chronologically my first line might get they might get co-mingled
    • 3:03:51executed then brian's first line might chronologically my first line might get
    • 3:03:53get executed my second line might get executed then brian's first line might
    • 3:03:54executed brian's second line get executed my second line might get
    • 3:03:56so they might get interspersed on executed brian's second line
    • 3:03:57different servers or just temporarily so they might get interspersed on
    • 3:03:59in time chronologically that's different servers or just temporarily
    • 3:04:01problematic in time chronologically that's
    • 3:04:02because suppose brian and i click on problematic
    • 3:04:04that egg roughly at the same time because suppose brian and i click on
    • 3:04:06and we get back the same answer to the that egg roughly at the same time
    • 3:04:07select query 50 million is the current and we get back the same answer to the
    • 3:04:10count select query 50 million is the current
    • 3:04:10then our next lines of code execute on count
    • 3:04:12the servers we happen to be on then our next lines of code execute on
    • 3:04:14which adds one to the likes the the servers we happen to be on
    • 3:04:17server might accidentally end up which adds one to the likes the
    • 3:04:19updating server might accidentally end up
    • 3:04:20the row for the egg with 50 million one updating
    • 3:04:24both times because the fundamental the row for the egg with 50 million one
    • 3:04:27problem is both times because the fundamental
    • 3:04:28if my code executes while brian codes problem is
    • 3:04:31execute if my code executes while brian codes
    • 3:04:32we are both checking the value of a execute
    • 3:04:34variable at essentially the same time we are both checking the value of a
    • 3:04:37and we are both then making a conclusion variable at essentially the same time
    • 3:04:39oh and we are both then making a conclusion
    • 3:04:40the current likes are 50 million we are oh
    • 3:04:43then making a decision let's add one to the current likes are 50 million we are
    • 3:04:4550 million then making a decision let's add one to
    • 3:04:45we are then updating the value with 50 50 million
    • 3:04:47million one we are then updating the value with 50
    • 3:04:49the problem is though that really if million one
    • 3:04:52brian's code or the server he happens to the problem is though that really if
    • 3:04:54be connected to on instagram brian's code or the server he happens to
    • 3:04:55happens to have selected the number of be connected to on instagram
    • 3:04:58likes first happens to have selected the number of
    • 3:04:59he should be allowed to finish the code likes first
    • 3:05:01that's being executed he should be allowed to finish the code
    • 3:05:02so that when i select it i see 50 that's being executed
    • 3:05:04million one and i add one to that so the so that when i select it i see 50
    • 3:05:07new count is 50 million million one and i add one to that so the
    • 3:05:09two this is what's known as a race new count is 50 million
    • 3:05:11condition when you write code two this is what's known as a race
    • 3:05:13in a multi-server or fancily known as a condition when you write code
    • 3:05:15multi-threaded environment in a multi-server or fancily known as a
    • 3:05:17lines of code chronologically can get multi-threaded environment
    • 3:05:20co-mingled lines of code chronologically can get
    • 3:05:21on different servers at any given time co-mingled
    • 3:05:23the problem fundamentally derives from on different servers at any given time
    • 3:05:25the fact that if brian's server is in the problem fundamentally derives from
    • 3:05:27the middle of checking the state of a the fact that if brian's server is in
    • 3:05:29variable the middle of checking the state of a
    • 3:05:30i should be locked out i should not be variable
    • 3:05:32allowed to click on that button at the i should be locked out i should not be
    • 3:05:34same time or my logic code my code might allowed to click on that button at the
    • 3:05:36should not be allowed to execute same time or my logic code my code might
    • 3:05:37logically so there is a solution should not be allowed to execute
    • 3:05:39when you have to write code like this as logically so there is a solution
    • 3:05:41is common for twitter and instagram and when you have to write code like this as
    • 3:05:42facebook and the like is common for twitter and instagram and
    • 3:05:44to use what are called transactions facebook and the like
    • 3:05:46transactions add some few new pieces of to use what are called transactions
    • 3:05:48syntax that we won't dwell on today and transactions add some few new pieces of
    • 3:05:49you won't need to use in the coming days syntax that we won't dwell on today and
    • 3:05:51but they do solve a fundamentally hard you won't need to use in the coming days
    • 3:05:53problem but they do solve a fundamentally hard
    • 3:05:53transactions essentially allow you to problem
    • 3:05:55lock a table transactions essentially allow you to
    • 3:05:57or really a row in a table so that if lock a table
    • 3:06:00brian's or really a row in a table so that if
    • 3:06:01click on that egg results in some code brian's
    • 3:06:03executing that's in the process of click on that egg results in some code
    • 3:06:04checking what is the total like count executing that's in the process of
    • 3:06:07my click on the egg will not get handled checking what is the total like count
    • 3:06:09by the server my click on the egg will not get handled
    • 3:06:10until his code is done executing so in by the server
    • 3:06:13green here i've proposed the way you until his code is done executing so in
    • 3:06:15should do this green here i've proposed the way you
    • 3:06:16you shouldn't just execute the middle should do this
    • 3:06:17three lines you being you shouldn't just execute the middle
    • 3:06:19in facebook in this case instagram three lines you being
    • 3:06:21should execute begin transaction first in facebook in this case instagram
    • 3:06:24then commit the transaction at the end should execute begin transaction first
    • 3:06:26and the design of transactions is that then commit the transaction at the end
    • 3:06:29all of the lines in between will either and the design of transactions is that
    • 3:06:30succeed altogether all of the lines in between will either
    • 3:06:32or fail altogether the database won't succeed altogether
    • 3:06:34get into this funky or fail altogether the database won't
    • 3:06:35state where we start losing track of get into this funky
    • 3:06:37likes state where we start losing track of
    • 3:06:38on eggs and though this has not been an likes
    • 3:06:41issue in recent years back in the day on eggs and though this has not been an
    • 3:06:42when twitter was first getting started issue in recent years back in the day
    • 3:06:44twitter was super popular when twitter was first getting started
    • 3:06:45and super offline a lot of the time twitter was super popular
    • 3:06:47there was this thing called a fail whale and super offline a lot of the time
    • 3:06:49which is like the picture they showed on there was this thing called a fail whale
    • 3:06:50their website which is like the picture they showed on
    • 3:06:51when they were getting too much traffic their website
    • 3:06:53to handle that was because when people when they were getting too much traffic
    • 3:06:54are liking and tweeting and retweeting to handle that was because when people
    • 3:06:56things it's a huge amount of data coming are liking and tweeting and retweeting
    • 3:06:58in things it's a huge amount of data coming
    • 3:06:59and it turns out it's very hard to solve in
    • 3:07:01these problems but and it turns out it's very hard to solve
    • 3:07:02locking the database table or the rows these problems but
    • 3:07:05with these transactions is one way locking the database table or the rows
    • 3:07:06fundamentally to solve this with these transactions is one way
    • 3:07:08and on our final extra time today we fundamentally to solve this
    • 3:07:10thought we would play this out in the and on our final extra time today we
    • 3:07:11same example that i was taught thought we would play this out in the
    • 3:07:13transactions in some years ago same example that i was taught
    • 3:07:15suppose that the scenario at hand is transactions in some years ago
    • 3:07:16that you and your roommates have a nice suppose that the scenario at hand is
    • 3:07:19dorm fridge that you and your roommates have a nice
    • 3:07:20and you're all in the habit of drinking dorm fridge
    • 3:07:21lots of milk and you want to be able to and you're all in the habit of drinking
    • 3:07:23drink some milk lots of milk and you want to be able to
    • 3:07:24but you go to the fridge like i'm about drink some milk
    • 3:07:26to here and you realize but you go to the fridge like i'm about
    • 3:07:28uh oh we're out of milk and so now i'm to here and you realize
    • 3:07:31inspecting the state of this uh oh we're out of milk and so now i'm
    • 3:07:32refrigerator which is quite old inspecting the state of this
    • 3:07:34but also quite empty and the state of refrigerator which is quite old
    • 3:07:36this variable but also quite empty and the state of
    • 3:07:37being empty tells me that i should go to this variable
    • 3:07:38cvs and buy some more milk being empty tells me that i should go to
    • 3:07:41so what do i then do i'm presumably cvs and buy some more milk
    • 3:07:43going to close the fridge so what do i then do i'm presumably
    • 3:07:44and i'm going to go and leave and go going to close the fridge
    • 3:07:47head to cvs and i'm going to go and leave and go
    • 3:07:48unfortunately the same problem arises head to cvs
    • 3:07:50that we'll act out here in our final 60 unfortunately the same problem arises
    • 3:07:52or so seconds together that we'll act out here in our final 60
    • 3:07:53whereby if brian now my roommate in this or so seconds together
    • 3:07:55story also wants some milk whereby if brian now my roommate in this
    • 3:07:57he comes by when i'm already headed to story also wants some milk
    • 3:07:59the store inspects the state of the he comes by when i'm already headed to
    • 3:08:00fridge the store inspects the state of the
    • 3:08:01and realizes oh we're out of milk so he fridge
    • 3:08:03nicely will go restock as well and realizes oh we're out of milk so he
    • 3:08:05so let's see how this plays out and nicely will go restock as well
    • 3:08:07we'll see if there's isn't so let's see how this plays out and
    • 3:08:08a similar analogous solution so we'll see if there's isn't
    • 3:08:11i've checked the state of the variable a similar analogous solution so
    • 3:08:13we're indeed out of milk i'll be right i've checked the state of the variable
    • 3:08:15back we're indeed out of milk i'll be right
    • 3:08:15just going to go to cvs back
    • 3:08:54do
    • 3:09:56all right i am now back from the store
    • 3:09:58i've picked up some milk all right i am now back from the store
    • 3:09:59gonna go ahead and put it into the i've picked up some milk
    • 3:10:01fridge and oh how did this happen now gonna go ahead and put it into the
    • 3:10:03there's multiple jugs of milk and of fridge and oh how did this happen now
    • 3:10:05course you know milk does not last that there's multiple jugs of milk and of
    • 3:10:07long and brian and i don't drink that course you know milk does not last that
    • 3:10:08much milk so this is like a really long and brian and i don't drink that
    • 3:10:09serious problem much milk so this is like a really
    • 3:10:10we've sort of tried to update the very serious problem
    • 3:10:12value of this variable we've sort of tried to update the very
    • 3:10:14at the same time so so how do we go value of this variable
    • 3:10:16about fixing this what's the at the same time so so how do we go
    • 3:10:17the actual solution here well i dare say about fixing this what's the
    • 3:10:20that we can draw some the actual solution here well i dare say
    • 3:10:21inspiration from the world of that we can draw some
    • 3:10:23transactions and the world of databases inspiration from the world of
    • 3:10:25and perhaps create a visual for here transactions and the world of databases
    • 3:10:27that we hope you never forget if you and perhaps create a visual for here
    • 3:10:28take nothing away from today let's go that we hope you never forget if you
    • 3:10:30ahead and act this act this out one last take nothing away from today let's go
    • 3:10:32time ahead and act this act this out one last
    • 3:10:32where this time i'm gonna be a little time
    • 3:10:33more extreme i go ahead and open the where this time i'm gonna be a little
    • 3:10:35fridge i realize ah we're out of milk more extreme i go ahead and open the
    • 3:10:37i'm gonna go to the store i do not want fridge i realize ah we're out of milk
    • 3:10:40to allow for this situation where brian i'm gonna go to the store i do not want
    • 3:10:42accidentally checks the fridge as well to allow for this situation where brian
    • 3:10:44so i i'm going to accidentally checks the fridge as well
    • 3:10:47lock the refrigerator instead let me go so i i'm going to
    • 3:10:50ahead and lock the refrigerator instead let me go
    • 3:10:52drape this through here ahead and
    • 3:10:55a little extreme but i think so long as
    • 3:10:58he can't get a little extreme but i think so long as
    • 3:10:59into the fridge this shouldn't be a he can't get
    • 3:11:02problem into the fridge this shouldn't be a
    • 3:11:04let me go ahead now and just attach the problem
    • 3:11:06lock here let me go ahead now and just attach the
    • 3:11:08almost got it come on all right lock here
    • 3:11:11now the fridge is locked now i'm gonna almost got it come on all right
    • 3:11:14go get some milk now the fridge is locked now i'm gonna
    • 3:11:32i can come up on stage and just tell me
    • 3:11:35when and i'll just i can come up on stage and just tell me
    • 3:11:38oh all right that's it for cs50 sorry to
    • 3:11:41keep you late we will see you oh all right that's it for cs50 sorry to
    • 3:11:42next time keep you late we will see you
    • 3:12:23you
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video