CS50 Video Player
    • 🧁

    • 🍬

    • 🍐

    • 🍿
    • 0:00:00We will begin shortly
    • 0:17:56Introduction
    • 0:18:43Data
    • 0:20:27Spreadsheets
    • 0:27:22Flat-File Databases
    • 0:29:26CSV Files
    • 0:30:17favorites.py
    • 0:39:44Data Cleaning
    • 0:41:50sorted
    • 1:06:28Lambda Functions
    • 1:16:25Relational Databases
    • 1:17:30Break begins
    • 1:25:20Break resumes
    • 1:25:21SQLite
    • 1:32:24SQL
    • 1:32:34CRUD
    • 1:36:45SELECT
    • 1:40:18DISTINCT
    • 1:42:24LIKE
    • 1:45:06ORDER BY
    • 1:46:47GROUP BY
    • 1:55:12INSERT
    • 2:00:26UPDATE
    • 2:01:17DELETE
    • 2:02:37Relational Data
    • 2:06:06Data Types
    • 2:07:22Constraints
    • 2:08:26PRIMARY KEY
    • 2:09:50FOREIGN KEY
    • 2:11:39CS50 Library
    • 2:13:16Many-to-Many Relationships
    • 2:27:34Break begins
    • 2:32:06Break resumes
    • 2:32:07Many-to-Many Relationships (continued)
    • 2:39:50IMDb
    • 2:47:42Indexes
    • 2:50:07JOINs
    • 2:55:44SQL Injection
    • 3:01:13Race Conditions
    • 3:07:09Race Conditions (Demo)
    • 0:17:58all right
    • 0:17:59this is cs50 and this is week seven and all right
    • 0:18:02today's focus is going to be entirely on this is cs50 and this is week seven and
    • 0:18:04data the process of collecting it the today's focus is going to be entirely on
    • 0:18:06process of storing it the process of data the process of collecting it the
    • 0:18:07searching it and so much more you'll process of storing it the process of
    • 0:18:09recall that last week we started off by searching it and so much more you'll
    • 0:18:11playing around with a relatively small recall that last week we started off by
    • 0:18:13data set we asked everyone for playing around with a relatively small
    • 0:18:14what their preferred house at hogwarts data set we asked everyone for
    • 0:18:16might be and then we proceeded to what their preferred house at hogwarts
    • 0:18:18analyze that data a little bit might be and then we proceeded to
    • 0:18:20using some python and counting up how analyze that data a little bit
    • 0:18:22many people wanted gryffindor slytherin using some python and counting up how
    • 0:18:23or the others as well many people wanted gryffindor slytherin
    • 0:18:25and we ultimately did that by using a or the others as well
    • 0:18:26google form to collect it and we stored and we ultimately did that by using a
    • 0:18:28all of the data in a google spreadsheet google form to collect it and we stored
    • 0:18:30which we then exported of course all of the data in a google spreadsheet
    • 0:18:31as a csv file so this week we thought which we then exported of course
    • 0:18:34we'd collect a little more data and see as a csv file so this week we thought
    • 0:18:35what kinds of problems arise when we we'd collect a little more data and see
    • 0:18:37start using what kinds of problems arise when we
    • 0:18:38only a spreadsheet or in turn a csv file start using
    • 0:18:41to store the data that we care about so only a spreadsheet or in turn a csv file
    • 0:18:43in fact if you could go ahead and go to to store the data that we care about so
    • 0:18:44this url in fact if you could go ahead and go to
    • 0:18:45here that you see you should see another this url
    • 0:18:47google form here that you see you should see another
    • 0:18:48this one asking you some different google form
    • 0:18:50questions all of us probably have some this one asking you some different
    • 0:18:52preferred tv shows now more than ever questions all of us probably have some
    • 0:18:54perhaps preferred tv shows now more than ever
    • 0:18:55and what we'd like to do is ask everyone perhaps
    • 0:18:57to input into that form and what we'd like to do is ask everyone
    • 0:18:58their favorite tv show followed by the to input into that form
    • 0:19:01genre or their favorite tv show followed by the
    • 0:19:02genres into which that particular tv genre or
    • 0:19:05show genres into which that particular tv
    • 0:19:06falls so go ahead and take a moment to show
    • 0:19:08do that falls so go ahead and take a moment to
    • 0:19:09and if you're unable to follow along at do that
    • 0:19:11home what folks are looking at is a form and if you're unable to follow along at
    • 0:19:13quite like this one here home what folks are looking at is a form
    • 0:19:14whereby we're just asking them for the quite like this one here
    • 0:19:16title of their preferred tv show whereby we're just asking them for the
    • 0:19:18and the genre or genres of that specific title of their preferred tv show
    • 0:19:23tv show so go ahead and fill that out if and the genre or genres of that specific
    • 0:19:26you could tv show so go ahead and fill that out if
    • 0:19:27we'll keep an eye on the responses you could
    • 0:19:28coming in we'll give everyone a few we'll keep an eye on the responses
    • 0:19:31moments to think about coming in we'll give everyone a few
    • 0:19:33their preferred tv show moments to think about
    • 0:19:36i myself have been re-watching a bit of their preferred tv show
    • 0:19:39the office i myself have been re-watching a bit of
    • 0:19:40i have been watching a lot the office
    • 0:19:44of reruns of older shows i probably i i have been watching a lot
    • 0:19:47think the point is been watching too of reruns of older shows i probably i
    • 0:19:48much tv think the point is been watching too
    • 0:19:50during all of this but in my defense much tv
    • 0:19:52it's on in the background while i'm during all of this but in my defense
    • 0:19:53doing work on my laptop so hopefully it's on in the background while i'm
    • 0:19:55that makes it okay doing work on my laptop so hopefully
    • 0:19:56let me take a look at the responses that that makes it okay
    • 0:19:58have come in all right we're getting a let me take a look at the responses that
    • 0:20:00lot of good data here on the order have come in all right we're getting a
    • 0:20:01of hundreds of responses give you just lot of good data here on the order
    • 0:20:04another moment or so of hundreds of responses give you just
    • 0:20:06the question at hand again is favorite another moment or so
    • 0:20:08tv show the title thereof the question at hand again is favorite
    • 0:20:10and the genre or genres into which tv show the title thereof
    • 0:20:13that tv show falls and the genre or genres into which
    • 0:20:16brian are you okay with my starting to that tv show falls
    • 0:20:18look at the data it's okay if we keep brian are you okay with my starting to
    • 0:20:19collecting some more but i'm gonna go look at the data it's okay if we keep
    • 0:20:21ahead and show the top collecting some more but i'm gonna go
    • 0:20:22uh few rows if that sounds good ahead and show the top
    • 0:20:25all right so let's go ahead and start to uh few rows if that sounds good
    • 0:20:27look at some of this data that's come in all right so let's go ahead and start to
    • 0:20:29here is the resulting google spreadsheet look at some of this data that's come in
    • 0:20:31that google forms has created for us and here is the resulting google spreadsheet
    • 0:20:32you'll notice that by default google that google forms has created for us and
    • 0:20:34forms you'll notice that by default google
    • 0:20:34this particular tool has three different forms
    • 0:20:36columns at least for this form one is a this particular tool has three different
    • 0:20:38time stamp and google automatically columns at least for this form one is a
    • 0:20:39gives us that time stamp and google automatically
    • 0:20:40based on what day and time everyone was gives us that
    • 0:20:42buzzing in with their responses based on what day and time everyone was
    • 0:20:44then they have a header row beyond that buzzing in with their responses
    • 0:20:46for title then they have a header row beyond that
    • 0:20:47and genres i've manually bold faced it for title
    • 0:20:50in advance just to make it stand out but and genres i've manually bold faced it
    • 0:20:52you'll notice that the in advance just to make it stand out but
    • 0:20:53headings here title and genres perfectly you'll notice that the
    • 0:20:55matches the question that we asked headings here title and genres perfectly
    • 0:20:57in the google form that allows us to matches the question that we asked
    • 0:20:59therefore line up your responses in the google form that allows us to
    • 0:21:01with our questions and you can see here therefore line up your responses
    • 0:21:04punisher was the first with our questions and you can see here
    • 0:21:05favorite tv show to be inputted followed punisher was the first
    • 0:21:07by the office breaking bad new girl favorite tv show to be inputted followed
    • 0:21:09archer by the office breaking bad new girl
    • 0:21:09another office and so forth and in the archer
    • 0:21:12third column another office and so forth and in the
    • 0:21:13under genres you'll see that there's third column
    • 0:21:14something curious here while some of the under genres you'll see that there's
    • 0:21:17cells that is the little boxes something curious here while some of the
    • 0:21:19of text have just single words like cells that is the little boxes
    • 0:21:21comedy or drama of text have just single words like
    • 0:21:22you'll notice that some of them have a comedy or drama
    • 0:21:24comma separated list and that comma you'll notice that some of them have a
    • 0:21:25separated list is because comma separated list and that comma
    • 0:21:27some of you checked as you could separated list is because
    • 0:21:28multiple check boxes to indicate that some of you checked as you could
    • 0:21:31breaking bad is a crime genre drama multiple check boxes to indicate that
    • 0:21:35and also thriller and so the way google breaking bad is a crime genre drama
    • 0:21:37forms handles this is a bit and also thriller and so the way google
    • 0:21:39lazily in the sense that they just drop forms handles this is a bit
    • 0:21:41all of those values lazily in the sense that they just drop
    • 0:21:43as a comma separated list inside of the all of those values
    • 0:21:46spreadsheet itself and that's as a comma separated list inside of the
    • 0:21:48potentially a problem if we ultimately spreadsheet itself and that's
    • 0:21:49download this as potentially a problem if we ultimately
    • 0:21:50a csv file comma separated values download this as
    • 0:21:53because if now you have a csv file comma separated values
    • 0:21:54commas inside in between the commas because if now you have
    • 0:21:57fortunately there's a solution to that commas inside in between the commas
    • 0:21:58that we'll ultimately see fortunately there's a solution to that
    • 0:22:00so we've got a good amount of data here that we'll ultimately see
    • 0:22:01in fact if i keep scrolling down we'll so we've got a good amount of data here
    • 0:22:03see a few hundred responses now in fact if i keep scrolling down we'll
    • 0:22:05and would be nice to analyze this data see a few hundred responses now
    • 0:22:07in some way and figure out what the most and would be nice to analyze this data
    • 0:22:09popular tv show is in some way and figure out what the most
    • 0:22:10maybe search for new shows i might like popular tv show is
    • 0:22:12via their genre maybe search for new shows i might like
    • 0:22:13so you can imagine some number of via their genre
    • 0:22:15queries that could be answered by way of so you can imagine some number of
    • 0:22:17this data set but let's first consider queries that could be answered by way of
    • 0:22:19the limitations of leaving this data in this data set but let's first consider
    • 0:22:22just the limitations of leaving this data in
    • 0:22:22a spreadsheet like this all of us are just
    • 0:22:25probably in the habit of using a spreadsheet like this all of us are
    • 0:22:26occasionally google spreadsheets apple probably in the habit of using
    • 0:22:28numbers occasionally google spreadsheets apple
    • 0:22:29microsoft excel or some other tool so numbers
    • 0:22:32let's consider microsoft excel or some other tool so
    • 0:22:33what spreadsheets are good at and what let's consider
    • 0:22:35they are bad at what spreadsheets are good at and what
    • 0:22:36would anyone like to volunteer and they are bad at
    • 0:22:38answer to the first of those what is a would anyone like to volunteer and
    • 0:22:40spreadsheet good at answer to the first of those what is a
    • 0:22:41or good for or not quite sure how to spreadsheet good at
    • 0:22:44answer that or good for or not quite sure how to
    • 0:22:45what do you use spreadsheets for what answer that
    • 0:22:48are some useful what do you use spreadsheets for what
    • 0:22:50problems they solve for us are some useful
    • 0:22:54uh yeah andrew what's your thinking on
    • 0:22:56spreadsheets uh yeah andrew what's your thinking on
    • 0:22:59over to andrew park oh hey um they're
    • 0:23:03very good for quickly sorting okay very over to andrew park oh hey um they're
    • 0:23:06good for quickly sorting i like that i very good for quickly sorting okay very
    • 0:23:08could click on the top of the title good for quickly sorting i like that i
    • 0:23:09column for instance and immediately sort could click on the top of the title
    • 0:23:11all of those titles column for instance and immediately sort
    • 0:23:13by alphabetically i like that other all of those titles
    • 0:23:15reasons to use by alphabetically i like that other
    • 0:23:16a spreadsheet what problems do they reasons to use
    • 0:23:18solve what are they good at a spreadsheet what problems do they
    • 0:23:21other thoughts on spreadsheets yeah how solve what are they good at
    • 0:23:22about peter other thoughts on spreadsheets yeah how
    • 0:23:30storing large amounts of data that you
    • 0:23:32can later analyze storing large amounts of data that you
    • 0:23:33okay so storing large amounts of data can later analyze
    • 0:23:35that you can later analyze it's kind of okay so storing large amounts of data
    • 0:23:37a nice model for storing lots of that you can later analyze it's kind of
    • 0:23:39rows of data so to speak i will say that a nice model for storing lots of
    • 0:23:41there actually is a limit and in fact rows of data so to speak i will say that
    • 0:23:42back in the day i learned what this there actually is a limit and in fact
    • 0:23:44limit is back in the day i learned what this
    • 0:23:45long story short in graduate school i limit is
    • 0:23:46was using a spreadsheet to analyze some long story short in graduate school i
    • 0:23:48research data was using a spreadsheet to analyze some
    • 0:23:49and at one point i had more data than research data
    • 0:23:51excel and at one point i had more data than
    • 0:23:52supported rows for specifically i had excel
    • 0:23:55some 65 supported rows for specifically i had
    • 0:23:57536 rows which was too many at that some 65
    • 0:24:00point for excel at the time because 536 rows which was too many at that
    • 0:24:02long story short if you recall from a point for excel at the time because
    • 0:24:04spreadsheet program like google long story short if you recall from a
    • 0:24:05spreadsheets spreadsheet program like google
    • 0:24:06every row is numbered from 1 on up well spreadsheets
    • 0:24:08unfortunately at the time microsoft had every row is numbered from 1 on up well
    • 0:24:10used a 16-bit integer unfortunately at the time microsoft had
    • 0:24:1216 bits or two bytes to represent each used a 16-bit integer
    • 0:24:15of those numbers 16 bits or two bytes to represent each
    • 0:24:16and it turns out the 2 to the 16th power of those numbers
    • 0:24:18is roughly 65 000 and it turns out the 2 to the 16th power
    • 0:24:20so at that point i maxed out the total is roughly 65 000
    • 0:24:22number of rows now to peter's point so at that point i maxed out the total
    • 0:24:24they've increased that in recent years number of rows now to peter's point
    • 0:24:26and you can actually store a lot more they've increased that in recent years
    • 0:24:27data so spreadsheets are indeed good at and you can actually store a lot more
    • 0:24:29that data so spreadsheets are indeed good at
    • 0:24:30but they're not necessarily good at that
    • 0:24:32everything because at some point you're but they're not necessarily good at
    • 0:24:33going to have more data potentially in a everything because at some point you're
    • 0:24:35spreadsheet going to have more data potentially in a
    • 0:24:36than your mac or pc can handle in fact spreadsheet
    • 0:24:39if you're actually trying to build an than your mac or pc can handle in fact
    • 0:24:40application if you're actually trying to build an
    • 0:24:41whether it's twitter or instagram or application
    • 0:24:43facebook or anything whether it's twitter or instagram or
    • 0:24:44of that scale those companies are facebook or anything
    • 0:24:46certainly not storing their data suffice of that scale those companies are
    • 0:24:48it to say in a spreadsheet because there certainly not storing their data suffice
    • 0:24:49would just be way too much data to use it to say in a spreadsheet because there
    • 0:24:51and no one could literally open it would just be way too much data to use
    • 0:24:53on their computer so we'll need a and no one could literally open it
    • 0:24:54solution to that problem of scale on their computer so we'll need a
    • 0:24:57but i don't think we need to throw out solution to that problem of scale
    • 0:24:59what works well about spreadsheets so but i don't think we need to throw out
    • 0:25:01you can store indeed a lot of data in what works well about spreadsheets so
    • 0:25:03row form you can store indeed a lot of data in
    • 0:25:04but it would seem that you can also row form
    • 0:25:05store a lot of data in column form and but it would seem that you can also
    • 0:25:08even though i'm only showing columns a b store a lot of data in column form and
    • 0:25:10and c of course you've probably used even though i'm only showing columns a b
    • 0:25:12spreadsheets where you add more columns and c of course you've probably used
    • 0:25:13d spreadsheets where you add more columns
    • 0:25:14e f and so forth so what's the right d
    • 0:25:16mental model e f and so forth so what's the right
    • 0:25:17for how to think about rows versus mental model
    • 0:25:20columns in a spreadsheet for how to think about rows versus
    • 0:25:22i feel like we probably use them in a columns in a spreadsheet
    • 0:25:25somewhat i feel like we probably use them in a
    • 0:25:26different way conceptually we might somewhat
    • 0:25:29think about them a little differently different way conceptually we might
    • 0:25:31what's the difference between rows and think about them a little differently
    • 0:25:34columns in a spreadsheet sophia what's the difference between rows and
    • 0:25:39um adding more entries like adding more
    • 0:25:42data is um adding more entries like adding more
    • 0:25:43those are within the rows but then like data is
    • 0:25:44the actual attributes or characteristics those are within the rows but then like
    • 0:25:46of the data should be in columns the actual attributes or characteristics
    • 0:25:48exactly when you add more data to the of the data should be in columns
    • 0:25:49spreadsheet you should really be adding exactly when you add more data to the
    • 0:25:52to the bottom of it adding more and more spreadsheet you should really be adding
    • 0:25:54rows so these things sort of grow to the bottom of it adding more and more
    • 0:25:56vertically even though of course that's rows so these things sort of grow
    • 0:25:57just a human's perception of it vertically even though of course that's
    • 0:25:59they grow from top to bottom by adding just a human's perception of it
    • 0:26:01more and more rows but to sophia's point they grow from top to bottom by adding
    • 0:26:03your columns represent what we might more and more rows but to sophia's point
    • 0:26:05called attributes or your columns represent what we might
    • 0:26:07fields uh or any other such called attributes or
    • 0:26:09characteristic fields uh or any other such
    • 0:26:10that kind of is a type of data that characteristic
    • 0:26:12you're storing so in this case of our that kind of is a type of data that
    • 0:26:14form you're storing so in this case of our
    • 0:26:14timestamp is the first column title is form
    • 0:26:16the second column timestamp is the first column title is
    • 0:26:18genres is the third column and those the second column
    • 0:26:19columns can indeed be thought of as genres is the third column and those
    • 0:26:21fields or attributes properties of your columns can indeed be thought of as
    • 0:26:23data fields or attributes properties of your
    • 0:26:24and those are properties that you should data
    • 0:26:25really decide on in advance when you're and those are properties that you should
    • 0:26:27first creating the form in our case or really decide on in advance when you're
    • 0:26:29when you're manually creating the first creating the form in our case or
    • 0:26:30spreadsheet in your in when you're manually creating the
    • 0:26:32another case you should not really be in spreadsheet in your in
    • 0:26:34the habit when using spreadsheets another case you should not really be in
    • 0:26:36be in the habit of adding data from left the habit when using spreadsheets
    • 0:26:38to right adding more and more columns be in the habit of adding data from left
    • 0:26:40unless to right adding more and more columns
    • 0:26:41you decide to collect more types of data unless
    • 0:26:45so just because someone adds a new you decide to collect more types of data
    • 0:26:47favorite tv show so just because someone adds a new
    • 0:26:48to your data set you shouldn't be adding favorite tv show
    • 0:26:50that from left to right in a new column to your data set you shouldn't be adding
    • 0:26:51you should indeed be adding it from top that from left to right in a new column
    • 0:26:53to bottom you should indeed be adding it from top
    • 0:26:54but suppose that we actually decided to to bottom
    • 0:26:56collect more information from everyone but suppose that we actually decided to
    • 0:26:58maybe that form had instead asked you collect more information from everyone
    • 0:26:59for your name or your maybe that form had instead asked you
    • 0:27:01email address or any other questions for your name or your
    • 0:27:03those properties or attributes or fields email address or any other questions
    • 0:27:06would belong as new columns so this is those properties or attributes or fields
    • 0:27:08to say we generally decide on the would belong as new columns so this is
    • 0:27:10the layout of our data the schema of our to say we generally decide on the
    • 0:27:13data in advance the layout of our data the schema of our
    • 0:27:14and then from there on out we proceed to data in advance
    • 0:27:16add add add more rows and then from there on out we proceed to
    • 0:27:18not columns unless we change our mind add add add more rows
    • 0:27:20and need to change the schema not columns unless we change our mind
    • 0:27:22of our particular data so it turns out and need to change the schema
    • 0:27:25that spreadsheets are indeed wonderfully of our particular data so it turns out
    • 0:27:26useful to peter's point for you know that spreadsheets are indeed wonderfully
    • 0:27:28large or reasonably large useful to peter's point for you know
    • 0:27:30data sets that we might collect and we large or reasonably large
    • 0:27:33can of course per last week export those data sets that we might collect and we
    • 0:27:35data sets as can of course per last week export those
    • 0:27:36csv files and so we can go from a data sets as
    • 0:27:38spreadsheet to a simple text file rep csv files and so we can go from a
    • 0:27:41stored in ascii or unicode more spreadsheet to a simple text file rep
    • 0:27:43generally on your own hard drive or stored in ascii or unicode more
    • 0:27:44somewhere in the cloud generally on your own hard drive or
    • 0:27:46and you can actually think of that file somewhere in the cloud
    • 0:27:48that dot csv and you can actually think of that file
    • 0:27:49file is what we might call a flat file that dot csv
    • 0:27:52database file is what we might call a flat file
    • 0:27:52a database is generally speaking a file database
    • 0:27:55that stores data a database is generally speaking a file
    • 0:27:56or it's a program that stores data for that stores data
    • 0:27:59you and all of us have probably thought or it's a program that stores data for
    • 0:28:00about or use databases in some sense you and all of us have probably thought
    • 0:28:02you're probably about or use databases in some sense
    • 0:28:03uh familiar with the fact that all of you're probably
    • 0:28:05those same big websites google and uh familiar with the fact that all of
    • 0:28:06twitter and facebook and others those same big websites google and
    • 0:28:08use databases to store our data well twitter and facebook and others
    • 0:28:10those databases are either just really use databases to store our data well
    • 0:28:12big files containing lots of data those databases are either just really
    • 0:28:14or special programs that are storing our big files containing lots of data
    • 0:28:16data for us or special programs that are storing our
    • 0:28:17and a flat file is just referring to the data for us
    • 0:28:19fact that it really is a very simple and a flat file is just referring to the
    • 0:28:21design fact that it really is a very simple
    • 0:28:21in fact years ago decades ago humans design
    • 0:28:24decided in fact years ago decades ago humans
    • 0:28:24when storing data in simple text files decided
    • 0:28:27that if you want to store different when storing data in simple text files
    • 0:28:29types of data like to sophia's point that if you want to store different
    • 0:28:30different properties or attributes types of data like to sophia's point
    • 0:28:32well let's keep it simple let's just different properties or attributes
    • 0:28:34separate those columns well let's keep it simple let's just
    • 0:28:35with commas in our flat file database separate those columns
    • 0:28:39aka a csv with commas in our flat file database
    • 0:28:40you can use other things you can use aka a csv
    • 0:28:42tabs there's things called tsvs for tab you can use other things you can use
    • 0:28:44separated values tabs there's things called tsvs for tab
    • 0:28:46and frankly you can use anything you separated values
    • 0:28:47want but there is a corner case and and frankly you can use anything you
    • 0:28:49we've already seen a preview of it want but there is a corner case and
    • 0:28:51what if your actual data has a comma in we've already seen a preview of it
    • 0:28:54it what if the title of your favorite tv what if your actual data has a comma in
    • 0:28:56show has a comma it what if the title of your favorite tv
    • 0:28:57what if google is presuming to store show has a comma
    • 0:28:59genres as a comma separated list what if google is presuming to store
    • 0:29:01bad things can happen if using a csv as genres as a comma separated list
    • 0:29:04your flat file database bad things can happen if using a csv as
    • 0:29:05but there's solutions to that and in your flat file database
    • 0:29:07fact what the world typically does is but there's solutions to that and in
    • 0:29:09whenever you have fact what the world typically does is
    • 0:29:09commas inside of your csv file whenever you have
    • 0:29:12you just make sure that the whole string commas inside of your csv file
    • 0:29:15is double quoted you just make sure that the whole string
    • 0:29:16on the far left and far right and is double quoted
    • 0:29:18anything inside of double quotes on the far left and far right and
    • 0:29:19is not mistaken thereafter as anything inside of double quotes
    • 0:29:22delineating a column is not mistaken thereafter as
    • 0:29:24as the other commas in the file might so delineating a column
    • 0:29:26that's all that's meant by a flat file as the other commas in the file might so
    • 0:29:28database and csv is perhaps one of the that's all that's meant by a flat file
    • 0:29:30most common the most common formats database and csv is perhaps one of the
    • 0:29:32thereof most common the most common formats
    • 0:29:32if only because all of these programs thereof
    • 0:29:34like google spreadsheets and excel and if only because all of these programs
    • 0:29:36numbers like google spreadsheets and excel and
    • 0:29:36allow you to save your files as csvs now numbers
    • 0:29:40long story short those of you who have allow you to save your files as csvs now
    • 0:29:42used fancier features of spreadsheets long story short those of you who have
    • 0:29:44like built-in functions and formulas and used fancier features of spreadsheets
    • 0:29:46those kinds of things like built-in functions and formulas and
    • 0:29:47those are built-in and proprietary to those kinds of things
    • 0:29:49google spreadsheets and excel those are built-in and proprietary to
    • 0:29:51and numbers you cannot use formulas in a google spreadsheets and excel
    • 0:29:54csv and numbers you cannot use formulas in a
    • 0:29:55file or a tsv file or in a flat file csv
    • 0:29:57database more generally file or a tsv file or in a flat file
    • 0:29:59you can only store static that is database more generally
    • 0:30:01unchanging you can only store static that is
    • 0:30:02values so when you export the data what unchanging
    • 0:30:05you see is what you get and that's why values so when you export the data what
    • 0:30:07people use you see is what you get and that's why
    • 0:30:08fancier programs like excel and numbers people use
    • 0:30:09in google spreadsheets because you get fancier programs like excel and numbers
    • 0:30:10more functionality in google spreadsheets because you get
    • 0:30:12but if you want to export the data you more functionality
    • 0:30:14can only get indeed the raw but if you want to export the data you
    • 0:30:16textual data out of it but i dare say can only get indeed the raw
    • 0:30:18that's going to be okay in fact brian do textual data out of it but i dare say
    • 0:30:19you mind if i go ahead and download that's going to be okay in fact brian do
    • 0:30:21this spreadsheet as a csv file now yep you mind if i go ahead and download
    • 0:30:23go ahead this spreadsheet as a csv file now yep
    • 0:30:24all right i'm going to go ahead and go ahead
    • 0:30:25google spreadsheets and go to file all right i'm going to go ahead and
    • 0:30:27download and you can see a whole bunch google spreadsheets and go to file
    • 0:30:29of options pdf download and you can see a whole bunch
    • 0:30:31web page comma separated values which is of options pdf
    • 0:30:34the one i want so i'm going to indeed go web page comma separated values which is
    • 0:30:35ahead and choose the one i want so i'm going to indeed go
    • 0:30:36csv from this drop-down in spreadsheets ahead and choose
    • 0:30:39that of course downloaded that file for csv from this drop-down in spreadsheets
    • 0:30:41me and now i'm going to go ahead and go that of course downloaded that file for
    • 0:30:42into our familiar cs50 ide you'll recall me and now i'm going to go ahead and go
    • 0:30:44that last week into our familiar cs50 ide you'll recall
    • 0:30:45i was able to upload a file into the ide that last week
    • 0:30:48and i'm going to go ahead and do the i was able to upload a file into the ide
    • 0:30:49same here this week as well and i'm going to go ahead and do the
    • 0:30:50i'm going to go ahead and grab my file same here this week as well
    • 0:30:52which ended up in my downloads folder on i'm going to go ahead and grab my file
    • 0:30:54my particular computer here which ended up in my downloads folder on
    • 0:30:56and i'm going to go ahead and drag and my particular computer here
    • 0:30:58drop this and i'm going to go ahead and drag and
    • 0:30:59into the ide such that it ends up in my drop this
    • 0:31:03home directory so to speak so now i have into the ide such that it ends up in my
    • 0:31:05this file favorite tv shows forms and in home directory so to speak so now i have
    • 0:31:08fact if i double click this within the this file favorite tv shows forms and in
    • 0:31:09ide fact if i double click this within the
    • 0:31:10you'll see familiar data now timestamp ide
    • 0:31:13comma you'll see familiar data now timestamp
    • 0:31:13title comma genres is our header row comma
    • 0:31:16that contains the names of the title comma genres is our header row
    • 0:31:18properties or attributes in this file that contains the names of the
    • 0:31:20then we've got our timestamps comma properties or attributes in this file
    • 0:31:22favorite title then we've got our timestamps comma
    • 0:31:23comma and then a comma separated list of favorite title
    • 0:31:26genres and here indeed notice comma and then a comma separated list of
    • 0:31:28that google took care to use double genres and here indeed notice
    • 0:31:30quotes around that google took care to use double
    • 0:31:31any values that themselves had commas so quotes around
    • 0:31:33it's a relatively simple file format any values that themselves had commas so
    • 0:31:35and i could certainly just kind of skim it's a relatively simple file format
    • 0:31:37through this figuring out who likes the and i could certainly just kind of skim
    • 0:31:39office who likes breaking bad or other through this figuring out who likes the
    • 0:31:40shows but per last week we now have a office who likes breaking bad or other
    • 0:31:42pretty useful programming language at shows but per last week we now have a
    • 0:31:44our disposal python pretty useful programming language at
    • 0:31:45that could allow us to start our disposal python
    • 0:31:46manipulating and analyzing this data that could allow us to start
    • 0:31:49more readily and here to my point last manipulating and analyzing this data
    • 0:31:51week about using the right tool for the more readily and here to my point last
    • 0:31:52job week about using the right tool for the
    • 0:31:53you could absolutely do everything we're job
    • 0:31:55about to do in all weeks prior of cs50 you could absolutely do everything we're
    • 0:31:58we could have used c about to do in all weeks prior of cs50
    • 0:31:59for what we're about to do but as you we could have used c
    • 0:32:01can probably glean c tends to be painful for what we're about to do but as you
    • 0:32:03for certain things like anything can probably glean c tends to be painful
    • 0:32:05involving string manipulation changing for certain things like anything
    • 0:32:08strings involving string manipulation changing
    • 0:32:09analyzing strings is just a real pain strings
    • 0:32:11right god forbid you had to take this analyzing strings is just a real pain
    • 0:32:12csv file right god forbid you had to take this
    • 0:32:14and load it all into memory not unlike csv file
    • 0:32:16your spell checker you would have to be and load it all into memory not unlike
    • 0:32:17using malloc your spell checker you would have to be
    • 0:32:18all over the place or reallock or the using malloc
    • 0:32:19like like there's just a lot of heavy all over the place or reallock or the
    • 0:32:21lifting involved in just like like there's just a lot of heavy
    • 0:32:22analyzing a text file so python does all lifting involved in just
    • 0:32:25of that for us by just giving us more analyzing a text file so python does all
    • 0:32:27functions at our disposal of that for us by just giving us more
    • 0:32:29with which to start analyzing and functions at our disposal
    • 0:32:31opening data with which to start analyzing and
    • 0:32:32so let me go ahead and close this file opening data
    • 0:32:34let me go ahead and create a new one so let me go ahead and close this file
    • 0:32:36called favorites.pi wherein i'm going to let me go ahead and create a new one
    • 0:32:39start playing with this data set and see called favorites.pi wherein i'm going to
    • 0:32:40if we can't start start playing with this data set and see
    • 0:32:41answering some questions about it and if we can't start
    • 0:32:43frankly to this day 20 plus years after answering some questions about it and
    • 0:32:45learning how to program for the first frankly to this day 20 plus years after
    • 0:32:46time learning how to program for the first
    • 0:32:47i myself am very much in the habit when time
    • 0:32:49writing a new program of just starting i myself am very much in the habit when
    • 0:32:50simple and not solving the problem i writing a new program of just starting
    • 0:32:52ultimately want to simple and not solving the problem i
    • 0:32:54but something simpler just as a sort of ultimately want to
    • 0:32:56proof of concept to make sure i have the but something simpler just as a sort of
    • 0:32:58right plumbing proof of concept to make sure i have the
    • 0:32:59in place so by that i mean this let's go right plumbing
    • 0:33:01ahead and write a quick in place so by that i mean this let's go
    • 0:33:02program that simply opens up this file ahead and write a quick
    • 0:33:05the csv file program that simply opens up this file
    • 0:33:06iterates over it top to bottom and just the csv file
    • 0:33:09prints out each of the titles just as a iterates over it top to bottom and just
    • 0:33:10quick sanity check that i know what i'm prints out each of the titles just as a
    • 0:33:12doing and i have access to the data quick sanity check that i know what i'm
    • 0:33:14they're in so let me go ahead and import doing and i have access to the data
    • 0:33:16csv and then i can do this in a few they're in so let me go ahead and import
    • 0:33:18different ways but by now you've csv and then i can do this in a few
    • 0:33:19probably seen or remembered different ways but by now you've
    • 0:33:21by using something like the open command probably seen or remembered
    • 0:33:23and the with keyword to sort of open by using something like the open command
    • 0:33:26and eventually automatically close this and the with keyword to sort of open
    • 0:33:27file for me this file is called favorite and eventually automatically close this
    • 0:33:30tv file for me this file is called favorite
    • 0:33:30shows dash form responses tv
    • 0:33:331 dot csv and i'm going to open this up shows dash form responses
    • 0:33:36in read mode 1 dot csv and i'm going to open this up
    • 0:33:38strictly speaking the r is not required in read mode
    • 0:33:40you might see examples online strictly speaking the r is not required
    • 0:33:41not including it that's because read is you might see examples online
    • 0:33:43the default but for parity with c not including it that's because read is
    • 0:33:45and f open i'm going to be explicit and the default but for parity with c
    • 0:33:47actually do quote-unquote r and f open i'm going to be explicit and
    • 0:33:49and i'm going to go ahead and give this actually do quote-unquote r
    • 0:33:50a variable name of file so this line and i'm going to go ahead and give this
    • 0:33:523 here has the effect of opening that a variable name of file so this line
    • 0:33:55csv file in read-only mode 3 here has the effect of opening that
    • 0:33:57and creating a variable called file via csv file in read-only mode
    • 0:33:59which i can reference it and creating a variable called file via
    • 0:34:01now i'm going to go ahead and use some which i can reference it
    • 0:34:02of that csv functionality i'm going to now i'm going to go ahead and use some
    • 0:34:03give myself what we keep calling a of that csv functionality i'm going to
    • 0:34:05reader give myself what we keep calling a
    • 0:34:05which i could call it xyz anything else reader
    • 0:34:08but reader kind of describes which i could call it xyz anything else
    • 0:34:09what this variable is going to do and but reader kind of describes
    • 0:34:11it's going to be the return value of what this variable is going to do and
    • 0:34:13calling csv.reader on that file it's going to be the return value of
    • 0:34:16and so essentially the csv library per calling csv.reader on that file
    • 0:34:19last week and so essentially the csv library per
    • 0:34:20has a lot of fancy features built in and last week
    • 0:34:21all it needs as input has a lot of fancy features built in and
    • 0:34:23is an already opened text file and then all it needs as input
    • 0:34:26it will then wrap that file so to speak is an already opened text file and then
    • 0:34:28with a whole bunch of more useful it will then wrap that file so to speak
    • 0:34:29functionality with a whole bunch of more useful
    • 0:34:30like the ability to read it column functionality
    • 0:34:33and row at a time all right now i'm like the ability to read it column
    • 0:34:35going to go ahead and and row at a time all right now i'm
    • 0:34:37you know what just for now i'm going to going to go ahead and
    • 0:34:39skip the first you know what just for now i'm going to
    • 0:34:40row i'm going to skip the first row skip the first
    • 0:34:43because the first row has my headings row i'm going to skip the first row
    • 0:34:44timestamp title and genres and i i know because the first row has my headings
    • 0:34:47what my columns are so i'm just going to timestamp title and genres and i i know
    • 0:34:49ignore that what my columns are so i'm just going to
    • 0:34:50line for now and now i'm going to do ignore that
    • 0:34:52this for row line for now and now i'm going to do
    • 0:34:53in reader let me go ahead and print out this for row
    • 0:34:56quite simply row and i only want title in reader let me go ahead and print out
    • 0:34:59so i think if it's three columns quite simply row and i only want title
    • 0:35:01from left to right it's 0 1 2 so i want so i think if it's three columns
    • 0:35:04to print out column from left to right it's 0 1 2 so i want
    • 0:35:05bracket 1 which is going to be the to print out column
    • 0:35:07second column 0 index bracket 1 which is going to be the
    • 0:35:09all right let me go ahead and save that second column 0 index
    • 0:35:10go down to my terminal window all right let me go ahead and save that
    • 0:35:12and run python of favorites.pi and cross go down to my terminal window
    • 0:35:15my fingers and run python of favorites.pi and cross
    • 0:35:16okay voila it looks like it flied it my fingers
    • 0:35:18flew by super fast okay voila it looks like it flied it
    • 0:35:20but it looks like indeed these are all flew by super fast
    • 0:35:22of the tv shows that folks have inputted but it looks like indeed these are all
    • 0:35:24indeed there's a few hundred if i keep of the tv shows that folks have inputted
    • 0:35:26scrolling up indeed there's a few hundred if i keep
    • 0:35:26so it looks like my program is working scrolling up
    • 0:35:29but let's improve it just a little bit so it looks like my program is working
    • 0:35:31it turns out that using the csv but let's improve it just a little bit
    • 0:35:34reader isn't necessarily the best it turns out that using the csv
    • 0:35:36approach in python many of you have reader isn't necessarily the best
    • 0:35:37already discovered a dict reader a approach in python many of you have
    • 0:35:39dictionary reader already discovered a dict reader a
    • 0:35:41which is nice because then you don't dictionary reader
    • 0:35:42have to know or keep double checking which is nice because then you don't
    • 0:35:44what number column your data is in you have to know or keep double checking
    • 0:35:46can instead refer it to by the what number column your data is in you
    • 0:35:48header itself so by title quote unquote can instead refer it to by the
    • 0:35:50or by genres header itself so by title quote unquote
    • 0:35:52this is also good because if you or or by genres
    • 0:35:54maybe a colleague are sort of messing this is also good because if you or
    • 0:35:55around with the spreadsheet and they maybe a colleague are sort of messing
    • 0:35:56rearrange the columns by dragging them around with the spreadsheet and they
    • 0:35:58left or right rearrange the columns by dragging them
    • 0:35:59any numbers you have used in your code 0 left or right
    • 0:36:021 2 on up any numbers you have used in your code 0
    • 0:36:03could suddenly be incorrect if your 1 2 on up
    • 0:36:05colleague has reordered those columns could suddenly be incorrect if your
    • 0:36:07so using a dictionary reader tends to be colleague has reordered those columns
    • 0:36:09a little more robust because it uses the so using a dictionary reader tends to be
    • 0:36:11titles a little more robust because it uses the
    • 0:36:12not the mere numbers it's still fallible titles
    • 0:36:14if someone yourself or someone else not the mere numbers it's still fallible
    • 0:36:16changes the values in that very first if someone yourself or someone else
    • 0:36:19row and renames titles or genres then changes the values in that very first
    • 0:36:21things are going to break but at that row and renames titles or genres then
    • 0:36:23point things are going to break but at that
    • 0:36:23we kind of have to blame you for not point
    • 0:36:25having kept track of your code versus we kind of have to blame you for not
    • 0:36:26your data but having kept track of your code versus
    • 0:36:27still a risk so i'm going to change this your data but
    • 0:36:29to dictionary reader or dict reader here still a risk so i'm going to change this
    • 0:36:31and pretty much the rest of my code can to dictionary reader or dict reader here
    • 0:36:33be the same except i don't need this and pretty much the rest of my code can
    • 0:36:34hack here on line five be the same except i don't need this
    • 0:36:36i don't need to just skip over to the hack here on line five
    • 0:36:38next row from the get-go i don't need to just skip over to the
    • 0:36:40because i now want the dictionary reader next row from the get-go
    • 0:36:42to handle the process of because i now want the dictionary reader
    • 0:36:44reading that first row for me but to handle the process of
    • 0:36:45otherwise everything else stays the same reading that first row for me but
    • 0:36:47except for this last line where now i otherwise everything else stays the same
    • 0:36:48think i can now use except for this last line where now i
    • 0:36:50row as a dictionary not as a list think i can now use
    • 0:36:53per se and print out specifically the row as a dictionary not as a list
    • 0:36:56title per se and print out specifically the
    • 0:36:57from each given row so let me go ahead title
    • 0:36:58and run python a favorite stop pi again from each given row so let me go ahead
    • 0:37:01and voila it looks like i got the same and run python a favorite stop pi again
    • 0:37:03results several hundred of them but let and voila it looks like i got the same
    • 0:37:05me stipulate that it's doing the same results several hundred of them but let
    • 0:37:06thing if we actually compared both of me stipulate that it's doing the same
    • 0:37:08those thing if we actually compared both of
    • 0:37:09side by side all right before i forge those
    • 0:37:11ahead now to actually augment this with side by side all right before i forge
    • 0:37:12new functionality ahead now to actually augment this with
    • 0:37:14any questions or confusion on this new functionality
    • 0:37:17python any questions or confusion on this
    • 0:37:17script we just wrote to open a file wrap python
    • 0:37:20it with a reader or dict reader script we just wrote to open a file wrap
    • 0:37:22and then iterate over the rows one at a it with a reader or dict reader
    • 0:37:24time printing and then iterate over the rows one at a
    • 0:37:26the titles any questions confusion on time printing
    • 0:37:29syntax at all it's okay we've only known the titles any questions confusion on
    • 0:37:31or seen python for a week syntax at all it's okay we've only known
    • 0:37:32it's fine if it's still quite new or seen python for a week
    • 0:37:35anything brian we should it's fine if it's still quite new
    • 0:37:36address yeah so why is it that you don't anything brian we should
    • 0:37:40need to address yeah so why is it that you don't
    • 0:37:40close the file using the syntax that need to
    • 0:37:43you're using right here close the file using the syntax that
    • 0:37:44really good question last week i more you're using right here
    • 0:37:46pedantically used really good question last week i more
    • 0:37:48open on its own and then i later used a pedantically used
    • 0:37:50close open on its own and then i later used a
    • 0:37:51function that was associated with the close
    • 0:37:53file that i just opened function that was associated with the
    • 0:37:54now the more pythonic way to do things file that i just opened
    • 0:37:57if you will now the more pythonic way to do things
    • 0:37:58is actually to use this with keyword if you will
    • 0:38:00which didn't exist in c is actually to use this with keyword
    • 0:38:01and it just tends to be a useful feature which didn't exist in c
    • 0:38:03in python whereby if you say and it just tends to be a useful feature
    • 0:38:05with open dot dot dot it will open the in python whereby if you say
    • 0:38:08file for you with open dot dot dot it will open the
    • 0:38:09then it will remain open so long as your file for you
    • 0:38:12code is indented inside of that then it will remain open so long as your
    • 0:38:13with keywords block and as soon as you code is indented inside of that
    • 0:38:16get to the end of your program with keywords block and as soon as you
    • 0:38:18it will automatically be closed for you get to the end of your program
    • 0:38:19so this is one of these features where it will automatically be closed for you
    • 0:38:21python in some sense so this is one of these features where
    • 0:38:22is trying to protect us from ourselves python in some sense
    • 0:38:25it's probably pretty common for humans is trying to protect us from ourselves
    • 0:38:26myself included to forget to close your it's probably pretty common for humans
    • 0:38:28file myself included to forget to close your
    • 0:38:29that can create problems with saving file
    • 0:38:30things permanently it can create memory that can create problems with saving
    • 0:38:32leaks as we know from c things permanently it can create memory
    • 0:38:34so the with keyword just assumes that leaks as we know from c
    • 0:38:36i'm not going to be an idiot and forget so the with keyword just assumes that
    • 0:38:37to close the file i'm not going to be an idiot and forget
    • 0:38:38python is going to do it for me to close the file
    • 0:38:40automatically python is going to do it for me
    • 0:38:42other questions or confusions brian automatically
    • 0:38:46how does dict reader know that title is
    • 0:38:49the name of the key how does dict reader know that title is
    • 0:38:50inside of the dictionary really good the name of the key
    • 0:38:52question too so inside of the dictionary really good
    • 0:38:54it is designed by the authors of the question too so
    • 0:38:56python language it is designed by the authors of the
    • 0:38:57to look at the very first row in the python language
    • 0:39:00file to look at the very first row in the
    • 0:39:01split it on the commas in that very file
    • 0:39:04first row split it on the commas in that very
    • 0:39:05and just assume that the first word or first row
    • 0:39:07phrase and just assume that the first word or
    • 0:39:08before the first comma is the name of phrase
    • 0:39:10the first column before the first comma is the name of
    • 0:39:11that the second word or phrase after the the first column
    • 0:39:14first comma that the second word or phrase after the
    • 0:39:14is the uh name of the second column first comma
    • 0:39:18and so forth so a dict reader just is the uh name of the second column
    • 0:39:20presumes and so forth so a dict reader just
    • 0:39:21as is the convention with csv is that presumes
    • 0:39:23your first row is going to contain as is the convention with csv is that
    • 0:39:25the headings that you want to use to your first row is going to contain
    • 0:39:27refer to those columns if your csv the headings that you want to use to
    • 0:39:29happens not to have such a heading refer to those columns if your csv
    • 0:39:31whereby it just jumps right in on the happens not to have such a heading
    • 0:39:33first row to real data then you're not whereby it just jumps right in on the
    • 0:39:35going to be able to use a dict reader first row to real data then you're not
    • 0:39:36correctly at least not without some going to be able to use a dict reader
    • 0:39:38manual configuration correctly at least not without some
    • 0:39:42other questions brian nothing else here
    • 0:39:45all right so other questions brian nothing else here
    • 0:39:46let's go ahead and now i feel like all right so
    • 0:39:48there's a whole mess here and you know let's go ahead and now i feel like
    • 0:39:49some of these shows are pretty popular there's a whole mess here and you know
    • 0:39:51and as i'm glancing over this i some of these shows are pretty popular
    • 0:39:52definitely see some duplication a whole and as i'm glancing over this i
    • 0:39:53bunch of you like definitely see some duplication a whole
    • 0:39:54the office a whole bunch of you like bunch of you like
    • 0:39:56breaking bad game of thrones and a whole the office a whole bunch of you like
    • 0:39:58bunch of other shows as well breaking bad game of thrones and a whole
    • 0:39:59so it would be nicer i think if we kind bunch of other shows as well
    • 0:40:01of narrow the scope of our look at this so it would be nicer i think if we kind
    • 0:40:03data by just looking at unique values of narrow the scope of our look at this
    • 0:40:06you're looking at unique values so data by just looking at unique values
    • 0:40:07rather than just iterate over the file you're looking at unique values so
    • 0:40:08top to bottom printing out one title rather than just iterate over the file
    • 0:40:10after another why don't we go ahead and top to bottom printing out one title
    • 0:40:12sort of accumulate all of this data in after another why don't we go ahead and
    • 0:40:15some kind of data structure sort of accumulate all of this data in
    • 0:40:16so that we can throw away duplicate some kind of data structure
    • 0:40:18values and then so that we can throw away duplicate
    • 0:40:19only print out the unique titles that values and then
    • 0:40:22we've accumulated only print out the unique titles that
    • 0:40:23so i bet we can do this in a few ways we've accumulated
    • 0:40:24but if we think back to last week's so i bet we can do this in a few ways
    • 0:40:26demonstration of our dictionary but if we think back to last week's
    • 0:40:28you'll recall that i used what was demonstration of our dictionary
    • 0:40:30called a set and i'm going to go ahead you'll recall that i used what was
    • 0:40:31and create a variable called titles and called a set and i'm going to go ahead
    • 0:40:33set it equal to something called and create a variable called titles and
    • 0:40:35set and a set is just a collection of set it equal to something called
    • 0:40:37values it's kind of like a list set and a set is just a collection of
    • 0:40:39but it eliminates duplicates for me and values it's kind of like a list
    • 0:40:41that would seem to be exactly the but it eliminates duplicates for me and
    • 0:40:42characteristic that i want that would seem to be exactly the
    • 0:40:44for this program now instead of printing characteristic that i want
    • 0:40:46each title which is now premature if i for this program now instead of printing
    • 0:40:49want to first filter out duplicates each title which is now premature if i
    • 0:40:51i'm going to go ahead and do this i'm want to first filter out duplicates
    • 0:40:52going to go ahead and add to the titles i'm going to go ahead and do this i'm
    • 0:40:55set using the add function the current going to go ahead and add to the titles
    • 0:40:58rows set using the add function the current
    • 0:40:59title so again i'm not printing it now rows
    • 0:41:01i'm instead title so again i'm not printing it now
    • 0:41:02adding to the title set that particular i'm instead
    • 0:41:05title and if it's there already no big adding to the title set that particular
    • 0:41:07deal the set title and if it's there already no big
    • 0:41:08data structure in python is going to deal the set
    • 0:41:10throw away the duplicates for me and data structure in python is going to
    • 0:41:11it's only going to go ahead and keep the throw away the duplicates for me and
    • 0:41:13uniques it's only going to go ahead and keep the
    • 0:41:13now at the bottom of my file i need to uniques
    • 0:41:16do a little more work admittedly now at the bottom of my file i need to
    • 0:41:17now i have to iterate over the set to do a little more work admittedly
    • 0:41:19print out only those unique titles so now i have to iterate over the set to
    • 0:41:21let me do this for title in print out only those unique titles so
    • 0:41:23titles go ahead and print out title and let me do this for title in
    • 0:41:27this is where python just gets really titles go ahead and print out title and
    • 0:41:28user friendly right you don't have to do this is where python just gets really
    • 0:41:30int i get 0 i less than n or whatever user friendly right you don't have to do
    • 0:41:33you can just say for title in titles and int i get 0 i less than n or whatever
    • 0:41:36if the titles variable you can just say for title in titles and
    • 0:41:38is the type of data structure that you if the titles variable
    • 0:41:40can iterate over is the type of data structure that you
    • 0:41:41which it will be if it's a list or if can iterate over
    • 0:41:44it's a set which it will be if it's a list or if
    • 0:41:45or even if it's a dictionary another it's a set
    • 0:41:46data structure we saw last week in or even if it's a dictionary another
    • 0:41:48python data structure we saw last week in
    • 0:41:49the for loop in python will just know python
    • 0:41:50what to do this will loop over the for loop in python will just know
    • 0:41:52all of the titles in the titles what to do this will loop over
    • 0:41:55sets so let me go ahead and save this all of the titles in the titles
    • 0:41:57file and go ahead now and run python of sets so let me go ahead and save this
    • 0:42:00favorites.pi file and go ahead now and run python of
    • 0:42:01and it looks like yeah the list is favorites.pi
    • 0:42:03different in some way and it looks like yeah the list is
    • 0:42:05but i'm seeing fewer results as i scroll different in some way
    • 0:42:08up definitely fewer than before because but i'm seeing fewer results as i scroll
    • 0:42:10my scroll bar didn't jump nearly up definitely fewer than before because
    • 0:42:11as far down but honestly this is kind of my scroll bar didn't jump nearly
    • 0:42:13a mess let's go ahead and sort this as far down but honestly this is kind of
    • 0:42:15now in c it would have been kind of a a mess let's go ahead and sort this
    • 0:42:17pain to sort things we'd have to whip now in c it would have been kind of a
    • 0:42:18out the pseudocode probably for bubble pain to sort things we'd have to whip
    • 0:42:20sort selection sword or god forbid merge out the pseudocode probably for bubble
    • 0:42:22sort and then implement it ourselves sort selection sword or god forbid merge
    • 0:42:23but no with python comes really the sort and then implement it ourselves
    • 0:42:25proverbial kitchen sink of functions but no with python comes really the
    • 0:42:27so if you want to sort this set you know proverbial kitchen sink of functions
    • 0:42:29what just say you want it sorted so if you want to sort this set you know
    • 0:42:31there is a function in python called what just say you want it sorted
    • 0:42:33sorted that will use one of those better there is a function in python called
    • 0:42:36algorithms maybe it's merge sort maybe sorted that will use one of those better
    • 0:42:38it's something called quick sort maybe algorithms maybe it's merge sort maybe
    • 0:42:39it's something else altogether it's not it's something called quick sort maybe
    • 0:42:41going to use a big o of n squared sort it's something else altogether it's not
    • 0:42:42someone at python probably have spent going to use a big o of n squared sort
    • 0:42:44the time implementing a better sort someone at python probably have spent
    • 0:42:46for us but it will go ahead and sort the the time implementing a better sort
    • 0:42:48set for me now let me go ahead and do for us but it will go ahead and sort the
    • 0:42:50this again let me increase the size of set for me now let me go ahead and do
    • 0:42:51my terminal window this again let me increase the size of
    • 0:42:52and rerun python of favorites.pi okay my terminal window
    • 0:42:56and now we have an interesting and rerun python of favorites.pi okay
    • 0:42:59assortment of shows that's easier for me and now we have an interesting
    • 0:43:01to wrap my mind around assortment of shows that's easier for me
    • 0:43:03because i have it now sorted here to wrap my mind around
    • 0:43:06and indeed if i scroll all the way up we because i have it now sorted here
    • 0:43:08should see all of the shows beginning and indeed if i scroll all the way up we
    • 0:43:09with should see all of the shows beginning
    • 0:43:10numbers or a period which might have with
    • 0:43:13just been someone playing around numbers or a period which might have
    • 0:43:14followed by the a words the b words and just been someone playing around
    • 0:43:16so forth so now it's a little easier to followed by the a words the b words and
    • 0:43:17wrap our minds around this so forth so now it's a little easier to
    • 0:43:19but something's up i feel like a lot of wrap our minds around this
    • 0:43:22you like avatar the last airbender but something's up i feel like a lot of
    • 0:43:24and yet i'm seeing it indeed four you like avatar the last airbender
    • 0:43:27different times and yet i'm seeing it indeed four
    • 0:43:28but i thought we were filtering this different times
    • 0:43:30down to uniques but i thought we were filtering this
    • 0:43:31by using that set structure so what's down to uniques
    • 0:43:34going on and in fact if i keep scrolling by using that set structure so what's
    • 0:43:36i'm pretty sure i saw more going on and in fact if i keep scrolling
    • 0:43:37duplicates in here uh bojack horseman i'm pretty sure i saw more
    • 0:43:40breaking bad breaking bad duplicates in here uh bojack horseman
    • 0:43:42brooklyn nine-nine brooklyn nine-nine breaking bad breaking bad
    • 0:43:44cs50 brooklyn nine-nine brooklyn nine-nine
    • 0:43:45and several different flavors um and cs50
    • 0:43:49yes uh keeps going friends so i see a and several different flavors um and
    • 0:43:52lot of duplicate value so what's going yes uh keeps going friends so i see a
    • 0:43:54on lot of duplicate value so what's going
    • 0:43:55where did those come from on
    • 0:43:58any thoughts here yeah uh kadana
    • 0:44:04yeah so your current sort is case any thoughts here yeah uh kadana
    • 0:44:07insensitive yeah so your current sort is case
    • 0:44:08this sorry is case sensitive meaning insensitive
    • 0:44:10that if someone this sorry is case sensitive meaning
    • 0:44:11spells avatar with capital a's in some that if someone
    • 0:44:13places spells avatar with capital a's in some
    • 0:44:14then it's going to be a different result places
    • 0:44:16each time yeah exactly some of you then it's going to be a different result
    • 0:44:18weren't each time yeah exactly some of you
    • 0:44:18quite diligent when it came to weren't
    • 0:44:20capitalization and so in fact the quite diligent when it came to
    • 0:44:22reality is is kudana notes that there's capitalization and so in fact the
    • 0:44:24differences in capitalization now we've reality is is kudana notes that there's
    • 0:44:26addressed this before in fact when you differences in capitalization now we've
    • 0:44:27implemented your own spell checker addressed this before in fact when you
    • 0:44:29you had to deal with this already when implemented your own spell checker
    • 0:44:30you were spell checking an arbitrary you had to deal with this already when
    • 0:44:32text you were spell checking an arbitrary
    • 0:44:32some words might be capitalized somewhat text
    • 0:44:34might be all lowercase all uppercase and some words might be capitalized somewhat
    • 0:44:36you wanted to tolerate might be all lowercase all uppercase and
    • 0:44:37different casings and so we probably you wanted to tolerate
    • 0:44:40solved this by just forcing everything different casings and so we probably
    • 0:44:41to uppercase or everything to lowercase solved this by just forcing everything
    • 0:44:43and doing things therefore case to uppercase or everything to lowercase
    • 0:44:45insensitively so give me just a moment and doing things therefore case
    • 0:44:47here and i'm going to go ahead and make insensitively so give me just a moment
    • 0:44:48a quick here and i'm going to go ahead and make
    • 0:44:49change to my form here and i'm going to a quick
    • 0:44:52go ahead and change to my form here and i'm going to
    • 0:44:55i'm going to go ahead and change this in go ahead and
    • 0:44:57such a way i'm going to go ahead and change this in
    • 0:45:00so that we have instead such a way
    • 0:45:04let me shrink back my code let's go so that we have instead
    • 0:45:06ahead and change this in such a way that let me shrink back my code let's go
    • 0:45:08we actually force everything to ahead and change this in such a way that
    • 0:45:10uppercase or lowercase doesn't really we actually force everything to
    • 0:45:11matter which but we need to canonicalize uppercase or lowercase doesn't really
    • 0:45:13things so to speak in some way matter which but we need to canonicalize
    • 0:45:15and to canonicalize things just means to things so to speak in some way
    • 0:45:17format and to canonicalize things just means to
    • 0:45:18all of your data in some standard way so format
    • 0:45:20to katana's point all of your data in some standard way so
    • 0:45:21let's just standardize the to katana's point
    • 0:45:22capitalization of things maybe all let's just standardize the
    • 0:45:24uppercase all capitalization of things maybe all
    • 0:45:25lowercase we just need to make a uppercase all
    • 0:45:26judgment call so i'm going to go ahead lowercase we just need to make a
    • 0:45:28and make a few tweaks here judgment call so i'm going to go ahead
    • 0:45:29i'm still going to use a set i'm still and make a few tweaks here
    • 0:45:31going to read the csv i'm still going to use a set i'm still
    • 0:45:32as before but instead of just adding the going to read the csv
    • 0:45:34title as before but instead of just adding the
    • 0:45:36with row bracket title i'm going to go title
    • 0:45:38ahead and force it to with row bracket title i'm going to go
    • 0:45:39uppercase just arbitrarily just for the ahead and force it to
    • 0:45:41sake of uniformity uppercase just arbitrarily just for the
    • 0:45:43and i'm also going to be and then let's sake of uniformity
    • 0:45:46go ahead and check and i'm also going to be and then let's
    • 0:45:47what exactly has happened here i'm not go ahead and check
    • 0:45:48going to change anything else but let me what exactly has happened here i'm not
    • 0:45:50go ahead and increase the size of my going to change anything else but let me
    • 0:45:51terminal window go ahead and increase the size of my
    • 0:45:52rerun python of stock pie and voila terminal window
    • 0:45:56it's a little harder to read just rerun python of stock pie and voila
    • 0:45:58because i'm not used to reading all caps it's a little harder to read just
    • 0:45:59kind of looks like we're yelling at because i'm not used to reading all caps
    • 0:46:00ourselves but kind of looks like we're yelling at
    • 0:46:02i don't see wait a minute i still see ourselves but
    • 0:46:05the office i don't see wait a minute i still see
    • 0:46:06over here twice if i keep scrolling the office
    • 0:46:10here so far i see strangers things and over here twice if i keep scrolling
    • 0:46:12strange here so far i see strangers things and
    • 0:46:13stranger things that just looks like a strange
    • 0:46:15typo i see two sherlocks though stranger things that just looks like a
    • 0:46:18this is a little suspicious so kadana typo i see two sherlocks though
    • 0:46:21you and i don't seem to have solved this is a little suspicious so kadana
    • 0:46:22things fully you and i don't seem to have solved
    • 0:46:24and this one's a little more subtle what things fully
    • 0:46:27more should i perhaps do to my data and this one's a little more subtle what
    • 0:46:30to ensure we get duplicates removed more should i perhaps do to my data
    • 0:46:36olivia
    • 0:46:41maybe trimmer around the edges and we'll
    • 0:46:43trim around the edges i like the sound maybe trimmer around the edges and we'll
    • 0:46:45of that but trim around the edges i like the sound
    • 0:46:45what do you mean what does that do oh of that but
    • 0:46:47like trim off the extra spaces in case what do you mean what does that do oh
    • 0:46:49someone put a space before or after the like trim off the extra spaces in case
    • 0:46:51words someone put a space before or after the
    • 0:46:51yeah exactly it's pretty common for words
    • 0:46:54humans intentionally or accidentally to yeah exactly it's pretty common for
    • 0:46:55hit the space bar where they shouldn't humans intentionally or accidentally to
    • 0:46:57and in fact i'm kind of inferring that i hit the space bar where they shouldn't
    • 0:46:59bet one or more of you and in fact i'm kind of inferring that i
    • 0:47:00accidentally typed sherlock space and bet one or more of you
    • 0:47:03then decided nope that's it i'm not accidentally typed sherlock space and
    • 0:47:04typing anything else then decided nope that's it i'm not
    • 0:47:05but that's space even though we can't typing anything else
    • 0:47:07quite see it obviously is there but that's space even though we can't
    • 0:47:09and when we do a string comparison or quite see it obviously is there
    • 0:47:10when the set data structure does that and when we do a string comparison or
    • 0:47:12it's actually going to be noticed when when the set data structure does that
    • 0:47:15doing those comparisons and therefore it's actually going to be noticed when
    • 0:47:17they're not going to be doing those comparisons and therefore
    • 0:47:18the same so i can do this in a few they're not going to be
    • 0:47:19different ways but it turns out in the same so i can do this in a few
    • 0:47:21python different ways but it turns out in
    • 0:47:21you can chain functions together which python
    • 0:47:23is also too kind of a fancy feature you can chain functions together which
    • 0:47:26notice what i'm doing here i'm still is also too kind of a fancy feature
    • 0:47:27accessing the titles notice what i'm doing here i'm still
    • 0:47:29set i'm adding the following value to it accessing the titles
    • 0:47:32i'm adding the value row bracket title set i'm adding the following value to it
    • 0:47:34but not quite i'm adding the value row bracket title
    • 0:47:35i'm that is a string or an str in python but not quite
    • 0:47:38speak i'm that is a string or an str in python
    • 0:47:39i'm going to go ahead and strip it which speak
    • 0:47:41means if we look up the documentation i'm going to go ahead and strip it which
    • 0:47:43for this function to olivia's point it's means if we look up the documentation
    • 0:47:45going to strip off or trim for this function to olivia's point it's
    • 0:47:46all of the white space to the left all going to strip off or trim
    • 0:47:48of the white space to the right all of the white space to the left all
    • 0:47:49whether that's the space bar or the of the white space to the right
    • 0:47:51enter key or the tab whether that's the space bar or the
    • 0:47:53character or a few other things as well enter key or the tab
    • 0:47:55it's just going to get rid of leading character or a few other things as well
    • 0:47:57and trailing white space and then it's just going to get rid of leading
    • 0:47:59whatever's left over and trailing white space and then
    • 0:48:00i'm going to go ahead and force whatever's left over
    • 0:48:01everything to uppercase in the spirit of i'm going to go ahead and force
    • 0:48:04kadana suggestion 2. so we're sort of everything to uppercase in the spirit of
    • 0:48:05combining two good ideas now kadana suggestion 2. so we're sort of
    • 0:48:07to really massage the data if you will combining two good ideas now
    • 0:48:09into a cleaner format and this is such a to really massage the data if you will
    • 0:48:12real world reality into a cleaner format and this is such a
    • 0:48:13like humans you and i cannot be trusted real world reality
    • 0:48:15to input data like humans you and i cannot be trusted
    • 0:48:16the way we are supposed to sometimes to input data
    • 0:48:18it's all lowercase because we're being a the way we are supposed to sometimes
    • 0:48:20little lazy or a little social it's all lowercase because we're being a
    • 0:48:21media-like even if we're checking out little lazy or a little social
    • 0:48:23from amazon media-like even if we're checking out
    • 0:48:24and trying to input a valid postal from amazon
    • 0:48:26address sometimes it's all capitals and trying to input a valid postal
    • 0:48:28because address sometimes it's all capitals
    • 0:48:29i can think of a few people in my life because
    • 0:48:30who don't quite understand the caps lock i can think of a few people in my life
    • 0:48:32thing just yet and so things might be who don't quite understand the caps lock
    • 0:48:34all capitalized instead thing just yet and so things might be
    • 0:48:35this is not good for computer systems all capitalized instead
    • 0:48:37that require precision to our emphasis this is not good for computer systems
    • 0:48:40in week zero that require precision to our emphasis
    • 0:48:41and so massaging data means cleaning it in week zero
    • 0:48:43up doing some mutations and so massaging data means cleaning it
    • 0:48:45that don't really change the meaning of up doing some mutations
    • 0:48:46the data but canonicalize it standardize that don't really change the meaning of
    • 0:48:49it the data but canonicalize it standardize
    • 0:48:49so that you're comparing apples and it
    • 0:48:51apples so to speak not apples and so that you're comparing apples and
    • 0:48:53oranges apples so to speak not apples and
    • 0:48:53well let me go ahead and run this again oranges
    • 0:48:55in my bigger terminal window python of well let me go ahead and run this again
    • 0:48:57favorites.hi in my bigger terminal window python of
    • 0:48:58voila and scrolling up up up i think favorites.hi
    • 0:49:02we're in a better place i only see one voila and scrolling up up up i think
    • 0:49:05office now we're in a better place i only see one
    • 0:49:06and if i keep scrolling up and up and up office now
    • 0:49:08i'm seeing typos still and if i keep scrolling up and up and up
    • 0:49:10but nothing related to white space and i i'm seeing typos still
    • 0:49:12think we have a much but nothing related to white space and i
    • 0:49:13cleaner unique list of titles at this think we have a much
    • 0:49:16point of course if we scroll up cleaner unique list of titles at this
    • 0:49:18i would have to be a lot more clever if point of course if we scroll up
    • 0:49:21i want to detect things like i would have to be a lot more clever if
    • 0:49:22typographical errors it looks like one i want to detect things like
    • 0:49:24of you typographical errors it looks like one
    • 0:49:24was very diligent about putting f period of you
    • 0:49:27or period i was very diligent about putting f period
    • 0:49:28period and so forth but then got bored or period i
    • 0:49:30at the end and left off the last period period and so forth but then got bored
    • 0:49:31but that's going to happen when you're at the end and left off the last period
    • 0:49:32taking in user input we've of course got but that's going to happen when you're
    • 0:49:34all these variants of cs50 taking in user input we've of course got
    • 0:49:36that's going to be a mess to clean up all these variants of cs50
    • 0:49:38because now you can imagine having to that's going to be a mess to clean up
    • 0:49:40add a whole bunch of if conditions and because now you can imagine having to
    • 0:49:42else's and eltifs to sort of clean all add a whole bunch of if conditions and
    • 0:49:44of that up if we do want to canonicalize else's and eltifs to sort of clean all
    • 0:49:47all different flavors of cs50 as of that up if we do want to canonicalize
    • 0:49:48quote-unquote all different flavors of cs50 as
    • 0:49:50cs50 so this is a very slippery slope quote-unquote
    • 0:49:52like you and i could start writing a cs50 so this is a very slippery slope
    • 0:49:53huge amount of data just to clean this like you and i could start writing a
    • 0:49:55up huge amount of data just to clean this
    • 0:49:55but that's the reality when dealing with up
    • 0:49:58real world data but that's the reality when dealing with
    • 0:49:59well let's go ahead now and improve this real world data
    • 0:50:02program further well let's go ahead now and improve this
    • 0:50:03do something a little fancier because i program further
    • 0:50:06now can trust that my data has been do something a little fancier because i
    • 0:50:08canonicalized except for the actual now can trust that my data has been
    • 0:50:09typos or the weird variants of cs50 and canonicalized except for the actual
    • 0:50:11the like typos or the weird variants of cs50 and
    • 0:50:12let's go ahead and figure out what's the the like
    • 0:50:14most popular favorite tv show let's go ahead and figure out what's the
    • 0:50:17among uh the audience here so i'm going most popular favorite tv show
    • 0:50:19to start where i have before with my among uh the audience here so i'm going
    • 0:50:21current code because i think i have most to start where i have before with my
    • 0:50:23of the building blocks in place current code because i think i have most
    • 0:50:25i'm going to go ahead and clean up my of the building blocks in place
    • 0:50:26code a little bit in here i'm going to i'm going to go ahead and clean up my
    • 0:50:27go ahead and give myself a separate code a little bit in here i'm going to
    • 0:50:28variable now called title go ahead and give myself a separate
    • 0:50:30just so that i can think about things in variable now called title
    • 0:50:32a little more orderly fashion just so that i can think about things in
    • 0:50:34but i'm not going to start adding things a little more orderly fashion
    • 0:50:36to this set anymore but i'm not going to start adding things
    • 0:50:37in fact a set i don't think is really to this set anymore
    • 0:50:40going to be sufficient in fact a set i don't think is really
    • 0:50:42to keep track of the popularity of tv going to be sufficient
    • 0:50:44shows because by definition the set is to keep track of the popularity of tv
    • 0:50:46throwing away duplicates shows because by definition the set is
    • 0:50:47but the goal now is kind of the opposite throwing away duplicates
    • 0:50:49i want to know but the goal now is kind of the opposite
    • 0:50:50which are the duplicates so that i can i want to know
    • 0:50:53tell you that this many people like the which are the duplicates so that i can
    • 0:50:55office this many people like tell you that this many people like the
    • 0:50:56uh breaking bad and the like so what office this many people like
    • 0:50:59tools do we have in python's toolkit uh breaking bad and the like so what
    • 0:51:02via which we could accumulate or tools do we have in python's toolkit
    • 0:51:05figure out that information via which we could accumulate or
    • 0:51:08any thoughts on what data structure figure out that information
    • 0:51:10might help us here any thoughts on what data structure
    • 0:51:11if we want to figure out show might help us here
    • 0:51:14popularity show popularity and by if we want to figure out show
    • 0:51:16popularity i just mean the frequency of popularity show popularity and by
    • 0:51:18it popularity i just mean the frequency of
    • 0:51:19in the csv file santiago it
    • 0:51:24um i guess um one option could be to use
    • 0:51:27dictionaries so that you could have like um i guess um one option could be to use
    • 0:51:29the office dictionaries so that you could have like
    • 0:51:31i don't know 20 votes and then game of the office
    • 0:51:33thrones another one so that i don't know 20 votes and then game of
    • 0:51:35a dictionary could really help you thrones another one so that
    • 0:51:37visualize that a dictionary could really help you
    • 0:51:38yeah perfect instincts recall that a visualize that
    • 0:51:40dictionary at the end of the day yeah perfect instincts recall that a
    • 0:51:42no matter how sophisticated it's dictionary at the end of the day
    • 0:51:44implemented underneath the hood like no matter how sophisticated it's
    • 0:51:45your spell checker implemented underneath the hood like
    • 0:51:46it's just a collection of key value your spell checker
    • 0:51:48pairs and indeed it's it's just a collection of key value
    • 0:51:50maybe one of the most useful data pairs and indeed it's
    • 0:51:52structures in any language because this maybe one of the most useful data
    • 0:51:54ability to associate one piece of data structures in any language because this
    • 0:51:56with another ability to associate one piece of data
    • 0:51:56is just a very general purpose solution with another
    • 0:51:59to problems and indeed to santiago's is just a very general purpose solution
    • 0:52:01point to problems and indeed to santiago's
    • 0:52:01if the problem at hand is to figure out point
    • 0:52:03the popularity of shows well let's make if the problem at hand is to figure out
    • 0:52:05the keys the titles of our shows the popularity of shows well let's make
    • 0:52:08and the frequencies thereof the votes so the keys the titles of our shows
    • 0:52:10to speak and the frequencies thereof the votes so
    • 0:52:11the values of those keys we're going to to speak
    • 0:52:13map title the values of those keys we're going to
    • 0:52:14to votes title to vote title to vote and map title
    • 0:52:17so forth so a dictionary is exactly that to votes title to vote title to vote and
    • 0:52:19so so forth so a dictionary is exactly that
    • 0:52:19let me go ahead and scroll up and i can so
    • 0:52:20make a little tweak here instead of a let me go ahead and scroll up and i can
    • 0:52:22set make a little tweak here instead of a
    • 0:52:22i can instead say dict and give myself set
    • 0:52:25just an empty dictionary i can instead say dict and give myself
    • 0:52:26there's actually shorthand notation for just an empty dictionary
    • 0:52:28that that's a little more common to use there's actually shorthand notation for
    • 0:52:30two that that's a little more common to use
    • 0:52:30empty curly braces that just means the two
    • 0:52:32exact same thing empty curly braces that just means the
    • 0:52:34give me a dictionary that's initially exact same thing
    • 0:52:35empty there's no fancy shortcut for a give me a dictionary that's initially
    • 0:52:38set you have to literally type out set empty there's no fancy shortcut for a
    • 0:52:40open paren close paren but dictionaries set you have to literally type out set
    • 0:52:42are so common so popular so powerful open paren close paren but dictionaries
    • 0:52:45they have this little syntactic shortcut are so common so popular so powerful
    • 0:52:47of just two they have this little syntactic shortcut
    • 0:52:48curly braces open and close so now that of just two
    • 0:52:51i have that curly braces open and close so now that
    • 0:52:52let me go ahead and do this inside of my i have that
    • 0:52:54for loop let me go ahead and do this inside of my
    • 0:52:55instead of printing the title which i for loop
    • 0:52:57don't want to do and instead of adding instead of printing the title which i
    • 0:52:59it to the set don't want to do and instead of adding
    • 0:53:00i now want to add it to the dictionary it to the set
    • 0:53:01so how do i do that well if my i now want to add it to the dictionary
    • 0:53:03dictionary is called titles so how do i do that well if my
    • 0:53:05i think i can essentially do something dictionary is called titles
    • 0:53:06like this titles bracket i think i can essentially do something
    • 0:53:08title uh equals or maybe plus like this titles bracket
    • 0:53:12equals one maybe i can kind of use the title uh equals or maybe plus
    • 0:53:15dictionary equals one maybe i can kind of use the
    • 0:53:16as just a little cheat sheet of counts dictionary
    • 0:53:19numbers as just a little cheat sheet of counts
    • 0:53:19that start at zero and then just add one numbers
    • 0:53:22add two add three so every time i see that start at zero and then just add one
    • 0:53:24the office the office the alphys do plus add two add three so every time i see
    • 0:53:27equals one the office the office the alphys do plus
    • 0:53:28plus equals one we can't do plus plus equals one
    • 0:53:30because that's not a thing in python it plus equals one we can't do plus plus
    • 0:53:31only exists in c because that's not a thing in python it
    • 0:53:32but this would seem to go into the only exists in c
    • 0:53:34dictionary called titles but this would seem to go into the
    • 0:53:36look up the key that matches this dictionary called titles
    • 0:53:39specific title look up the key that matches this
    • 0:53:40and then increment whatever value is specific title
    • 0:53:42there and then increment whatever value is
    • 0:53:43by one but i'm going to go ahead and run there
    • 0:53:46this a little naively here by one but i'm going to go ahead and run
    • 0:53:48let me go ahead and run python of this a little naively here
    • 0:53:50favorites favorites.pie let me go ahead and run python of
    • 0:53:51and wow or it broke already on line nine favorites favorites.pie
    • 0:53:54so it's sort of an apt uh choice of show and wow or it broke already on line nine
    • 0:53:57to begin with so it's sort of an apt uh choice of show
    • 0:53:58uh we have a key error with punisher so to begin with
    • 0:54:00punisher is bad something bad has just uh we have a key error with punisher so
    • 0:54:03happened but what does that mean punisher is bad something bad has just
    • 0:54:04a key error is referring to the fact happened but what does that mean
    • 0:54:06that i tried to access a key error is referring to the fact
    • 0:54:08an invalid key in a dictionary this is that i tried to access
    • 0:54:10saying that literally in this line of an invalid key in a dictionary this is
    • 0:54:12code here saying that literally in this line of
    • 0:54:13even though titles is a dictionary and code here
    • 0:54:15even though the value of title even though titles is a dictionary and
    • 0:54:17singular is quote unquote punisher i'm even though the value of title
    • 0:54:19getting a key error singular is quote unquote punisher i'm
    • 0:54:20because that title does not yet exist getting a key error
    • 0:54:24so even if you're not sure of the python because that title does not yet exist
    • 0:54:25syntax for fixing this problem so even if you're not sure of the python
    • 0:54:28what's the intuitive solution here syntax for fixing this problem
    • 0:54:32i cannot increment the frequency of the what's the intuitive solution here
    • 0:54:35punisher i cannot increment the frequency of the
    • 0:54:36because punisher is not in the punisher
    • 0:54:38dictionary it's almost feels like a because punisher is not in the
    • 0:54:40catch dictionary it's almost feels like a
    • 0:54:4022. uh greg catch
    • 0:54:44i think that you need first of all to
    • 0:54:47create a for loop and maybe assign a i think that you need first of all to
    • 0:54:49value to every create a for loop and maybe assign a
    • 0:54:51thing in the dictionary for example the value to every
    • 0:54:53value zero and then add thing in the dictionary for example the
    • 0:54:55one yeah so good instincts and here i value zero and then add
    • 0:54:58can use another metaphor i worry we one yeah so good instincts and here i
    • 0:54:59might have a chicken in the egg problem can use another metaphor i worry we
    • 0:55:01there because i don't think i can go to might have a chicken in the egg problem
    • 0:55:02the top of my code there because i don't think i can go to
    • 0:55:03at a loop that initializes all of the the top of my code
    • 0:55:07values in the dictionary to zero because at a loop that initializes all of the
    • 0:55:09i would need to know values in the dictionary to zero because
    • 0:55:10all of the names of the shows at that i would need to know
    • 0:55:13point now that's fine all of the names of the shows at that
    • 0:55:14i think i could take you maybe more point now that's fine
    • 0:55:16literally gray i think i could take you maybe more
    • 0:55:17and open up the csv file iterate over it literally gray
    • 0:55:21top to bottom and anytime i see a title and open up the csv file iterate over it
    • 0:55:24just initialize it in the dictionary as top to bottom and anytime i see a title
    • 0:55:26having a value of zero just initialize it in the dictionary as
    • 0:55:27zero zero then have another for loop having a value of zero
    • 0:55:30maybe reopen the file zero zero then have another for loop
    • 0:55:31and do the same and that would work but maybe reopen the file
    • 0:55:33it's arguably not very and do the same and that would work but
    • 0:55:35efficient it is asymptotically in terms it's arguably not very
    • 0:55:37of big o but that would seem to be doing efficient it is asymptotically in terms
    • 0:55:39twice as much work of big o but that would seem to be doing
    • 0:55:40iterate over the file once just to twice as much work
    • 0:55:42initialize everything to zero iterate over the file once just to
    • 0:55:44then iterate over the file a second time initialize everything to zero
    • 0:55:46just to increment the counts then iterate over the file a second time
    • 0:55:47i think we can do things a little more just to increment the counts
    • 0:55:49efficiently i think we can do things a little more
    • 0:55:50i think we can achieve not only efficiently
    • 0:55:51correctness but better design any i think we can achieve not only
    • 0:55:53thoughts correctness but better design any
    • 0:55:54on how we can still solve this problem thoughts
    • 0:55:57without having to iterate over the whole on how we can still solve this problem
    • 0:55:59thing twice without having to iterate over the whole
    • 0:56:03yeah some of it
    • 0:56:07um i think we can add in an if statement
    • 0:56:10to check if that um i think we can add in an if statement
    • 0:56:11key is in the dictionary and if it's not to check if that
    • 0:56:13then added and then go ahead and key is in the dictionary and if it's not
    • 0:56:15increment the value after then added and then go ahead and
    • 0:56:17nice and we can do exactly that so let's increment the value after
    • 0:56:19just apply that intuition if the problem nice and we can do exactly that so let's
    • 0:56:21is that i'm trying to access a key just apply that intuition if the problem
    • 0:56:24that does not yet exist well let's just is that i'm trying to access a key
    • 0:56:26be a little smarter about it and to that does not yet exist well let's just
    • 0:56:27somehow its point be a little smarter about it and to
    • 0:56:28let's check whether the key exists and somehow its point
    • 0:56:31if it does let's check whether the key exists and
    • 0:56:31then increment it but if it does not if it does
    • 0:56:34then and only then to grid's advice then increment it but if it does not
    • 0:56:36initialize it to zero so let me do that then and only then to grid's advice
    • 0:56:38let me go ahead and say initialize it to zero so let me do that
    • 0:56:39if title in titles which is the very let me go ahead and say
    • 0:56:42pythonic if title in titles which is the very
    • 0:56:43beautiful way of asking a question like pythonic
    • 0:56:45that way cleaner than in c beautiful way of asking a question like
    • 0:56:47let me go ahead then and say uh that way cleaner than in c
    • 0:56:50exactly the line from before else though let me go ahead then and say uh
    • 0:56:53if the exactly the line from before else though
    • 0:56:53that title is not yet in the dictionary if the
    • 0:56:56called titles that title is not yet in the dictionary
    • 0:56:57well that's okay too i can go ahead and called titles
    • 0:56:59say titles well that's okay too i can go ahead and
    • 0:57:00bracket title equals zero say titles
    • 0:57:04so the difference here is that i can bracket title equals zero
    • 0:57:06certainly inc i can certainly so the difference here is that i can
    • 0:57:08index into a dictionary using a key certainly inc i can certainly
    • 0:57:11that doesn't exist if i plan at that index into a dictionary using a key
    • 0:57:14moment to give it a value that doesn't exist if i plan at that
    • 0:57:15that's okay and that has always been moment to give it a value
    • 0:57:17okay since last week that's okay and that has always been
    • 0:57:19but however if i want to go ahead and okay since last week
    • 0:57:22increment the value that's there i'm but however if i want to go ahead and
    • 0:57:24going to go ahead and increment the value that's there i'm
    • 0:57:26do that in this separate line but i did going to go ahead and
    • 0:57:29introduce a bug do that in this separate line but i did
    • 0:57:30i did introduce a bug here i think i introduce a bug
    • 0:57:33need to go one step further logically i did introduce a bug here i think i
    • 0:57:36i don't think i want to initialize this need to go one step further logically
    • 0:57:37to zero i don't think i want to initialize this
    • 0:57:39per se does anyone see a subtle bug in to zero
    • 0:57:43my logic here per se does anyone see a subtle bug in
    • 0:57:46if the title is already in the my logic here
    • 0:57:47dictionary i'm incrementing it by one if the title is already in the
    • 0:57:49otherwise i'm initializing it to zero dictionary i'm incrementing it by one
    • 0:57:53any subtle catches here yeah olivia what
    • 0:57:56do you see any subtle catches here yeah olivia what
    • 0:58:03i think you should initialize it to one
    • 0:58:04since it's the first instance i think you should initialize it to one
    • 0:58:07exactly i should initialize it to one since it's the first instance
    • 0:58:08otherwise i'm accidentally overlooking exactly i should initialize it to one
    • 0:58:10this particular title and i'm going to otherwise i'm accidentally overlooking
    • 0:58:12go ahead and under count it so i can fix this particular title and i'm going to
    • 0:58:14this either by doing this go ahead and under count it so i can fix
    • 0:58:16or frankly if you prefer i don't this either by doing this
    • 0:58:18technically need to use an if or frankly if you prefer i don't
    • 0:58:19else i can use just an if by doing technically need to use an if
    • 0:58:21something like this instead i could say else i can use just an if by doing
    • 0:58:23if something like this instead i could say
    • 0:58:23title not in titles then i could go if
    • 0:58:26ahead and say title not in titles then i could go
    • 0:58:27titles bracket title get zero and then ahead and say
    • 0:58:30after that i can titles bracket title get zero and then
    • 0:58:31blindly so to speak just do this so after that i can
    • 0:58:34which one is better i think this second blindly so to speak just do this so
    • 0:58:36one is maybe a little better and that which one is better i think this second
    • 0:58:37i'm saving one line of code one is maybe a little better and that
    • 0:58:39but it's ensuring with that if condition i'm saving one line of code
    • 0:58:41to someone's advice but it's ensuring with that if condition
    • 0:58:42that i'm not indexing into the titles to someone's advice
    • 0:58:45dictionary that i'm not indexing into the titles
    • 0:58:46until i'm sure that the title is in dictionary
    • 0:58:48there so let me go ahead and run this until i'm sure that the title is in
    • 0:58:50now there so let me go ahead and run this
    • 0:58:50python of favorites dot pi enter now
    • 0:58:53and okay it didn't crash so that's good python of favorites dot pi enter
    • 0:58:56but i'm not yet seeing any useful and okay it didn't crash so that's good
    • 0:58:57information but i'm not yet seeing any useful
    • 0:58:58but i now have access to a bit more let information
    • 0:59:00me scroll down now to the bottom of this but i now have access to a bit more let
    • 0:59:02program me scroll down now to the bottom of this
    • 0:59:03where i have now this loop let me go program
    • 0:59:05ahead and print out not just the title where i have now this loop let me go
    • 0:59:07but the value of that key in the ahead and print out not just the title
    • 0:59:09dictionary but the value of that key in the
    • 0:59:10by just indexing into it here and you dictionary
    • 0:59:12might not have seen the syntax before by just indexing into it here and you
    • 0:59:14but with print might not have seen the syntax before
    • 0:59:14you can actually pass in multiple but with print
    • 0:59:16arguments and by default print will just you can actually pass in multiple
    • 0:59:17separate them with a space for you arguments and by default print will just
    • 0:59:19you can override that behavior and separate them with a space for you
    • 0:59:20separate them with anything but this is you can override that behavior and
    • 0:59:22just meant to be a quick and dirty separate them with anything but this is
    • 0:59:23program that prints out titles and now just meant to be a quick and dirty
    • 0:59:25the program that prints out titles and now
    • 0:59:25popularity thereof so let me run this the
    • 0:59:27again python favorites dot pi popularity thereof so let me run this
    • 0:59:29and voila it's kind of again python favorites dot pi
    • 0:59:32all over the place office super popular and voila it's kind of
    • 0:59:35with 26 votes there a lot of all over the place office super popular
    • 0:59:37single votes here a lot of big bang with 26 votes there a lot of
    • 0:59:40theory has nine single votes here a lot of big bang
    • 0:59:41you know this is all nice and good but i theory has nine
    • 0:59:43feel like this is going to take me you know this is all nice and good but i
    • 0:59:44forever to wrap my mind around which are feel like this is going to take me
    • 0:59:46the most forever to wrap my mind around which are
    • 0:59:46popular shows so of course how would we the most
    • 0:59:49do this well to the point made earlier popular shows so of course how would we
    • 0:59:50with spreadsheets my god in microsoft do this well to the point made earlier
    • 0:59:53excel or google spreadsheets or apple with spreadsheets my god in microsoft
    • 0:59:54numbers you just excel or google spreadsheets or apple
    • 0:59:55click the column heading and boom sort numbers you just
    • 0:59:57it we seem to have lost that capability click the column heading and boom sort
    • 0:59:59unless we now do it in code so it we seem to have lost that capability
    • 1:00:00let me do that for us let me go ahead unless we now do it in code so
    • 1:00:02and go back to my code let me do that for us let me go ahead
    • 1:00:04and it looks like sorted even though and go back to my code
    • 1:00:08it does work on dictionaries is actually and it looks like sorted even though
    • 1:00:11sorting by it does work on dictionaries is actually
    • 1:00:12key not by value and here's where our sorting by
    • 1:00:15python programming techniques need to key not by value and here's where our
    • 1:00:17get a little more sophisticated and we python programming techniques need to
    • 1:00:18wanted to introduce another feature here get a little more sophisticated and we
    • 1:00:20now of wanted to introduce another feature here
    • 1:00:20python which is going to solve this now of
    • 1:00:22problem specifically but in a pretty python which is going to solve this
    • 1:00:24general way problem specifically but in a pretty
    • 1:00:25so if we read the documentation for general way
    • 1:00:27sordid uh the sorted so if we read the documentation for
    • 1:00:29function indeed sorts sets by the values sordid uh the sorted
    • 1:00:32they're in function indeed sorts sets by the values
    • 1:00:33it sorts lists by the values they're in they're in
    • 1:00:35it sorts dictionaries it sorts lists by the values they're in
    • 1:00:37by the keys they're in because it sorts dictionaries
    • 1:00:39dictionaries have two pieces of by the keys they're in because
    • 1:00:41information for every dictionaries have two pieces of
    • 1:00:42element it has a key and a value not information for every
    • 1:00:44just a value element it has a key and a value not
    • 1:00:45so by default sorted sorts by key so we just a value
    • 1:00:47somehow have to override that behavior so by default sorted sorts by key so we
    • 1:00:50so how can we do this well it turns out somehow have to override that behavior
    • 1:00:52that the sorted function so how can we do this well it turns out
    • 1:00:53takes another optional argument that the sorted function
    • 1:00:56literally called takes another optional argument
    • 1:00:57key and the key argument literally called
    • 1:01:00takes as its value the name of a key and the key argument
    • 1:01:02function takes as its value the name of a
    • 1:01:03and this is where things get really function
    • 1:01:05interesting if not confusing really and this is where things get really
    • 1:01:06quickly interesting if not confusing really
    • 1:01:07it turns out in python you can pass quickly
    • 1:01:10around it turns out in python you can pass
    • 1:01:10functions as arguments by way of their around
    • 1:01:13name functions as arguments by way of their
    • 1:01:13and technically you can do this in c name
    • 1:01:15it's a lot more syntactically and technically you can do this in c
    • 1:01:17involved but in python it's very common it's a lot more syntactically
    • 1:01:19in javascript it's very common in a lot involved but in python it's very common
    • 1:01:21of languages it's very common in javascript it's very common in a lot
    • 1:01:23to think of functions as first class of languages it's very common
    • 1:01:24objects which is a fancy way of saying to think of functions as first class
    • 1:01:26you can pass them around just like they objects which is a fancy way of saying
    • 1:01:28are variables themselves you can pass them around just like they
    • 1:01:30we're not calling them yet but you can are variables themselves
    • 1:01:32pass them around by their name we're not calling them yet but you can
    • 1:01:33so what do i mean by this well i need a pass them around by their name
    • 1:01:36function now so what do i mean by this well i need a
    • 1:01:37to sort my dictionary by its value function now
    • 1:01:41and only i know how to do this and to sort my dictionary by its value
    • 1:01:43perhaps so let me go ahead and give and only i know how to do this and
    • 1:01:44myself a generic function name just for perhaps so let me go ahead and give
    • 1:01:46the moment called f f myself a generic function name just for
    • 1:01:47for function kind of like in math the moment called f f
    • 1:01:49because we're going to get rid of it for function kind of like in math
    • 1:01:50eventually but let me go ahead and because we're going to get rid of it
    • 1:01:51temporarily eventually but let me go ahead and
    • 1:01:52define a function called f that takes as temporarily
    • 1:01:54input a title define a function called f that takes as
    • 1:01:56and then it returns for me the value input a title
    • 1:01:59corresponding to that key so i'm going and then it returns for me the value
    • 1:02:01to go ahead and return corresponding to that key so i'm going
    • 1:02:02titles bracket title so here we have a to go ahead and return
    • 1:02:06function titles bracket title so here we have a
    • 1:02:07whose purpose in life is super simple function
    • 1:02:09you give it a title whose purpose in life is super simple
    • 1:02:10it gives you the count thereof the you give it a title
    • 1:02:13frequency the popularity thereof by just it gives you the count thereof the
    • 1:02:15looking it up frequency the popularity thereof by just
    • 1:02:16in that global dictionary so it's super looking it up
    • 1:02:18simple in that global dictionary so it's super
    • 1:02:19but that's its only purpose in life but simple
    • 1:02:22now but that's its only purpose in life but
    • 1:02:22according to the documentation for now
    • 1:02:24sorted what it's now going to do because according to the documentation for
    • 1:02:27i'm passing in a second argument called sorted what it's now going to do because
    • 1:02:28key i'm passing in a second argument called
    • 1:02:29the sorted function rather than just key
    • 1:02:32presume you want everything sorted the sorted function rather than just
    • 1:02:34alphabetically by presume you want everything sorted
    • 1:02:35key it's instead going to call alphabetically by
    • 1:02:38that function f on every one key it's instead going to call
    • 1:02:41of the elements in your dictionary and that function f on every one
    • 1:02:44depending on your of the elements in your dictionary and
    • 1:02:46answer the return value you give depending on your
    • 1:02:49with that f function that will be used answer the return value you give
    • 1:02:52instead with that f function that will be used
    • 1:02:52to determine to determine the actual instead
    • 1:02:55ordering to determine to determine the actual
    • 1:02:56so by default sorted just looks at key ordering
    • 1:02:58what i'm effectively doing with this so by default sorted just looks at key
    • 1:03:00f function is instead returning the what i'm effectively doing with this
    • 1:03:03value f function is instead returning the
    • 1:03:04corresponding to every key and so the value
    • 1:03:06logical implication of this even though corresponding to every key and so the
    • 1:03:08the syntax is a little logical implication of this even though
    • 1:03:09new is that this dictionary of titles the syntax is a little
    • 1:03:13will now be sorted by value instead of new is that this dictionary of titles
    • 1:03:16by key will now be sorted by value instead of
    • 1:03:17because again by default it sorts by key by key
    • 1:03:19but if i define my own key function because again by default it sorts by key
    • 1:03:22and override that behavior to return the but if i define my own key function
    • 1:03:24corresponding value and override that behavior to return the
    • 1:03:25it's the values the numbers the counts corresponding value
    • 1:03:28that will actually be used it's the values the numbers the counts
    • 1:03:29to sort this thing all right let's go that will actually be used
    • 1:03:31ahead and see if that's true in practice to sort this thing all right let's go
    • 1:03:33let me go ahead and rerun python ahead and see if that's true in practice
    • 1:03:34favorites dot pi let me go ahead and rerun python
    • 1:03:35i should see all the titles and voila favorites dot pi
    • 1:03:37conveniently the most popular show i should see all the titles and voila
    • 1:03:39seems to be game of thrones with 33 conveniently the most popular show
    • 1:03:41votes followed by friends seems to be game of thrones with 33
    • 1:03:43with 27 followed by the office with 26 votes followed by friends
    • 1:03:45and so forth with 27 followed by the office with 26
    • 1:03:46but of course the list is kind of and so forth
    • 1:03:48backwards i mean it's convenient that i but of course the list is kind of
    • 1:03:50can see it at the bottom of my screen backwards i mean it's convenient that i
    • 1:03:51but really if we're making a list it can see it at the bottom of my screen
    • 1:03:53should really be at the top so how can but really if we're making a list it
    • 1:03:54we override that behavior should really be at the top so how can
    • 1:03:56turns out the sorted function if you we override that behavior
    • 1:03:57read its documentation also takes turns out the sorted function if you
    • 1:03:59another read its documentation also takes
    • 1:04:00optional parameter called reverse and if another
    • 1:04:03you set optional parameter called reverse and if
    • 1:04:03reverse equal to true capital t in you set
    • 1:04:06python reverse equal to true capital t in
    • 1:04:06that's going to go ahead and give us now python
    • 1:04:10the reverse order of that same sort so that's going to go ahead and give us now
    • 1:04:12let me go ahead and maximize my terminal the reverse order of that same sort so
    • 1:04:14window let me go ahead and maximize my terminal
    • 1:04:14rerun it again and voila if i scroll window
    • 1:04:17back up to the top it's not rerun it again and voila if i scroll
    • 1:04:18alphabetically sorted but if i keep back up to the top it's not
    • 1:04:20going keep going keep going keep going alphabetically sorted but if i keep
    • 1:04:22the numbers are getting bigger and voila going keep going keep going keep going
    • 1:04:23now game of thrones with 33 the numbers are getting bigger and voila
    • 1:04:25is all the way at the top now game of thrones with 33
    • 1:04:28all right so pretty cool and again the is all the way at the top
    • 1:04:31new functionality here in python at all right so pretty cool and again the
    • 1:04:33least new functionality here in python at
    • 1:04:33is that we can actually pass in least
    • 1:04:35functions two functions is that we can actually pass in
    • 1:04:37and leave it to the ladder to call the functions two functions
    • 1:04:40former and leave it to the ladder to call the
    • 1:04:41so that was complicated just to say but former
    • 1:04:43any questions so that was complicated just to say but
    • 1:04:44or confusion now on how we are using any questions
    • 1:04:47dictionaries or confusion now on how we are using
    • 1:04:48and how we are sorting things in this dictionaries
    • 1:04:51reverse and how we are sorting things in this
    • 1:04:52value-based way reverse
    • 1:04:56any questions or confusion anything in
    • 1:04:57the chat or verbally brian any questions or confusion anything in
    • 1:05:00uh looks like all questions are answered the chat or verbally brian
    • 1:05:02here okay uh looks like all questions are answered
    • 1:05:04then in that case let me point out a here okay
    • 1:05:05common mistake notice that even though then in that case let me point out a
    • 1:05:08f is a function notice that i did not common mistake notice that even though
    • 1:05:10call it f is a function notice that i did not
    • 1:05:11there that would be incorrect the reason call it
    • 1:05:14being there that would be incorrect the reason
    • 1:05:14we deliberately want to pass the being
    • 1:05:16function f into we deliberately want to pass the
    • 1:05:19the sorted function so that the sorted function f into
    • 1:05:21function can take it upon itself the sorted function so that the sorted
    • 1:05:23to call f again and again and again we function can take it upon itself
    • 1:05:26don't want to just call it once by using to call f again and again and again we
    • 1:05:27the parentheses ourselves don't want to just call it once by using
    • 1:05:29we want to just pass it in by name so the parentheses ourselves
    • 1:05:31that the sorted function which comes we want to just pass it in by name so
    • 1:05:33with python that the sorted function which comes
    • 1:05:34can instead do it for us santiago did with python
    • 1:05:37you have a question can instead do it for us santiago did
    • 1:05:40yes i was i was going to ask why didn't
    • 1:05:42we yes i was i was going to ask why didn't
    • 1:05:43put f of title uh so like why wouldn't we
    • 1:05:48i was gonna ask that question put f of title uh so like why wouldn't
    • 1:05:49specifically oh the with the i was gonna ask that question
    • 1:05:51with the parentheses oh okay perfect so specifically oh the with the
    • 1:05:54uh because that would call the function with the parentheses oh okay perfect so
    • 1:05:56once and only once we want uh because that would call the function
    • 1:05:58sorted to be able to call it again and once and only once we want
    • 1:05:59again now here's actually an example as sorted to be able to call it again and
    • 1:06:01we've seen in the past again now here's actually an example as
    • 1:06:02of a correct solution this is behaving we've seen in the past
    • 1:06:04as i intend a list of sorted titles of a correct solution this is behaving
    • 1:06:07from top to bottom in order of as i intend a list of sorted titles
    • 1:06:09popularity from top to bottom in order of
    • 1:06:10but it's a little poorly designed popularity
    • 1:06:12because i'm defining this function f but it's a little poorly designed
    • 1:06:14whose name in the first place is kind of because i'm defining this function f
    • 1:06:16lame whose name in the first place is kind of
    • 1:06:16but i'm defining a function only to use lame
    • 1:06:18it in one place and my god the but i'm defining a function only to use
    • 1:06:20function's so tiny it just feels like a it in one place and my god the
    • 1:06:22waste function's so tiny it just feels like a
    • 1:06:23of keystrokes to have defined a new waste
    • 1:06:25function just to then pass it in of keystrokes to have defined a new
    • 1:06:27so it turns out in python if you have a function just to then pass it in
    • 1:06:29very short function so it turns out in python if you have a
    • 1:06:31whose purpose in life is meant to be to very short function
    • 1:06:33solve a local problem just once and whose purpose in life is meant to be to
    • 1:06:35that's it solve a local problem just once and
    • 1:06:36and it's short enough that you're pretty that's it
    • 1:06:38sure you can fit it on one line of code and it's short enough that you're pretty
    • 1:06:40without things wrapping and starting to sure you can fit it on one line of code
    • 1:06:41get ugly stylistically without things wrapping and starting to
    • 1:06:43it turns out you can actually do this get ugly stylistically
    • 1:06:45instead it turns out you can actually do this
    • 1:06:46you can copy the code that you had in instead
    • 1:06:48mind like this you can copy the code that you had in
    • 1:06:50and instead of actually defining f as a mind like this
    • 1:06:53function name and instead of actually defining f as a
    • 1:06:54you can actually use a special keyword function name
    • 1:06:55in python called lambda you can actually use a special keyword
    • 1:06:57you can specify the name of an argument in python called lambda
    • 1:06:59for your function as before you can specify the name of an argument
    • 1:07:01and then you can simply specify the for your function as before
    • 1:07:03return value and then you can simply specify the
    • 1:07:04thereafter deleting the function itself return value
    • 1:07:08so to be clear key is still an argument thereafter deleting the function itself
    • 1:07:11to the sorted function so to be clear key is still an argument
    • 1:07:13it expects as its value typically the to the sorted function
    • 1:07:16name of a function it expects as its value typically the
    • 1:07:17but if you've decided that this seems name of a function
    • 1:07:19like a waste of extra a waste of effort but if you've decided that this seems
    • 1:07:20to define a function then pass the like a waste of extra a waste of effort
    • 1:07:22function in especially when it's so to define a function then pass the
    • 1:07:24short function in especially when it's so
    • 1:07:24you can do it in a one-liner a lambda short
    • 1:07:26function is an you can do it in a one-liner a lambda
    • 1:07:27anonymous function lambda literally says function is an
    • 1:07:30python anonymous function lambda literally says
    • 1:07:31give me a function i don't care about python
    • 1:07:33its name give me a function i don't care about
    • 1:07:34therefore you don't have to choose a its name
    • 1:07:35name for it but it does care therefore you don't have to choose a
    • 1:07:37still about its arguments and its return name for it but it does care
    • 1:07:40value still about its arguments and its return
    • 1:07:41so it's still up to you to provide zero value
    • 1:07:43or more arguments so it's still up to you to provide zero
    • 1:07:44and a return value and notice i've done or more arguments
    • 1:07:47that i've specified the keyword lambda and a return value and notice i've done
    • 1:07:49followed by the name of the argument i that i've specified the keyword lambda
    • 1:07:51want this anonymous followed by the name of the argument i
    • 1:07:52nameless function to accept and then i'm want this anonymous
    • 1:07:55specifying the return value nameless function to accept and then i'm
    • 1:07:57and with lambda functions you do not specifying the return value
    • 1:07:59need to specify return and with lambda functions you do not
    • 1:08:01whatever you write after the colon is need to specify return
    • 1:08:03literally what will be returned whatever you write after the colon is
    • 1:08:05automatically literally what will be returned
    • 1:08:06so again this is a very pythonic thing automatically
    • 1:08:08to do it's kind of a very so again this is a very pythonic thing
    • 1:08:10clever one-liner even though it's a to do it's kind of a very
    • 1:08:12little cryptic to see for the very first clever one-liner even though it's a
    • 1:08:13time but it allows you to condense your little cryptic to see for the very first
    • 1:08:15thoughts into a succinct statement that time but it allows you to condense your
    • 1:08:17gets the job done thoughts into a succinct statement that
    • 1:08:18so you don't have to start defining more gets the job done
    • 1:08:20and more functions that you or someone so you don't have to start defining more
    • 1:08:21else and more functions that you or someone
    • 1:08:22then need to keep track of else
    • 1:08:26all right any questions then on this and then need to keep track of
    • 1:08:28i'm pretty sure this is as all right any questions then on this and
    • 1:08:31complex or sophisticated as our python i'm pretty sure this is as
    • 1:08:33code today will get complex or sophisticated as our python
    • 1:08:36yeah over to sofia code today will get
    • 1:08:40i was wondering why lambda is used as
    • 1:08:43like specifically rather than some other i was wondering why lambda is used as
    • 1:08:44keyword like specifically rather than some other
    • 1:08:46yeah so there's a long history in this keyword
    • 1:08:47and if in fact you take a course on yeah so there's a long history in this
    • 1:08:49functional programming and if in fact you take a course on
    • 1:08:50at harvard it's called cs51 um there's a functional programming
    • 1:08:54whole etymology between keywords like at harvard it's called cs51 um there's a
    • 1:08:55this whole etymology between keywords like
    • 1:08:56let me defer that one for another time this
    • 1:08:58but indeed not only in python but in let me defer that one for another time
    • 1:09:01other languages as well these things but indeed not only in python but in
    • 1:09:02have come to exist called other languages as well these things
    • 1:09:04lambda functions so they're actually have come to exist called
    • 1:09:06quite commonplace lambda functions so they're actually
    • 1:09:07in other languages as well and so python quite commonplace
    • 1:09:09just adopted the term in other languages as well and so python
    • 1:09:10of art mathematically lambda is often just adopted the term
    • 1:09:14used as a symbol for of art mathematically lambda is often
    • 1:09:15functions and so they borrowed that same used as a symbol for
    • 1:09:17idea in the world of programming functions and so they borrowed that same
    • 1:09:20all right so seeing no other questions idea in the world of programming
    • 1:09:22let's go ahead and solve all right so seeing no other questions
    • 1:09:23a related problem still with some python let's go ahead and solve
    • 1:09:26but that's going to a related problem still with some python
    • 1:09:27push up against the limits of efficiency but that's going to
    • 1:09:30when it comes to storing our data in csv push up against the limits of efficiency
    • 1:09:33files let me go ahead and start let me when it comes to storing our data in csv
    • 1:09:35go ahead and start fresh files let me go ahead and start let me
    • 1:09:36in this file favorites dot pi all of the go ahead and start fresh
    • 1:09:38code i've written thus far though is on in this file favorites dot pi all of the
    • 1:09:39the course's website in advance so you code i've written thus far though is on
    • 1:09:41can see the incremental the course's website in advance so you
    • 1:09:42improvement i'm going to go ahead and can see the incremental
    • 1:09:43again import csv at the top improvement i'm going to go ahead and
    • 1:09:45and now this let's write a program this again import csv at the top
    • 1:09:47time it doesn't just and now this let's write a program this
    • 1:09:49automatically open up the csv and time it doesn't just
    • 1:09:51analyze it looking for automatically open up the csv and
    • 1:09:53the total popularity of shows let's analyze it looking for
    • 1:09:55search for the total popularity of shows let's
    • 1:09:56a specific show in the csv and then go search for
    • 1:10:00ahead and a specific show in the csv and then go
    • 1:10:00output the popularity thereof and i can ahead and
    • 1:10:04do this in a bunch of different ways but output the popularity thereof and i can
    • 1:10:05i'm going to try to make this as concise do this in a bunch of different ways but
    • 1:10:06as possible i'm going to try to make this as concise
    • 1:10:07i'm first going to ask the user for to as possible
    • 1:10:10input a title i'm first going to ask the user for to
    • 1:10:11i could use cs50's getstring function input a title
    • 1:10:13but recall that it's pretty much the i could use cs50's getstring function
    • 1:10:14same as python's input function but recall that it's pretty much the
    • 1:10:16so i'm going to use python's input same as python's input function
    • 1:10:18function today so i'm going to use python's input
    • 1:10:20and then i'm going to go ahead and as function today
    • 1:10:21before open up that same csv and then i'm going to go ahead and as
    • 1:10:23called favorite tv shows form before open up that same csv
    • 1:10:26responses 1 dot csv in read-only mode called favorite tv shows form
    • 1:10:30as a variable called file i'm then going responses 1 dot csv in read-only mode
    • 1:10:32to give myself a reader and i'll use a as a variable called file i'm then going
    • 1:10:34dict reader again so i don't have to to give myself a reader and i'll use a
    • 1:10:36worry about dict reader again so i don't have to
    • 1:10:37knowing which columns things are in worry about
    • 1:10:38passing in file knowing which columns things are in
    • 1:10:40and then let's see if i only care about passing in file
    • 1:10:42one title i can keep this program and then let's see if i only care about
    • 1:10:44simpler i don't need to figure out one title i can keep this program
    • 1:10:45the popularity of every show i just need simpler i don't need to figure out
    • 1:10:48to figure out the popularity of the popularity of every show i just need
    • 1:10:49one show the title that the human has to figure out the popularity of
    • 1:10:52typed in so i'm going to go ahead and one show the title that the human has
    • 1:10:53give myself a very simple int typed in so i'm going to go ahead and
    • 1:10:55called counter and set it equal to zero give myself a very simple int
    • 1:10:57i don't need a whole dictionary just one called counter and set it equal to zero
    • 1:10:59variable suffices now i don't need a whole dictionary just one
    • 1:11:01and i'm going to go ahead and iterate variable suffices now
    • 1:11:02over the rows and i'm going to go ahead and iterate
    • 1:11:04in the reader as before and then i'm over the rows
    • 1:11:07going to say in the reader as before and then i'm
    • 1:11:07if the current rows title going to say
    • 1:11:10equals equals the title the human typed if the current rows title
    • 1:11:13in let's go ahead and increment counter equals equals the title the human typed
    • 1:11:15by one in let's go ahead and increment counter
    • 1:11:16and it's already initialized because i by one
    • 1:11:17did that on line seven so i think i'm and it's already initialized because i
    • 1:11:19good did that on line seven so i think i'm
    • 1:11:19and then at the end of this program good
    • 1:11:21let's very simply print out the value of and then at the end of this program
    • 1:11:23counter so the purpose of this program let's very simply print out the value of
    • 1:11:26is to prompt the user for a title of a counter so the purpose of this program
    • 1:11:28show and then just is to prompt the user for a title of a
    • 1:11:30report the popularity thereof by show and then just
    • 1:11:32counting the number of instances of it report the popularity thereof by
    • 1:11:34in the file so let me go ahead and run counting the number of instances of it
    • 1:11:36this with python of favorites.pi in the file so let me go ahead and run
    • 1:11:38enter let me go ahead and type in the this with python of favorites.pi
    • 1:11:41office enter enter let me go ahead and type in the
    • 1:11:43and 19. now i don't remember exactly office enter
    • 1:11:47what the number was but i remember the and 19. now i don't remember exactly
    • 1:11:48office was more popular than that what the number was but i remember the
    • 1:11:51i'm pretty sure it was not 19. office was more popular than that
    • 1:11:54any intuition as to why this program is i'm pretty sure it was not 19.
    • 1:11:57buggy any intuition as to why this program is
    • 1:11:58or so it would seem what have i done buggy
    • 1:12:02wrong or so it would seem what have i done
    • 1:12:06any thoughts in the chat or
    • 1:12:10a few people in the chat are saying you any thoughts in the chat or
    • 1:12:11need to remember to deal with a few people in the chat are saying you
    • 1:12:13capitalization and white space again need to remember to deal with
    • 1:12:14yeah so we need to practice those same capitalization and white space again
    • 1:12:17lessons learned from before so i should yeah so we need to practice those same
    • 1:12:19really canonicalize the input that the lessons learned from before so i should
    • 1:12:21human really canonicalize the input that the
    • 1:12:21i just typed in and also the input human
    • 1:12:24that's coming from the csv file perhaps i just typed in and also the input
    • 1:12:26the simplest way to do this is up here that's coming from the csv file perhaps
    • 1:12:27to first strip off leading and trailing the simplest way to do this is up here
    • 1:12:29white space in case i get a little to first strip off leading and trailing
    • 1:12:31sloppy and hit the space bar white space in case i get a little
    • 1:12:32where i shouldn't and then let's go sloppy and hit the space bar
    • 1:12:34ahead and force it to uppercase just where i shouldn't and then let's go
    • 1:12:35because ahead and force it to uppercase just
    • 1:12:36it doesn't matter if it's upper or lower because
    • 1:12:38but at least we'll standardize things it doesn't matter if it's upper or lower
    • 1:12:39that way but at least we'll standardize things
    • 1:12:40and then when i do this look at the that way
    • 1:12:42current rows title and then when i do this look at the
    • 1:12:44i think i really need to do the same current rows title
    • 1:12:45thing if i'm going to canonicalize one i i think i really need to do the same
    • 1:12:47need to canonicalize the other and now thing if i'm going to canonicalize one i
    • 1:12:49compare the all caps white space script need to canonicalize the other and now
    • 1:12:53versions of both strings so now let me compare the all caps white space script
    • 1:12:55rerun it versions of both strings so now let me
    • 1:12:55now i'm going to type in the office rerun it
    • 1:12:57enter and voila now i'm at 26 which i now i'm going to type in the office
    • 1:13:00think is where we were at before and in enter and voila now i'm at 26 which i
    • 1:13:02fact now i the user can be a little think is where we were at before and in
    • 1:13:04sloppy i can say the office fact now i the user can be a little
    • 1:13:06i can run it again and say the office sloppy i can say the office
    • 1:13:08and then for whatever reason hit the i can run it again and say the office
    • 1:13:09space bar a lot enter and then for whatever reason hit the
    • 1:13:11it's still going to work and indeed space bar a lot enter
    • 1:13:13though we seem to be like belaboring the it's still going to work and indeed
    • 1:13:14pedantic here with uh though we seem to be like belaboring the
    • 1:13:16trimming off white space and so forth pedantic here with uh
    • 1:13:18just think in a relatively small trimming off white space and so forth
    • 1:13:19audience here how many of you just think in a relatively small
    • 1:13:20accidentally hit the space bar or audience here how many of you
    • 1:13:22capitalize things differently this accidentally hit the space bar or
    • 1:13:24happens massively on scale and you can capitalize things differently this
    • 1:13:26imagine this being happens massively on scale and you can
    • 1:13:27important when you're tagging friends in imagine this being
    • 1:13:29some social media account you're doing important when you're tagging friends in
    • 1:13:31at some social media account you're doing
    • 1:13:31brian or the like you don't want to have at
    • 1:13:33to require the user to type at brian or the like you don't want to have
    • 1:13:35capital b lowercase r i a n and so forth to require the user to type at
    • 1:13:38so tolerating disparate messy user input capital b lowercase r i a n and so forth
    • 1:13:41is such so tolerating disparate messy user input
    • 1:13:42a common uh problem to solve including is such
    • 1:13:45in today's apps that we all use a common uh problem to solve including
    • 1:13:48all right any questions then on this in today's apps that we all use
    • 1:13:51program which i think is correct all right any questions then on this
    • 1:13:56then let me ask a question of you in program which i think is correct
    • 1:13:58what sense is this program poorly then let me ask a question of you in
    • 1:14:00designed what sense is this program poorly
    • 1:14:02in what sense is this program poorly designed
    • 1:14:04designed in what sense is this program poorly
    • 1:14:06this is more subtle but think about the designed
    • 1:14:10running time of this program this is more subtle but think about the
    • 1:14:11in terms of big o what is the running running time of this program
    • 1:14:14time of this program if the in terms of big o what is the running
    • 1:14:16csv file has n different time of this program if the
    • 1:14:20shows in it or n different submissions csv file has n different
    • 1:14:23so n is the variable in question yeah shows in it or n different submissions
    • 1:14:26what's the running time andrew so n is the variable in question yeah
    • 1:14:29is the big o of n because you're using what's the running time andrew
    • 1:14:31the linear search yeah it's big o of n is the big o of n because you're using
    • 1:14:33because i'm literally using linear the linear search yeah it's big o of n
    • 1:14:35search by way of the for loop that's how because i'm literally using linear
    • 1:14:37a for loop works in python just like in search by way of the for loop that's how
    • 1:14:39c a for loop works in python just like in
    • 1:14:39starts at the beginning and potentially c
    • 1:14:40goes all the way till the end and so i'm starts at the beginning and potentially
    • 1:14:43using implicitly linear search goes all the way till the end and so i'm
    • 1:14:45because i'm not using any fancy data using implicitly linear search
    • 1:14:46structures no sets no dictionaries i'm because i'm not using any fancy data
    • 1:14:48just structures no sets no dictionaries i'm
    • 1:14:49looping from top to bottom so you can just
    • 1:14:51imagine that looping from top to bottom so you can
    • 1:14:52if we surveyed not just all of the imagine that
    • 1:14:53students here in class but maybe if we surveyed not just all of the
    • 1:14:55everyone on campus or everyone in the students here in class but maybe
    • 1:14:57world maybe we're internet movie everyone on campus or everyone in the
    • 1:14:59database imdb world maybe we're internet movie
    • 1:15:00there could be a huge number of votes database imdb
    • 1:15:03and a huge number of shows there could be a huge number of votes
    • 1:15:05and so writing a program whether it's in and a huge number of shows
    • 1:15:07a terminal window like mine or maybe on and so writing a program whether it's in
    • 1:15:09a mobile device or maybe on a web page a terminal window like mine or maybe on
    • 1:15:11for your laptop or desktop a mobile device or maybe on a web page
    • 1:15:13it's probably not the best design to for your laptop or desktop
    • 1:15:15constantly it's probably not the best design to
    • 1:15:17loop over all of the shows in your constantly
    • 1:15:19database from top to bottom loop over all of the shows in your
    • 1:15:21just to answer a single question it database from top to bottom
    • 1:15:24would be much nicer to do things in just to answer a single question it
    • 1:15:25log of end time or in constant time and would be much nicer to do things in
    • 1:15:28thankfully over the past few weeks both log of end time or in constant time and
    • 1:15:29in cnn and python we have seen thankfully over the past few weeks both
    • 1:15:31smarter ways to do this but i'm not in cnn and python we have seen
    • 1:15:34practicing smarter ways to do this but i'm not
    • 1:15:35what i've preached here and in fact at practicing
    • 1:15:38some point what i've preached here and in fact at
    • 1:15:39this notion of a flat file database some point
    • 1:15:42starts to get too primitive for us flat this notion of a flat file database
    • 1:15:44file databases like csv files starts to get too primitive for us flat
    • 1:15:46are wonderfully useful when you just file databases like csv files
    • 1:15:48want to do something quickly are wonderfully useful when you just
    • 1:15:50or when you want to download data from want to do something quickly
    • 1:15:52some third party like google in a or when you want to download data from
    • 1:15:53standard some third party like google in a
    • 1:15:54portable way portable means that it can standard
    • 1:15:56be used by different people on different portable way portable means that it can
    • 1:15:57systems be used by different people on different
    • 1:15:58csv is about as simple as it gets systems
    • 1:16:00because you don't need to own microsoft csv is about as simple as it gets
    • 1:16:01word because you don't need to own microsoft
    • 1:16:02or apple numbers or any particular word
    • 1:16:04product it's just a text file so you can or apple numbers or any particular
    • 1:16:06use any text editing program product it's just a text file so you can
    • 1:16:08or any programming language to access it use any text editing program
    • 1:16:10but flat file databases aren't or any programming language to access it
    • 1:16:12necessarily the best but flat file databases aren't
    • 1:16:14structure to use ultimately for larger necessarily the best
    • 1:16:17data sets structure to use ultimately for larger
    • 1:16:18because they don't really lend data sets
    • 1:16:19themselves to more efficient queries so because they don't really lend
    • 1:16:21csv files pretty much at best you have themselves to more efficient queries so
    • 1:16:23to search csv files pretty much at best you have
    • 1:16:24top to bottom left or right but it turns to search
    • 1:16:26out that there top to bottom left or right but it turns
    • 1:16:27are better databases out there generally out that there
    • 1:16:29known as relational databases are better databases out there generally
    • 1:16:31that instead of being files in which you known as relational databases
    • 1:16:33store data they are instead that instead of being files in which you
    • 1:16:35programs in which you store data now to store data they are instead
    • 1:16:38be fair programs in which you store data now to
    • 1:16:38those programs use a lot of ram memory be fair
    • 1:16:41where they actually store your data those programs use a lot of ram memory
    • 1:16:43and they do certainly persist your data where they actually store your data
    • 1:16:45they keep it long-term and they do certainly persist your data
    • 1:16:46by storing your data also in files but they keep it long-term
    • 1:16:49between you and your data by storing your data also in files but
    • 1:16:51there is this running program and if between you and your data
    • 1:16:52you've ever heard of oracle or mysql or there is this running program and if
    • 1:16:55postgres or sql server or microsoft you've ever heard of oracle or mysql or
    • 1:16:57access or postgres or sql server or microsoft
    • 1:16:58bunches of other popular products both access or
    • 1:17:00commercial bunches of other popular products both
    • 1:17:01and free and open source alike commercial
    • 1:17:03relational databases are so similar in and free and open source alike
    • 1:17:05spirit relational databases are so similar in
    • 1:17:06to spreadsheets but they are implemented spirit
    • 1:17:09in software to spreadsheets but they are implemented
    • 1:17:10and they give us more and more features in software
    • 1:17:12and they use more and more data and they give us more and more features
    • 1:17:13structures so that we can and they use more and more data
    • 1:17:14search for data insert data delete data structures so that we can
    • 1:17:17update data search for data insert data delete data
    • 1:17:18much much more efficiently than we could update data
    • 1:17:21if just using much much more efficiently than we could
    • 1:17:22something like a csv file so let's go if just using
    • 1:17:24ahead and take our five-minute break something like a csv file so let's go
    • 1:17:25here and when we come back we'll look at ahead and take our five-minute break
    • 1:17:27relational databases and in turn a here and when we come back we'll look at
    • 1:17:29language called sql relational databases and in turn a
    • 1:25:24all right we are back and the goal at
    • 1:25:27hand now is to transition from these all right we are back and the goal at
    • 1:25:29fairly simplistic hand now is to transition from these
    • 1:25:30flat file databases to a more proper fairly simplistic
    • 1:25:32relational database and relational flat file databases to a more proper
    • 1:25:34databases are indeed what power relational database and relational
    • 1:25:36so many of today's mobile applications databases are indeed what power
    • 1:25:38web applications and the like so many of today's mobile applications
    • 1:25:40now we're beginning to transition to web applications and the like
    • 1:25:41real world software with real world now we're beginning to transition to
    • 1:25:43languages at that and so now real world software with real world
    • 1:25:47let me introduce what we're going to languages at that and so now
    • 1:25:48call sql lite so it turns out that a let me introduce what we're going to
    • 1:25:51relational database call sql lite so it turns out that a
    • 1:25:52is a database that stores all of the relational database
    • 1:25:54data still in rows and columns is a database that stores all of the
    • 1:25:57but it doesn't do so using spreadsheets data still in rows and columns
    • 1:25:59or sheets but it doesn't do so using spreadsheets
    • 1:26:00it instead does so using what we're or sheets
    • 1:26:02going to call tables so it's pretty much it instead does so using what we're
    • 1:26:04the same idea going to call tables so it's pretty much
    • 1:26:05but in with tables do we get some the same idea
    • 1:26:07additional functionality but in with tables do we get some
    • 1:26:08with those tables we'll have the ability additional functionality
    • 1:26:10to search for data with those tables we'll have the ability
    • 1:26:12update data delete data insert new data to search for data
    • 1:26:15and the like and these are things that update data delete data insert new data
    • 1:26:16we absolutely and the like and these are things that
    • 1:26:17can do with spreadsheets but in the we absolutely
    • 1:26:19world of spreadsheets if you want to can do with spreadsheets but in the
    • 1:26:20search for something it's you the human world of spreadsheets if you want to
    • 1:26:22doing it search for something it's you the human
    • 1:26:22by manually clicking and scrolling doing it
    • 1:26:24typically if you want to insert data by manually clicking and scrolling
    • 1:26:26it's you the human typically if you want to insert data
    • 1:26:27typing it in manually after adding a new it's you the human
    • 1:26:29row if you want to delete something it's typing it in manually after adding a new
    • 1:26:30you right clicking or control clicking row if you want to delete something it's
    • 1:26:32and you right clicking or control clicking
    • 1:26:32deleting a whole row or updating the and
    • 1:26:34individual cells they're in deleting a whole row or updating the
    • 1:26:36with sql structured query language we individual cells they're in
    • 1:26:39have a new programming language that is with sql structured query language we
    • 1:26:41very often used in conjunction with have a new programming language that is
    • 1:26:43other programming languages and so very often used in conjunction with
    • 1:26:45today we'll see sql used on its own other programming languages and so
    • 1:26:47initially today we'll see sql used on its own
    • 1:26:48but we'll also see it in the context of initially
    • 1:26:50a python program so a language like but we'll also see it in the context of
    • 1:26:52python a python program so a language like
    • 1:26:53can itself use sql to python
    • 1:26:56do more powerful things than python can itself use sql to
    • 1:26:58alone could do do more powerful things than python
    • 1:27:00so with that said sql lite is like a alone could do
    • 1:27:02light version of sql it's a more so with that said sql lite is like a
    • 1:27:04user-friendly version it's more portable light version of sql it's a more
    • 1:27:06it can be used on macs and pcs and user-friendly version it's more portable
    • 1:27:08phones and laptops and desktops and it can be used on macs and pcs and
    • 1:27:10servers but it's incredibly common in phones and laptops and desktops and
    • 1:27:11fact in your iphone and your android servers but it's incredibly common in
    • 1:27:13phone fact in your iphone and your android
    • 1:27:14many of the applications you are running phone
    • 1:27:16today on your own device many of the applications you are running
    • 1:27:17are using sql lite underneath the hood today on your own device
    • 1:27:19so it isn't a toy language per se are using sql lite underneath the hood
    • 1:27:22it's instead a relatively simple so it isn't a toy language per se
    • 1:27:23implementation of a language generally it's instead a relatively simple
    • 1:27:25known as sql but long story short implementation of a language generally
    • 1:27:27there's other implementations of known as sql but long story short
    • 1:27:29relational databases out there and i there's other implementations of
    • 1:27:31rattled off several of them already relational databases out there and i
    • 1:27:32oracle and mysql and postgres and the rattled off several of them already
    • 1:27:34like those all oracle and mysql and postgres and the
    • 1:27:36have slightly different flavors or like those all
    • 1:27:38dialects of sql have slightly different flavors or
    • 1:27:39so sql is like a fairly standard dialects of sql
    • 1:27:42language so sql is like a fairly standard
    • 1:27:42for interacting with databases but language
    • 1:27:44different companies different for interacting with databases but
    • 1:27:45communities different companies different
    • 1:27:46have kind of added or subtracted their communities
    • 1:27:48own preferred features have kind of added or subtracted their
    • 1:27:49and so the syntax you use is generally own preferred features
    • 1:27:52constant across all platforms and so the syntax you use is generally
    • 1:27:54but we will standardize for our purposes constant across all platforms
    • 1:27:56on sql lite and indeed this is what you but we will standardize for our purposes
    • 1:27:58would use these days in the world of on sql lite and indeed this is what you
    • 1:27:59mobile applications would use these days in the world of
    • 1:28:01um so it's very much germane there so mobile applications
    • 1:28:03with sql lite um so it's very much germane there so
    • 1:28:05we're going to have ultimately the with sql lite
    • 1:28:07ability to we're going to have ultimately the
    • 1:28:08query data and update data delete data ability to
    • 1:28:10and the like but to do so we actually query data and update data delete data
    • 1:28:12need a program and the like but to do so we actually
    • 1:28:13with which to interact with our database need a program
    • 1:28:15so the way with which to interact with our database
    • 1:28:16sql lite works is that it stores all of so the way
    • 1:28:19your data still sql lite works is that it stores all of
    • 1:28:20in a file but it's a binary file now your data still
    • 1:28:23that is it's a file containing zeros and in a file but it's a binary file now
    • 1:28:25ones and those zeros and ones might that is it's a file containing zeros and
    • 1:28:26represent text they might represent ones and those zeros and ones might
    • 1:28:27numbers represent text they might represent
    • 1:28:28but it's a more compact efficient numbers
    • 1:28:30representation than a mere csv file but it's a more compact efficient
    • 1:28:32would be using ascii representation than a mere csv file
    • 1:28:34or unicode so that's the first would be using ascii
    • 1:28:35difference sqlite or unicode so that's the first
    • 1:28:37uses a single file a binary file to difference sqlite
    • 1:28:40store all of your data and represented uses a single file a binary file to
    • 1:28:42inside of that file by way of all of store all of your data and represented
    • 1:28:44those zeros and ones inside of that file by way of all of
    • 1:28:45are the tables to which i alluded before those zeros and ones
    • 1:28:47which are the analog are the tables to which i alluded before
    • 1:28:48in the database world of sheets or which are the analog
    • 1:28:51spreadsheets in the spreadsheet world in the database world of sheets or
    • 1:28:53so to interact with that binary file spreadsheets in the spreadsheet world
    • 1:28:56wherein all of your data is stored we so to interact with that binary file
    • 1:28:58need some kind of user facing program wherein all of your data is stored we
    • 1:29:01and there's many different tools to use need some kind of user facing program
    • 1:29:02but the most and there's many different tools to use
    • 1:29:03the the standard one that comes with sql but the most
    • 1:29:06lite is called sql lite3 essentially the the standard one that comes with sql
    • 1:29:08version three lite is called sql lite3 essentially
    • 1:29:09of the tool this is a command line tool version three
    • 1:29:12similar in spirits any of the commands of the tool this is a command line tool
    • 1:29:13you've run in a terminal window thus far similar in spirits any of the commands
    • 1:29:15that allows you to open up that binary you've run in a terminal window thus far
    • 1:29:17file and interact with all of your that allows you to open up that binary
    • 1:29:19tables now here again we kind of have a file and interact with all of your
    • 1:29:21chicken and the egg problem if i want to tables now here again we kind of have a
    • 1:29:23use a database chicken and the egg problem if i want to
    • 1:29:24but i don't yet have a database and yet use a database
    • 1:29:26i want to select data from my database but i don't yet have a database and yet
    • 1:29:28how do i actually load things in i want to select data from my database
    • 1:29:29well you can load data into a sqlite how do i actually load things in
    • 1:29:31database in at least two ways well you can load data into a sqlite
    • 1:29:33one which i'll do in a moment you can database in at least two ways
    • 1:29:35just import one which i'll do in a moment you can
    • 1:29:36an existing flat file database like a just import
    • 1:29:39csv an existing flat file database like a
    • 1:29:40and what you do is you save the csv on csv
    • 1:29:42your mac or pc and your cs50 ide and what you do is you save the csv on
    • 1:29:45you run a special command with sql light your mac or pc and your cs50 ide
    • 1:29:473 and it will just you run a special command with sql light
    • 1:29:48load the csv into memory it will figure 3 and it will just
    • 1:29:51out where all of the commas are load the csv into memory it will figure
    • 1:29:53and it will construct inside of that out where all of the commas are
    • 1:29:55binary file and it will construct inside of that
    • 1:29:56the corresponding rows and columns using binary file
    • 1:29:59the appropriate zeros and ones the corresponding rows and columns using
    • 1:30:00to store all of that information so it the appropriate zeros and ones
    • 1:30:02just imports it for you automatically to store all of that information so it
    • 1:30:05approach two would be to actually write just imports it for you automatically
    • 1:30:06code in a language like python or any approach two would be to actually write
    • 1:30:09other code in a language like python or any
    • 1:30:09that actually manually inserts all of other
    • 1:30:12the data that actually manually inserts all of
    • 1:30:14into your database and we'll do that as the data
    • 1:30:15well but let's start simple let me go into your database and we'll do that as
    • 1:30:17ahead and run for instance uh sqlite3 well but let's start simple let me go
    • 1:30:20uh and this is pre-installed on cs50 ide ahead and run for instance uh sqlite3
    • 1:30:22and it's not that hard to get it up and uh and this is pre-installed on cs50 ide
    • 1:30:24running on a mac and pc as well and it's not that hard to get it up and
    • 1:30:26i'm going to go ahead and run sqlite3 in running on a mac and pc as well
    • 1:30:28my terminal window here i'm going to go ahead and run sqlite3 in
    • 1:30:29and voila you just see some very simple my terminal window here
    • 1:30:32output and voila you just see some very simple
    • 1:30:33it's telling me to type period help if i output
    • 1:30:35want to see some usage hints but i know it's telling me to type period help if i
    • 1:30:37most of the commands and will generally want to see some usage hints but i know
    • 1:30:38give you all of the commands that you most of the commands and will generally
    • 1:30:40might need give you all of the commands that you
    • 1:30:40in fact one of the commands that we can might need
    • 1:30:42use is dot mode and another is dot in fact one of the commands that we can
    • 1:30:44import use is dot mode and another is dot
    • 1:30:45so generally you won't use these that import
    • 1:30:47frequently you'll only use them when so generally you won't use these that
    • 1:30:48creating a database for the first time frequently you'll only use them when
    • 1:30:50when you are creating that database from creating a database for the first time
    • 1:30:53an existing csv file and indeed that's when you are creating that database from
    • 1:30:55my goal at the moment let me take an existing csv file and indeed that's
    • 1:30:56our csv file containing all of your my goal at the moment let me take
    • 1:30:59favorite tv shows our csv file containing all of your
    • 1:31:00and load it into sql lite in a proper favorite tv shows
    • 1:31:03relational database and load it into sql lite in a proper
    • 1:31:05so that we can do better than for relational database
    • 1:31:07instance big o so that we can do better than for
    • 1:31:08of n when it comes to searching that instance big o
    • 1:31:10data and doing anything else on it of n when it comes to searching that
    • 1:31:12so to do this i have to execute two data and doing anything else on it
    • 1:31:14commands one i need to put sql lite into so to do this i have to execute two
    • 1:31:16csv commands one i need to put sql lite into
    • 1:31:17mode and that's just to distinguish it csv
    • 1:31:19from other flat file formats like mode and that's just to distinguish it
    • 1:31:20tsv for tabs or some other format and from other flat file formats like
    • 1:31:23now i'm going to go ahead and run tsv for tabs or some other format and
    • 1:31:24import then i have to specify the name now i'm going to go ahead and run
    • 1:31:27of the file to import which is the csv import then i have to specify the name
    • 1:31:29and i'm going to go ahead and call my of the file to import which is the csv
    • 1:31:31table shows and i'm going to go ahead and call my
    • 1:31:33so dot import takes two arguments the table shows
    • 1:31:36name of the file that you want to import so dot import takes two arguments the
    • 1:31:38and the name of the table that you want name of the file that you want to import
    • 1:31:40to create and the name of the table that you want
    • 1:31:41out of that file and again tables have to create
    • 1:31:43rows and columns out of that file and again tables have
    • 1:31:44and the commas in the file are going to rows and columns
    • 1:31:47delineate and the commas in the file are going to
    • 1:31:47where those columns begin and end i'm delineate
    • 1:31:50going to go ahead and hit enter where those columns begin and end i'm
    • 1:31:51it looks like it flew by pretty fast going to go ahead and hit enter
    • 1:31:54nothing it looks like it flew by pretty fast
    • 1:31:54seems to have happened but i think nothing
    • 1:31:56that's okay seems to have happened but i think
    • 1:31:57because now we're going to go ahead and that's okay
    • 1:31:59have the ability to actually manipulate because now we're going to go ahead and
    • 1:32:01that data have the ability to actually manipulate
    • 1:32:02but how do we manipulate the data we that data
    • 1:32:04need a new language but how do we manipulate the data we
    • 1:32:05sql structured query language is the need a new language
    • 1:32:08language sql structured query language is the
    • 1:32:09used by sql lite and oracle and mysql language
    • 1:32:12and postgres and bunches of other used by sql lite and oracle and mysql
    • 1:32:14products whose names you don't need to and postgres and bunches of other
    • 1:32:15know or remember anytime soon products whose names you don't need to
    • 1:32:17but sql is the language we'll use to know or remember anytime soon
    • 1:32:20query the database but sql is the language we'll use to
    • 1:32:22for information and do something with it query the database
    • 1:32:24generally speaking for information and do something with it
    • 1:32:25a relational database and in turn sql generally speaking
    • 1:32:28which is a language by a relational database and in turn sql
    • 1:32:29which you can interact with relational which is a language by
    • 1:32:31databases support four which you can interact with relational
    • 1:32:33fundamental operations and they're sort databases support four
    • 1:32:34of a crude acronym fundamental operations and they're sort
    • 1:32:36pun intended that is just helpful for of a crude acronym
    • 1:32:39remembering what those fundamental pun intended that is just helpful for
    • 1:32:40operations are with relational databases remembering what those fundamental
    • 1:32:42crud operations are with relational databases
    • 1:32:44stands for create read crud
    • 1:32:47update and delete and indeed the acronym stands for create read
    • 1:32:50is crud crud update and delete and indeed the acronym
    • 1:32:51so it helps you remember that the four is crud crud
    • 1:32:53basic operations supported by any so it helps you remember that the four
    • 1:32:54relational database are create basic operations supported by any
    • 1:32:56read update delete create means to relational database are create
    • 1:32:59create or add new data read update delete create means to
    • 1:33:00read means to access and load into create or add new data
    • 1:33:03memory read means to access and load into
    • 1:33:03new data we've seen read before with memory
    • 1:33:05opening files update and delete mean new data we've seen read before with
    • 1:33:07exactly that as well if you want to opening files update and delete mean
    • 1:33:09manipulate the data in your data set exactly that as well if you want to
    • 1:33:11now those are generic terms for any manipulate the data in your data set
    • 1:33:13relational database those are the four now those are generic terms for any
    • 1:33:14properties typically supported by any relational database those are the four
    • 1:33:16relational database properties typically supported by any
    • 1:33:17in the world of sql there are some very relational database
    • 1:33:20specific in the world of sql there are some very
    • 1:33:21commands or functions if you will that specific
    • 1:33:24implement commands or functions if you will that
    • 1:33:24those four functionalities implement
    • 1:33:28they are create and insert achieve the those four functionalities
    • 1:33:31same they are create and insert achieve the
    • 1:33:31thing as create more generally the same
    • 1:33:33keyword select thing as create more generally the
    • 1:33:34is what's used to read data from a keyword select
    • 1:33:36database update and delete are the same is what's used to read data from a
    • 1:33:39so it's kind of an annoying database update and delete are the same
    • 1:33:40inconsistency the acronym or the term of so it's kind of an annoying
    • 1:33:42art is crud inconsistency the acronym or the term of
    • 1:33:43create read update delete but in the art is crud
    • 1:33:44world of sql the authors of the language create read update delete but in the
    • 1:33:46decided to world of sql the authors of the language
    • 1:33:47implement those four ideas by way of decided to
    • 1:33:49these five keywords implement those four ideas by way of
    • 1:33:51or functions or commands if you will in these five keywords
    • 1:33:54the language sql so what you are looking or functions or commands if you will in
    • 1:33:55at the language sql so what you are looking
    • 1:33:56as are five of the keywords you can use at
    • 1:33:59in this new as are five of the keywords you can use
    • 1:33:59language called sql to actually do in this new
    • 1:34:01something language called sql to actually do
    • 1:34:02with your database now what does that something
    • 1:34:04mean well suppose that you wanted to with your database now what does that
    • 1:34:06manually create a database for the very mean well suppose that you wanted to
    • 1:34:07first time manually create a database for the very
    • 1:34:08what do you do well back in the world of first time
    • 1:34:10spreadsheets is pretty straightforward what do you do well back in the world of
    • 1:34:11right you like open up google spreadsheets is pretty straightforward
    • 1:34:13spreadsheets you go to like file new or right you like open up google
    • 1:34:15whatever spreadsheets you go to like file new or
    • 1:34:16and then you just voila you get a new whatever
    • 1:34:17spreadsheet into which you can start and then you just voila you get a new
    • 1:34:18creating rows and columns and the like spreadsheet into which you can start
    • 1:34:20microsoft excel apple number same thing creating rows and columns and the like
    • 1:34:23file menu new spreadsheet or whatever microsoft excel apple number same thing
    • 1:34:25and boom you have a new spreadsheet now file menu new spreadsheet or whatever
    • 1:34:27in the world of sql and boom you have a new spreadsheet now
    • 1:34:28sql databases are generally meant to be in the world of sql
    • 1:34:30interacted with sql databases are generally meant to be
    • 1:34:31code however there are graphical user interacted with
    • 1:34:34interfaces gui's by which you can code however there are graphical user
    • 1:34:36interact with them as well but we're interfaces gui's by which you can
    • 1:34:37going to use code today to do so interact with them as well but we're
    • 1:34:39and programs at a command line it turns going to use code today to do so
    • 1:34:41out that you can and programs at a command line it turns
    • 1:34:44create tables programmatically by out that you can
    • 1:34:47running a command like this create tables programmatically by
    • 1:34:49so if you literally type out syntax running a command like this
    • 1:34:51along the lines of create so if you literally type out syntax
    • 1:34:52table then the name of your table along the lines of create
    • 1:34:55indicated here in lowercase table then the name of your table
    • 1:34:56then a parenthesis then the name of your indicated here in lowercase
    • 1:34:59column that you want to create then a parenthesis then the name of your
    • 1:35:01and the type of that column a la column that you want to create
    • 1:35:04c and then comma dot dot some more and the type of that column a la
    • 1:35:07columns c and then comma dot dot some more
    • 1:35:07this is generally speaking the syntax columns
    • 1:35:10you will use this is generally speaking the syntax
    • 1:35:11to create in this language called sql a you will use
    • 1:35:13new table now this is in the abstract to create in this language called sql a
    • 1:35:15again new table now this is in the abstract
    • 1:35:16like table in lowercase is meant to again
    • 1:35:18represent the name you want to give your like table in lowercase is meant to
    • 1:35:19actual table represent the name you want to give your
    • 1:35:20column in lowercase is meant to be the actual table
    • 1:35:22name you want to give to your own column column in lowercase is meant to be the
    • 1:35:23maybe it's title maybe genres and name you want to give to your own column
    • 1:35:25dot dot just means of course you can maybe it's title maybe genres and
    • 1:35:27have even more columns than that but dot dot just means of course you can
    • 1:35:28literally in a moment if i were to type have even more columns than that but
    • 1:35:30in literally in a moment if i were to type
    • 1:35:31this kind of command into the terminal in
    • 1:35:33window after running the sql light 3 this kind of command into the terminal
    • 1:35:35program window after running the sql light 3
    • 1:35:36i could start creating one or more program
    • 1:35:38tables for myself i could start creating one or more
    • 1:35:39and in fact that's what already happened tables for myself
    • 1:35:41for me this dot import command which is and in fact that's what already happened
    • 1:35:44not part of sql for me this dot import command which is
    • 1:35:45this is like the equivalent of a menu not part of sql
    • 1:35:47option in excel or google spreadsheets this is like the equivalent of a menu
    • 1:35:49dot import just automates a certain option in excel or google spreadsheets
    • 1:35:51process for me and what it did for me is dot import just automates a certain
    • 1:35:53this process for me and what it did for me is
    • 1:35:54if i type now dot schema which is this
    • 1:35:56another sql light specific command if i type now dot schema which is
    • 1:35:58anything that starts with a dot another sql light specific command
    • 1:35:59is specific only to sqlite 3 this anything that starts with a dot
    • 1:36:02terminal window program is specific only to sqlite 3 this
    • 1:36:04notice what's outputted is this by terminal window program
    • 1:36:06running dot import notice what's outputted is this by
    • 1:36:08that automatically for me created a running dot import
    • 1:36:10table that automatically for me created a
    • 1:36:11in my database called table
    • 1:36:14shows and it gave it three columns in my database called
    • 1:36:17timestamp shows and it gave it three columns
    • 1:36:18title and genres where did those column timestamp
    • 1:36:20names come from title and genres where did those column
    • 1:36:21come from well they came from the very names come from
    • 1:36:23first line in the csv come from well they came from the very
    • 1:36:24and they all looked like text so the first line in the csv
    • 1:36:27type of those values was just in and they all looked like text so the
    • 1:36:29assumed to be text text text now to be type of those values was just in
    • 1:36:32clear assumed to be text text text now to be
    • 1:36:33i could have manually typed this out clear
    • 1:36:34created these three columns in a new i could have manually typed this out
    • 1:36:36table called shows for me created these three columns in a new
    • 1:36:38but again the dot import command just table called shows for me
    • 1:36:39automated that from a csv but again the dot import command just
    • 1:36:41but the sql is what we see here create automated that from a csv
    • 1:36:44table shows but the sql is what we see here create
    • 1:36:45and so forth so that is to say now table shows
    • 1:36:48in this database there is a file and so forth so that is to say now
    • 1:36:52or rather there is a table called in this database there is a file
    • 1:36:55shows inside of which is all of the data or rather there is a table called
    • 1:36:58from that csv shows inside of which is all of the data
    • 1:36:59how do i actually get at that data well from that csv
    • 1:37:01it turns out there's other commands how do i actually get at that data well
    • 1:37:02recalled not just create it turns out there's other commands
    • 1:37:04but also select it turns out select is recalled not just create
    • 1:37:07the equivalent of but also select it turns out select is
    • 1:37:08read getting data from the database and the equivalent of
    • 1:37:10this one's pretty powerful read getting data from the database and
    • 1:37:12and the reason that so many data this one's pretty powerful
    • 1:37:13scientists and statisticians and the reason that so many data
    • 1:37:15use and like using languages like sql scientists and statisticians
    • 1:37:17they make it relatively easy to just get use and like using languages like sql
    • 1:37:19data and filter that data and analyze they make it relatively easy to just get
    • 1:37:21that data data and filter that data and analyze
    • 1:37:22using new syntax for us today but that data
    • 1:37:25relatively simple syntax relative to using new syntax for us today but
    • 1:37:26other things we've seen relatively simple syntax relative to
    • 1:37:28the select command in sql lets you other things we've seen
    • 1:37:30select one or more columns the select command in sql lets you
    • 1:37:33from your table by the given name select one or more columns
    • 1:37:36so we'll see this now in just a moment from your table by the given name
    • 1:37:39here how might i go about doing this so we'll see this now in just a moment
    • 1:37:40well let me go ahead and now here how might i go about doing this
    • 1:37:43at my prompt after just clearing the well let me go ahead and now
    • 1:37:44window to keep things neat let me try at my prompt after just clearing the
    • 1:37:46this out window to keep things neat let me try
    • 1:37:46let me go ahead and select this out
    • 1:37:50uh let's say title from let me go ahead and select
    • 1:37:53shows semicolon so why am i doing this uh let's say title from
    • 1:37:56well again the conventional format for shows semicolon so why am i doing this
    • 1:37:58the select command well again the conventional format for
    • 1:37:59is to say select then the name of one or the select command
    • 1:38:01more columns is to say select then the name of one or
    • 1:38:02then literally the preposition from and more columns
    • 1:38:04then the name of the table then literally the preposition from and
    • 1:38:06from which you want to select that data then the name of the table
    • 1:38:08so if my table is called from which you want to select that data
    • 1:38:10shows and the column is called title it so if my table is called
    • 1:38:13stands to reason that select title from shows and the column is called title it
    • 1:38:15shows stands to reason that select title from
    • 1:38:16should give me back the data i want now shows
    • 1:38:17notice a couple of stylistic choices should give me back the data i want now
    • 1:38:19that aren't strictly required but are notice a couple of stylistic choices
    • 1:38:21good style that aren't strictly required but are
    • 1:38:22conventionally i would capitalize any good style
    • 1:38:24sql conventionally i would capitalize any
    • 1:38:25keywords including select and from in sql
    • 1:38:27this case keywords including select and from in
    • 1:38:28and then lowercase anything that's a this case
    • 1:38:30column name and then lowercase anything that's a
    • 1:38:31or a table name assuming you created column name
    • 1:38:34those columns and tables or a table name assuming you created
    • 1:38:36in in fact lowercase there's different those columns and tables
    • 1:38:37conventions out there some people will in in fact lowercase there's different
    • 1:38:38upper case some people use something conventions out there some people will
    • 1:38:40called camel case or snake case or the upper case some people use something
    • 1:38:42like called camel case or snake case or the
    • 1:38:42but generally speaking i would encourage like
    • 1:38:44all caps for sql syntax but generally speaking i would encourage
    • 1:38:46and lowercase for the column and table all caps for sql syntax
    • 1:38:48names i'm going to go ahead now and hit and lowercase for the column and table
    • 1:38:49enter names i'm going to go ahead now and hit
    • 1:38:50and voila we see rapidly a whole list of enter
    • 1:38:53values and voila we see rapidly a whole list of
    • 1:38:54outputted from the database and if you values
    • 1:38:56think way back outputted from the database and if you
    • 1:38:58you'll might recognize that this think way back
    • 1:39:00actually happens to be the same order you'll might recognize that this
    • 1:39:02as before because the csv file was actually happens to be the same order
    • 1:39:05loaded top to bottom as before because the csv file was
    • 1:39:06into this same database table and so loaded top to bottom
    • 1:39:08what we're seeing in fact is all of that into this same database table and so
    • 1:39:10same data what we're seeing in fact is all of that
    • 1:39:11duplicates and miscapitalizations and same data
    • 1:39:13weird spacing duplicates and miscapitalizations and
    • 1:39:14and all but suppose i want to see all of weird spacing
    • 1:39:17the data from the csv and all but suppose i want to see all of
    • 1:39:18well it turns out you can select the data from the csv
    • 1:39:19multiple columns you can select not only well it turns out you can select
    • 1:39:21title but maybe timestamp was of multiple columns you can select not only
    • 1:39:23interest and this one admittedly was title but maybe timestamp was of
    • 1:39:25capitalized because that's what it was interest and this one admittedly was
    • 1:39:26in the spreadsheet capitalized because that's what it was
    • 1:39:28that was not something i chose manually in the spreadsheet
    • 1:39:29so if i just use a comma separated list that was not something i chose manually
    • 1:39:31of column names notice what i can do now so if i just use a comma separated list
    • 1:39:33it's a little hard to see for us humans of column names notice what i can do now
    • 1:39:35because there's a lot going on now it's a little hard to see for us humans
    • 1:39:37but notice that in double quotes on the because there's a lot going on now
    • 1:39:39left there are all of the time stamps but notice that in double quotes on the
    • 1:39:42which represent the time at which you left there are all of the time stamps
    • 1:39:43all submitted your favorite shows and on which represent the time at which you
    • 1:39:45the right of the comma there's another all submitted your favorite shows and on
    • 1:39:47quoted string the right of the comma there's another
    • 1:39:49that is the title of the show that you quoted string
    • 1:39:50liked although sqlite omits the com that is the title of the show that you
    • 1:39:52the quotes if it's just a single word liked although sqlite omits the com
    • 1:39:54like friends just by convention the quotes if it's just a single word
    • 1:39:56you know in fact if i want to get all of like friends just by convention
    • 1:39:58the columns turns out there's some you know in fact if i want to get all of
    • 1:39:59shorthand syntax for that the columns turns out there's some
    • 1:40:01star is the so-called wildcard operator shorthand syntax for that
    • 1:40:03and it will get me star is the so-called wildcard operator
    • 1:40:04all of the columns from left to right in and it will get me
    • 1:40:06my table and voila now i see all of the all of the columns from left to right in
    • 1:40:09data my table and voila now i see all of the
    • 1:40:09including all of the genres data
    • 1:40:12as well so now i effectively have three including all of the genres
    • 1:40:15columns being outputted as well so now i effectively have three
    • 1:40:16all at once here well this is not that columns being outputted
    • 1:40:20useful all at once here well this is not that
    • 1:40:20thus far in fact all i've been doing is useful
    • 1:40:22really just outputting the contents of thus far in fact all i've been doing is
    • 1:40:23the csv really just outputting the contents of
    • 1:40:24but sql is powerful because it comes the csv
    • 1:40:26with other features right out of the box but sql is powerful because it comes
    • 1:40:28somewhat similar in spirits of functions with other features right out of the box
    • 1:40:30that are built into google spreadsheets somewhat similar in spirits of functions
    • 1:40:32in excel that are built into google spreadsheets
    • 1:40:33but now we can use them ultimately in in excel
    • 1:40:34our own code so functions like average but now we can use them ultimately in
    • 1:40:37count distinct lower max min our own code so functions like average
    • 1:40:39and upper and bunches more these are all count distinct lower max min
    • 1:40:41functions built into sql and upper and bunches more these are all
    • 1:40:43that you can use as part of your query functions built into sql
    • 1:40:45to sort of that you can use as part of your query
    • 1:40:46alter the data as it's coming back from to sort of
    • 1:40:49the database not permanently but as it's alter the data as it's coming back from
    • 1:40:51coming back to you the database not permanently but as it's
    • 1:40:52so that it's in a format you actually coming back to you
    • 1:40:53care about so for instance one of my so that it's in a format you actually
    • 1:40:55goals earlier was to get back just the care about so for instance one of my
    • 1:40:57distinct goals earlier was to get back just the
    • 1:40:58the unique titles and we had to write distinct
    • 1:41:00all that annoying code the unique titles and we had to write
    • 1:41:01using a set and then add things to the all that annoying code
    • 1:41:03set and then loop over it again right using a set and then add things to the
    • 1:41:04like set and then loop over it again right
    • 1:41:05that was not a huge amount of code but like
    • 1:41:06it definitely took us what five ten that was not a huge amount of code but
    • 1:41:08minutes to get the job done at least it definitely took us what five ten
    • 1:41:10in sql you can do all of that in one minutes to get the job done at least
    • 1:41:12breath i'm going to go ahead now and do in sql you can do all of that in one
    • 1:41:14this breath i'm going to go ahead now and do
    • 1:41:15select not just title from this
    • 1:41:18shows let me go ahead and select select not just title from
    • 1:41:21distinct shows let me go ahead and select
    • 1:41:22title from shows so distinct again is an distinct
    • 1:41:25available function in sql title from shows so distinct again is an
    • 1:41:27that does what the name says it's going available function in sql
    • 1:41:28to filter out all of the titles to just that does what the name says it's going
    • 1:41:30give me the distinct ones back so if i to filter out all of the titles to just
    • 1:41:32hit enter now give me the distinct ones back so if i
    • 1:41:33you'll see a similarly messy list but hit enter now
    • 1:41:37including no idea someone that doesn't you'll see a similarly messy list but
    • 1:41:39watch tv including an unsorted list including no idea someone that doesn't
    • 1:41:42of those titles so i think we can watch tv including an unsorted list
    • 1:41:44probably start to clean this thing up as of those titles so i think we can
    • 1:41:46we did before probably start to clean this thing up as
    • 1:41:47let me go ahead and now select not just we did before
    • 1:41:49distinct but let me go ahead and let me go ahead and now select not just
    • 1:41:51uppercase everything as well distinct but let me go ahead and
    • 1:41:53and i can use upper as another function uppercase everything as well
    • 1:41:55and notice i'm just nesting things like and i can use upper as another function
    • 1:41:57the output of one function as we've seen and notice i'm just nesting things like
    • 1:41:58in many languages now can be the input the output of one function as we've seen
    • 1:42:00to another in many languages now can be the input
    • 1:42:01let me hit enter now and now it's to another
    • 1:42:03getting a little let me hit enter now and now it's
    • 1:42:04more canonicalized so to speak because getting a little
    • 1:42:06i'm using capitalization for everything more canonicalized so to speak because
    • 1:42:08but it would seem that things still i'm using capitalization for everything
    • 1:42:11aren't really but it would seem that things still
    • 1:42:12sorted it's just the same order in which aren't really
    • 1:42:14you inputted them but without sorted it's just the same order in which
    • 1:42:16duplicates this time so it turns out you inputted them but without
    • 1:42:18that sql has duplicates this time so it turns out
    • 1:42:19other syntax that we can use to make our that sql has
    • 1:42:22queries more precise and more powerful other syntax that we can use to make our
    • 1:42:25so in addition to these kinds of queries more precise and more powerful
    • 1:42:26functions that you can use to alter the so in addition to these kinds of
    • 1:42:28data that's being shown to you and functions that you can use to alter the
    • 1:42:30coming back data that's being shown to you and
    • 1:42:30you can also use these kinds of clauses coming back
    • 1:42:33or syntax in sql queries you can also use these kinds of clauses
    • 1:42:35you can say where which is the or syntax in sql queries
    • 1:42:37equivalent of a condition you can say where which is the
    • 1:42:38you can say give select all of this data equivalent of a condition
    • 1:42:40where something is true you can say give select all of this data
    • 1:42:42or false you can say like where you can where something is true
    • 1:42:44say give me data that doesn't isn't or false you can say like where you can
    • 1:42:46exactly this but is like this say give me data that doesn't isn't
    • 1:42:48you can order the data by some column exactly this but is like this
    • 1:42:50you can limit the number of you can order the data by some column
    • 1:42:51rows that come back and you can group you can limit the number of
    • 1:42:54identical values together in some way rows that come back and you can group
    • 1:42:56so let's see a few examples of this let identical values together in some way
    • 1:42:58me go back here so let's see a few examples of this let
    • 1:42:59and play around now with uh how about me go back here
    • 1:43:01the office that was the one we looked at and play around now with uh how about
    • 1:43:03earlier so let me go ahead and select the office that was the one we looked at
    • 1:43:04title from earlier so let me go ahead and select
    • 1:43:05shows where title equals title from
    • 1:43:09the office quote unquote semicolon so shows where title equals
    • 1:43:12i've added this the office quote unquote semicolon so
    • 1:43:12where predicate so to speak where title i've added this
    • 1:43:16equals quote unquote the office so sql where predicate so to speak where title
    • 1:43:18is nice similar in spirit to python equals quote unquote the office so sql
    • 1:43:20it's more user-friendly perhaps than c is nice similar in spirit to python
    • 1:43:22where everything kinda sort of reads it's more user-friendly perhaps than c
    • 1:43:24like an english sentence even though where everything kinda sort of reads
    • 1:43:25it's a little more like an english sentence even though
    • 1:43:26uh precise and it's a little more it's a little more
    • 1:43:28succinct let me go ahead and hit enter uh precise and it's a little more
    • 1:43:30and voila that's how many of you succinct let me go ahead and hit enter
    • 1:43:33inputted the office and voila that's how many of you
    • 1:43:35but notice it's not everyone is it we're inputted the office
    • 1:43:38missing some still but notice it's not everyone is it we're
    • 1:43:39it seems that i got back only those of missing some still
    • 1:43:42you who typed in literally it seems that i got back only those of
    • 1:43:43the office capital t capital o so what you who typed in literally
    • 1:43:46if i want to be a little more resilient the office capital t capital o so what
    • 1:43:48than that well let me get back if i want to be a little more resilient
    • 1:43:49any rose where you all typed in office than that well let me get back
    • 1:43:52maybe you omitted the um any rose where you all typed in office
    • 1:43:55the article the so let me go ahead and maybe you omitted the um
    • 1:43:57say not title the article the so let me go ahead and
    • 1:43:58equals office but let me go ahead and say not title
    • 1:44:00say where the title is equals office but let me go ahead and
    • 1:44:02like office but i don't want it to just say where the title is
    • 1:44:04be office i want to allow for maybe some like office but i don't want it to just
    • 1:44:06stuff at the beginning be office i want to allow for maybe some
    • 1:44:08maybe some stuff at the end and even stuff at the beginning
    • 1:44:09though this seems like a bit of an maybe some stuff at the end and even
    • 1:44:10inconsistency though this seems like a bit of an
    • 1:44:11in the context of using like there's inconsistency
    • 1:44:14another wildcard character the percent in the context of using like there's
    • 1:44:17sign another wildcard character the percent
    • 1:44:17represents zero or more characters to sign
    • 1:44:20the left represents zero or more characters to
    • 1:44:21and this percent sign represents zero or the left
    • 1:44:23more characters to the right and this percent sign represents zero or
    • 1:44:24so it's kind of this catch-all that will more characters to the right
    • 1:44:26now find me all so it's kind of this catch-all that will
    • 1:44:28titles that somewhere have o f f i now find me all
    • 1:44:31c e inside of them and it turns out like titles that somewhere have o f f i
    • 1:44:33is case insensitive so i don't even need c e inside of them and it turns out like
    • 1:44:35to worry about capitalization with like is case insensitive so i don't even need
    • 1:44:36now let me hit enter and voila now i get to worry about capitalization with like
    • 1:44:39back more answers and you can really see now let me hit enter and voila now i get
    • 1:44:41the messiness now notice up here back more answers and you can really see
    • 1:44:43one of you used lowercase you know that the messiness now notice up here
    • 1:44:45tends to be common when one of you used lowercase you know that
    • 1:44:46typing things in quickly one of you did tends to be common when
    • 1:44:49it lowercase here and then also gave us typing things in quickly one of you did
    • 1:44:51an extra white space at the end it lowercase here and then also gave us
    • 1:44:52one of you just typed in office one of an extra white space at the end
    • 1:44:54you typed in the office again with a one of you just typed in office one of
    • 1:44:56space at the end you typed in the office again with a
    • 1:44:57and so there's a lot of variation here space at the end
    • 1:44:58and that's why when we forced everything and so there's a lot of variation here
    • 1:45:00to uppercase and we started trimming and that's why when we forced everything
    • 1:45:02things to uppercase and we started trimming
    • 1:45:02we were able to get rid of a lot of things
    • 1:45:05those redundancies we were able to get rid of a lot of
    • 1:45:07well in fact let's go ahead and and those redundancies
    • 1:45:08order this now so let me go back to well in fact let's go ahead and and
    • 1:45:11selecting order this now so let me go back to
    • 1:45:12the distinct uppercase title so select selecting
    • 1:45:14distinct the distinct uppercase title so select
    • 1:45:15upper of title distinct
    • 1:45:19from shows and let me now order by which upper of title
    • 1:45:22is a new clause from shows and let me now order by which
    • 1:45:24the uppercased version of title is a new clause
    • 1:45:27so now notice there's a few things going the uppercased version of title
    • 1:45:29on here but i'm just building up more so now notice there's a few things going
    • 1:45:30complicated queries similar learn to on here but i'm just building up more
    • 1:45:32scratch where we just started throwing complicated queries similar learn to
    • 1:45:33more and more puzzle pieces out of scratch where we just started throwing
    • 1:45:34problem more and more puzzle pieces out of
    • 1:45:35i'm selecting all of the distinct problem
    • 1:45:37uppercase titles i'm selecting all of the distinct
    • 1:45:38from the shows table but i'm going to uppercase titles
    • 1:45:40order the results this time from the shows table but i'm going to
    • 1:45:42by the uppercase version of title so order the results this time
    • 1:45:46everything's going to be uppercase and by the uppercase version of title so
    • 1:45:47then it's going to be sorted a through z everything's going to be uppercase and
    • 1:45:49hit enter now and now things are a then it's going to be sorted a through z
    • 1:45:51little easier to make sense of notice hit enter now and now things are a
    • 1:45:53the quotes little easier to make sense of notice
    • 1:45:53are there only when there are multiple the quotes
    • 1:45:55words in a title otherwise sequel light are there only when there are multiple
    • 1:45:573 doesn't bother showing us words in a title otherwise sequel light
    • 1:45:59but notice here's all the the shows and 3 doesn't bother showing us
    • 1:46:01if we keep scrolling up the p's the ends but notice here's all the the shows and
    • 1:46:04the if we keep scrolling up the p's the ends
    • 1:46:04m's the l's and so forth it's indeed the
    • 1:46:06alphabetized m's the l's and so forth it's indeed
    • 1:46:08thanks to using order by all right let alphabetized
    • 1:46:11me pause for just a second because i thanks to using order by all right let
    • 1:46:12know that's a lot all at once me pause for just a second because i
    • 1:46:15any questions thus far on select know that's a lot all at once
    • 1:46:18or on distinct or on upper table names any questions thus far on select
    • 1:46:21the where clause the order by clause or on distinct or on upper table names
    • 1:46:24it's a lot quickly but it just generally the where clause the order by clause
    • 1:46:27expresses the kinds of problems we've it's a lot quickly but it just generally
    • 1:46:29already seen expresses the kinds of problems we've
    • 1:46:30but solved in code anything on your end already seen
    • 1:46:36brian but solved in code anything on your end
    • 1:46:37no hands here all right well let's start brian
    • 1:46:40to solve no hands here all right well let's start
    • 1:46:40more similar problems now in sql by to solve
    • 1:46:43writing way less code than we did more similar problems now in sql by
    • 1:46:46a bit ago in python suppose i want to writing way less code than we did
    • 1:46:48actually figure out a bit ago in python suppose i want to
    • 1:46:49the counts of these most popular shows actually figure out
    • 1:46:52so i want to the counts of these most popular shows
    • 1:46:53combine all of the identical shows and so i want to
    • 1:46:56figure out all of the corresponding combine all of the identical shows and
    • 1:46:57counts well let me go ahead and try this figure out all of the corresponding
    • 1:47:00let me go ahead and select again um counts well let me go ahead and try this
    • 1:47:04the uppercase version of title but i'm let me go ahead and select again um
    • 1:47:06not going to do distinct this time the uppercase version of title but i'm
    • 1:47:07because i want to do that a little not going to do distinct this time
    • 1:47:08differently because i want to do that a little
    • 1:47:09i'm going to select the uppercase differently
    • 1:47:11version of title the i'm going to select the uppercase
    • 1:47:12count of those titles so the number of version of title the
    • 1:47:15times a given title appears so count as count of those titles so the number of
    • 1:47:17a new keyword now times a given title appears so count as
    • 1:47:18from shows but now how do i figure out a new keyword now
    • 1:47:22what the count is from shows but now how do i figure out
    • 1:47:23well if you think about this table as what the count is
    • 1:47:25having a lot of titles well if you think about this table as
    • 1:47:27title title title title title it would having a lot of titles
    • 1:47:29be nice to kind of group the identical title title title title title it would
    • 1:47:32titles together be nice to kind of group the identical
    • 1:47:33and then actually count titles together
    • 1:47:37how many such titles we group together and then actually count
    • 1:47:40and the syntax for that is literally to how many such titles we group together
    • 1:47:42say and the syntax for that is literally to
    • 1:47:42group by upper title this tells sql say
    • 1:47:46to group all of the uppercase titles group by upper title this tells sql
    • 1:47:48together kind of collapse multiple rows to group all of the uppercase titles
    • 1:47:50into one together kind of collapse multiple rows
    • 1:47:51but keep track of the count of titles into one
    • 1:47:55after that collapse let me go ahead now but keep track of the count of titles
    • 1:47:57and after that collapse let me go ahead now
    • 1:47:58hit enter and you'll see very similar and
    • 1:48:01to one of the earlier python programs we hit enter and you'll see very similar
    • 1:48:03wrote all of the titles to one of the earlier python programs we
    • 1:48:04on the left followed by a comma followed wrote all of the titles
    • 1:48:07by the count so one of you really likes on the left followed by a comma followed
    • 1:48:08tom and jerry one of you really likes by the count so one of you really likes
    • 1:48:10top tom and jerry one of you really likes
    • 1:48:11top gear if i scroll up though two of top
    • 1:48:13you really liked the wire top gear if i scroll up though two of
    • 1:48:1423 of you here like the office although you really liked the wire
    • 1:48:17we still haven't trimmed the issue here 23 of you here like the office although
    • 1:48:19so we could still combine that further we still haven't trimmed the issue here
    • 1:48:21by trimming white space if we want but so we could still combine that further
    • 1:48:23now we're getting these kinds of counts by trimming white space if we want but
    • 1:48:24well how can i go ahead and order this now we're getting these kinds of counts
    • 1:48:28as we did before let me go ahead here well how can i go ahead and order this
    • 1:48:31and as we did before let me go ahead here
    • 1:48:32add order by count and
    • 1:48:35of title and then hit semicolon now add order by count
    • 1:48:39and now notice just as in python of title and then hit semicolon now
    • 1:48:42everything is from smallest to largest and now notice just as in python
    • 1:48:44initially with game of thrones here down everything is from smallest to largest
    • 1:48:46on the bottom initially with game of thrones here down
    • 1:48:46how can i fix this well it turns out if on the bottom
    • 1:48:49you can order things in descending order how can i fix this well it turns out if
    • 1:48:51d e s c for short instead of asc which you can order things in descending order
    • 1:48:54is the default for ascending d e s c for short instead of asc which
    • 1:48:56so if i do it in descending order now is the default for ascending
    • 1:48:58i'd have to scroll all the way back up so if i do it in descending order now
    • 1:48:59to the a's the very top i'd have to scroll all the way back up
    • 1:49:01to see where the lines begin to the a's the very top
    • 1:49:05whoops if i scroll all the way back up to see where the lines begin
    • 1:49:08to the top whoops if i scroll all the way back up
    • 1:49:09we'll see where all of the a words begin to the top
    • 1:49:11up here we'll see where all of the a words begin
    • 1:49:12and now if i want to whoops up here
    • 1:49:15whoops did i do that right sorry i don't and now if i want to whoops
    • 1:49:18want to whoops did i do that right sorry i don't
    • 1:49:19uh there we go order by count descending want to
    • 1:49:21now let me go ahead and uh there we go order by count descending
    • 1:49:22this is just a little too unwieldy to now let me go ahead and
    • 1:49:24see let me just limit myself to the top this is just a little too unwieldy to
    • 1:49:25ten and keep it simple and only look at see let me just limit myself to the top
    • 1:49:27the top ten values here ten and keep it simple and only look at
    • 1:49:28voila now i have game of thrones at 33 the top ten values here
    • 1:49:31friends at 26 the office at 23 though i voila now i have game of thrones at 33
    • 1:49:34think i'm still missing a few brian do friends at 26 the office at 23 though i
    • 1:49:36you recall the sequel function for think i'm still missing a few brian do
    • 1:49:38trimming leading and trailing white you recall the sequel function for
    • 1:49:39space trimming leading and trailing white
    • 1:49:42uh i think it's just trim trim okay
    • 1:49:45i myself did not remember so when in uh i think it's just trim trim okay
    • 1:49:46doubt google or s brian so let me go i myself did not remember so when in
    • 1:49:48ahead and fix doubt google or s brian so let me go
    • 1:49:49this let me go ahead and select ahead and fix
    • 1:49:51uppercase of this let me go ahead and select
    • 1:49:52trimming the title first and then i'm uppercase of
    • 1:49:55going to trimming the title first and then i'm
    • 1:49:55group by trimming and then uppercasing going to
    • 1:49:59it there and now enter group by trimming and then uppercasing
    • 1:50:00and voila thank you brian so now we're it there and now enter
    • 1:50:02up to our 26 and voila thank you brian so now we're
    • 1:50:04offices here so in short it took us a up to our 26
    • 1:50:07little while to get to this point in the offices here so in short it took us a
    • 1:50:08story in sql but notice what we've done little while to get to this point in the
    • 1:50:11we've taken a program that took us a few story in sql but notice what we've done
    • 1:50:12minutes and certainly a dozen or more we've taken a program that took us a few
    • 1:50:14lines of code minutes and certainly a dozen or more
    • 1:50:15and we've distilled it into something lines of code
    • 1:50:17that yes is a new language but it's just and we've distilled it into something
    • 1:50:19kind of a one-liner that yes is a new language but it's just
    • 1:50:21and once you get comfortable with a kind of a one-liner
    • 1:50:22language-like sql especially if you're and once you get comfortable with a
    • 1:50:24not even a computer scientist but maybe language-like sql especially if you're
    • 1:50:26a data scientist not even a computer scientist but maybe
    • 1:50:27or an analyst of some sort who spends a a data scientist
    • 1:50:29lot of their day looking at financial or an analyst of some sort who spends a
    • 1:50:30information or medical information or lot of their day looking at financial
    • 1:50:32really information or medical information or
    • 1:50:33any data set that can be loaded into really
    • 1:50:35rows and columns any data set that can be loaded into
    • 1:50:36once you start to speak and read sql rows and columns
    • 1:50:39as a human can you start to express some once you start to speak and read sql
    • 1:50:41pretty powerful queries as a human can you start to express some
    • 1:50:42relatively succinctly and boom get back pretty powerful queries
    • 1:50:45your answer and by using a command line relatively succinctly and boom get back
    • 1:50:47program like sql lite3 your answer and by using a command line
    • 1:50:49you can immediately see the results program like sql lite3
    • 1:50:50there albeit it's very simplistic text you can immediately see the results
    • 1:50:52but as mentioned too there albeit it's very simplistic text
    • 1:50:53there's also some graphical programs out but as mentioned too
    • 1:50:55there free and commercial there's also some graphical programs out
    • 1:50:57that also supports sql where you can there free and commercial
    • 1:50:59still type these commands and then it that also supports sql where you can
    • 1:51:00will show it to you in a more still type these commands and then it
    • 1:51:01user-friendly way will show it to you in a more
    • 1:51:02much like in windows or mac os would by user-friendly way
    • 1:51:05default much like in windows or mac os would by
    • 1:51:06so any questions now on the syntax or default
    • 1:51:09capabilities so any questions now on the syntax or
    • 1:51:11of select statements capabilities
    • 1:51:15any questions on selects of select statements
    • 1:51:21seeing anything on your end brian uh one
    • 1:51:24question came in where is the file with seeing anything on your end brian uh one
    • 1:51:26this data question came in where is the file with
    • 1:51:27actually being stored where is the good this data
    • 1:51:29question where is the actually being stored where is the good
    • 1:51:30file actually being stored so before question where is the
    • 1:51:32quitting file actually being stored so before
    • 1:51:33i can actually save this file as quitting
    • 1:51:35anything i want the file extension would i can actually save this file as
    • 1:51:36typically be anything i want the file extension would
    • 1:51:38db and in fact brian do you mind just typically be
    • 1:51:39checking what's the syntax for writing db and in fact brian do you mind just
    • 1:51:41the file manually with dot something checking what's the syntax for writing
    • 1:51:43it would be under dot help i think the file manually with dot something
    • 1:51:51i think it's dot save if followed by the
    • 1:51:52name of the file dot save so i'll call i think it's dot save if followed by the
    • 1:51:54this shows dot name of the file dot save so i'll call
    • 1:51:55uh db enter and now if i open up another this shows dot
    • 1:51:59terminal window just to demonstrate uh db enter and now if i open up another
    • 1:52:01whoops terminal window just to demonstrate
    • 1:52:03sorry close the whole thing if i now
    • 1:52:07go ahead and open up another terminal sorry close the whole thing if i now
    • 1:52:09window and type our old friend go ahead and open up another terminal
    • 1:52:10ls you'll see that now i have a csv file window and type our old friend
    • 1:52:13i have my python file from before and i ls you'll see that now i have a csv file
    • 1:52:15have a new file called i have my python file from before and i
    • 1:52:16shows.db which i've created that is the have a new file called
    • 1:52:18binary file shows.db which i've created that is the
    • 1:52:20that contains the tables that i the binary file
    • 1:52:22table that i've loaded dynamically in that contains the tables that i the
    • 1:52:24from that csv file table that i've loaded dynamically in
    • 1:52:28any other questions on select queries or from that csv file
    • 1:52:31what we can do with them any other questions on select queries or
    • 1:52:35anything on your end brian yeah a few
    • 1:52:38people are asking about what the run anything on your end brian yeah a few
    • 1:52:39time of this is people are asking about what the run
    • 1:52:41yeah really good question what is the time of this is
    • 1:52:43run time i'm gonna come back to that yeah really good question what is the
    • 1:52:44question just a little bit if that's run time i'm gonna come back to that
    • 1:52:46okay question just a little bit if that's
    • 1:52:46right now it's admittedly big o of n okay
    • 1:52:49i've not actually done anything better right now it's admittedly big o of n
    • 1:52:51than we did with our csv i've not actually done anything better
    • 1:52:52file or our python code right now it's than we did with our csv
    • 1:52:55still big o of n by default but there's file or our python code right now it's
    • 1:52:57going to be a better answer to that still big o of n by default but there's
    • 1:52:58that's going to make it something much going to be a better answer to that
    • 1:53:00more logarithmic so let me come back to that's going to make it something much
    • 1:53:02that feature more logarithmic so let me come back to
    • 1:53:03when it's time to enable it but in fact that feature
    • 1:53:05let's start to take some steps toward when it's time to enable it but in fact
    • 1:53:07that because it turns out let's start to take some steps toward
    • 1:53:08when loading in data we're not always that because it turns out
    • 1:53:10going to have the luxury of just having when loading in data we're not always
    • 1:53:11one big file and csv format that we going to have the luxury of just having
    • 1:53:13import and we go about our business one big file and csv format that we
    • 1:53:14we're gonna have to decide in advance import and we go about our business
    • 1:53:16how we wanna store the data and what we're gonna have to decide in advance
    • 1:53:17data we wanna store how we wanna store the data and what
    • 1:53:18and what the relationships might be data we wanna store
    • 1:53:20across not one single table and what the relationships might be
    • 1:53:22but multiple tables so let me go ahead across not one single table
    • 1:53:24and run one other command here that but multiple tables so let me go ahead
    • 1:53:26actually introduces the first of a and run one other command here that
    • 1:53:28problem let me go ahead and select actually introduces the first of a
    • 1:53:29title from shows where problem let me go ahead and select
    • 1:53:33genres equals for instance comedy that title from shows where
    • 1:53:35was one of the genres genres equals for instance comedy that
    • 1:53:37and notice that we get back a whole was one of the genres
    • 1:53:38bunch of results and notice that we get back a whole
    • 1:53:40but i get i bet i'm missing some i'm bunch of results
    • 1:53:43skimming through this pretty quickly but i get i bet i'm missing some i'm
    • 1:53:44but i bet i'm missing some because if i skimming through this pretty quickly
    • 1:53:47check if genre is but i bet i'm missing some because if i
    • 1:53:48equals comedy what am i omitting well check if genre is
    • 1:53:50those of you who checked multiple boxes equals comedy what am i omitting well
    • 1:53:52might have said something is a comedy those of you who checked multiple boxes
    • 1:53:54and a drama or might have said something is a comedy
    • 1:53:55comedy and romance or maybe a couple of and a drama or
    • 1:53:57other permutations of genres comedy and romance or maybe a couple of
    • 1:53:59if i'm searching for equality here other permutations of genres
    • 1:54:01equals comedy i'm only gonna get if i'm searching for equality here
    • 1:54:03those favorites from you where you only equals comedy i'm only gonna get
    • 1:54:06said my favorite tv show is a those favorites from you where you only
    • 1:54:08a comedy but what about something like said my favorite tv show is a
    • 1:54:10uh a comedy but what about something like
    • 1:54:11what if comedy what if we want to do uh
    • 1:54:13something like like what if comedy what if we want to do
    • 1:54:15comedy instead and we could say something like like
    • 1:54:17something like well so long as the word comedy instead and we could say
    • 1:54:19comedy is in there something like well so long as the word
    • 1:54:20then we should get back even more comedy is in there
    • 1:54:22results and let me stipulate that indeed then we should get back even more
    • 1:54:24i now have a longer list of results results and let me stipulate that indeed
    • 1:54:26now we have all shows where you checked i now have a longer list of results
    • 1:54:27at least the comedy box now we have all shows where you checked
    • 1:54:29but unfortunately this starts to get a at least the comedy box
    • 1:54:31little sloppy because recall what the but unfortunately this starts to get a
    • 1:54:33genres column looks like select let me little sloppy because recall what the
    • 1:54:36select genres column looks like select let me
    • 1:54:37genres from shows semicolon notice that select
    • 1:54:40all of the genres that we loaded into genres from shows semicolon notice that
    • 1:54:42this table all of the genres that we loaded into
    • 1:54:43from the csv file are a comma this table
    • 1:54:46separated list of genres that's just the from the csv file are a comma
    • 1:54:49way google forms did it separated list of genres that's just the
    • 1:54:50and that's fine for csv purposes that's way google forms did it
    • 1:54:53kind of fine for sql purposes but this and that's fine for csv purposes that's
    • 1:54:55is kind of messy kind of fine for sql purposes but this
    • 1:54:56like generally speaking storing comma is kind of messy
    • 1:54:58separated lists like generally speaking storing comma
    • 1:55:00of values in a sql database separated lists
    • 1:55:03is not what you should be doing the of values in a sql database
    • 1:55:04whole point of using a sql database is not what you should be doing the
    • 1:55:06is to move away from commas and csvs and whole point of using a sql database
    • 1:55:09to actually store things more cleanly is to move away from commas and csvs and
    • 1:55:11because in fact let me propose a problem to actually store things more cleanly
    • 1:55:13notice that suppose i want to search because in fact let me propose a problem
    • 1:55:15not for comedy but maybe also notice that suppose i want to search
    • 1:55:19music like this thereby allowing me to not for comedy but maybe also
    • 1:55:22find any shows where the word music is music like this thereby allowing me to
    • 1:55:25somewhere find any shows where the word music is
    • 1:55:26in the comma separated list there's a somewhere
    • 1:55:29subtle bug here in the comma separated list there's a
    • 1:55:30and you might have to think back to subtle bug here
    • 1:55:32where we began and you might have to think back to
    • 1:55:33the form that i select the form that you where we began
    • 1:55:35pulled up i can't show the whole thing the form that i select the form that you
    • 1:55:37here but we started with action pulled up i can't show the whole thing
    • 1:55:39adventure animation biography dot dot here but we started with action
    • 1:55:41dot adventure animation biography dot dot
    • 1:55:42music musical was also there so distinct dot
    • 1:55:47a music video versus a musical or music musical was also there so distinct
    • 1:55:49two different types of genres but notice a music video versus a musical or
    • 1:55:52my query at the moment two different types of genres but notice
    • 1:55:53what's problematic with this at the my query at the moment
    • 1:55:55moment we would seem to have a bug what's problematic with this at the
    • 1:55:57whereby moment we would seem to have a bug
    • 1:55:58this query will select not only music whereby
    • 1:56:00but also this query will select not only music
    • 1:56:01musical and so this is just where things but also
    • 1:56:04are getting messy now yeah you know what musical and so this is just where things
    • 1:56:05we could kind of clean this up are getting messy now yeah you know what
    • 1:56:07maybe we could put a comma here so that we could kind of clean this up
    • 1:56:10it can't just be maybe we could put a comma here so that
    • 1:56:11music something it has to be music comma it can't just be
    • 1:56:14but what if music is the last box that music something it has to be music comma
    • 1:56:15you checked well then it's music but what if music is the last box that
    • 1:56:17nothing there is no comma so now i need you checked well then it's music
    • 1:56:19to like or things together so maybe i nothing there is no comma so now i need
    • 1:56:21have to do something like we're music to like or things together so maybe i
    • 1:56:23like this have to do something like we're music
    • 1:56:23or or genres like quote-unquote like this
    • 1:56:27music like this but honestly this is or or genres like quote-unquote
    • 1:56:30just getting messy music like this but honestly this is
    • 1:56:31like this is poorly designed if you're just getting messy
    • 1:56:32just storing your data as a comma like this is poorly designed if you're
    • 1:56:34separated list of values inside of a just storing your data as a comma
    • 1:56:36column separated list of values inside of a
    • 1:56:36and you have to resort to this kind of column
    • 1:56:38hack to figure out well maybe it's over and you have to resort to this kind of
    • 1:56:40here or here or here and thinking about hack to figure out well maybe it's over
    • 1:56:42all the permutations of syntax here or here or here and thinking about
    • 1:56:44you're doing it wrong you're not using a all the permutations of syntax
    • 1:56:46sql database to its fullest potential you're doing it wrong you're not using a
    • 1:56:48so how do we go about designing this sql database to its fullest potential
    • 1:56:50thing better and actually load this csv so how do we go about designing this
    • 1:56:52into a database a little more cleanly in thing better and actually load this csv
    • 1:56:55short into a database a little more cleanly in
    • 1:56:55how do we get rid of the stupid commas short
    • 1:56:58in the genres how do we get rid of the stupid commas
    • 1:56:59column and instead put one word in the genres
    • 1:57:03comedy or music or musical in each of column and instead put one word
    • 1:57:06those cells comedy or music or musical in each of
    • 1:57:06so to speak not two not three one only those cells
    • 1:57:09without throwing away some of those so to speak not two not three one only
    • 1:57:11genres without throwing away some of those
    • 1:57:12well let me introduce a few building genres
    • 1:57:13blocks that'll get us there well let me introduce a few building
    • 1:57:15it turns out in sequel that when you blocks that'll get us there
    • 1:57:17want to it turns out in sequel that when you
    • 1:57:19create your own tables we can want to
    • 1:57:24sorry let me just think for a moment
    • 1:57:27it turns out when creating your own
    • 1:57:29tables and loading data into a database it turns out when creating your own
    • 1:57:31on your own tables and loading data into a database
    • 1:57:32we're going to need more than just on your own
    • 1:57:34select select of course is just for we're going to need more than just
    • 1:57:35reading select select of course is just for
    • 1:57:36but if we're going to do this better and reading
    • 1:57:38not just use sqlite 3's but if we're going to do this better and
    • 1:57:39built-in dot import command but instead not just use sqlite 3's
    • 1:57:43we're going to write some code built-in dot import command but instead
    • 1:57:44to load all of our data into maybe two we're going to write some code
    • 1:57:48tables to load all of our data into maybe two
    • 1:57:48one for the titles one for the genres tables
    • 1:57:51we're going to need a little more one for the titles one for the genres
    • 1:57:52expressiveness we're going to need a little more
    • 1:57:53when it comes to um sql and so for that expressiveness
    • 1:57:57we're going to need one the ability to when it comes to um sql and so for that
    • 1:57:59create our own tables and we've seen a we're going to need one the ability to
    • 1:58:00glimpse of this before create our own tables and we've seen a
    • 1:58:01but we're also going to need to see glimpse of this before
    • 1:58:02another piece of syntax as well so but we're also going to need to see
    • 1:58:04inserting another piece of syntax as well so
    • 1:58:05inserting is another command that you inserting
    • 1:58:07can execute inserting is another command that you
    • 1:58:08on a sql database in order to actually can execute
    • 1:58:11add data to a database on a sql database in order to actually
    • 1:58:13which is great because if i want to add data to a database
    • 1:58:15ultimately which is great because if i want to
    • 1:58:16iterate over that same csv but this time ultimately
    • 1:58:19manually iterate over that same csv but this time
    • 1:58:20add all of the rows to the database manually
    • 1:58:23myself add all of the rows to the database
    • 1:58:24well then i'm going to need some way of myself
    • 1:58:25inserting and the syntax for that is well then i'm going to need some way of
    • 1:58:27as follows insert into the name of the inserting and the syntax for that is
    • 1:58:29table as follows insert into the name of the
    • 1:58:30the column or columns that you want to table
    • 1:58:32insert values into the column or columns that you want to
    • 1:58:34then literally the word values and then insert values into
    • 1:58:36literally in parentheses again then literally the word values and then
    • 1:58:38the actual list of values so it's a literally in parentheses again
    • 1:58:40little abstract when we see it in this the actual list of values so it's a
    • 1:58:41generic form but we'll see this more little abstract when we see it in this
    • 1:58:43explicitly in just a moment here generic form but we'll see this more
    • 1:58:46as well so when it comes to inserting explicitly in just a moment here
    • 1:58:49something into a database let's go ahead as well so when it comes to inserting
    • 1:58:50and try this so suppose that um something into a database let's go ahead
    • 1:58:53let's see what's what's a show that the and try this so suppose that um
    • 1:58:56muppet show like i grew up loving the let's see what's what's a show that the
    • 1:58:57muppet show it was out in like the 70s muppet show like i grew up loving the
    • 1:58:59and i don't think it was on the list but muppet show it was out in like the 70s
    • 1:59:01i can check this for sure so and i don't think it was on the list but
    • 1:59:03select star from shows where i can check this for sure so
    • 1:59:07uh title like let's just search for select star from shows where
    • 1:59:10muppets with a wild card uh title like let's just search for
    • 1:59:12and i'm guessing no one put it there muppets with a wild card
    • 1:59:13good so it's a missed opportunity i and i'm guessing no one put it there
    • 1:59:15forgot to fill out the form good so it's a missed opportunity i
    • 1:59:16i could go back and fill out the form forgot to fill out the form
    • 1:59:17and re-import the csv but let's go ahead i could go back and fill out the form
    • 1:59:19and do this manually so let me go ahead and re-import the csv but let's go ahead
    • 1:59:20and and do this manually so let me go ahead
    • 1:59:21insert into shows and
    • 1:59:24what columns title and genres insert into shows
    • 1:59:28and i guess i could do a time stamp just what columns title and genres
    • 1:59:30for kicks and i guess i could do a time stamp just
    • 1:59:31and then i'm going to insert what values for kicks
    • 1:59:33the values will be well i don't know and then i'm going to insert what values
    • 1:59:35whatever the values will be well i don't know
    • 1:59:35time it is now so i'm going to cheat whatever
    • 1:59:37there just rather than look up the date time it is now so i'm going to cheat
    • 1:59:38and the time there just rather than look up the date
    • 1:59:39the title will be like the muppet show and the time
    • 1:59:42and the genres will be it was kind of a the title will be like the muppet show
    • 1:59:45comedy it was kind of a musical and the genres will be it was kind of a
    • 1:59:47so we'll kind of leave it at that comedy it was kind of a musical
    • 1:59:49semicolon so we'll kind of leave it at that
    • 1:59:50so again this follows the standard semicolon
    • 1:59:52syntax here of specifying the table you so again this follows the standard
    • 1:59:54want to insert into syntax here of specifying the table you
    • 1:59:55the columns you want to insert into and want to insert into
    • 1:59:57the values you want to put into those the columns you want to insert into and
    • 1:59:58columns the values you want to put into those
    • 1:59:59and i'm going to go ahead and hit enter columns
    • 2:00:00now nothing seems to have happened and i'm going to go ahead and hit enter
    • 2:00:03but if i now select that same query now nothing seems to have happened
    • 2:00:09oh okay uh it's still nothing because i but if i now select that same query
    • 2:00:11made a subtle mistake oh okay uh it's still nothing because i
    • 2:00:13uh not i'm not searching for muppets made a subtle mistake
    • 2:00:15plural i'm searching for muppet uh not i'm not searching for muppets
    • 2:00:16singular the muppet show voila now you plural i'm searching for muppet
    • 2:00:19see my row singular the muppet show voila now you
    • 2:00:20in this database and so insert would see my row
    • 2:00:22give us the ability now to insert new in this database and so insert would
    • 2:00:24rows into the database give us the ability now to insert new
    • 2:00:25suppose you want to update rows into the database
    • 2:00:28something maybe you know some of the suppose you want to update
    • 2:00:30muppet shows were actually pretty something maybe you know some of the
    • 2:00:31dramatic so how might we do that muppet shows were actually pretty
    • 2:00:33well i can say update shows set dramatic so how might we do that
    • 2:00:38let's see genres equal to comedy well i can say update shows set
    • 2:00:41drama musical where let's see genres equal to comedy
    • 2:00:45title equals the muppet show drama musical where
    • 2:00:48so again i'll pull up the canonical title equals the muppet show
    • 2:00:50syntax for this in a bit but for now so again i'll pull up the canonical
    • 2:00:52just a little teaser syntax for this in a bit but for now
    • 2:00:53you can update things pretty simply and just a little teaser
    • 2:00:55even though it takes a little getting you can update things pretty simply and
    • 2:00:56used to the syntax even though it takes a little getting
    • 2:00:57it kind of does what it says update used to the syntax
    • 2:00:59shows set genres equal to this it kind of does what it says update
    • 2:01:02where title equals that and now i can go shows set genres equal to this
    • 2:01:04ahead and enter where title equals that and now i can go
    • 2:01:05if i go ahead and select the same thing ahead and enter
    • 2:01:07just like in a terminal window you can if i go ahead and select the same thing
    • 2:01:08go up and down that's how i'm typing so just like in a terminal window you can
    • 2:01:10quickly go up and down that's how i'm typing so
    • 2:01:10i'm just going up and down to previous quickly
    • 2:01:12commands voila now i see that the muppet i'm just going up and down to previous
    • 2:01:14show is a comedy a drama commands voila now i see that the muppet
    • 2:01:15drama and a musical well i i take show is a comedy a drama
    • 2:01:18issue though with one of the more drama and a musical well i i take
    • 2:01:20popular shows that was in the list a issue though with one of the more
    • 2:01:22whole bunch of you liked um popular shows that was in the list a
    • 2:01:24let's say friends which i've never whole bunch of you liked um
    • 2:01:26really been a fan of and let me go ahead let's say friends which i've never
    • 2:01:27and select really been a fan of and let me go ahead
    • 2:01:28uh title from shows where and select
    • 2:01:32title equals friends and maybe i should uh title from shows where
    • 2:01:35be a little more rigorous than that i title equals friends and maybe i should
    • 2:01:37should say title be a little more rigorous than that i
    • 2:01:38like friends just in case there was should say title
    • 2:01:40different capitalizations enter like friends just in case there was
    • 2:01:42a lot of you really liked friends in different capitalizations enter
    • 2:01:44fact how many of you well recall that i a lot of you really liked friends in
    • 2:01:46can do this i can say fact how many of you well recall that i
    • 2:01:47count and i can let sequel do the count can do this i can say
    • 2:01:49for me 26 of you i disagree with count and i can let sequel do the count
    • 2:01:51strongly and there's a couple of you for me 26 of you i disagree with
    • 2:01:52that even added all the dots but we'll strongly and there's a couple of you
    • 2:01:54deal with you later that even added all the dots but we'll
    • 2:01:55so suppose i do take issue with this deal with you later
    • 2:01:57well delete from so suppose i do take issue with this
    • 2:01:59shows where title equals quote unquote well delete from
    • 2:02:02friends actually title like friends shows where title equals quote unquote
    • 2:02:05let's get them all friends actually title like friends
    • 2:02:06enter and now if we select this again let's get them all
    • 2:02:09i'm sorry friends has been cancelled so enter and now if we select this again
    • 2:02:12you can again i'm sorry friends has been cancelled so
    • 2:02:13update the you can execute these you can again
    • 2:02:14fundamental commands of crud create read update the you can execute these
    • 2:02:16update and delete fundamental commands of crud create read
    • 2:02:17by using create or insert by using update and delete
    • 2:02:20select by using create or insert by using
    • 2:02:21by using update literally and delete select
    • 2:02:23literally as well by using update literally and delete
    • 2:02:24and that's about it like even though literally as well
    • 2:02:25this was a lot quickly there really are and that's about it like even though
    • 2:02:27just those four fundamental operations this was a lot quickly there really are
    • 2:02:29in sql plus some of these add-on just those four fundamental operations
    • 2:02:31features like these additional functions in sql plus some of these add-on
    • 2:02:33like features like these additional functions
    • 2:02:33count that you can use and also some of like
    • 2:02:36these keywords like where and the like count that you can use and also some of
    • 2:02:38well let me propose that we now do these keywords like where and the like
    • 2:02:40better if we have the ability to select well let me propose that we now do
    • 2:02:42data better if we have the ability to select
    • 2:02:43and create tables and insert data let's data
    • 2:02:45go ahead and write our own and create tables and insert data let's
    • 2:02:47python script that uses sql go ahead and write our own
    • 2:02:51as in a loop to read over my csv file python script that uses sql
    • 2:02:54and to insert insert insert insert each as in a loop to read over my csv file
    • 2:02:56of the rows manually because honestly it and to insert insert insert insert each
    • 2:02:58will take me forever to like manually of the rows manually because honestly it
    • 2:03:00type out like hundreds of sql queries to will take me forever to like manually
    • 2:03:02import all of your rows into a new type out like hundreds of sql queries to
    • 2:03:03database i want to write a program that import all of your rows into a new
    • 2:03:05does this instead and i'm going to database i want to write a program that
    • 2:03:07propose that we design it does this instead and i'm going to
    • 2:03:08in the following way i'm going to have propose that we design it
    • 2:03:11two tables this time in the following way i'm going to have
    • 2:03:13represented here with this artist's two tables this time
    • 2:03:14rendition one is going to be called represented here with this artist's
    • 2:03:16shows rendition one is going to be called
    • 2:03:17one is going to be called genres and shows
    • 2:03:19this is a one is going to be called genres and
    • 2:03:20fundamental principle of designing this is a
    • 2:03:23relational databases to figure out the fundamental principle of designing
    • 2:03:26relationships among data relational databases to figure out the
    • 2:03:28and to normalize your data to normalize relationships among data
    • 2:03:31your data means to eliminate and to normalize your data to normalize
    • 2:03:33redundancies to normalize your data your data means to eliminate
    • 2:03:35means to redundancies to normalize your data
    • 2:03:37eliminate mentions of the same words means to
    • 2:03:39again and again eliminate mentions of the same words
    • 2:03:40and have just single sources of truth again and again
    • 2:03:42for your data so to speak so what do i and have just single sources of truth
    • 2:03:44mean by that for your data so to speak so what do i
    • 2:03:45i'm going to propose that we instead mean by that
    • 2:03:47create a simpler table called shows i'm going to propose that we instead
    • 2:03:49that has just two columns one is going create a simpler table called shows
    • 2:03:51to be called id that has just two columns one is going
    • 2:03:52which is new the other is going to be to be called id
    • 2:03:54called title as before honestly i don't which is new the other is going to be
    • 2:03:56care about time stamps so we're just called title as before honestly i don't
    • 2:03:58going to throw that value away which is care about time stamps so we're just
    • 2:03:59another upside of writing our own going to throw that value away which is
    • 2:04:01program we can add or remove any data we another upside of writing our own
    • 2:04:03want program we can add or remove any data we
    • 2:04:04for id i'm introducing this which is want
    • 2:04:06going to be a unique identifier for id i'm introducing this which is
    • 2:04:08literally a simple integer one going to be a unique identifier
    • 2:04:10two three all the way up to a billion or literally a simple integer one
    • 2:04:11two billion however many favorites we two three all the way up to a billion or
    • 2:04:13have two billion however many favorites we
    • 2:04:14i'm just going to let this auto have
    • 2:04:15increment as we go why i'm just going to let this auto
    • 2:04:17i propose that we move to another table increment as we go why
    • 2:04:21all of the genres and that instead of i propose that we move to another table
    • 2:04:24having all of the genres and that instead of
    • 2:04:25one or two or three or five genres having
    • 2:04:28in one column as a stupid comma one or two or three or five genres
    • 2:04:30separated list in one column as a stupid comma
    • 2:04:31which is stupid only in the sense that separated list
    • 2:04:33it's just messy right it means that i which is stupid only in the sense that
    • 2:04:34have to run stupid commands where i'm it's just messy right it means that i
    • 2:04:36checking for the comma here the comma have to run stupid commands where i'm
    • 2:04:37there checking for the comma here the comma
    • 2:04:38it's very hackish so to speak bad design there
    • 2:04:41instead of doing that it's very hackish so to speak bad design
    • 2:04:42i'm going to create another table that instead of doing that
    • 2:04:44also has two columns i'm going to create another table that
    • 2:04:46one is going to be called show id and also has two columns
    • 2:04:48the other is going to be called one is going to be called show id and
    • 2:04:49genre and genre here is just going to be the other is going to be called
    • 2:04:52a single word now genre and genre here is just going to be
    • 2:04:54that column will contain single words a single word now
    • 2:04:56for genres like that column will contain single words
    • 2:04:57comedy or music or musical but for genres like
    • 2:05:01we're going to associate all of those comedy or music or musical but
    • 2:05:03genres we're going to associate all of those
    • 2:05:04with the original show to which they genres
    • 2:05:06belong per your google form submissions with the original show to which they
    • 2:05:09by using this show id here so what does belong per your google form submissions
    • 2:05:12this mean in particular by using this show id here so what does
    • 2:05:14by adding to our first table shows this this mean in particular
    • 2:05:17unique identifier one two three four by adding to our first table shows this
    • 2:05:19five six unique identifier one two three four
    • 2:05:20i can now refer to that same show five six
    • 2:05:24in a very efficient way using a very i can now refer to that same show
    • 2:05:26simple number instead of redundantly in a very efficient way using a very
    • 2:05:28having the office the office the office simple number instead of redundantly
    • 2:05:30again and again i can refer to it by having the office the office the office
    • 2:05:31just one again and again i can refer to it by
    • 2:05:32canonical number which is only going to just one
    • 2:05:33be like four bytes or 32 bits canonical number which is only going to
    • 2:05:36pretty efficient but i can still be like four bytes or 32 bits
    • 2:05:38associate that show with pretty efficient but i can still
    • 2:05:39one genre or two or three or more or associate that show with
    • 2:05:42even none one genre or two or three or more or
    • 2:05:44so in this way every row in our current even none
    • 2:05:47table so in this way every row in our current
    • 2:05:48is going to become one or more rows table
    • 2:05:51in our new pair of tables we're is going to become one or more rows
    • 2:05:54factoring out the genres in our new pair of tables we're
    • 2:05:56so that we can add multiple rows for factoring out the genres
    • 2:05:59every show potentially so that we can add multiple rows for
    • 2:06:00but still remap those genres back to the every show potentially
    • 2:06:04original show itself but still remap those genres back to the
    • 2:06:06so what is some of the the buzzwords original show itself
    • 2:06:08here what's some of the the language so what is some of the the buzzwords
    • 2:06:10to be familiar with well we need to know here what's some of the the language
    • 2:06:13what kinds of to be familiar with well we need to know
    • 2:06:14types are at our disposal here so for what kinds of
    • 2:06:16that let me propose types are at our disposal here so for
    • 2:06:17this let me propose that we have that let me propose
    • 2:06:21this list here it turns out in sql lite this let me propose that we have
    • 2:06:24there are five main data types and this list here it turns out in sql lite
    • 2:06:26that's a bit of an oversimplification there are five main data types and
    • 2:06:27but there's five main data types that's a bit of an oversimplification
    • 2:06:29some of which look familiar a couple of but there's five main data types
    • 2:06:30which are a little weird um some of which look familiar a couple of
    • 2:06:32integer is a thing uh real which are a little weird um
    • 2:06:35is the same thing as float so an integer integer is a thing uh real
    • 2:06:37might be a 32 bit or four byte is the same thing as float so an integer
    • 2:06:39value like one two three or four might be a 32 bit or four byte
    • 2:06:41positive or negative a real number is value like one two three or four
    • 2:06:42going to have a decimal point in it a positive or negative a real number is
    • 2:06:44floating point value probably 32 bits by going to have a decimal point in it a
    • 2:06:46default but those kinds of things the floating point value probably 32 bits by
    • 2:06:47sizes of these types default but those kinds of things the
    • 2:06:49vary by system just like they sizes of these types
    • 2:06:51technically did in c so do they vary by vary by system just like they
    • 2:06:53system in the world of sql but generally technically did in c so do they vary by
    • 2:06:55speaking these are good rules of thumb system in the world of sql but generally
    • 2:06:57text is just that it's sort of the speaking these are good rules of thumb
    • 2:06:58equivalent of a string of some length text is just that it's sort of the
    • 2:07:00but then in sql lite it turns out equivalent of a string of some length
    • 2:07:02there's two other data types we've not but then in sql lite it turns out
    • 2:07:04seen before numeric and blob there's two other data types we've not
    • 2:07:06but more on those in just a little bit seen before numeric and blob
    • 2:07:07blob is binary large object it means you but more on those in just a little bit
    • 2:07:09can store zeros and ones in your blob is binary large object it means you
    • 2:07:11database can store zeros and ones in your
    • 2:07:11numeric is going to be something that's database
    • 2:07:13number like but isn't a number per se numeric is going to be something that's
    • 2:07:15it's like a year or a time number like but isn't a number per se
    • 2:07:17something that has numbers but isn't it's like a year or a time
    • 2:07:19just a simple integer something that has numbers but isn't
    • 2:07:20at that and then we propose too that sql just a simple integer
    • 2:07:23lite is going to allow us to specify two at that and then we propose too that sql
    • 2:07:25when we create our own columns manually lite is going to allow us to specify two
    • 2:07:27by executing the sql code ourselves we when we create our own columns manually
    • 2:07:30can specify that a column by executing the sql code ourselves we
    • 2:07:32cannot be null thus far we've ignored can specify that a column
    • 2:07:34this but some of you might have uh taken cannot be null thus far we've ignored
    • 2:07:36the fifth and just not given us the this but some of you might have uh taken
    • 2:07:38title of a show or a genre the fifth and just not given us the
    • 2:07:39your answers might be blank uh some of title of a show or a genre
    • 2:07:42you may be in registering for a website your answers might be blank uh some of
    • 2:07:44don't want to provide information like you may be in registering for a website
    • 2:07:45where you live or your phone number don't want to provide information like
    • 2:07:47so a database in general sometimes does where you live or your phone number
    • 2:07:49want to support null values so a database in general sometimes does
    • 2:07:51but you might want to say that it can't want to support null values
    • 2:07:52be null a website probably needs your but you might want to say that it can't
    • 2:07:54email address needs your be null a website probably needs your
    • 2:07:57password and a few other fields but not email address needs your
    • 2:07:58everything and there's another keyword password and a few other fields but not
    • 2:08:00in sql just so you've seen it called everything and there's another keyword
    • 2:08:02unique in sql just so you've seen it called
    • 2:08:03where you can additionally say that unique
    • 2:08:04whatever values are in this column where you can additionally say that
    • 2:08:06must be unique so a data a website might whatever values are in this column
    • 2:08:08also use that must be unique so a data a website might
    • 2:08:09if you want to make sure that the same also use that
    • 2:08:11email address can't register for your if you want to make sure that the same
    • 2:08:13website multiple times email address can't register for your
    • 2:08:14you just specify that the email column website multiple times
    • 2:08:16is unique that way you can't put you just specify that the email column
    • 2:08:18multiple people is unique that way you can't put
    • 2:08:18in with identical email addresses so multiple people
    • 2:08:21long story short this is just more of in with identical email addresses so
    • 2:08:23the tools in our sql toolkit because long story short this is just more of
    • 2:08:25we'll see some of these now indirectly the tools in our sql toolkit because
    • 2:08:27and the last piece of jargon we need we'll see some of these now indirectly
    • 2:08:28before designing our own tables and the last piece of jargon we need
    • 2:08:30is going to be this it turns out that in before designing our own tables
    • 2:08:33sql is going to be this it turns out that in
    • 2:08:34there's this notion of primary keys and sql
    • 2:08:36foreign keys and we've not seen this in there's this notion of primary keys and
    • 2:08:38spreadsheets foreign keys and we've not seen this in
    • 2:08:39odds are unless you've been working in spreadsheets
    • 2:08:41the real world for some years and you odds are unless you've been working in
    • 2:08:43have fairly fancy spreadsheets in front the real world for some years and you
    • 2:08:45of you as an analyst or financial person have fairly fancy spreadsheets in front
    • 2:08:46or the like of you as an analyst or financial person
    • 2:08:47odds are you've not seen keys or unique or the like
    • 2:08:50identifiers in quite the same way odds are you've not seen keys or unique
    • 2:08:52but they're relatively simple in fact identifiers in quite the same way
    • 2:08:54let me go back to but they're relatively simple in fact
    • 2:08:56our picture before and propose that when let me go back to
    • 2:08:59you have two tables like this our picture before and propose that when
    • 2:09:02and you want to use a simple integer to you have two tables like this
    • 2:09:04uniquely identify and you want to use a simple integer to
    • 2:09:05all of the rows in one of the tables uniquely identify
    • 2:09:07that's called all of the rows in one of the tables
    • 2:09:08technically an id that's what i'll call that's called
    • 2:09:10it by convention you could call it technically an id that's what i'll call
    • 2:09:11anything you want it by convention you could call it
    • 2:09:12but id just means it's a unique anything you want
    • 2:09:13identifier but but id just means it's a unique
    • 2:09:15semantically this id is what's called a identifier but
    • 2:09:17primary key a primary key is the column semantically this id is what's called a
    • 2:09:20in a table that uniquely identifies primary key a primary key is the column
    • 2:09:23every row in a table that uniquely identifies
    • 2:09:25this means you can have multiple every row
    • 2:09:26versions of the office in that this means you can have multiple
    • 2:09:28title field but each of those rows is versions of the office in that
    • 2:09:31going to have its own number title field but each of those rows is
    • 2:09:32uniquely potentially so primary key going to have its own number
    • 2:09:34uniquely identifies uniquely potentially so primary key
    • 2:09:36each row in another table like genres uniquely identifies
    • 2:09:39which i'm proposing we each row in another table like genres
    • 2:09:40create in just a moment it turns out which i'm proposing we
    • 2:09:43that you're create in just a moment it turns out
    • 2:09:44welcome to refer back to another table that you're
    • 2:09:47by way of that unique identifier welcome to refer back to another table
    • 2:09:50but when it's in this context that id is by way of that unique identifier
    • 2:09:53called a foreign key but when it's in this context that id is
    • 2:09:54so even though i've called it show id called a foreign key
    • 2:09:56here that's just a convention in a lot so even though i've called it show id
    • 2:09:58of sql databases to imply that here that's just a convention in a lot
    • 2:10:00this is technically a column called id of sql databases to imply that
    • 2:10:03in a table this is technically a column called id
    • 2:10:04called show or shows plural in this case in a table
    • 2:10:07so if there's a number one here and called show or shows plural in this case
    • 2:10:10suppose that so if there's a number one here and
    • 2:10:11the office has a unique id of one suppose that
    • 2:10:14we would have a row in this table called the office has a unique id of one
    • 2:10:16id is one we would have a row in this table called
    • 2:10:18title is the office the office might be id is one
    • 2:10:20in title is the office the office might be
    • 2:10:21the comedy category the drama category in
    • 2:10:24and the romance category so multiple the comedy category the drama category
    • 2:10:26ones and the romance category so multiple
    • 2:10:27therefore in the genres table we want to ones
    • 2:10:30output therefore in the genres table we want to
    • 2:10:31three rows the number one one one output
    • 2:10:34in each of those rows but the words three rows the number one one one
    • 2:10:36comedy drama in each of those rows but the words
    • 2:10:38romance in each of those rows comedy drama
    • 2:10:40respectively romance in each of those rows
    • 2:10:41so again the goal here is to just design respectively
    • 2:10:43our database better not have these so again the goal here is to just design
    • 2:10:45stupid comma separated list of values our database better not have these
    • 2:10:47inside of a single column we want to stupid comma separated list of values
    • 2:10:49kind of blow that up explode it inside of a single column we want to
    • 2:10:51into individual rows you might think kind of blow that up explode it
    • 2:10:54well why don't we just use multiple into individual rows you might think
    • 2:10:56columns well why don't we just use multiple
    • 2:10:56but again per our principle from columns
    • 2:10:59spreadsheets you should not be in the but again per our principle from
    • 2:11:00habit of adding more and more columns spreadsheets you should not be in the
    • 2:11:02when the data is all the same like genre habit of adding more and more columns
    • 2:11:04genre genre when the data is all the same like genre
    • 2:11:05right the sort of stupid way to do this genre genre
    • 2:11:07in the spreadsheet world would be to right the sort of stupid way to do this
    • 2:11:08have one column called in the spreadsheet world would be to
    • 2:11:09genre one another column called genre have one column called
    • 2:11:12two another column called genre one another column called genre
    • 2:11:13genre three genre four and you can two another column called
    • 2:11:15imagine just how stupid and inefficient genre three genre four and you can
    • 2:11:17this imagine just how stupid and inefficient
    • 2:11:17is a lot of those columns are going to this
    • 2:11:19be empty for shows with very few is a lot of those columns are going to
    • 2:11:21genres and it's just kind of messy at be empty for shows with very few
    • 2:11:24that point so better genres and it's just kind of messy at
    • 2:11:25in the world of relational databases to that point so better
    • 2:11:28have something like a second table in the world of relational databases to
    • 2:11:30where you have multiple rows that have something like a second table
    • 2:11:32somehow link back to where you have multiple rows that
    • 2:11:34that primary key by way of what we're somehow link back to
    • 2:11:36calling conceptually that primary key by way of what we're
    • 2:11:37a foreign key all right so let's go calling conceptually
    • 2:11:41ahead now and try to write this code let a foreign key all right so let's go
    • 2:11:42me go back to my ide ahead now and try to write this code let
    • 2:11:44let me quit out of sql light now me go back to my ide
    • 2:11:48and let me just move away i'm going to let me quit out of sql light now
    • 2:11:52move this away and let me just move away i'm going to
    • 2:11:55my file for just a moment so that we're
    • 2:11:56only left with our original data my file for just a moment so that we're
    • 2:11:58let's go about implementing a final only left with our original data
    • 2:12:00version of my python file let's go about implementing a final
    • 2:12:02that does this creates two tables one version of my python file
    • 2:12:05called that does this creates two tables one
    • 2:12:05shows one called genres and then two called
    • 2:12:08in a for loop iterates over that csv and shows one called genres and then two
    • 2:12:11inserts some data into the shows and in a for loop iterates over that csv and
    • 2:12:13other data into the genres inserts some data into the shows and
    • 2:12:15how can we do this programmatically well other data into the genres
    • 2:12:17there's a final piece of the puzzle that how can we do this programmatically well
    • 2:12:19we need there's a final piece of the puzzle that
    • 2:12:19we need some way of bridging the world we need
    • 2:12:21of python and sql we need some way of bridging the world
    • 2:12:23and here we do need a library because it of python and sql
    • 2:12:24would just be way too painful to do and here we do need a library because it
    • 2:12:26without a library would just be way too painful to do
    • 2:12:27it can be cs50 cs50s as we'll see makes without a library
    • 2:12:30this very simple it can be cs50 cs50s as we'll see makes
    • 2:12:31there are other third-party commercial this very simple
    • 2:12:33and open source libraries that you can there are other third-party commercial
    • 2:12:34also use in the real world as well and open source libraries that you can
    • 2:12:36that do the same thing but the syntax is also use in the real world as well
    • 2:12:38a little uh less friendly that do the same thing but the syntax is
    • 2:12:39so we'll start by using the cs50 library a little uh less friendly
    • 2:12:41which in python recall has functions so we'll start by using the cs50 library
    • 2:12:43like getstring and getint and getfloat which in python recall has functions
    • 2:12:45but today it also has support it turns like getstring and getint and getfloat
    • 2:12:48out but today it also has support it turns
    • 2:12:48for sql capabilities as well so i'm out
    • 2:12:51going to go back to my favorites file for sql capabilities as well so i'm
    • 2:12:53and i'm going to import not only csv but going to go back to my favorites file
    • 2:12:56i'm also going to import and i'm going to import not only csv but
    • 2:12:58from the cs50 library a feature i'm also going to import
    • 2:13:01called sql so we have a a variable if from the cs50 library a feature
    • 2:13:04you will called sql so we have a a variable if
    • 2:13:05inside of the cs50 library or rather a you will
    • 2:13:07function inside of the cs50 library inside of the cs50 library or rather a
    • 2:13:09called sql function inside of the cs50 library
    • 2:13:11that if i call it will allow me to load called sql
    • 2:13:13a sql lite that if i call it will allow me to load
    • 2:13:14database into memory so how do i do this a sql lite
    • 2:13:17well let me go ahead and add a couple of database into memory so how do i do this
    • 2:13:18new lines of code well let me go ahead and add a couple of
    • 2:13:19let me go ahead and open up new lines of code
    • 2:13:23a file called shows.db let me go ahead and open up
    • 2:13:26but this time in write mode and then a file called shows.db
    • 2:13:28just for kick just for now rather i'm but this time in write mode and then
    • 2:13:30going to go ahead and close it right just for kick just for now rather i'm
    • 2:13:31away going to go ahead and close it right
    • 2:13:32this is a a pythonic way of creating an away
    • 2:13:34empty file it's kind of stupid looking this is a a pythonic way of creating an
    • 2:13:36but by opening a file called shows.db in empty file it's kind of stupid looking
    • 2:13:39write mode and then immediately closing but by opening a file called shows.db in
    • 2:13:41it write mode and then immediately closing
    • 2:13:41it has the effect of creating the file it
    • 2:13:43closing the file so i now have an empty it has the effect of creating the file
    • 2:13:45file with which to interact closing the file so i now have an empty
    • 2:13:47i could also do this as an aside by file with which to interact
    • 2:13:49doing this touch i could also do this as an aside by
    • 2:13:50shows.db touch kind of a strange a doing this touch
    • 2:13:53command shows.db touch kind of a strange a
    • 2:13:54but in a terminal window it means to command
    • 2:13:56create a file but in a terminal window it means to
    • 2:13:57if it doesn't exist so we could also do create a file
    • 2:13:59that instead but that would be on if it doesn't exist so we could also do
    • 2:14:01uh that would be independent of python that instead but that would be on
    • 2:14:03so once i've created this file uh that would be independent of python
    • 2:14:05let me go ahead and open the file now as so once i've created this file
    • 2:14:08a let me go ahead and open the file now as
    • 2:14:08sqlite database i'm going to declare a a
    • 2:14:10variable called db for database sqlite database i'm going to declare a
    • 2:14:12i'm going to use the sql function from variable called db for database
    • 2:14:14cs50s library i'm going to use the sql function from
    • 2:14:16and i'm going to open via somewhat cs50s library
    • 2:14:17cryptic string this and i'm going to open via somewhat
    • 2:14:19sqlite colon slash slash cryptic string this
    • 2:14:22shows dot db now it looks like a url sqlite colon slash slash
    • 2:14:26http colon slash but it's sql lite shows dot db now it looks like a url
    • 2:14:29instead and there's three http colon slash but it's sql lite
    • 2:14:31slashes instead of the usual two but instead and there's three
    • 2:14:33this line of code line 6 has the result slashes instead of the usual two but
    • 2:14:35of opening now that otherwise empty file this line of code line 6 has the result
    • 2:14:38with nothing of opening now that otherwise empty file
    • 2:14:38in it yet as being a sqlite database with nothing
    • 2:14:42using cs50s library why did i do that in it yet as being a sqlite database
    • 2:14:46well i did that because i now want to using cs50s library why did i do that
    • 2:14:48create my first table well i did that because i now want to
    • 2:14:50let me go ahead and execute db.execute create my first table
    • 2:14:53so there's a function called execute let me go ahead and execute db.execute
    • 2:14:54inside of the cs50 sql library so there's a function called execute
    • 2:14:57and i'm going to go ahead and run this inside of the cs50 sql library
    • 2:14:59create table and i'm going to go ahead and run this
    • 2:15:00called shows the type of which create table
    • 2:15:03the columns of which are an id which is called shows the type of which
    • 2:15:06going to be an integer the columns of which are an id which is
    • 2:15:07a title which is going to be text the going to be an integer
    • 2:15:10primary key a title which is going to be text the
    • 2:15:11in which is going to be the id column primary key
    • 2:15:14so this is a bit cryptic but let's see in which is going to be the id column
    • 2:15:16what's happening so this is a bit cryptic but let's see
    • 2:15:17i seem to now in line eight be combining what's happening
    • 2:15:21python with sql and this is where now i seem to now in line eight be combining
    • 2:15:24like programming gets really python with sql and this is where now
    • 2:15:25powerful fancy cool difficult whatever like programming gets really
    • 2:15:28however you want to perceive it powerful fancy cool difficult whatever
    • 2:15:29i can actually use one language inside however you want to perceive it
    • 2:15:31of another how well sql is just a bunch i can actually use one language inside
    • 2:15:33of textual commands of another how well sql is just a bunch
    • 2:15:34up until now i've been typing them out of textual commands
    • 2:15:36manually in this program called sql up until now i've been typing them out
    • 2:15:37lite3 manually in this program called sql
    • 2:15:38there's nothing stopping me though from lite3
    • 2:15:40storing those same commands in python there's nothing stopping me though from
    • 2:15:42strings storing those same commands in python
    • 2:15:43and then passing them to a database strings
    • 2:15:45using code and then passing them to a database
    • 2:15:46the code i'm using is a function called using code
    • 2:15:48execute and its purpose in life and cs50 the code i'm using is a function called
    • 2:15:51staff wrote this execute and its purpose in life and cs50
    • 2:15:52is to pass the argument staff wrote this
    • 2:15:55from your python code into the database is to pass the argument
    • 2:15:58for execution from your python code into the database
    • 2:15:59so it's like the programmatic way of for execution
    • 2:16:02just typing things manually at the sql so it's like the programmatic way of
    • 2:16:04lite prompt a few minutes ago just typing things manually at the sql
    • 2:16:06so that's going to go ahead and create lite prompt a few minutes ago
    • 2:16:07my table called shows in which i'm going so that's going to go ahead and create
    • 2:16:09to store all of those unique ids my table called shows in which i'm going
    • 2:16:11and also the titles and then let me do to store all of those unique ids
    • 2:16:13this again db.execute and also the titles and then let me do
    • 2:16:16create table genres and that's going to this again db.execute
    • 2:16:20have a column called create table genres and that's going to
    • 2:16:21show id which is an integer also genre have a column called
    • 2:16:24which is text and lastly it's going to show id which is an integer also genre
    • 2:16:27have a foreign key it's going to wrap which is text and lastly it's going to
    • 2:16:29onto two have a foreign key it's going to wrap
    • 2:16:30it's going to wrap a little long here on onto two
    • 2:16:32show id it's going to wrap a little long here on
    • 2:16:34which references the shows table id show id
    • 2:16:37all right so this is a lot so let's just which references the shows table id
    • 2:16:39recap left to right all right so this is a lot so let's just
    • 2:16:40db execute is my python function that recap left to right
    • 2:16:42executes any sql i want db execute is my python function that
    • 2:16:44create table genres creates a table executes any sql i want
    • 2:16:46called genres the columns create table genres creates a table
    • 2:16:48in that table will be something called called genres the columns
    • 2:16:50show id which is an integer in that table will be something called
    • 2:16:52and genre which is a text field but it's show id which is an integer
    • 2:16:55going to be one genre at a time and genre which is a text field but it's
    • 2:16:56not multiple and then here i'm going to be one genre at a time
    • 2:16:59specifying a foreign key not multiple and then here i'm
    • 2:17:01will be the show id column which happens specifying a foreign key
    • 2:17:04to refer back to will be the show id column which happens
    • 2:17:05the shows tables ids column to refer back to
    • 2:17:09uh brian question or comment the shows tables ids column
    • 2:17:16brian over to you uh never mind no okay
    • 2:17:19did i just fix the bug brian over to you uh never mind no okay
    • 2:17:20yes okay brian's very good at secretly did i just fix the bug
    • 2:17:22messaging me when i screwed up but i saw yes okay brian's very good at secretly
    • 2:17:23it first so messaging me when i screwed up but i saw
    • 2:17:24well i knew that something was wrong all it first so
    • 2:17:26right so it's a little cryptic but all well i knew that something was wrong all
    • 2:17:28this is doing is implementing for us right so it's a little cryptic but all
    • 2:17:30the equivalent of this picture here i this is doing is implementing for us
    • 2:17:32could have manually typed both of these the equivalent of this picture here i
    • 2:17:34sql commands at that blinking prompt could have manually typed both of these
    • 2:17:36but again no i want to write a program sql commands at that blinking prompt
    • 2:17:38now in python that creates the tables but again no i want to write a program
    • 2:17:40for me and now now in python that creates the tables
    • 2:17:41more interestingly loads the data into for me and now
    • 2:17:45that that database so let's go ahead and more interestingly loads the data into
    • 2:17:47do this now that that database so let's go ahead and
    • 2:17:48i'm not going to select a title from the do this now
    • 2:17:50user because i want to import everything i'm not going to select a title from the
    • 2:17:51i'm not going to use any counting or user because i want to import everything
    • 2:17:53anything like that so let's go ahead and i'm not going to use any counting or
    • 2:17:54just go inside of my loop as before anything like that so let's go ahead and
    • 2:17:57and this time let's go ahead and for just go inside of my loop as before
    • 2:18:00row in reader let's go ahead and get the and this time let's go ahead and for
    • 2:18:02current title as we've always done row in reader let's go ahead and get the
    • 2:18:04but let's also as always go ahead and current title as we've always done
    • 2:18:06strip it of white space but let's also as always go ahead and
    • 2:18:08and capitalize it just to canonicalize strip it of white space
    • 2:18:10it and now i'm going to go ahead and and capitalize it just to canonicalize
    • 2:18:12execute db it and now i'm going to go ahead and
    • 2:18:13execute quote unquote insert into execute db
    • 2:18:17shows the title column the value of execute quote unquote insert into
    • 2:18:20quote-unquote uh title so i want to put shows the title column the value of
    • 2:18:24the title here it turns out quote-unquote uh title so i want to put
    • 2:18:26that sql libraries like ours supports the title here it turns out
    • 2:18:29one a final piece of syntax which is a that sql libraries like ours supports
    • 2:18:31placeholder in c one a final piece of syntax which is a
    • 2:18:32we used percent s in python we just used placeholder in c
    • 2:18:35curly braces and put the word right we used percent s in python we just used
    • 2:18:36there curly braces and put the word right
    • 2:18:37in sql we have a third approach to the there
    • 2:18:39same problem just syntactically in sql we have a third approach to the
    • 2:18:40different but same problem just syntactically
    • 2:18:41uh conceptually the same you put a different but
    • 2:18:43question mark where you want to put a uh conceptually the same you put a
    • 2:18:44placeholder question mark where you want to put a
    • 2:18:45and then outside of this string i'm placeholder
    • 2:18:47going to actually type in and then outside of this string i'm
    • 2:18:49the value that i want to plug into that going to actually type in
    • 2:18:51question mark so this is so similar to the value that i want to plug into that
    • 2:18:53printf in week one question mark so this is so similar to
    • 2:18:54but instead of percent s it's a question printf in week one
    • 2:18:57mark now but instead of percent s it's a question
    • 2:18:58and then a comma separated list of the mark now
    • 2:18:59arguments you want to plug in for those and then a comma separated list of the
    • 2:19:01placeholders arguments you want to plug in for those
    • 2:19:02so now this line of code 16 has just placeholders
    • 2:19:06inserted so now this line of code 16 has just
    • 2:19:07all of those values into my database and inserted
    • 2:19:08let's go ahead and run this before i go all of those values into my database and
    • 2:19:10any further let's go ahead and run this before i go
    • 2:19:10let me go ahead and do this i'm going to any further
    • 2:19:12go ahead now and run python let me go ahead and do this i'm going to
    • 2:19:14stop pie and cross my fingers as always go ahead now and run python
    • 2:19:17it's taking a moment stop pie and cross my fingers as always
    • 2:19:18taking a moment that's because there's a it's taking a moment
    • 2:19:20decent sized file there taking a moment that's because there's a
    • 2:19:22or i screwed up take decent sized file there
    • 2:19:25this is taking too long what did i do or i screwed up take
    • 2:19:28wrong this is taking too long what did i do
    • 2:19:30uh-huh checking wrong
    • 2:19:36okay this is inexplicably taking too
    • 2:19:38long oh interesting okay this is inexplicably taking too
    • 2:19:42it's getting bigger and bigger for some
    • 2:19:44reason it's getting bigger and bigger for some
    • 2:19:46uh oh okay i should have just been more reason
    • 2:19:49patient all right uh oh okay i should have just been more
    • 2:19:50so it just seems my connection's a patient all right
    • 2:19:52little slow so so it just seems my connection's a
    • 2:19:54uh as i expected everything is 100 little slow so
    • 2:19:56correct and it's working fine uh as i expected everything is 100
    • 2:19:57so now let's go ahead and see what i correct and it's working fine
    • 2:19:59actually did if i type ls so now let's go ahead and see what i
    • 2:20:01notice that i have a file called actually did if i type ls
    • 2:20:03shows.db this is brand new because my notice that i have a file called
    • 2:20:05python program created at this time shows.db this is brand new because my
    • 2:20:07let's go ahead and run sqlite3 of python program created at this time
    • 2:20:09shows.db just so i can now see what's let's go ahead and run sqlite3 of
    • 2:20:11inside of it shows.db just so i can now see what's
    • 2:20:12notice that i can do dot schema just to inside of it
    • 2:20:14see what tables exist notice that i can do dot schema just to
    • 2:20:16and indeed the two tables that i created see what tables exist
    • 2:20:19in my python code seem to exist and indeed the two tables that i created
    • 2:20:21but notice that there's if i do select in my python code seem to exist
    • 2:20:24star from shows let's see all the data but notice that there's if i do select
    • 2:20:28voila there is a table that's been star from shows let's see all the data
    • 2:20:30programmatically created voila there is a table that's been
    • 2:20:32and it has notice this time no time programmatically created
    • 2:20:34stamps no genres and it has notice this time no time
    • 2:20:35but it has an id on the left and the stamps no genres
    • 2:20:38title but it has an id on the left and the
    • 2:20:38on the right and amazingly all of the title
    • 2:20:41ids on the right and amazingly all of the
    • 2:20:42are monotonically increasing from 1 on ids
    • 2:20:44up to 513 in this case why is that are monotonically increasing from 1 on
    • 2:20:47well one of the features you get in a up to 513 in this case why is that
    • 2:20:48sql database is if you define a column well one of the features you get in a
    • 2:20:51as being a primary key in sql light sql database is if you define a column
    • 2:20:53it's going to be auto-incremented for as being a primary key in sql light
    • 2:20:55you recall that nowhere in my code it's going to be auto-incremented for
    • 2:20:57did i even have a line an integer you recall that nowhere in my code
    • 2:21:01inputting 1 then 2 then 3. i could did i even have a line an integer
    • 2:21:03absolutely do that inputting 1 then 2 then 3. i could
    • 2:21:04i could have done something like this absolutely do that
    • 2:21:05counter uh rather i could have done something like this
    • 2:21:07i could have done something like this counter uh rather
    • 2:21:09counter equals one i could have done something like this
    • 2:21:11and then down here i could have said uh counter equals one
    • 2:21:13id comma title and then down here i could have said uh
    • 2:21:15give myself two placeholders and then id comma title
    • 2:21:17pass in the counter each time give myself two placeholders and then
    • 2:21:18i could have implemented this myself and pass in the counter each time
    • 2:21:20then on each iteration done counter plus i could have implemented this myself and
    • 2:21:22equals one but with sql databases as then on each iteration done counter plus
    • 2:21:24we've seen equals one but with sql databases as
    • 2:21:25you get a lot more functionality built we've seen
    • 2:21:27in i don't have to do any of that you get a lot more functionality built
    • 2:21:29because if i've declared that id as in i don't have to do any of that
    • 2:21:31being a primary key because if i've declared that id as
    • 2:21:33sqlite is going to insert it for me and being a primary key
    • 2:21:36increment it also for me as well all sqlite is going to insert it for me and
    • 2:21:39right increment it also for me as well all
    • 2:21:39so if i go back to sql light though right
    • 2:21:41notice that i do have ids and titles but so if i go back to sql light though
    • 2:21:43if i select star from notice that i do have ids and titles but
    • 2:21:45genres there's of course nothing there if i select star from
    • 2:21:47yet so how now genres there's of course nothing there
    • 2:21:48do i get all of the genres for each of yet so how now
    • 2:21:50these shows in i need to finish my do i get all of the genres for each of
    • 2:21:52script these shows in i need to finish my
    • 2:21:53so inside of this same loop i have not script
    • 2:21:55only the title so inside of this same loop i have not
    • 2:21:56in my current row but i also have genres only the title
    • 2:22:00in the current row in my current row but i also have genres
    • 2:22:01but the genres are separated by commas in the current row
    • 2:22:04recall that in the csv but the genres are separated by commas
    • 2:22:06next to every title there's a comma recall that in the csv
    • 2:22:07separated list of titles so how do next to every title there's a comma
    • 2:22:09genres how do i get at each genre separated list of titles so how do
    • 2:22:11individually well i'd like to be able to genres how do i get at each genre
    • 2:22:13say for individually well i'd like to be able to
    • 2:22:14genre in row bracket say for
    • 2:22:17genres but this is not going to work genre in row bracket
    • 2:22:20because that's not genres but this is not going to work
    • 2:22:21going to be split up based on those because that's not
    • 2:22:24commas that's literally just going to going to be split up based on those
    • 2:22:25iterate over commas that's literally just going to
    • 2:22:26in fact all of the characters in that iterate over
    • 2:22:28string as we saw last week in fact all of the characters in that
    • 2:22:30but it turns out that strings in python string as we saw last week
    • 2:22:32have a fancy split function but it turns out that strings in python
    • 2:22:34whereby i can split on a comma followed have a fancy split function
    • 2:22:37by a space whereby i can split on a comma followed
    • 2:22:38and what this function will do for me in by a space
    • 2:22:40python is take a comma separated list of and what this function will do for me in
    • 2:22:42genres python is take a comma separated list of
    • 2:22:43and explode it so to speak split it genres
    • 2:22:46on every comma space into a python and explode it so to speak split it
    • 2:22:50list containing genre after on every comma space into a python
    • 2:22:53genre in an actual python list allah list containing genre after
    • 2:22:56square brackets genre in an actual python list allah
    • 2:22:57so now i can iterate over that list square brackets
    • 2:23:00of individual genres and inside of here so now i can iterate over that list
    • 2:23:02i can do db of individual genres and inside of here
    • 2:23:03execute insert into genres i can do db
    • 2:23:06show id genre the values execute insert into genres
    • 2:23:10question mark question mark but huh show id genre the values
    • 2:23:14there's a problem i can definitely plug question mark question mark but huh
    • 2:23:16in the current genre which is this there's a problem i can definitely plug
    • 2:23:19but i need to put something here still in the current genre which is this
    • 2:23:22for that first question mark i need a but i need to put something here still
    • 2:23:24value for the show id for that first question mark i need a
    • 2:23:26how do i know what the id is of the value for the show id
    • 2:23:29current tv show well it turns out the how do i know what the id is of the
    • 2:23:31library can help you with this current tv show well it turns out the
    • 2:23:33when you insert new rows into a table library can help you with this
    • 2:23:36that has a primary key when you insert new rows into a table
    • 2:23:38it turns out that most libraries will that has a primary key
    • 2:23:40return you that value in some way it turns out that most libraries will
    • 2:23:42and if i go back to line 15 and i return you that value in some way
    • 2:23:45actually store the return and if i go back to line 15 and i
    • 2:23:46value of db execute after using insert actually store the return
    • 2:23:50my the library will tell me what was the value of db execute after using insert
    • 2:23:53integer that was just used for this my the library will tell me what was the
    • 2:23:55given show maybe it's one two three integer that was just used for this
    • 2:23:57i don't have to know or care as the given show maybe it's one two three
    • 2:23:58programmer but the return value i can i don't have to know or care as the
    • 2:24:00store in a variable programmer but the return value i can
    • 2:24:01and then down here i can literally put store in a variable
    • 2:24:04that same id and then down here i can literally put
    • 2:24:06so that now if i am inputting the office that same id
    • 2:24:08whose id is one so that now if i am inputting the office
    • 2:24:09into the shows table and its genres are whose id is one
    • 2:24:12comedy drama into the shows table and its genres are
    • 2:24:13romance i can now inside of this for comedy drama
    • 2:24:15loop this nested for loop romance i can now inside of this for
    • 2:24:17insert one followed by comedy loop this nested for loop
    • 2:24:20one followed by drama one followed by insert one followed by comedy
    • 2:24:23romance one followed by drama one followed by
    • 2:24:24three rows all at once and so now let's romance
    • 2:24:27go back down here three rows all at once and so now let's
    • 2:24:29into my terminal window let me remove go back down here
    • 2:24:31the old shows.db with rm into my terminal window let me remove
    • 2:24:33just to start fresh let me go ahead and the old shows.db with rm
    • 2:24:35rerun just to start fresh let me go ahead and
    • 2:24:37python of favorites.pie i'll be more rerun
    • 2:24:39patient this time because python of favorites.pie i'll be more
    • 2:24:41cloud's being a little slow so it's patient this time because
    • 2:24:43doing some thinking and in fact there's cloud's being a little slow so it's
    • 2:24:44more work being done now doing some thinking and in fact there's
    • 2:24:46at this point in the story my program is more work being done now
    • 2:24:48presumably iterating over all of the at this point in the story my program is
    • 2:24:50rows presumably iterating over all of the
    • 2:24:51in the csv and it's inserting into rows
    • 2:24:54the shows table one at a time and then in the csv and it's inserting into
    • 2:24:56it's inserting the shows table one at a time and then
    • 2:24:57one or more genres into it's inserting
    • 2:25:00the genres table and one or more genres into
    • 2:25:04if we keep going and going let me let the genres table and
    • 2:25:07that if we keep going and going let me let
    • 2:25:07do its thing there let me let it that
    • 2:25:10do its thing there let me let it do its do its thing there let me let it
    • 2:25:14thing there do its thing there let me let it do its
    • 2:25:15it's a little slow if we're on a faster thing there
    • 2:25:17system or if i were doing it on my own it's a little slow if we're on a faster
    • 2:25:18mac or pc it would probably go system or if i were doing it on my own
    • 2:25:20down more quickly but you can see here mac or pc it would probably go
    • 2:25:22an example of why i used the dot import down more quickly but you can see here
    • 2:25:24command in the first place that an example of why i used the dot import
    • 2:25:25automated some of this process but command in the first place that
    • 2:25:27unfortunately it didn't allow me to automated some of this process but
    • 2:25:28change unfortunately it didn't allow me to
    • 2:25:29the format of my data but the key point change
    • 2:25:31to make here the format of my data but the key point
    • 2:25:32is that even though this is taking a to make here
    • 2:25:34little bit of time to insert these is that even though this is taking a
    • 2:25:35hundreds of rows all at once little bit of time to insert these
    • 2:25:37i'm only going to have to do this once hundreds of rows all at once
    • 2:25:39and was asked a bit ago i'm only going to have to do this once
    • 2:25:41was the performance of this it turns out and was asked a bit ago
    • 2:25:43that now that we have full control was the performance of this it turns out
    • 2:25:46over the sql database it turns out we're that now that we have full control
    • 2:25:48going to have the ability to over the sql database it turns out we're
    • 2:25:50to actually um improve the performance going to have the ability to
    • 2:25:53thereof and i really don't want to keep to actually um improve the performance
    • 2:25:55stalling here i really just want this thereof and i really don't want to keep
    • 2:25:57thing to finish i should have used a stalling here i really just want this
    • 2:25:58faster computer because this does not thing to finish i should have used a
    • 2:26:00take nearly as long on faster computer because this does not
    • 2:26:02some other systems but let me just stall take nearly as long on
    • 2:26:05for a few more seconds some other systems but let me just stall
    • 2:26:07stalling stalling stalling stalling for a few more seconds
    • 2:26:11well let me uh suspend our stalling stalling stalling stalling
    • 2:26:14suspense for just a moment and use the well let me uh suspend our
    • 2:26:17time wisely let me switch over suspense for just a moment and use the
    • 2:26:19oh okay as expected it finished right on time wisely let me switch over
    • 2:26:22time and let me go ahead now and run sql oh okay as expected it finished right on
    • 2:26:24light 3 time and let me go ahead now and run sql
    • 2:26:25on shows.db all right so now i'm back in light 3
    • 2:26:28my raw sql environment on shows.db all right so now i'm back in
    • 2:26:30if i do select star from shows which i my raw sql environment
    • 2:26:32did before if i do select star from shows which i
    • 2:26:33we'll see all of this as before if i did before
    • 2:26:35select star from shows where we'll see all of this as before if i
    • 2:26:37title equals quote unquote the office select star from shows where
    • 2:26:39i'll see the actual title equals quote unquote the office
    • 2:26:40unique ids of all of those we didn't i'll see the actual
    • 2:26:42bother eliminating duplicates we just unique ids of all of those we didn't
    • 2:26:44kept everything as bother eliminating duplicates we just
    • 2:26:45is but we gave everything a unique id kept everything as
    • 2:26:47but if i now do select star is but we gave everything a unique id
    • 2:26:49from um genres but if i now do select star
    • 2:26:53we'll see all of the values there and from um genres
    • 2:26:54notice the key detail we'll see all of the values there and
    • 2:26:56there is only one genre per row here notice the key detail
    • 2:27:00and so we can ultimately line those up there is only one genre per row here
    • 2:27:02with our titles and our titles here we and so we can ultimately line those up
    • 2:27:04had all of these here with our titles and our titles here we
    • 2:27:12um something's wrong had all of these here
    • 2:27:16sorry let me think for a while i want to um something's wrong
    • 2:27:19get this right sorry let me think for a while i want to
    • 2:27:19let's go ahead and take our second and get this right
    • 2:27:20final five-minute break here and we'll let's go ahead and take our second and
    • 2:27:22come back and i will explain final five-minute break here and we'll
    • 2:27:23what's going on come back and i will explain
    • 2:32:04foreign
    • 2:32:08all right we are back and just before we
    • 2:32:11broke up my own self-doubt was starting all right we are back and just before we
    • 2:32:13to creep in but i'm happy to say with no broke up my own self-doubt was starting
    • 2:32:15fancy magic behind the scenes everything to creep in but i'm happy to say with no
    • 2:32:17was actually working fine i was just fancy magic behind the scenes everything
    • 2:32:19doubting was actually working fine i was just
    • 2:32:19the correctness of this if i do select doubting
    • 2:32:21star from shows the correctness of this if i do select
    • 2:32:22i indeed get back two columns one with star from shows
    • 2:32:24the unique id i indeed get back two columns one with
    • 2:32:26the so-called primary key followed by the unique id
    • 2:32:28the title of each of those shows the so-called primary key followed by
    • 2:32:30and if i similarly search for star from the title of each of those shows
    • 2:32:33genres and if i similarly search for star from
    • 2:32:34i get single genres at a time but on the genres
    • 2:32:36left hand side are not primary i get single genres at a time but on the
    • 2:32:38keys per se but now those same numbers left hand side are not primary
    • 2:32:41here in this context called foreign keys keys per se but now those same numbers
    • 2:32:43that map one to the other so for here in this context called foreign keys
    • 2:32:45instance whatever show that map one to the other so for
    • 2:32:47512 is has five different genres instance whatever show
    • 2:32:51associated with it and in fact if i 512 is has five different genres
    • 2:32:52don't go back a moment to shows associated with it and in fact if i
    • 2:32:54it looks like game of thrones was don't go back a moment to shows
    • 2:32:56decided by one of you it looks like game of thrones was
    • 2:32:57as belonging in thriller history decided by one of you
    • 2:33:00adventure as belonging in thriller history
    • 2:33:00action and war as well adventure
    • 2:33:04those five so now this is what's meant action and war as well
    • 2:33:06by relational database you have this those five so now this is what's meant
    • 2:33:08relation or relationship by relational database you have this
    • 2:33:10across multiple tables that link some relation or relationship
    • 2:33:12data in one across multiple tables that link some
    • 2:33:13to some other data in the like the catch data in one
    • 2:33:16though is that it would seem a little to some other data in the like the catch
    • 2:33:17harder now to answer though is that it would seem a little
    • 2:33:18questions because now i have to kind of harder now to answer
    • 2:33:20query two tables or questions because now i have to kind of
    • 2:33:22execute two separate queries and then query two tables or
    • 2:33:23combine the data but that's not actually execute two separate queries and then
    • 2:33:25the case combine the data but that's not actually
    • 2:33:26suppose that i want to answer the the case
    • 2:33:28question of what are all of the musicals suppose that i want to answer the
    • 2:33:31among your favorite tv shows question of what are all of the musicals
    • 2:33:33i can't select just the shows because among your favorite tv shows
    • 2:33:35there's no genres in there anymore i can't select just the shows because
    • 2:33:36but i also can't select just the genres there's no genres in there anymore
    • 2:33:38table because there's no but i also can't select just the genres
    • 2:33:39titles in there but there is a value table because there's no
    • 2:33:42that's bridging titles in there but there is a value
    • 2:33:43one in the other that foreign key to that's bridging
    • 2:33:45primary key relationship one in the other that foreign key to
    • 2:33:47so you know what i can do off the top of primary key relationship
    • 2:33:49my head i'm pretty sure i can select all so you know what i can do off the top of
    • 2:33:51of the show ids my head i'm pretty sure i can select all
    • 2:33:52from the genres table where a specific of the show ids
    • 2:33:55genre equals quote unquote musical from the genres table where a specific
    • 2:33:57and i don't have to worry about commas genre equals quote unquote musical
    • 2:33:58or spaces now because again in this new and i don't have to worry about commas
    • 2:34:01version or spaces now because again in this new
    • 2:34:01that i have designed programmatically version
    • 2:34:03with code musical and every other genre that i have designed programmatically
    • 2:34:06is just a single word with code musical and every other genre
    • 2:34:07if i hit enter all of these show ids is just a single word
    • 2:34:10were decided by you all as belonging to if i hit enter all of these show ids
    • 2:34:13musicals but now this is not interesting were decided by you all as belonging to
    • 2:34:15and i certainly don't want to execute 10 musicals but now this is not interesting
    • 2:34:17or so queries manually and i certainly don't want to execute 10
    • 2:34:18to look up every one of those ids but or so queries manually
    • 2:34:20notice what we can do in sql as well to look up every one of those ids but
    • 2:34:23i can nest queries let me put this whole notice what we can do in sql as well
    • 2:34:25query in parentheses for just a moment i can nest queries let me put this whole
    • 2:34:27and then prepend to it the following query in parentheses for just a moment
    • 2:34:29select title and then prepend to it the following
    • 2:34:30from shows where the primary key select title
    • 2:34:33id is in this sub query from shows where the primary key
    • 2:34:37so you can have nested queries similar id is in this sub query
    • 2:34:39in spirit a bit like in python and see so you can have nested queries similar
    • 2:34:41when you have nested for loops in spirit a bit like in python and see
    • 2:34:43in this case just like in grade school when you have nested for loops
    • 2:34:44math whatever's in the parentheses will in this case just like in grade school
    • 2:34:46be executed first math whatever's in the parentheses will
    • 2:34:47then the outer query will be executed be executed first
    • 2:34:50using then the outer query will be executed
    • 2:34:50the results of that inner query so if i using
    • 2:34:53select the title from shows where the id the results of that inner query so if i
    • 2:34:55is in select the title from shows where the id
    • 2:34:55that list of ids voila it seems that is in
    • 2:34:59somewhat amusingly several of you think that list of ids voila it seems that
    • 2:35:02that breaking bad supernatural glee somewhat amusingly several of you think
    • 2:35:04sherlock how i met your mother hawaii that breaking bad supernatural glee
    • 2:35:055-0 twin peaks the lawyer sherlock how i met your mother hawaii
    • 2:35:07and my brother my brother and me are all 5-0 twin peaks the lawyer
    • 2:35:09musicals i and my brother my brother and me are all
    • 2:35:10take exception to a few of those but so musicals i
    • 2:35:12be it you check the box for musical for take exception to a few of those but so
    • 2:35:14those shows be it you check the box for musical for
    • 2:35:15so even though we've sort of done things those shows
    • 2:35:17we've designed things better in the so even though we've sort of done things
    • 2:35:19sense that we've normalized our database we've designed things better in the
    • 2:35:21by factoring out commonalities or rather sense that we've normalized our database
    • 2:35:23we've cleaned up the data by factoring out commonalities or rather
    • 2:35:25there's still admittedly some redundancy we've cleaned up the data
    • 2:35:27there's still admittedly some redundancy there's still admittedly some redundancy
    • 2:35:29but i at least now have the data there's still admittedly some redundancy
    • 2:35:33in clean fashion so that every column but i at least now have the data
    • 2:35:36has just a single value in it and not in clean fashion so that every column
    • 2:35:38some contrived comma separated list has just a single value in it and not
    • 2:35:40suppose i want to find out all of the some contrived comma separated list
    • 2:35:41genres that you all thought the office suppose i want to find out all of the
    • 2:35:43was in so let's genres that you all thought the office
    • 2:35:44ask kind of the opposite question well was in so let's
    • 2:35:45how might i do that well to figure out ask kind of the opposite question well
    • 2:35:48the office i'm going to first need to how might i do that well to figure out
    • 2:35:49select the office i'm going to first need to
    • 2:35:50the id from shows where title select
    • 2:35:53equals quote unquote the office because the id from shows where title
    • 2:35:55a whole bunch of you equals quote unquote the office because
    • 2:35:56typed in the office and we gave each of a whole bunch of you
    • 2:35:58your answers a unique identifier so we typed in the office and we gave each of
    • 2:36:00could keep track of it your answers a unique identifier so we
    • 2:36:01and there's all of those numbers now could keep track of it
    • 2:36:02this is like dozens of responses i and there's all of those numbers now
    • 2:36:04certainly don't want to execute that this is like dozens of responses i
    • 2:36:05many queries certainly don't want to execute that
    • 2:36:06but i think a sub query will help us out many queries
    • 2:36:08again let me put parentheses around this but i think a sub query will help us out
    • 2:36:10whole thing again let me put parentheses around this
    • 2:36:11and now let me say select distinct whole thing
    • 2:36:14genre from genres where and now let me say select distinct
    • 2:36:18the show id in the genres table is in genre from genres where
    • 2:36:21that query and just for kicks let me go the show id in the genres table is in
    • 2:36:24ahead and order by that query and just for kicks let me go
    • 2:36:25uh genre so let me go ahead and execute ahead and order by
    • 2:36:28this uh genre so let me go ahead and execute
    • 2:36:29and okay somewhat amusingly those of you this
    • 2:36:31who inputed the office and okay somewhat amusingly those of you
    • 2:36:32checked boxes for animation comedy who inputed the office
    • 2:36:35documentary drama family checked boxes for animation comedy
    • 2:36:36horror reality tv romance and sci-fi i documentary drama family
    • 2:36:39take a horror reality tv romance and sci-fi i
    • 2:36:40section exception to a few of those too take a
    • 2:36:42but this is what happens when you accept section exception to a few of those too
    • 2:36:43user but this is what happens when you accept
    • 2:36:44input so here again we have with this user
    • 2:36:47sql input so here again we have with this
    • 2:36:48language the ability to express fairly sql
    • 2:36:49succinctly even though it's a lot of new language the ability to express fairly
    • 2:36:52features today all at once what would succinctly even though it's a lot of new
    • 2:36:54otherwise take me a dozen or two lines features today all at once what would
    • 2:36:56in python code to implement and god otherwise take me a dozen or two lines
    • 2:36:58knows how many lines of code in python code to implement and god
    • 2:36:59and how many hours it would take me to knows how many lines of code
    • 2:37:00implement something like this in and how many hours it would take me to
    • 2:37:02c now admittedly we could do better than implement something like this in
    • 2:37:05this design c now admittedly we could do better than
    • 2:37:06this table or this picture represents this design
    • 2:37:08what we have now this table or this picture represents
    • 2:37:09but you'll notice a lot of redundancy what we have now
    • 2:37:11implicit in the genres table but you'll notice a lot of redundancy
    • 2:37:13anytime you check the comedy box i have implicit in the genres table
    • 2:37:15a row now that says comedy anytime you check the comedy box i have
    • 2:37:17comedy comedy comedy and the show id a row now that says comedy
    • 2:37:20differs but i have the word comedy again comedy comedy comedy and the show id
    • 2:37:22and again differs but i have the word comedy again
    • 2:37:23and now that tends to be frowned upon in and again
    • 2:37:25the world of relational databases and now that tends to be frowned upon in
    • 2:37:27because if you have a the world of relational databases
    • 2:37:28genre called called comedy or one called because if you have a
    • 2:37:30musical or anything else genre called called comedy or one called
    • 2:37:32you should ideally just have that living musical or anything else
    • 2:37:34in one place and so if we really wanted you should ideally just have that living
    • 2:37:36to be particular in one place and so if we really wanted
    • 2:37:38and really truly normalize this database to be particular
    • 2:37:40which is an academic term referring to and really truly normalize this database
    • 2:37:42removing all such redundancies which is an academic term referring to
    • 2:37:44we could actually do it like this we removing all such redundancies
    • 2:37:46could have a shows table still with an we could actually do it like this we
    • 2:37:48id and title could have a shows table still with an
    • 2:37:49no difference there but we could have a id and title
    • 2:37:51genres table no difference there but we could have a
    • 2:37:52with two columns id and name now this is genres table
    • 2:37:55its own id it has no connection with the with two columns id and name now this is
    • 2:37:57show id it's just its own unique its own id it has no connection with the
    • 2:37:59identifier show id it's just its own unique
    • 2:38:00a primary key here now and the name of identifier
    • 2:38:02that genre so you would have one row in a primary key here now and the name of
    • 2:38:04the genres table for comedy for drama that genre so you would have one row in
    • 2:38:06music musical and everything else the genres table for comedy for drama
    • 2:38:08and then you would use a third table music musical and everything else
    • 2:38:10which is colloquially called a join and then you would use a third table
    • 2:38:12table which i'll draw here in the middle which is colloquially called a join
    • 2:38:15and you can call it anything you want table which i'll draw here in the middle
    • 2:38:16but we've drawn called it shows and you can call it anything you want
    • 2:38:18underscore genres to make clear that but we've drawn called it shows
    • 2:38:20this table implements a relationship underscore genres to make clear that
    • 2:38:21between those two this table implements a relationship
    • 2:38:23tables and notice that in this table is between those two
    • 2:38:26really no juicy data tables and notice that in this table is
    • 2:38:28it's just foreign keys show id genre id really no juicy data
    • 2:38:32and by having this third table we can it's just foreign keys show id genre id
    • 2:38:35now make sure that the word comedy only and by having this third table we can
    • 2:38:37appears in one row now make sure that the word comedy only
    • 2:38:38anywhere the word musical only appears appears in one row
    • 2:38:40in one row anywhere anywhere the word musical only appears
    • 2:38:42but we use these more efficient integers in one row anywhere
    • 2:38:44called show id and genre id but we use these more efficient integers
    • 2:38:46which respectively point to those called show id and genre id
    • 2:38:49primary keys which respectively point to those
    • 2:38:50in their primary tables to link those primary keys
    • 2:38:52two together in their primary tables to link those
    • 2:38:53and this is an example of what's called two together
    • 2:38:55in the world of databases a many-to-many and this is an example of what's called
    • 2:38:57relationship in the world of databases a many-to-many
    • 2:38:58one show can have many genres one relationship
    • 2:39:01genre can belong to many shows and so by one show can have many genres one
    • 2:39:04having this third table you can have genre can belong to many shows and so by
    • 2:39:05that many-to-many relationship having this third table you can have
    • 2:39:07and again the third table now allows us that many-to-many relationship
    • 2:39:09to truly normalize our data set and again the third table now allows us
    • 2:39:11by getting rid of all of the duplicate to truly normalize our data set
    • 2:39:13comedy comedy comedy by getting rid of all of the duplicate
    • 2:39:15why is this important probably not a comedy comedy comedy
    • 2:39:17huge deal for genres why is this important probably not a
    • 2:39:18but imagine with my current design if i huge deal for genres
    • 2:39:20want if i made a spelling mistake and i but imagine with my current design if i
    • 2:39:22misnamed want if i made a spelling mistake and i
    • 2:39:23comedy i would now have to change every misnamed
    • 2:39:25row with the word comedy again and again comedy i would now have to change every
    • 2:39:27or if maybe you change the cat the row with the word comedy again and again
    • 2:39:29genres of the shows or if maybe you change the cat the
    • 2:39:30you have to change it in multiple places genres of the shows
    • 2:39:33but with this other approach with three you have to change it in multiple places
    • 2:39:34tables but with this other approach with three
    • 2:39:35you can argue that now you only have to tables
    • 2:39:37change the name of a genre you can argue that now you only have to
    • 2:39:38in one place not all over the place and change the name of a genre
    • 2:39:41that in general in c in one place not all over the place and
    • 2:39:42and now in python and now sql has that in general in c
    • 2:39:44generally been a good thing and now in python and now sql has
    • 2:39:45not to copy-paste identical values all generally been a good thing
    • 2:39:49over the place all right so with that not to copy-paste identical values all
    • 2:39:52said over the place all right so with that
    • 2:39:53what other tools do we have at our said
    • 2:39:55disposal well it turns out what other tools do we have at our
    • 2:39:57that there are other data types out disposal well it turns out
    • 2:39:59there in the real world that there are other data types out
    • 2:40:00using sql besides just these five blob there in the real world
    • 2:40:03integer numeric using sql besides just these five blob
    • 2:40:04real and text blob again is for binary integer numeric
    • 2:40:06stuff generally not used except for more real and text blob again is for binary
    • 2:40:08specialized applications let's say stuff generally not used except for more
    • 2:40:10integer which is an int typically 32 specialized applications let's say
    • 2:40:12bits numeric which is something like a integer which is an int typically 32
    • 2:40:13date or a year or time bits numeric which is something like a
    • 2:40:15or something like that real numbers date or a year or time
    • 2:40:17which are floating point values and text or something like that real numbers
    • 2:40:19which are things like strings but if you which are floating point values and text
    • 2:40:22graduate ultimately from sql lite which are things like strings but if you
    • 2:40:24on phones and on macs and pcs to actual graduate ultimately from sql lite
    • 2:40:27servers that run on phones and on macs and pcs to actual
    • 2:40:28oracle mysql and postgres if you're servers that run
    • 2:40:30actually running your own oracle mysql and postgres if you're
    • 2:40:32internet style business well it turns actually running your own
    • 2:40:34out that internet style business well it turns
    • 2:40:35more sophisticated even more powerful out that
    • 2:40:38databases more sophisticated even more powerful
    • 2:40:39come with other subtypes if you will so databases
    • 2:40:42besides integer you can specify small come with other subtypes if you will so
    • 2:40:44int for small numbers maybe using just a besides integer you can specify small
    • 2:40:46few bits instead of int for small numbers maybe using just a
    • 2:40:4832 integer itself or bigint which uses few bits instead of
    • 2:40:5164 bits instead of 32. 32 integer itself or bigint which uses
    • 2:40:53the facebooks the twitters of the world 64 bits instead of 32.
    • 2:40:54need to use big ants a lot because they the facebooks the twitters of the world
    • 2:40:56have so much need to use big ants a lot because they
    • 2:40:57data you and i can get away with simple have so much
    • 2:40:59integers because we're not going to have data you and i can get away with simple
    • 2:41:01more than 4 billion favorite tv shows in integers because we're not going to have
    • 2:41:03a class certainly something like real more than 4 billion favorite tv shows in
    • 2:41:05you can have 32-bit real numbers or a a class certainly something like real
    • 2:41:07little weirdly named you can have 32-bit real numbers or a
    • 2:41:08double precision which is like a double little weirdly named
    • 2:41:11was in c double precision which is like a double
    • 2:41:12using 64 bits instead for more precision was in c
    • 2:41:14numeric is kind of this catch-all you using 64 bits instead for more precision
    • 2:41:16can have not only dates and date times numeric is kind of this catch-all you
    • 2:41:18but things like boolean values can have not only dates and date times
    • 2:41:20you can specify the total number of but things like boolean values
    • 2:41:22digits to store using this numeric scale you can specify the total number of
    • 2:41:24and precision so it relates to digits to store using this numeric scale
    • 2:41:26numbers that aren't just quite integers and precision so it relates to
    • 2:41:28and then you also have categories of numbers that aren't just quite integers
    • 2:41:30text and then you also have categories of
    • 2:41:31char followed by a number which text
    • 2:41:33specifies that char followed by a number which
    • 2:41:34every value in the column will have the specifies that
    • 2:41:36same number of characters every value in the column will have the
    • 2:41:38that's helpful for things where you know same number of characters
    • 2:41:39the length in advance like in the u.s that's helpful for things where you know
    • 2:41:41all states all 50 states have two digit the length in advance like in the u.s
    • 2:41:43code or two character codes all states all 50 states have two digit
    • 2:41:45like m a for massachusetts ca for code or two character codes
    • 2:41:47california like m a for massachusetts ca for
    • 2:41:48char two would be appropriate there california
    • 2:41:51because you know every value in the char two would be appropriate there
    • 2:41:52column is going to have two because you know every value in the
    • 2:41:53characters when you don't know though column is going to have two
    • 2:41:55you can use varchar characters when you don't know though
    • 2:41:56and varchar specifies a maximum number you can use varchar
    • 2:41:58of characters and varchar specifies a maximum number
    • 2:41:59and so you might specify varchar of like of characters
    • 2:42:0232 and so you might specify varchar of like
    • 2:42:03no one might be able to type in a name 32
    • 2:42:05that's longer than 32 characters or no one might be able to type in a name
    • 2:42:07varchar 200 if you want to allow for that's longer than 32 characters or
    • 2:42:09something even bigger varchar 200 if you want to allow for
    • 2:42:10but this is germane to our real world something even bigger
    • 2:42:11experience with the web if you've ever but this is germane to our real world
    • 2:42:13gone to a website experience with the web if you've ever
    • 2:42:14start filling out a form and all of a gone to a website
    • 2:42:16sudden you can't type any more start filling out a form and all of a
    • 2:42:17characters your response is too long sudden you can't type any more
    • 2:42:19why is that well one the programmers characters your response is too long
    • 2:42:21just might not want you to keep why is that well one the programmers
    • 2:42:23expressing yourself in more detail just might not want you to keep
    • 2:42:24especially if it's like a complaint form expressing yourself in more detail
    • 2:42:26on a customer service site especially if it's like a complaint form
    • 2:42:27but pragmatically it's probably because on a customer service site
    • 2:42:30their database but pragmatically it's probably because
    • 2:42:31was designed to store a finite number of their database
    • 2:42:33characters and you have hit that was designed to store a finite number of
    • 2:42:34threshold characters and you have hit that
    • 2:42:35and you certainly don't want to have a threshold
    • 2:42:36buffer overflow like in c and you certainly don't want to have a
    • 2:42:38so the database won't force a maximum buffer overflow like in c
    • 2:42:40value n and then text is for even bigger so the database won't force a maximum
    • 2:42:43chunks of text if you're letting people value n and then text is for even bigger
    • 2:42:44copy paste their resumes or whole chunks of text if you're letting people
    • 2:42:46documents copy paste their resumes or whole
    • 2:42:47or even larger sets of text you might documents
    • 2:42:49use text instead or even larger sets of text you might
    • 2:42:51so let's then consider a real world data use text instead
    • 2:42:54set so let's then consider a real world data
    • 2:42:54things get really interesting and all of set
    • 2:42:56these very academic ideas and things get really interesting and all of
    • 2:42:58recommendations these very academic ideas and
    • 2:42:59really come into play when we don't have recommendations
    • 2:43:01hundreds of favorites really come into play when we don't have
    • 2:43:02but when we have uh thousands instead hundreds of favorites
    • 2:43:06and so what i'm going to go ahead and do but when we have uh thousands instead
    • 2:43:08here is download a file here and so what i'm going to go ahead and do
    • 2:43:10give me just a moment to grab it from here is download a file here
    • 2:43:12the course's website give me just a moment to grab it from
    • 2:43:14i'm going to go ahead and download a the course's website
    • 2:43:15file from today which is a i'm going to go ahead and download a
    • 2:43:18sequel light version of the imdb file from today which is a
    • 2:43:21internet movie database that some of you sequel light version of the imdb
    • 2:43:22might have used in website form in order internet movie database that some of you
    • 2:43:24to look up movies and ratings thereof might have used in website form in order
    • 2:43:27and the like and what we've done in to look up movies and ratings thereof
    • 2:43:28advance is we wrote a script and the like and what we've done in
    • 2:43:30i wrote a script that downloaded all of advance is we wrote a script
    • 2:43:32that information in advance i wrote a script that downloaded all of
    • 2:43:34as tsv files it turns out that they that information in advance
    • 2:43:37internet movie database make it as tsv files it turns out that they
    • 2:43:38available internet movie database make it
    • 2:43:39all of their data available as tsv files available
    • 2:43:42tab separated values all of their data available as tsv files
    • 2:43:44and we went ahead and imported it with a tab separated values
    • 2:43:46script and we went ahead and imported it with a
    • 2:43:47called shows.db script
    • 2:43:50as follows so i'm going to go ahead in called shows.db
    • 2:43:52just a moment and open up shows.db which as follows so i'm going to go ahead in
    • 2:43:54is not the version i created earlier just a moment and open up shows.db which
    • 2:43:56based on your favorites is not the version i created earlier
    • 2:43:58this is now the version that we the based on your favorites
    • 2:43:59staff created in advance this is now the version that we the
    • 2:44:01by downloading hundreds of thousands of staff created in advance
    • 2:44:04movies and tv shows and actors and by downloading hundreds of thousands of
    • 2:44:06directors from imdb.com under their movies and tv shows and actors and
    • 2:44:08license directors from imdb.com under their
    • 2:44:09and then imported into a sqlite database license
    • 2:44:13so how can i see what's in here well let and then imported into a sqlite database
    • 2:44:14me go ahead and type dot schema recall so how can i see what's in here well let
    • 2:44:16and you'll see a whole bunch of data me go ahead and type dot schema recall
    • 2:44:19therein and you'll see a whole bunch of data
    • 2:44:19and in fact in pictorial form it therein
    • 2:44:21actually looks like this here's a and in fact in pictorial form it
    • 2:44:23picture that just gives you the lay of actually looks like this here's a
    • 2:44:24the land picture that just gives you the lay of
    • 2:44:25there's going to be a people table that the land
    • 2:44:27has an id for every person there's going to be a people table that
    • 2:44:29a name and their birth year uh there's has an id for every person
    • 2:44:31going to be a shows table just like a name and their birth year uh there's
    • 2:44:33we've been talking which is going to be a shows table just like
    • 2:44:34ids titles of shows also though the year we've been talking which is
    • 2:44:37that the show debuted and the number of ids titles of shows also though the year
    • 2:44:39episodes that the show had that the show debuted and the number of
    • 2:44:40then there's going to be genres similar episodes that the show had
    • 2:44:42in design to before so we didn't go all then there's going to be genres similar
    • 2:44:44out and factor it out into a third table in design to before so we didn't go all
    • 2:44:47we just have some duplication here out and factor it out into a third table
    • 2:44:48admittedly in genres we just have some duplication here
    • 2:44:50but then there's a ratings table and admittedly in genres
    • 2:44:51here's where you can see where but then there's a ratings table and
    • 2:44:52relational databases get interesting here's where you can see where
    • 2:44:54you can have a ratings table storing relational databases get interesting
    • 2:44:56ratings like one to five you can have a ratings table storing
    • 2:44:58uh but also associate those ratings with ratings like one to five
    • 2:45:00a show by way of its show id uh but also associate those ratings with
    • 2:45:02and then you can keep track of the a show by way of its show id
    • 2:45:03number of votes that that show got and then you can keep track of the
    • 2:45:05uh writers notice is a separate table number of votes that that show got
    • 2:45:08and notice this is kind of cool uh writers notice is a separate table
    • 2:45:09this table per the arrows relates to and notice this is kind of cool
    • 2:45:13the shows table and the people table this table per the arrows relates to
    • 2:45:16because this is a join table a foreign the shows table and the people table
    • 2:45:18key of show id and a foreign key of because this is a join table a foreign
    • 2:45:20person id key of show id and a foreign key of
    • 2:45:21refer to the shows table and the people person id
    • 2:45:24table respectively refer to the shows table and the people
    • 2:45:25so that a human uh person can be a table respectively
    • 2:45:28writer for multiple shows and so that a human uh person can be a
    • 2:45:30one show can have multiple writers writer for multiple shows and
    • 2:45:32another many-to-many relationship one show can have multiple writers
    • 2:45:33and then lastly stars the actors in a another many-to-many relationship
    • 2:45:36show notice that this too is a join and then lastly stars the actors in a
    • 2:45:38table it's only got two foreign keys a show notice that this too is a join
    • 2:45:40show id table it's only got two foreign keys a
    • 2:45:40and a person id that are referring back show id
    • 2:45:43to those tables respectively and a person id that are referring back
    • 2:45:44and here's where it really makes sense a to those tables respectively
    • 2:45:46relational database it would be pretty and here's where it really makes sense a
    • 2:45:48stupid and bad design relational database it would be pretty
    • 2:45:49if you had names of all of the directors stupid and bad design
    • 2:45:53and names of all of the writers and if you had names of all of the directors
    • 2:45:55names of all of the stars of these shows and names of all of the writers and
    • 2:45:57in separate tables in duplicate like names of all of the stars of these shows
    • 2:45:59steve carell steve carell steve carell in separate tables in duplicate like
    • 2:46:01all of those actors and directors and steve carell steve carell steve carell
    • 2:46:03writers and every other all of those actors and directors and
    • 2:46:05role in the business are just people at writers and every other
    • 2:46:07the end of the day so in a relational role in the business are just people at
    • 2:46:09database the advice would be to put all the end of the day so in a relational
    • 2:46:11of those people in a people table database the advice would be to put all
    • 2:46:13and then use primary and foreign keys to of those people in a people table
    • 2:46:16refer to to relate them to and then use primary and foreign keys to
    • 2:46:18these other types of tables the catch is refer to to relate them to
    • 2:46:22though that when we do this these other types of tables the catch is
    • 2:46:24it turns out that things can be slow though that when we do this
    • 2:46:27when we have lots of data so for it turns out that things can be slow
    • 2:46:29instance let me go into this when we have lots of data so for
    • 2:46:30let me go ahead and select star from instance let me go into this
    • 2:46:32shows semicolon let me go ahead and select star from
    • 2:46:34that's a lot of data it's pretty fast on shows semicolon
    • 2:46:36my mac and i switch from the ide to my that's a lot of data it's pretty fast on
    • 2:46:38mac just to save time because it's a my mac and i switch from the ide to my
    • 2:46:39little faster doing things locally mac just to save time because it's a
    • 2:46:41instead of in the cloud little faster doing things locally
    • 2:46:42let me go ahead and count the number of instead of in the cloud
    • 2:46:43shows in this imdb let me go ahead and count the number of
    • 2:46:45database by using count 153 shows in this imdb
    • 2:46:48331 tv shows so that's a lot how about database by using count 153
    • 2:46:52the count 331 tv shows so that's a lot how about
    • 2:46:52of people uh from the people table the count
    • 2:46:56uh 457 886 people who might be stars or of people uh from the people table
    • 2:47:01writers or some other role as well so uh 457 886 people who might be stars or
    • 2:47:03this is a sizable data set writers or some other role as well so
    • 2:47:05so let me go ahead and do something this is a sizable data set
    • 2:47:06simple though let me go ahead and select so let me go ahead and do something
    • 2:47:08star from simple though let me go ahead and select
    • 2:47:08shows where title equals the office star from
    • 2:47:11and this time i don't have to worry shows where title equals the office
    • 2:47:13about weird capitalization or spacing and this time i don't have to worry
    • 2:47:15this is imdb this is clean about weird capitalization or spacing
    • 2:47:17data from an authoritative source notice this is imdb this is clean
    • 2:47:19that there's actually different versions data from an authoritative source notice
    • 2:47:21of the office you probably know the uk that there's actually different versions
    • 2:47:22one and the us one of the office you probably know the uk
    • 2:47:23there's other shows that are unrelated one and the us one
    • 2:47:25to that particular there's other shows that are unrelated
    • 2:47:26type of show but each of them is to that particular
    • 2:47:29distinguished notice by the year type of show but each of them is
    • 2:47:31here all right so that's kind of a lot distinguished notice by the year
    • 2:47:34and let's do this again let me go ahead here all right so that's kind of a lot
    • 2:47:36and turn on a feature temporarily just and let's do this again let me go ahead
    • 2:47:38to time this query by turning on a timer and turn on a feature temporarily just
    • 2:47:40in this program to time this query by turning on a timer
    • 2:47:41and let me run it again it looks like it in this program
    • 2:47:43took 0.012 and let me run it again it looks like it
    • 2:47:46seconds of real time to do that search took 0.012
    • 2:47:49that's pretty fast i barely noticed seconds of real time to do that search
    • 2:47:51certainly because it's so fast that's pretty fast i barely noticed
    • 2:47:52but let me go ahead and do this let me certainly because it's so fast
    • 2:47:54go ahead and create but let me go ahead and do this let me
    • 2:47:55an index called title index on the table go ahead and create
    • 2:47:58called shows an index called title index on the table
    • 2:47:59on its title column well what am i doing called shows
    • 2:48:02well to answer the question on its title column well what am i doing
    • 2:48:04finally from before about performance by well to answer the question
    • 2:48:06default everything we've been doing is finally from before about performance by
    • 2:48:07indeed big o of n it's just being default everything we've been doing is
    • 2:48:09linearly searched from top to bottom indeed big o of n it's just being
    • 2:48:11which seems to call into question the linearly searched from top to bottom
    • 2:48:12whole purpose of sql if we were doing no which seems to call into question the
    • 2:48:14better than with csvs whole purpose of sql if we were doing no
    • 2:48:16but an index is a clue to the database better than with csvs
    • 2:48:19to sort of load the data more but an index is a clue to the database
    • 2:48:20efficiently to sort of load the data more
    • 2:48:21in such a way that you get logarithmic efficiently
    • 2:48:23time in index is a fancy data structure in such a way that you get logarithmic
    • 2:48:25that the sql lite database or the oracle time in index is a fancy data structure
    • 2:48:27database or the mysql database whatever that the sql lite database or the oracle
    • 2:48:29product you're using database or the mysql database whatever
    • 2:48:30builds up for you in memory and then it product you're using
    • 2:48:33does builds up for you in memory and then it
    • 2:48:33something using syntax like this that does
    • 2:48:36builds in memory something using syntax like this that
    • 2:48:37generally something known as a bee tree builds in memory
    • 2:48:39we've talked a bit about trees in the generally something known as a bee tree
    • 2:48:41class we talked about binary search we've talked a bit about trees in the
    • 2:48:42trees class we talked about binary search
    • 2:48:43things that kind of look like family trees
    • 2:48:44trees a bee tree is essentially a family things that kind of look like family
    • 2:48:47tree trees a bee tree is essentially a family
    • 2:48:47that's just very wide and not that tall tree
    • 2:48:50it's a data structure similar in spirit that's just very wide and not that tall
    • 2:48:52to what we looked at in c it's a data structure similar in spirit
    • 2:48:53but it tries to keep all of the leaf to what we looked at in c
    • 2:48:55nodes all of the children or but it tries to keep all of the leaf
    • 2:48:57grandchildren or great grandchildren so nodes all of the children or
    • 2:48:59to speak grandchildren or great grandchildren so
    • 2:48:59as close to the root as possible and the to speak
    • 2:49:02algorithm it uses for that as close to the root as possible and the
    • 2:49:03tends to be proprietary or documented algorithm it uses for that
    • 2:49:05based on the system you're using but it tends to be proprietary or documented
    • 2:49:07doesn't store things in a list based on the system you're using but it
    • 2:49:09it does not store things top to bottom doesn't store things in a list
    • 2:49:12like the tables we view them as it does not store things top to bottom
    • 2:49:15underneath the hood those tables that like the tables we view them as
    • 2:49:17look like very tall structures underneath the hood those tables that
    • 2:49:19are actually underneath the hood look like very tall structures
    • 2:49:20implemented with fancier things called are actually underneath the hood
    • 2:49:22trees implemented with fancier things called
    • 2:49:23and if we create those trees by creating trees
    • 2:49:25what they're properly called and if we create those trees by creating
    • 2:49:26indexes like this it might take us a what they're properly called
    • 2:49:28moment like 0.098 seconds to create an indexes like this it might take us a
    • 2:49:31index but now notice what happens moment like 0.098 seconds to create an
    • 2:49:33previously when i searched the titles index but now notice what happens
    • 2:49:35for the office previously when i searched the titles
    • 2:49:36using linear search it took .012 for the office
    • 2:49:39seconds if i do the same query again using linear search it took .012
    • 2:49:42after having created the index seconds if i do the same query again
    • 2:49:44and having told sql light build me this after having created the index
    • 2:49:45fancy tree in memory and having told sql light build me this
    • 2:49:47voila 0.001 seconds fancy tree in memory
    • 2:49:50so orders of magnitude faster now both voila 0.001 seconds
    • 2:49:53are fast to us humans certainly but so orders of magnitude faster now both
    • 2:49:55imagine the data set being even bigger are fast to us humans certainly but
    • 2:49:56the query being even bigger imagine the data set being even bigger
    • 2:49:58these indexes can get even larger than the query being even bigger
    • 2:50:01that these indexes can get even larger than
    • 2:50:02and they're rather the queries can take that
    • 2:50:04longer than that and therefore take even and they're rather the queries can take
    • 2:50:06more time longer than that and therefore take even
    • 2:50:07than that but unfortunately if i've got more time
    • 2:50:09all of my data all over the place as in than that but unfortunately if i've got
    • 2:50:11a chart all of my data all over the place as in
    • 2:50:12as in a diagram like this my god how do a chart
    • 2:50:14i actually get useful work done how do i as in a diagram like this my god how do
    • 2:50:16get back the people in a movie and the i actually get useful work done how do i
    • 2:50:18writers and the stars and the ratings get back the people in a movie and the
    • 2:50:20if it's all over the place i would seem writers and the stars and the ratings
    • 2:50:21to have created such a mess if it's all over the place i would seem
    • 2:50:23and that i now need to execute all of to have created such a mess
    • 2:50:25these queries but notice it doesn't have and that i now need to execute all of
    • 2:50:27to be these queries but notice it doesn't have
    • 2:50:28that complicated it turns out that to be
    • 2:50:30there's another that complicated it turns out that
    • 2:50:31keyword in sql really the last that there's another
    • 2:50:33we'll look at here keyword in sql really the last that
    • 2:50:34called join the join keyword which you we'll look at here
    • 2:50:36can use implicitly or explicitly called join the join keyword which you
    • 2:50:38allows you to just join tables together can use implicitly or explicitly
    • 2:50:41and sort of reconstitute a bigger more allows you to just join tables together
    • 2:50:43user-friendly table and sort of reconstitute a bigger more
    • 2:50:45so for instance suppose i want to get user-friendly table
    • 2:50:46all of steve carell's tv shows not just so for instance suppose i want to get
    • 2:50:48the office all of steve carell's tv shows not just
    • 2:50:49well recall that i can select steve's id the office
    • 2:50:52from the people table well recall that i can select steve's id
    • 2:50:54where name equals steve carell so again from the people table
    • 2:50:57he has a different id in this table where name equals steve carell so again
    • 2:50:58because this is from imdb he has a different id in this table
    • 2:51:00but there's his id and let me go ahead because this is from imdb
    • 2:51:02and turn the timer off but there's his id and let me go ahead
    • 2:51:03for now all right so there is his id and turn the timer off
    • 2:51:06136 797 i could copy paste that into my for now all right so there is his id
    • 2:51:10code but that's not necessary 136 797 i could copy paste that into my
    • 2:51:12thanks to these nested queries i can do code but that's not necessary
    • 2:51:14something like this thanks to these nested queries i can do
    • 2:51:15let me go ahead and now select all of something like this
    • 2:51:17the show ids let me go ahead and now select all of
    • 2:51:19from the stars table where person id the show ids
    • 2:51:23from that table is in or is equal to from the stars table where person id
    • 2:51:25this result from that table is in or is equal to
    • 2:51:27so there's that join table stars that this result
    • 2:51:28links people and shows so there's that join table stars that
    • 2:51:30so let me go ahead and execute oh i hit links people and shows
    • 2:51:33the wrong key so let me go ahead and execute oh i hit
    • 2:51:33let me go ahead and execute that by the wrong key
    • 2:51:36retyping it let me go ahead and execute that by
    • 2:51:36select show id from stars retyping it
    • 2:51:40where person id equals select show id from stars
    • 2:51:43whatever steve carell's ideas all right where person id equals
    • 2:51:45so there's all of the show ids of steve whatever steve carell's ideas all right
    • 2:51:47carell's tv shows that's a lot so there's all of the show ids of steve
    • 2:51:49and it's very non-obvious what they are carell's tv shows that's a lot
    • 2:51:51so let me do another nested query by and it's very non-obvious what they are
    • 2:51:53putting all of that in parentheses and so let me do another nested query by
    • 2:51:55now select title putting all of that in parentheses and
    • 2:51:57from shows where the id of the show now select title
    • 2:52:01is in this big long list of show ids from shows where the id of the show
    • 2:52:05and there are all of the shows that he's is in this big long list of show ids
    • 2:52:07in including and there are all of the shows that he's
    • 2:52:08uh the dana carvey show back when uh in including
    • 2:52:11the office up at the top and then most uh the dana carvey show back when uh
    • 2:52:13recently shows like the morning show the office up at the top and then most
    • 2:52:15on apple tv all right so that's pretty recently shows like the morning show
    • 2:52:18cool that we can actually reconstitute on apple tv all right so that's pretty
    • 2:52:19the data like that but it turns out cool that we can actually reconstitute
    • 2:52:21there's different ways of doing that as the data like that but it turns out
    • 2:52:23well and you'll see more of this in the there's different ways of doing that as
    • 2:52:24coming weeks well and you'll see more of this in the
    • 2:52:25and in problem sets in labs and the like coming weeks
    • 2:52:27but it turns out we can do other things and in problem sets in labs and the like
    • 2:52:29as well and let me just show this syntax but it turns out we can do other things
    • 2:52:30even though it'll look a little cryptic as well and let me just show this syntax
    • 2:52:32at first glance even though it'll look a little cryptic
    • 2:52:33you can also use that join keyword as at first glance
    • 2:52:35follow i can select the title you can also use that join keyword as
    • 2:52:37from the people table joined with follow i can select the title
    • 2:52:41the stars table on the people's id from the people table joined with
    • 2:52:45column the stars table on the people's id
    • 2:52:45equaling the stars person id column column
    • 2:52:49so in other words i can select a title equaling the stars person id column
    • 2:52:51from the result of joining so in other words i can select a title
    • 2:52:53people and stars like this on the id from the result of joining
    • 2:52:55column in one and the person id column people and stars like this on the id
    • 2:52:57in the other column in one and the person id column
    • 2:52:58and i can join in the shows table in the other
    • 2:53:02on the stars dot show and i can join in the shows table
    • 2:53:05id equaling the shows dot id on the stars dot show
    • 2:53:08so again now i'm uh joining the primary id equaling the shows dot id
    • 2:53:11and foreign keys on these two tables so again now i'm uh joining the primary
    • 2:53:13where the name equals quote unquote and foreign keys on these two tables
    • 2:53:16steve carell where the name equals quote unquote
    • 2:53:17so this is the most cryptic thing we've steve carell
    • 2:53:19seen yet but it just means take this so this is the most cryptic thing we've
    • 2:53:20table and join it with this one and then seen yet but it just means take this
    • 2:53:22join it with this one table and join it with this one and then
    • 2:53:23and filter all of the resulting joined join it with this one
    • 2:53:25rows by a name of steve carell and filter all of the resulting joined
    • 2:53:28and voila there we have all of those rows by a name of steve carell
    • 2:53:30answers and voila there we have all of those
    • 2:53:31as well and there's other ways of doing answers
    • 2:53:33this too as well and there's other ways of doing
    • 2:53:34i'll leave unsaid now some of the syntax this too
    • 2:53:36for that but that's felt a little slow i'll leave unsaid now some of the syntax
    • 2:53:39and in fact let me go ahead and turn my for that but that's felt a little slow
    • 2:53:40timer back on let me re-execute this and in fact let me go ahead and turn my
    • 2:53:43last query timer back on let me re-execute this
    • 2:53:44select title from people joining on last query
    • 2:53:47stars select title from people joining on
    • 2:53:48joining on shows where name equals steve stars
    • 2:53:51carell joining on shows where name equals steve
    • 2:53:52that took over half a second so that was carell
    • 2:53:54actually admittedly kind of slow that took over half a second so that was
    • 2:53:57but again indexes come to the rescue and actually admittedly kind of slow
    • 2:53:59if again we don't but again indexes come to the rescue and
    • 2:54:00allow linear search to dominate but let if again we don't
    • 2:54:02me go ahead and create a few indexes allow linear search to dominate but let
    • 2:54:04create an index on called person index me go ahead and create a few indexes
    • 2:54:08on the stars table the person id column create an index on called person index
    • 2:54:11why on the stars table the person id column
    • 2:54:12well my query a moment ago use the why
    • 2:54:14person id column it filtered on it so well my query a moment ago use the
    • 2:54:16that might be a bottleneck person id column it filtered on it so
    • 2:54:17i'm going to go ahead and create another that might be a bottleneck
    • 2:54:18index on the called show index i'm going to go ahead and create another
    • 2:54:21on the stars table on show id similarly index on the called show index
    • 2:54:25a moment ago my query used the show id on the stars table on show id similarly
    • 2:54:27column and so that too might have been a a moment ago my query used the show id
    • 2:54:29bottleneck linearly top to bottom column and so that too might have been a
    • 2:54:31so let me create that index and then bottleneck linearly top to bottom
    • 2:54:32lastly let me create an index called so let me create that index and then
    • 2:54:34name index and this is perhaps the most lastly let me create an index called
    • 2:54:35obvious similar to the show titles name index and this is perhaps the most
    • 2:54:37before obvious similar to the show titles
    • 2:54:38on the people table on the name column before
    • 2:54:41and that too took a moment now in total on the people table on the name column
    • 2:54:43this took like almost a full second and that too took a moment now in total
    • 2:54:44but these these indexes only get created this took like almost a full second
    • 2:54:47once they get maintained automatically but these these indexes only get created
    • 2:54:48over time once they get maintained automatically
    • 2:54:49but you don't incur this with every over time
    • 2:54:51query now let me do my but you don't incur this with every
    • 2:54:53select again let me select title from query now let me do my
    • 2:54:55people joining the stars table select again let me select title from
    • 2:54:58joining the shows table where name people joining the stars table
    • 2:55:01equals steve carell joining the shows table where name
    • 2:55:02boom 0.001 seconds equals steve carell
    • 2:55:06that it's an order of magnitude faster boom 0.001 seconds
    • 2:55:08than the like more than half a second it that it's an order of magnitude faster
    • 2:55:10took us than the like more than half a second it
    • 2:55:11a little bit ago so here too you see the took us
    • 2:55:13power of a relational database a little bit ago so here too you see the
    • 2:55:15so even though we've created some power of a relational database
    • 2:55:16problems for ourselves over time so even though we've created some
    • 2:55:18we've solved them ultimately granted problems for ourselves over time
    • 2:55:20with some more sophisticated features we've solved them ultimately granted
    • 2:55:21and additional syntax with some more sophisticated features
    • 2:55:22but a relational database is indeed why and additional syntax
    • 2:55:25you use them in the real world for the but a relational database is indeed why
    • 2:55:26twitters the instagrams the facebooks you use them in the real world for the
    • 2:55:28the googles twitters the instagrams the facebooks
    • 2:55:29because they can store data so the googles
    • 2:55:31efficiently because they can store data so
    • 2:55:32without redundancy because you can efficiently
    • 2:55:34normalize them and factor everything out without redundancy because you can
    • 2:55:36but they can still maintain the normalize them and factor everything out
    • 2:55:37relations that you might have seen in a but they can still maintain the
    • 2:55:39spreadsheet relations that you might have seen in a
    • 2:55:40but using something closer to spreadsheet
    • 2:55:41logarithmic thanks to those tree but using something closer to
    • 2:55:43structures logarithmic thanks to those tree
    • 2:55:44but there are problems and what we structures
    • 2:55:45wanted to do is end on today two primary but there are problems and what we
    • 2:55:48problems that are introduced with sql wanted to do is end on today two primary
    • 2:55:50and because they are just unfortunately problems that are introduced with sql
    • 2:55:52so commonly done and because they are just unfortunately
    • 2:55:53notice this here there is something so commonly done
    • 2:55:55generally known as a sql injection notice this here there is something
    • 2:55:57attack generally known as a sql injection
    • 2:55:58which you are vulnerable to in any attack
    • 2:56:00application where you're taking user which you are vulnerable to in any
    • 2:56:01input application where you're taking user
    • 2:56:02that hasn't been an issue for my input
    • 2:56:03favorites.pi file that hasn't been an issue for my
    • 2:56:05where i only took input from a csv but favorites.pi file
    • 2:56:08if one of you were malicious what if one where i only took input from a csv but
    • 2:56:09of you had maliciously typed in the word if one of you were malicious what if one
    • 2:56:11delete of you had maliciously typed in the word
    • 2:56:12or update or something else as the title delete
    • 2:56:15of your show or update or something else as the title
    • 2:56:15and i accidentally plugged it into my of your show
    • 2:56:18own python code when and i accidentally plugged it into my
    • 2:56:19executing a query you could potentially own python code when
    • 2:56:21inject sql executing a query you could potentially
    • 2:56:22into my own code how might that be well inject sql
    • 2:56:25if logging in into my own code how might that be well
    • 2:56:26via yale you'll typically see a form if logging in
    • 2:56:28like this or logging in via harvard to via yale you'll typically see a form
    • 2:56:29something you'll see a form like this like this or logging in via harvard to
    • 2:56:31here's an example that i'm pretty sure something you'll see a form like this
    • 2:56:32neither harvard nor yale or vulnerable here's an example that i'm pretty sure
    • 2:56:34to suppose i type in my email address to neither harvard nor yale or vulnerable
    • 2:56:37this login form as mainland.harvard.edu to suppose i type in my email address to
    • 2:56:40single quote dash dash it turns out in this login form as mainland.harvard.edu
    • 2:56:43sql single quote dash dash it turns out in
    • 2:56:43dash dash is the symbol for commenting sql
    • 2:56:46if you want to comment something out dash dash is the symbol for commenting
    • 2:56:47it turns out that the single quote is if you want to comment something out
    • 2:56:49used when you want to search for it turns out that the single quote is
    • 2:56:50something like used when you want to search for
    • 2:56:51steve carell or in this case mailing at something like
    • 2:56:53harvard.edu it can be double quotes it steve carell or in this case mailing at
    • 2:56:55can be single quotes harvard.edu it can be double quotes it
    • 2:56:56in this case i'm using single quotes can be single quotes
    • 2:56:58here but let's consider some sample in this case i'm using single quotes
    • 2:57:01code if you will in python here's a line here but let's consider some sample
    • 2:57:03of code that i propose code if you will in python here's a line
    • 2:57:04might exist in the back end for of code that i propose
    • 2:57:06harvard's authentication or yales or might exist in the back end for
    • 2:57:08anyone else's harvard's authentication or yales or
    • 2:57:09maybe someone wrote some python code anyone else's
    • 2:57:11like this using select star from users maybe someone wrote some python code
    • 2:57:14where username equals question mark and like this using select star from users
    • 2:57:16password equals question mark where username equals question mark and
    • 2:57:18and they plugged in username and password equals question mark
    • 2:57:19password whatever the user typed into and they plugged in username and
    • 2:57:21that web form a moment ago password whatever the user typed into
    • 2:57:23gets plugged in here to these question that web form a moment ago
    • 2:57:25marks this is good gets plugged in here to these question
    • 2:57:26this is good code because you're using marks this is good
    • 2:57:29the sql question marks this is good code because you're using
    • 2:57:30so if you literally just do what we the sql question marks
    • 2:57:32preach today and use these question mark so if you literally just do what we
    • 2:57:34placeholders preach today and use these question mark
    • 2:57:35you are safe from sql injection attacks placeholders
    • 2:57:37unfortunately there are too many you are safe from sql injection attacks
    • 2:57:38developers in the world unfortunately there are too many
    • 2:57:40that don't practice this or don't developers in the world
    • 2:57:42realize this or do forget this that don't practice this or don't
    • 2:57:44if you instead resort to python realize this or do forget this
    • 2:57:47approaches like this if you instead resort to python
    • 2:57:48where you use an f string instead which approaches like this
    • 2:57:51might be your instincts after last week where you use an f string instead which
    • 2:57:53because they're wonderfully convenient might be your instincts after last week
    • 2:57:54with the curly braces and all because they're wonderfully convenient
    • 2:57:55suppose that you literally plug in with the curly braces and all
    • 2:57:57username and password suppose that you literally plug in
    • 2:58:00not with the question mark placeholders username and password
    • 2:58:02but just literally not with the question mark placeholders
    • 2:58:03in between those curly braces watch what but just literally
    • 2:58:05happens if my username in between those curly braces watch what
    • 2:58:06maylyn at harvard.edu was actually typed happens if my username
    • 2:58:09in by me maliciously as maylyn at harvard.edu was actually typed
    • 2:58:11mailin single quote in by me maliciously as
    • 2:58:14dash dash that would have the effect of mailin single quote
    • 2:58:17tricking dash dash that would have the effect of
    • 2:58:17this python code into doing essentially tricking
    • 2:58:20this this python code into doing essentially
    • 2:58:21let me do a find and replace it with this
    • 2:58:23trick python let me do a find and replace it with
    • 2:58:25into executing username equals quote trick python
    • 2:58:28mainland harvard.edu into executing username equals quote
    • 2:58:30quote dash dash and then mainland harvard.edu
    • 2:58:33other stuff unfortunately the dash dash quote dash dash and then
    • 2:58:35again means comment other stuff unfortunately the dash dash
    • 2:58:37which means you could maybe trick a again means comment
    • 2:58:40server which means you could maybe trick a
    • 2:58:40into ignoring the whole password part of server
    • 2:58:43the sql query into ignoring the whole password part of
    • 2:58:44and if the sql query's purpose in life the sql query
    • 2:58:46is to check is this username and and if the sql query's purpose in life
    • 2:58:48password valid is to check is this username and
    • 2:58:49so that you can decide to log the user password valid
    • 2:58:51in or to say no you're not authorized so that you can decide to log the user
    • 2:58:54well by essentially commenting out in or to say no you're not authorized
    • 2:58:56everything related to password well by essentially commenting out
    • 2:58:58notice what i've done i've just now everything related to password
    • 2:59:00theoretically notice what i've done i've just now
    • 2:59:01logged myself in as mainland harvard.edu theoretically
    • 2:59:05without even knowing or inputting a logged myself in as mainland harvard.edu
    • 2:59:06password without even knowing or inputting a
    • 2:59:07because i injected sql syntax the quote password
    • 2:59:10and the dash dash into my query because i injected sql syntax the quote
    • 2:59:12tricking the server into just ignoring and the dash dash into my query
    • 2:59:14the password equality check tricking the server into just ignoring
    • 2:59:16and so it turns out that db execute when the password equality check
    • 2:59:19you execute an insert and so it turns out that db execute when
    • 2:59:20it returns to you as said the id of the you execute an insert
    • 2:59:23newly inserted row it returns to you as said the id of the
    • 2:59:24when you use db execute to select rows newly inserted row
    • 2:59:28from a database table it returns to you when you use db execute to select rows
    • 2:59:31a list of rows each of which is a from a database table it returns to you
    • 2:59:34dictionary so this is now pseudocode a list of rows each of which is a
    • 2:59:36down here with my comment dictionary so this is now pseudocode
    • 2:59:37but if you get back one row that would down here with my comment
    • 2:59:40seem to imply that there is a user named but if you get back one row that would
    • 2:59:42malin at harvard.edu seem to imply that there is a user named
    • 2:59:44don't know what his password is because malin at harvard.edu
    • 2:59:45whoever this person is maliciously don't know what his password is because
    • 2:59:47tricked the server into ignoring whoever this person is maliciously
    • 2:59:49that syntax so sql injection attacks are tricked the server into ignoring
    • 2:59:53unfortunately one of the most common that syntax so sql injection attacks are
    • 2:59:54attacks against sql databases they are unfortunately one of the most common
    • 2:59:56completely preventable attacks against sql databases they are
    • 2:59:57if you simply use placeholders and use completely preventable
    • 3:00:00libraries whether it's cs50s or other if you simply use placeholders and use
    • 3:00:02third-party libraries that you may use libraries whether it's cs50s or other
    • 3:00:04down the road third-party libraries that you may use
    • 3:00:05a common meme on the internet is this down the road
    • 3:00:06picture here uh a common meme on the internet is this
    • 3:00:08if we zoom in on this person's license picture here uh
    • 3:00:10plate or where the license plate should if we zoom in on this person's license
    • 3:00:11be plate or where the license plate should
    • 3:00:12this is an example of someone be
    • 3:00:14theoretically trying to trick some this is an example of someone
    • 3:00:16camera on the highway theoretically trying to trick some
    • 3:00:18into like dropping the whole database camera on the highway
    • 3:00:20drop is another keyword in sql that into like dropping the whole database
    • 3:00:21deletes a database table drop is another keyword in sql that
    • 3:00:23and this person was either intentionally deletes a database table
    • 3:00:25or just humorously and this person was either intentionally
    • 3:00:26trying to trick it into executing sql by or just humorously
    • 3:00:29using syntax like this so trying to trick it into executing sql by
    • 3:00:31characters like single quotes dash dash using syntax like this so
    • 3:00:33semicolons characters like single quotes dash dash
    • 3:00:34are all potentially dangerous characters semicolons
    • 3:00:36in sql if they're passed through are all potentially dangerous characters
    • 3:00:38unchanged to the database a very popular in sql if they're passed through
    • 3:00:40xkcd comic let me give you a moment to unchanged to the database a very popular
    • 3:00:42just read this xkcd comic let me give you a moment to
    • 3:00:43is another uh well-known meme of sorts just read this
    • 3:00:47now is another uh well-known meme of sorts
    • 3:00:47in computer science if you'd like to now
    • 3:00:50read this one in computer science if you'd like to
    • 3:00:52on your own but henceforth you are now read this one
    • 3:00:54in the on your own but henceforth you are now
    • 3:00:55um family of of educated in the
    • 3:01:00learners who know who little bobby um family of of educated
    • 3:01:02tables learners who know who little bobby
    • 3:01:03is unfortunately it's dead silence in tables
    • 3:01:05here so i can't tell if anyone is is unfortunately it's dead silence in
    • 3:01:06actually laughing at this joke but here so i can't tell if anyone is
    • 3:01:08anyhow this is a very well-known meme so actually laughing at this joke but
    • 3:01:09if you're a computer scientist who knows anyhow this is a very well-known meme so
    • 3:01:11sequel you know this one if you're a computer scientist who knows
    • 3:01:12and there's one last problem we'd like sequel you know this one
    • 3:01:13to introduce if you don't mind just a and there's one last problem we'd like
    • 3:01:15couple final moments here to introduce if you don't mind just a
    • 3:01:16and that is a fundamental problem in couple final moments here
    • 3:01:18computing called race conditions which and that is a fundamental problem in
    • 3:01:20for the first time is now manifest computing called race conditions which
    • 3:01:22in our discussion of sql it turns out for the first time is now manifest
    • 3:01:24that sql and sql databases in our discussion of sql it turns out
    • 3:01:26are very often used again in the real that sql and sql databases
    • 3:01:28world for very high performing are very often used again in the real
    • 3:01:30applications and by that i mean again world for very high performing
    • 3:01:32the googles the facebooks the twitters applications and by that i mean again
    • 3:01:33of the world where lots and lots of data the googles the facebooks the twitters
    • 3:01:35is coming into servers of the world where lots and lots of data
    • 3:01:37all at once and case in point some of is coming into servers
    • 3:01:39you might have clicked like all at once and case in point some of
    • 3:01:40on this egg uh some time ago this is the you might have clicked like
    • 3:01:43most liked instagram post ever on this egg uh some time ago this is the
    • 3:01:45as of last night it was up to like 50 most liked instagram post ever
    • 3:01:47plus million as of last night it was up to like 50
    • 3:01:48likes uh well eclipsed kim kardashian's plus million
    • 3:01:51previous post which is still at like 18 likes uh well eclipsed kim kardashian's
    • 3:01:53million or so previous post which is still at like 18
    • 3:01:54this is to say this is a hard problem to million or so
    • 3:01:56solve this is to say this is a hard problem to
    • 3:01:57this notion of likes coming in at such solve
    • 3:02:00an incredible rate this notion of likes coming in at such
    • 3:02:01because suppose that long story short an incredible rate
    • 3:02:03instagram because suppose that long story short
    • 3:02:04actually has a server with a sql instagram
    • 3:02:06database and they have code in python or actually has a server with a sql
    • 3:02:08c database and they have code in python or
    • 3:02:09or whatever language that's talking to c
    • 3:02:11that database or whatever language that's talking to
    • 3:02:12and suppose that they have code that's that database
    • 3:02:14trying to increment the total number of and suppose that they have code that's
    • 3:02:15likes well how might this work logically trying to increment the total number of
    • 3:02:17well in order to increment the number of likes well how might this work logically
    • 3:02:19likes that a picture like this egg has well in order to increment the number of
    • 3:02:21you might first select from the database likes that a picture like this egg has
    • 3:02:23the current number of likes you might first select from the database
    • 3:02:25for the id of that egg photograph then the current number of likes
    • 3:02:28you might add one to it for the id of that egg photograph then
    • 3:02:29then you might update the database and i you might add one to it
    • 3:02:31didn't use it before but just like then you might update the database and i
    • 3:02:32there's insert and delete there's update didn't use it before but just like
    • 3:02:34as well there's insert and delete there's update
    • 3:02:35so you might update the database with as well
    • 3:02:37the new count plus one so you might update the database with
    • 3:02:39so the code for that might look a little the new count plus one
    • 3:02:41something like this three lines of code so the code for that might look a little
    • 3:02:43using cs50s library here where you something like this three lines of code
    • 3:02:45execute select using cs50s library here where you
    • 3:02:46likes from posts where id equals execute select
    • 3:02:48question mark likes from posts where id equals
    • 3:02:49where id is the unique identifier for question mark
    • 3:02:51that egg and then i'm storing the result where id is the unique identifier for
    • 3:02:53in a rows variable which again i claim that egg and then i'm storing the result
    • 3:02:56is a list in a rows variable which again i claim
    • 3:02:57of rows i'm going to go into the first is a list
    • 3:02:59row so that's rows bracket 0 of rows i'm going to go into the first
    • 3:03:01and i'm going to go into the likes row so that's rows bracket 0
    • 3:03:03column to get the actual number and that and i'm going to go into the likes
    • 3:03:04number i'm going to store in a variable column to get the actual number and that
    • 3:03:06called like so this is gonna be like 50 number i'm going to store in a variable
    • 3:03:07million called like so this is gonna be like 50
    • 3:03:08and i want it to go to 50 million in one million
    • 3:03:10so how do i do that and i want it to go to 50 million in one
    • 3:03:11well i execute on the database update so how do i do that
    • 3:03:14posts well i execute on the database update
    • 3:03:15set likes equal to question mark and posts
    • 3:03:18then i just plug in likes plus one set likes equal to question mark and
    • 3:03:20the problem though with the instagrams then i just plug in likes plus one
    • 3:03:22and googles and twitters of the world the problem though with the instagrams
    • 3:03:24is that they don't just have one server and googles and twitters of the world
    • 3:03:26they have many thousands of servers and is that they don't just have one server
    • 3:03:28all of those servers might in parallel they have many thousands of servers and
    • 3:03:30be receiving clicks from you and i all of those servers might in parallel
    • 3:03:32on the internet and those clicks be receiving clicks from you and i
    • 3:03:35translate into this code getting on the internet and those clicks
    • 3:03:36executed executed executed translate into this code getting
    • 3:03:38and the problem is that when you have executed executed executed
    • 3:03:40three lines of code and suppose brian and the problem is that when you have
    • 3:03:42and i click on that egg at roughly the three lines of code and suppose brian
    • 3:03:44same time and i click on that egg at roughly the
    • 3:03:45my three lines might not get executed same time
    • 3:03:47before his three lines or vice versa my three lines might not get executed
    • 3:03:49they might get co-mingled before his three lines or vice versa
    • 3:03:51chronologically my first line might get they might get co-mingled
    • 3:03:53executed then brian's first line might chronologically my first line might get
    • 3:03:55get executed my second line might get executed then brian's first line might
    • 3:03:56executed brian's second line get executed my second line might get
    • 3:03:58so they might get interspersed on executed brian's second line
    • 3:03:59different servers or just temporarily so they might get interspersed on
    • 3:04:01in time chronologically that's different servers or just temporarily
    • 3:04:03problematic in time chronologically that's
    • 3:04:04because suppose brian and i click on problematic
    • 3:04:06that egg roughly at the same time because suppose brian and i click on
    • 3:04:08and we get back the same answer to the that egg roughly at the same time
    • 3:04:09select query 50 million is the current and we get back the same answer to the
    • 3:04:12count select query 50 million is the current
    • 3:04:12then our next lines of code execute on count
    • 3:04:14the servers we happen to be on then our next lines of code execute on
    • 3:04:16which adds one to the likes the the servers we happen to be on
    • 3:04:19server might accidentally end up which adds one to the likes the
    • 3:04:21updating server might accidentally end up
    • 3:04:22the row for the egg with 50 million one updating
    • 3:04:26both times because the fundamental the row for the egg with 50 million one
    • 3:04:29problem is both times because the fundamental
    • 3:04:30if my code executes while brian codes problem is
    • 3:04:33execute if my code executes while brian codes
    • 3:04:34we are both checking the value of a execute
    • 3:04:36variable at essentially the same time we are both checking the value of a
    • 3:04:39and we are both then making a conclusion variable at essentially the same time
    • 3:04:41oh and we are both then making a conclusion
    • 3:04:42the current likes are 50 million we are oh
    • 3:04:45then making a decision let's add one to the current likes are 50 million we are
    • 3:04:4750 million then making a decision let's add one to
    • 3:04:47we are then updating the value with 50 50 million
    • 3:04:49million one we are then updating the value with 50
    • 3:04:51the problem is though that really if million one
    • 3:04:54brian's code or the server he happens to the problem is though that really if
    • 3:04:56be connected to on instagram brian's code or the server he happens to
    • 3:04:57happens to have selected the number of be connected to on instagram
    • 3:05:00likes first happens to have selected the number of
    • 3:05:01he should be allowed to finish the code likes first
    • 3:05:03that's being executed he should be allowed to finish the code
    • 3:05:04so that when i select it i see 50 that's being executed
    • 3:05:06million one and i add one to that so the so that when i select it i see 50
    • 3:05:09new count is 50 million million one and i add one to that so the
    • 3:05:11two this is what's known as a race new count is 50 million
    • 3:05:13condition when you write code two this is what's known as a race
    • 3:05:15in a multi-server or fancily known as a condition when you write code
    • 3:05:17multi-threaded environment in a multi-server or fancily known as a
    • 3:05:19lines of code chronologically can get multi-threaded environment
    • 3:05:22co-mingled lines of code chronologically can get
    • 3:05:23on different servers at any given time co-mingled
    • 3:05:25the problem fundamentally derives from on different servers at any given time
    • 3:05:27the fact that if brian's server is in the problem fundamentally derives from
    • 3:05:29the middle of checking the state of a the fact that if brian's server is in
    • 3:05:31variable the middle of checking the state of a
    • 3:05:32i should be locked out i should not be variable
    • 3:05:34allowed to click on that button at the i should be locked out i should not be
    • 3:05:36same time or my logic code my code might allowed to click on that button at the
    • 3:05:38should not be allowed to execute same time or my logic code my code might
    • 3:05:39logically so there is a solution should not be allowed to execute
    • 3:05:41when you have to write code like this as logically so there is a solution
    • 3:05:43is common for twitter and instagram and when you have to write code like this as
    • 3:05:44facebook and the like is common for twitter and instagram and
    • 3:05:46to use what are called transactions facebook and the like
    • 3:05:48transactions add some few new pieces of to use what are called transactions
    • 3:05:50syntax that we won't dwell on today and transactions add some few new pieces of
    • 3:05:51you won't need to use in the coming days syntax that we won't dwell on today and
    • 3:05:53but they do solve a fundamentally hard you won't need to use in the coming days
    • 3:05:55problem but they do solve a fundamentally hard
    • 3:05:55transactions essentially allow you to problem
    • 3:05:57lock a table transactions essentially allow you to
    • 3:05:59or really a row in a table so that if lock a table
    • 3:06:02brian's or really a row in a table so that if
    • 3:06:03click on that egg results in some code brian's
    • 3:06:05executing that's in the process of click on that egg results in some code
    • 3:06:06checking what is the total like count executing that's in the process of
    • 3:06:09my click on the egg will not get handled checking what is the total like count
    • 3:06:11by the server my click on the egg will not get handled
    • 3:06:12until his code is done executing so in by the server
    • 3:06:15green here i've proposed the way you until his code is done executing so in
    • 3:06:17should do this green here i've proposed the way you
    • 3:06:18you shouldn't just execute the middle should do this
    • 3:06:19three lines you being you shouldn't just execute the middle
    • 3:06:21in facebook in this case instagram three lines you being
    • 3:06:23should execute begin transaction first in facebook in this case instagram
    • 3:06:26then commit the transaction at the end should execute begin transaction first
    • 3:06:28and the design of transactions is that then commit the transaction at the end
    • 3:06:31all of the lines in between will either and the design of transactions is that
    • 3:06:32succeed altogether all of the lines in between will either
    • 3:06:34or fail altogether the database won't succeed altogether
    • 3:06:36get into this funky or fail altogether the database won't
    • 3:06:37state where we start losing track of get into this funky
    • 3:06:39likes state where we start losing track of
    • 3:06:40on eggs and though this has not been an likes
    • 3:06:43issue in recent years back in the day on eggs and though this has not been an
    • 3:06:44when twitter was first getting started issue in recent years back in the day
    • 3:06:46twitter was super popular when twitter was first getting started
    • 3:06:47and super offline a lot of the time twitter was super popular
    • 3:06:49there was this thing called a fail whale and super offline a lot of the time
    • 3:06:51which is like the picture they showed on there was this thing called a fail whale
    • 3:06:52their website which is like the picture they showed on
    • 3:06:53when they were getting too much traffic their website
    • 3:06:55to handle that was because when people when they were getting too much traffic
    • 3:06:56are liking and tweeting and retweeting to handle that was because when people
    • 3:06:58things it's a huge amount of data coming are liking and tweeting and retweeting
    • 3:07:00in things it's a huge amount of data coming
    • 3:07:01and it turns out it's very hard to solve in
    • 3:07:03these problems but and it turns out it's very hard to solve
    • 3:07:04locking the database table or the rows these problems but
    • 3:07:07with these transactions is one way locking the database table or the rows
    • 3:07:08fundamentally to solve this with these transactions is one way
    • 3:07:10and on our final extra time today we fundamentally to solve this
    • 3:07:12thought we would play this out in the and on our final extra time today we
    • 3:07:13same example that i was taught thought we would play this out in the
    • 3:07:15transactions in some years ago same example that i was taught
    • 3:07:17suppose that the scenario at hand is transactions in some years ago
    • 3:07:18that you and your roommates have a nice suppose that the scenario at hand is
    • 3:07:21dorm fridge that you and your roommates have a nice
    • 3:07:22and you're all in the habit of drinking dorm fridge
    • 3:07:23lots of milk and you want to be able to and you're all in the habit of drinking
    • 3:07:25drink some milk lots of milk and you want to be able to
    • 3:07:26but you go to the fridge like i'm about drink some milk
    • 3:07:28to here and you realize but you go to the fridge like i'm about
    • 3:07:30uh oh we're out of milk and so now i'm to here and you realize
    • 3:07:33inspecting the state of this uh oh we're out of milk and so now i'm
    • 3:07:34refrigerator which is quite old inspecting the state of this
    • 3:07:36but also quite empty and the state of refrigerator which is quite old
    • 3:07:38this variable but also quite empty and the state of
    • 3:07:39being empty tells me that i should go to this variable
    • 3:07:40cvs and buy some more milk being empty tells me that i should go to
    • 3:07:43so what do i then do i'm presumably cvs and buy some more milk
    • 3:07:45going to close the fridge so what do i then do i'm presumably
    • 3:07:46and i'm going to go and leave and go going to close the fridge
    • 3:07:49head to cvs and i'm going to go and leave and go
    • 3:07:50unfortunately the same problem arises head to cvs
    • 3:07:52that we'll act out here in our final 60 unfortunately the same problem arises
    • 3:07:54or so seconds together that we'll act out here in our final 60
    • 3:07:55whereby if brian now my roommate in this or so seconds together
    • 3:07:57story also wants some milk whereby if brian now my roommate in this
    • 3:07:59he comes by when i'm already headed to story also wants some milk
    • 3:08:01the store inspects the state of the he comes by when i'm already headed to
    • 3:08:02fridge the store inspects the state of the
    • 3:08:03and realizes oh we're out of milk so he fridge
    • 3:08:05nicely will go restock as well and realizes oh we're out of milk so he
    • 3:08:07so let's see how this plays out and nicely will go restock as well
    • 3:08:09we'll see if there's isn't so let's see how this plays out and
    • 3:08:10a similar analogous solution so we'll see if there's isn't
    • 3:08:13i've checked the state of the variable a similar analogous solution so
    • 3:08:15we're indeed out of milk i'll be right i've checked the state of the variable
    • 3:08:17back we're indeed out of milk i'll be right
    • 3:08:17just going to go to cvs back
    • 3:08:56do
    • 3:09:58all right i am now back from the store
    • 3:10:00i've picked up some milk all right i am now back from the store
    • 3:10:01gonna go ahead and put it into the i've picked up some milk
    • 3:10:03fridge and oh how did this happen now gonna go ahead and put it into the
    • 3:10:05there's multiple jugs of milk and of fridge and oh how did this happen now
    • 3:10:07course you know milk does not last that there's multiple jugs of milk and of
    • 3:10:09long and brian and i don't drink that course you know milk does not last that
    • 3:10:10much milk so this is like a really long and brian and i don't drink that
    • 3:10:11serious problem much milk so this is like a really
    • 3:10:12we've sort of tried to update the very serious problem
    • 3:10:14value of this variable we've sort of tried to update the very
    • 3:10:16at the same time so so how do we go value of this variable
    • 3:10:18about fixing this what's the at the same time so so how do we go
    • 3:10:19the actual solution here well i dare say about fixing this what's the
    • 3:10:22that we can draw some the actual solution here well i dare say
    • 3:10:23inspiration from the world of that we can draw some
    • 3:10:25transactions and the world of databases inspiration from the world of
    • 3:10:27and perhaps create a visual for here transactions and the world of databases
    • 3:10:29that we hope you never forget if you and perhaps create a visual for here
    • 3:10:30take nothing away from today let's go that we hope you never forget if you
    • 3:10:32ahead and act this act this out one last take nothing away from today let's go
    • 3:10:34time ahead and act this act this out one last
    • 3:10:34where this time i'm gonna be a little time
    • 3:10:35more extreme i go ahead and open the where this time i'm gonna be a little
    • 3:10:37fridge i realize ah we're out of milk more extreme i go ahead and open the
    • 3:10:39i'm gonna go to the store i do not want fridge i realize ah we're out of milk
    • 3:10:42to allow for this situation where brian i'm gonna go to the store i do not want
    • 3:10:44accidentally checks the fridge as well to allow for this situation where brian
    • 3:10:46so i i'm going to accidentally checks the fridge as well
    • 3:10:49lock the refrigerator instead let me go so i i'm going to
    • 3:10:52ahead and lock the refrigerator instead let me go
    • 3:10:54drape this through here ahead and
    • 3:10:57a little extreme but i think so long as
    • 3:11:00he can't get a little extreme but i think so long as
    • 3:11:01into the fridge this shouldn't be a he can't get
    • 3:11:04problem into the fridge this shouldn't be a
    • 3:11:06let me go ahead now and just attach the problem
    • 3:11:08lock here let me go ahead now and just attach the
    • 3:11:10almost got it come on all right lock here
    • 3:11:13now the fridge is locked now i'm gonna almost got it come on all right
    • 3:11:16go get some milk now the fridge is locked now i'm gonna
    • 3:11:34i can come up on stage and just tell me
    • 3:11:37when and i'll just i can come up on stage and just tell me
    • 3:11:40oh all right that's it for cs50 sorry to
    • 3:11:43keep you late we will see you oh all right that's it for cs50 sorry to
    • 3:11:44next time keep you late we will see you
    • 3:12:25you
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video