CS50 Video Player
    • 🧁

    • 🍫

    • 🥥

    • 🍿
    • 0:00:00Introduction
    • 0:00:17Securing Data
    • 0:02:21Hashing
    • 0:39:52Secret-Key Cryptography
    • 1:03:21Public-Key Cryptography
    • 1:16:14Digital Signatures
    • 1:27:57Passkeys
    • 1:34:47Encryption in Transit
    • 1:40:10Deletion
    • 1:46:28Encryption at Rest
    • 1:50:59Ransomware
    • 1:52:24Quantum Computing
    • 0:00:00[MUSIC PLAYING]
    • 0:00:17DAVID J. MALAN: All right.
    • 0:00:18This is CS50's Introduction to Cybersecurity.
    • 0:00:21My name is David Malan, and this week, we'll focus on securing data.
    • 0:00:25Last week, recall, we focused on accounts,
    • 0:00:27and particularly one of the mechanisms by which we
    • 0:00:30protect our accounts is generally by way of these things called passwords.
    • 0:00:33But we focused last time really on our having the responsibility
    • 0:00:38to keep these things secure.
    • 0:00:39And yet, there's another party involved whenever
    • 0:00:41you have an account with a username and a password,
    • 0:00:43and that's the server or app that is actually
    • 0:00:46storing that password in some form long-term
    • 0:00:49so that you can actually authenticate yourself--
    • 0:00:51that is, prove to this application or website
    • 0:00:53that you are who you claim to be.
    • 0:00:56Well, in the simplest form, perhaps these servers
    • 0:01:00that are storing our usernames and passwords for which we have registered
    • 0:01:04or maybe doing something very simple like this.
    • 0:01:06For instance, if a website or app has two users at the moment, at least,
    • 0:01:10Alice and Bob, and suppose for simplicity
    • 0:01:12that Alice's password is Apple and Bob's password is banana,
    • 0:01:16you could imagine that a website or that app, simply storing
    • 0:01:20in a very simple text file these key value pairs--
    • 0:01:25username, colon, password, new line.
    • 0:01:28Username, colon, password, new line.
    • 0:01:30And in fact, that's actually very commonly
    • 0:01:33how passwords are stored on systems, at least certain operating
    • 0:01:36systems like Linux, not necessarily as simply as this.
    • 0:01:39They often have a little more information off to the right there,
    • 0:01:42but in essence, it's the username and password.
    • 0:01:44But this wouldn't be a good thing to store the passwords exactly like this.
    • 0:01:48Why?
    • 0:01:49Well, suppose that this website or this app and its database
    • 0:01:53are somehow hacked by an adversary.
    • 0:01:54That if someone gains access to that file containing these usernames
    • 0:01:58and passwords, well, at that point, they literally
    • 0:02:00have everyone's username and password.
    • 0:02:02And we talked last time about attacks like credential
    • 0:02:05stuffing whereby an adversary, once they know your username and password on one
    • 0:02:09system, they can try stuffing that username
    • 0:02:12and password into other systems, other websites,
    • 0:02:14other apps just in hopes that you are, unfortunately, using the same username
    • 0:02:18and password elsewhere as well.
    • 0:02:21So this is generally not a good thing if an adversary gets access
    • 0:02:23to everyone's usernames and passwords.
    • 0:02:26And even though, of course, in an ideal world, that would never happen,
    • 0:02:30we should probably, as the administrators,
    • 0:02:32as the creators of this website or app, we
    • 0:02:34should probably do everything we can to at least minimize
    • 0:02:37the fallout, the downsides, the damages that might result if,
    • 0:02:42and daresay, when our database or this text file here are somehow compromised.
    • 0:02:47So how might we go about doing that here?
    • 0:02:49Rather than just storing apple and banana in clear text,
    • 0:02:53so to speak, literally in the English words themselves,
    • 0:02:57why don't we go ahead and employ a technique known as hashing?
    • 0:03:00Now if you've studied computer science before,
    • 0:03:02you might actually know this phrase in the context of hash tables and data
    • 0:03:06structures.
    • 0:03:06Well, it turns out the idea in this world of securing data is very similar,
    • 0:03:11and in fact, this is a technique that's incredibly common for solving
    • 0:03:14all sorts of problems.
    • 0:03:15Well, what do we mean by hashing in this context?
    • 0:03:17Hashing is the process of taking a password as input
    • 0:03:20and somehow converting it to a so-called hash or hash value.
    • 0:03:25Now these hash values don't look like English.
    • 0:03:27They're typically strings of text that might have letters, might have numbers,
    • 0:03:32but they're generally of some fixed length typically.
    • 0:03:35And in this case here, when we go about taking our password as input,
    • 0:03:39converting it somehow via an algorithm or some code that we wrote,
    • 0:03:43we want to convert it into this hash value
    • 0:03:45and then store that hash value in that database of passwords instead.
    • 0:03:50So here's a proverbial black box, and let's stipulate for the moment
    • 0:03:53that I have no idea how hashing works, but I do know that this box can do it.
    • 0:03:57So how do I think about this process?
    • 0:03:59Well generally speaking, there's going to be some input to this box.
    • 0:04:03Ultimately, I want to get some output from that box.
    • 0:04:06And what this box really represents is, in fact, a hash function.
    • 0:04:11You can think of this as a device like some kind of machine;
    • 0:04:14you can think of it like a program, some piece of software;
    • 0:04:16or you can even think about it as a mathematical function that operates
    • 0:04:20simply on numbers coming in as input.
    • 0:04:22In fact, if you're mathematically inclined,
    • 0:04:24though we won't use this syntax often, you can think of that hash function
    • 0:04:28as being represented by f, you can think of the input as being represented by x,
    • 0:04:33and you can think of the output of this process as being so-called f of x.
    • 0:04:37If you're not familiar with that notation, that's fine,
    • 0:04:39but this is directly connected hashing to basic mathematics
    • 0:04:43as well that you might encounter before long.
    • 0:04:45But what we care about is passing into this black box a password
    • 0:04:49and getting out a hash, and then storing that hash and not
    • 0:04:53the password in our database or text file of usernames and passwords.
    • 0:04:57So how might we go about doing this?
    • 0:04:59Well, if I were to provide apple as an input to this hash function,
    • 0:05:03let's think about the simplest hash function possible
    • 0:05:06that doesn't output apple, but some representation of apple
    • 0:05:12that I can eventually store in that database.
    • 0:05:14So I'm going to propose very simply that maybe the a simplest hash function
    • 0:05:17we can come up with-- and indeed, if you've studied computer science
    • 0:05:20or taken CS50 itself, you might recall that we
    • 0:05:23can hash our inputs unlike specific letters therein.
    • 0:05:27So apple starts with A. So you know what?
    • 0:05:29A is the first letter of the English alphabet.
    • 0:05:31So I'm going to create a hash function here
    • 0:05:33pictorially that outputs one whenever the input happens to start with an A,
    • 0:05:38as does apple.
    • 0:05:39Meanwhile, if we pass in banana, I'm going
    • 0:05:41to have this hash function output 2 because B
    • 0:05:44is the letter of the English alphabet.
    • 0:05:46And dot-dot-dot, we might get to cherry or other passwords as well that might
    • 0:05:51output 3 and beyond.
    • 0:05:53And you could imagine doing this for all letters of the English alphabet.
    • 0:05:56Now unfortunately, this isn't the best hash function
    • 0:05:59because it's fairly simplistic.
    • 0:06:01And in fact, I can quickly think of some other fruits like avocados
    • 0:06:05that also start with A and that would give me the same hash value.
    • 0:06:08And that's actually a characteristic we'll come back
    • 0:06:11to whereby when you hash values, there can actually be ambiguities,
    • 0:06:15potentially, whereby two inputs might actually have the same output,
    • 0:06:20and we'll consider eventually what the implications of that might be.
    • 0:06:23But for now, I dare say that's a little too simplistic.
    • 0:06:26And what might be better than outputting 1 or 2 or 3
    • 0:06:30is a little something more cryptic, because that's just too helpful.
    • 0:06:34That's too much of a hint.
    • 0:06:35If I see that your hash value is 1, I at least
    • 0:06:38know that your password now clearly starts with an A, which means at best,
    • 0:06:42I can do 1/26th the amount of work to figure out what it actually is.
    • 0:06:46So we want these hashes generally to be a little weird-looking and really
    • 0:06:50unguessable and not leak any information.
    • 0:06:53So for instance, a very common older hash function for apple might actually
    • 0:06:57output this-- ..ekWXa83dhiA with some mixed uppercase and lowercase letters
    • 0:07:05therein.
    • 0:07:06Now it looks weird, you probably can't and shouldn't see any kind of pattern
    • 0:07:10in there.
    • 0:07:10There is a fancy math formula that took as input apple
    • 0:07:14and outputted as its hash value that string of text
    • 0:07:18there, but in and of itself, it doesn't really leak any information
    • 0:07:22like the number 1 or 2 or 3 would.
    • 0:07:24So we've already made an improvement.
    • 0:07:25Banana, meanwhile, would look like this.
    • 0:07:28And cherry, meanwhile, would look like that.
    • 0:07:31So notice that these values are indeed quite different.
    • 0:07:34So using this better hash function, I claim, that doesn't just
    • 0:07:37look at the first letter of the alphabet,
    • 0:07:39but looks at maybe all of the letters in the input--
    • 0:07:42C-H-E-R-R-Y in this case, we can probably come up with something more
    • 0:07:46interesting, more cryptic-looking, if you will,
    • 0:07:48like the values that we've just seen.
    • 0:07:50So let me propose now that what we should do in our database of passwords
    • 0:07:54is not store alice, apple, bob, banana, but let's instead
    • 0:07:59store the hashes of apple and banana respectively.
    • 0:08:03So instead in this password database, I'm going to store this instead.
    • 0:08:07The exact same values that we just saw coming as outputs from that black box,
    • 0:08:11but in this case now, I'm storing in my database
    • 0:08:13of passwords usernames and hash values.
    • 0:08:18Now why is this perhaps a good thing?
    • 0:08:21Well, one, if someone now attacks this server
    • 0:08:23and somehow gains access to all of these usernames and hashes, what they don't
    • 0:08:28have is an entire list of passwords.
    • 0:08:31So they can't quite as easily go about credential stuffing and figuring out
    • 0:08:35maybe if this database will give me access to my accounts somewhere else.
    • 0:08:39I'm at least creating some work for the adversary.
    • 0:08:42But at the same time, I feel like I've kind of broken the whole system
    • 0:08:46because previously, presumably, when you log into a website or app
    • 0:08:49and you type in your username and then you type in your password, what
    • 0:08:53is the website or app probably do?
    • 0:08:55Well, once that username and password are sent over the internet,
    • 0:08:58typically to that server, well, the server probably
    • 0:09:00compares what you typed in against the username and their database,
    • 0:09:04or their text file, and the server compares
    • 0:09:07what you typed in as your password against whatever
    • 0:09:10password is in their database.
    • 0:09:12But now we have a problem.
    • 0:09:13We have you typing the username and we do have the username
    • 0:09:16still in the database.
    • 0:09:18Case in point, Alice and Bob are still here.
    • 0:09:22But what we don't have is apple and banana.
    • 0:09:25We've replaced those altogether with hashes.
    • 0:09:27So even if you type in-- or Alice types in apple,
    • 0:09:30well we don't want to compare A-P-P-L-E to this because it obviously
    • 0:09:34doesn't match; and Bob's banana, we don't want to compare against this
    • 0:09:37because it's not going to match; and so forth.
    • 0:09:40So what can we do?
    • 0:09:41Well, the way authentication typically works on the server side
    • 0:09:46when using hashing is as follows.
    • 0:09:49When you first create an account or register for this website or app,
    • 0:09:52you type in, if you're Alice, Alice, Enter, and then apple, for instance,
    • 0:09:58Enter.
    • 0:09:58That username, Alice, that password, Apple, are sent to the server.
    • 0:10:02But what the server does before saving the username and password
    • 0:10:06is it runs that hash function on Alice's password,
    • 0:10:10which is apple, converts it thereafter to this value, and stores Alice's
    • 0:10:16username and the hash of Alice's password only and throws away apple,
    • 0:10:21deletes it, it forgets it in memory.
    • 0:10:23What then happens next?
    • 0:10:25Well, the next time Alice tries to log into this website--
    • 0:10:29maybe the next day, a week from then, a year from then for the second or third
    • 0:10:33or more time, what happens?
    • 0:10:35Well, Alice types in Alice as her username, hopefully
    • 0:10:38apple as her password, hits Enter, those get sent to the server as usual,
    • 0:10:41and obviously the server can't just compare
    • 0:10:44username against username and password against password
    • 0:10:46because it doesn't have the password in its database, so
    • 0:10:49what can the server do?
    • 0:10:51The server can repeat the very same process,
    • 0:10:53taking Alice's password as inputted, A-P-P-L-E,
    • 0:10:57run it through the exact same hash function a day, a week, a year later,
    • 0:11:02and then compare that resulting hash value to whatever is stored in this
    • 0:11:07text file or database.
    • 0:11:09And now admittedly, we're creating a whole lot more work for ourselves,
    • 0:11:13but it's not that big a deal because this is just a math function,
    • 0:11:16or if you know how to program, it's just a few lines of code
    • 0:11:19that you've written in software that converts passwords to hash values.
    • 0:11:23And honestly, nowadays, you wouldn't even rewriting most of this code
    • 0:11:26yourself, you'd be using a library, third-party code that someone
    • 0:11:29else smarter than you, maybe, has written and gotten it just right,
    • 0:11:32no bugs or mistakes, so you're just relying on someone else's code
    • 0:11:36anyway to achieve this goal.
    • 0:11:37But the upside now, to be clear, is if this file is compromised somehow,
    • 0:11:42the server's hacked into and this data is leaked,
    • 0:11:45at least they only know the usernames on your system, not the actual passwords.
    • 0:11:52And let me pause here and see if there's any questions on this technique
    • 0:11:55of hashing for passwords specifically.
    • 0:12:01STUDENT: You said yourself, we are using libraries
    • 0:12:04more often than write the hash functions ourselves if we are not
    • 0:12:09taking the course on CS50.
    • 0:12:13So then it's easy to hack these hashes, right?
    • 0:12:17Because we can go through 10, 40, I don't
    • 0:12:20know, hash functions that are available in the libraries,
    • 0:12:24and then you can reverse the hash results, is that right?
    • 0:12:29DAVID J. MALAN: Almost.
    • 0:12:30Can do exactly what you described first whereby
    • 0:12:32you use the same library, the same code, to create hash values to then compare
    • 0:12:37those against what's in the database, but generally, these hashes
    • 0:12:41are not reversible, per se.
    • 0:12:43You can compare them, but you can't reverse the process
    • 0:12:46for reasons we'll come back to.
    • 0:12:47But your intuition is right.
    • 0:12:49And so really, the takeaway here is that we
    • 0:12:51haven't made our system absolutely secure,
    • 0:12:54we've made it relatively more secure.
    • 0:12:56Why?
    • 0:12:57Because we've increased the cost to the adversary, to the hacker.
    • 0:13:01They now have to do more work to figure out what the actual passwords are
    • 0:13:05if they want to benefit from this hack.
    • 0:13:07So again, it just raises the bar, it does not
    • 0:13:10keep the adversary necessarily out or even
    • 0:13:12stop them from figuring out one person's password,
    • 0:13:15but it might take them a lot more time, it
    • 0:13:17might take them a lot more resources like server or cloud costs or money,
    • 0:13:21or it might even heighten the risk before they actually are successful.
    • 0:13:25How about one other question here on hashing?
    • 0:13:28STUDENT: If the password is intercepted before--
    • 0:13:32after the website is hacked and the password
    • 0:13:34is intercepted before it's encrypted, so wouldn't that pose a problem?
    • 0:13:40DAVID J. MALAN: Yes, absolutely.
    • 0:13:42Then all bets are off.
    • 0:13:44Everything we just discussed is not useful
    • 0:13:45at all if the adversary has actually intercepted the password
    • 0:13:49before it has even been hashed.
    • 0:13:50Now thankfully, there's going to be solutions to that problem, too,
    • 0:13:53and we'll come to them today, but for now, focusing only on hashes,
    • 0:13:57it solves one problem but not all.
    • 0:13:59In fact, it turns out that those attacks we talked about last time with respect
    • 0:14:04to our accounts are still possible.
    • 0:14:06You can still use a dictionary, for instance, of English words,
    • 0:14:09or better yet, a dictionary of English fruits,
    • 0:14:12and you could, one fruit at a time, run each of those values as input
    • 0:14:18into the same hash function, the library or code
    • 0:14:20that you're using to achieve this, and then
    • 0:14:22that's going to give you one hash value after another.
    • 0:14:25And you could compare each of those hash values
    • 0:14:28against whatever is in the database or the file of passwords
    • 0:14:31that you, the hacker in this story, might have actually stolen somehow.
    • 0:14:35You have to do more work though, because it's
    • 0:14:37no longer as simple as just comparing apple against apple and banana
    • 0:14:41against banana.
    • 0:14:42You actually have to do some work.
    • 0:14:44You have to do some computational work.
    • 0:14:46And if the file is only a few values, of course, not a big deal.
    • 0:14:50If it's thousands or millions of rows, it might actually take a lot more
    • 0:14:54time, energy, and effort.
    • 0:14:55So again, we're just raising the bar, but not keeping the adversary
    • 0:14:58out altogether.
    • 0:15:00And even if you don't have a dictionary available,
    • 0:15:02and even if the passwords are not all fruits in English,
    • 0:15:06well, you can still, as the adversary, resort to brute-force attacks.
    • 0:15:10And you can try even the simplest of passwords like 0000 or maybe eight
    • 0:15:150's instead, and you can hash that and see what the resulting hash value is
    • 0:15:19and compare that against what's in the database.
    • 0:15:22Then you can try 00000001, hash that, compare that against
    • 0:15:28what's in the database, and then move on to the next and the next,
    • 0:15:31doing this not just for numbers, but for letters as well.
    • 0:15:34A, A, A, A, A, A, A, A, A, hash that and compare.
    • 0:15:38Eventually, apple will be on that list.
    • 0:15:40Eventually, banana will be on that list.
    • 0:15:42But there, too, the brute force attack is still
    • 0:15:44going to take some amount of time.
    • 0:15:46So it's just increasing the cost or the complexity
    • 0:15:48for the adversary in this particular case.
    • 0:15:51But there's yet another threat that's possible in the context now
    • 0:15:54of the hashes, which is worth knowing about.
    • 0:15:58There's a term of art known as a rainbow table, which
    • 0:16:00is a very beautiful way of saying that adversaries in advance
    • 0:16:05might have already hashed all possible English words in a dictionary.
    • 0:16:09Adversaries might have already hashed all possible passwords of length 4
    • 0:16:13or 5 or 6 or 7 or 8 or something else.
    • 0:16:16And maybe if they have a big enough hard drive,
    • 0:16:18they are storing a big table, like an Excel file
    • 0:16:21or a CSV file of all of the words that they've tried, all of the passwords
    • 0:16:25they've tried, and all of the hash values they've already computed.
    • 0:16:30Then it's even easier.
    • 0:16:31Then they don't even need to do a brute-force attack, per se,
    • 0:16:34hashing and hashing and hashing and hashing.
    • 0:16:36Then they can just compare, compare, compare.
    • 0:16:38Because indeed, a rainbow table simply contains
    • 0:16:41all of the passwords they've tried, all of the hash values they've generated,
    • 0:16:46and so they just compare left to right whatever
    • 0:16:48the user typed in against the hash value they've already computed.
    • 0:16:52Now for certain hash functions, this threat of a rainbow table
    • 0:16:56is just not feasible.
    • 0:16:57You might need terabytes or petabytes of data, which means a lot of hard drives
    • 0:17:03and a lot of money, so there are potential downward pressures
    • 0:17:06on this kind of an attack, but it can certainly speed things up.
    • 0:17:09Certainly if you're pre-computing-- that is,
    • 0:17:12pre-calculating some of the hashes for at least words
    • 0:17:14in an English dictionary, and certainly some short list like all
    • 0:17:17of the fruits in the world.
    • 0:17:19But there's another problem that we might
    • 0:17:21encounter on the server with regard to our passwords.
    • 0:17:25Alice might have a password of apple, Bob might have a password of banana,
    • 0:17:28but suppose that both Carol and Charlie have a password of cherry.
    • 0:17:34And just by coincidence, they both chose the same password
    • 0:17:38and are in this same database.
    • 0:17:39Now we've already concluded, I think, that we definitely don't
    • 0:17:42want to store the plaintext passwords.
    • 0:17:45We don't want to store literally in the clear apple, banana, cherry, and cherry
    • 0:17:50because this is just too easy for the adversary to do bad things with it.
    • 0:17:53So we at least want to hash this, but here's
    • 0:17:56where hashing can leak information, so to speak.
    • 0:18:00If I go ahead and use the same function I've
    • 0:18:02been using to hash apple and banana and now cherry,
    • 0:18:06what do you notice about Carol's and Charlie's hash values?
    • 0:18:12Curiously, but maybe not surprisingly, they're exactly the same.
    • 0:18:17That's, after all, how functions typically work,
    • 0:18:19be it in math or in software, in code.
    • 0:18:21If you pass the exact same input, unless there's some randomness going on,
    • 0:18:25you're going to get the same output again and again.
    • 0:18:27Now why is this a big deal?
    • 0:18:29Well, if some adversary attacks this database and gains
    • 0:18:33access to all of these usernames and hashes,
    • 0:18:36we have leaked information in the sense that the adversary, just
    • 0:18:40by glancing at this file, knows that, OK, I
    • 0:18:43don't know what Carol's password is or what Charlie's password is,
    • 0:18:46but I know it's the same password, and that alone
    • 0:18:50might be enough information to figure out with higher probability what it is.
    • 0:18:54Maybe Carol and Charlie are related.
    • 0:18:56So maybe you focus on words or numbers that are common to both of them.
    • 0:19:01Maybe there's some information that's implied by this if they both are--
    • 0:19:05they both like the same TV shows, they both like the same movies.
    • 0:19:08You can try to find, in your mind, maybe the intersection of information that
    • 0:19:12might lead you, with higher probability, to figure out,
    • 0:19:15without brute force, even, what Carol's password is and Charlie's password is.
    • 0:19:20So this is a common problem, and we only have four users in this database.
    • 0:19:24You can imagine having many more.
    • 0:19:25Odds are, some of us are going to have the same username-- not
    • 0:19:28the same username, some of us are going to have the same passwords.
    • 0:19:31In fact, without raising your hands or admitting to this for the whole world
    • 0:19:35to see, do any of you have a password of 1234 In some website or app?
    • 0:19:43Maybe a little harder?
    • 0:19:4412345678?
    • 0:19:48Something very simple like this.
    • 0:19:49Maybe it's an account you don't really care about.
    • 0:19:51Well, that's a perfect example of where, if you
    • 0:19:54have an account on the same system as someone else here in the classroom,
    • 0:19:57you're going to have, in that database, presumably, the same hash values,
    • 0:20:01and that might be alone enough information to leak and increase
    • 0:20:06the probability that you, and not Alice or Bob,
    • 0:20:09are actually compromised with respect to your account.
    • 0:20:12So how can we fix this?
    • 0:20:13Well, it turns out there's another technique in the world of data
    • 0:20:16that we can use to perturb this process.
    • 0:20:20And you can think of it metaphorically as
    • 0:20:21like sprinkling a little bit of salt on the hash function
    • 0:20:25so as to change what its output is.
    • 0:20:27It's not random, per se, but you are perturbing the output
    • 0:20:31so that it's much less likely that two people with the same passwords
    • 0:20:34are going to have the same hash value.
    • 0:20:36So how does this work?
    • 0:20:38In this case before, when we passed in cherry as our input,
    • 0:20:42we got the same hash again and again.
    • 0:20:45But let me propose that we modify our hash function to take two inputs now.
    • 0:20:50Not just the password, but also a salt value, so to speak.
    • 0:20:55A little bit of a sprinkling of, in this case, just two characters--
    • 0:20:58two numbers, two letters, or a combination thereof.
    • 0:21:01Now this hash function that I'm describing is still
    • 0:21:04going to output a hash value, but notice,
    • 0:21:06it's different from the one before, and even if you don't quite remember
    • 0:21:09what it was before, it was not this.
    • 0:21:11But worth noting is that in the output of this hash function
    • 0:21:15now is the salt itself.
    • 0:21:18So the salt isn't something that's meant to be private or secret or secure, it's
    • 0:21:22just sprinkled in there to make sure that whatever hash value comes out
    • 0:21:26of this black box is a little bit different than if you
    • 0:21:29had put a different salt value instead.
    • 0:21:32So for instance, suppose that for Carol and for Charlie,
    • 0:21:38we use different salts.
    • 0:21:39And that's the idea.
    • 0:21:40Different users should have different salt values
    • 0:21:43just in case they choose the same passwords.
    • 0:21:45So instead of 50 and cherry, suppose that Charlie
    • 0:21:48uses a salt value of, say, 49.
    • 0:21:5149 is not a number that Charlie or you or me have to pick.
    • 0:21:55This is all done by the server automatically,
    • 0:21:57picking a random two characters like 4-9 or 5-0.
    • 0:22:00But notice what just happened.
    • 0:22:02If I rewind to cherry with a salt of 5, this was the hash value, the first two
    • 0:22:07characters of which are the salt. If, though,
    • 0:22:10I change the salt from 50 to 49, the hash changes completely,
    • 0:22:16and it prefixes it with now 49 instead of 50.
    • 0:22:19This ensures that even if Carol and Charlie have the exact same password,
    • 0:22:24there's no way I, the adversary, am going to know by looking at it.
    • 0:22:28Because indeed, what ends up in the file now are these two values.
    • 0:22:32One is prefixed with 50, one is prefixed with 49, the rest of the hash values
    • 0:22:37clearly are completely different.
    • 0:22:39So again, the upside is this approach where the hash function
    • 0:22:42takes two inputs, the password and a salt,
    • 0:22:46and then outputs one hash value means that we're not leaking information
    • 0:22:51except--
    • 0:22:53except-- so there is a corner case--
    • 0:22:55if by chance, by bad luck, the system chooses
    • 0:22:58the same salt for both Carol and Charlie,
    • 0:23:01yes, there might still be information leaked.
    • 0:23:03And honestly, that may very well happen if you've
    • 0:23:06got thousands, millions of users, then you're
    • 0:23:09going to run out of two-character possibilities,
    • 0:23:11you're going to have to reuse salt.
    • 0:23:13But the idea is that we're just trying to put
    • 0:23:15downward pressure on the probability of being attacked successfully.
    • 0:23:20We're trying to equivalently raise the bar to the adversary
    • 0:23:23so that they are not as likely to gain access to my data or, in turn,
    • 0:23:28my account.
    • 0:23:29Questions now on salting or hashing itself?
    • 0:23:34STUDENT: Oh, I'm curious.
    • 0:23:35Where do we store the salt?
    • 0:23:37DAVID J. MALAN: So where do you store the salt?
    • 0:23:39The salt is actually stored in the hash value itself,
    • 0:23:43according to this algorithm, in the first two characters.
    • 0:23:46And the value of storing the salt in the first two characters of the hash
    • 0:23:50is as follows.
    • 0:23:51The next time Carol logs in, she types in her username, Carol, and hits Enter.
    • 0:23:56The server now knows, OK, I'm expecting a password from Carol,
    • 0:23:59let's see what she types in.
    • 0:24:01Suppose that she types in correctly cherry.
    • 0:24:04Now the system is not storing cherry, so it's not
    • 0:24:06going to compare literally what Carol typed in,
    • 0:24:08but it is going to hash cherry, but first, the system is going to check,
    • 0:24:14what is Carol's hash-- what is Carol's salt?
    • 0:24:17And it's going to infer as much by looking at Carol's hash value
    • 0:24:21and looking only at the first two characters by convention.
    • 0:24:24Then what the server is going to do, it's going to take whatever Carol typed
    • 0:24:27in, cherry, C-H-E-R-R-Y, it's going to pass in 50, 5-0, and then hopefully,
    • 0:24:33it's going to get back to this same value here,
    • 0:24:36this whole string in yellow.
    • 0:24:38And if those are correct, then Carol will be considered authenticated.
    • 0:24:43By contrast, if the username happens to be Charlie and Charlie hits Enter,
    • 0:24:47then what the server is going to do is look at Charlie's hash value,
    • 0:24:50grab the first two characters for Charlie's salt,
    • 0:24:53use that salt and cherry as the input to the hash function,
    • 0:24:57and hope that the result is Charlie's value, not Carol's.
    • 0:25:02Really good question.
    • 0:25:04Other questions on salting or hashing?
    • 0:25:07STUDENT: Is there any sense in rehashing a password?
    • 0:25:09So hashing it a first time to get a string,
    • 0:25:14then rehashing it for a second string?
    • 0:25:17Or it's just impractical?
    • 0:25:19DAVID J. MALAN: No, you could certainly hash the value multiple times,
    • 0:25:22but a good hash function should not require that of you.
    • 0:25:25Especially now, more recent modern hashes, one of which
    • 0:25:28we'll look at in a moment, they should have sufficiently calculated
    • 0:25:32and proven characteristics that allow you to hash it just once
    • 0:25:36and you will get a seemingly random string
    • 0:25:39that represents whatever that input is.
    • 0:25:41And here, too, is where I should emphasize
    • 0:25:44that when it comes to this world of hashing and salting
    • 0:25:47and today's other topics ultimately, these are not wheels
    • 0:25:51that you or I should be reinventing.
    • 0:25:54Unless you are the researcher or the company that's actually
    • 0:25:57developing the algorithm, stress-testing them, analyzing them theoretically
    • 0:26:02and practically so often in industry or the real world,
    • 0:26:05when people like you and me invent our own systems for storing information,
    • 0:26:10we just haven't spent nearly as much time
    • 0:26:13or we're just not nearly as sharp as some of the security researchers
    • 0:26:16out there who have really given this some thought.
    • 0:26:18So when it comes to all things security-- and let me get on my soapbox
    • 0:26:22here and say, you and I should not be solving these problems unless it is
    • 0:26:26your full-time job or calling in life.
    • 0:26:29There's just too many corner cases unless you're
    • 0:26:31collaborating with a smart team.
    • 0:26:35All right.
    • 0:26:35With that said, here is what hashes generally
    • 0:26:40look like nowadays in practice.
    • 0:26:41For the sake of discussion, I deliberately
    • 0:26:43chose a fairly simple hash function that was using a fairly short salt,
    • 0:26:48just two characters, and a fairly short hash value as output.
    • 0:26:52Here, in a smaller font, no less, is how Alice's and Bob's and Carol's
    • 0:26:57and Charlie's passwords would probably be
    • 0:27:00stored nowadays using a more recent modern hash function
    • 0:27:03that, notice, by the shear length of the text on the screen,
    • 0:27:06outputs a much larger value.
    • 0:27:09If you're familiar from computer science with the notion of bits,
    • 0:27:120's and 1's that are used to store information in systems,
    • 0:27:15these hash values use many more bits, many more 0's and 1's.
    • 0:27:19You and I as humans are seeing them as alphabetical letters and as numbers,
    • 0:27:23but underneath the hood, these are just more and more 0's and 1's
    • 0:27:27that the computer is storing, which means it's much, much less likely
    • 0:27:31that someone who steals this kind of file
    • 0:27:33is going to be able to figure out efficiently what
    • 0:27:37those original passwords were.
    • 0:27:38And you can see, too, that for both Carol and Charlie,
    • 0:27:41even though their passwords are still cherry,
    • 0:27:43these two strings along the bottom look completely different.
    • 0:27:47Except in one location here.
    • 0:27:49It turns out that the scheme a lot of systems have adopted
    • 0:27:52is that if you look between dollar signs at the beginning of what
    • 0:27:56seems to be the hash value, you'll see a code like y or y or y or y
    • 0:28:01or other numbers or letters as well.
    • 0:28:03That's a little cheat sheet that tells the system exactly what hash function
    • 0:28:07was used to generate the rest of it.
    • 0:28:10And that's in the documentation that you can read online
    • 0:28:12for any number of hash functions.
    • 0:28:14So that's just to say, when you create an account on some new website or app,
    • 0:28:18if they are doing things well in a manner consistent with best practices
    • 0:28:22and they are being mindful of your security, they are probably in a file
    • 0:28:27or in a database or some other mechanism storing
    • 0:28:29values that look quite like these based on whatever password you actually
    • 0:28:34typed in.
    • 0:28:35In fact, just to give you a sense of how easy or difficult
    • 0:28:38it might be to crack passwords-- that is, figure out what they are based only
    • 0:28:43on these hashes, in the case of our first hash function
    • 0:28:47whereby we had a fairly short hash value being outputted
    • 0:28:50with or without the salt, turns out, there's
    • 0:28:5218 quintillion possible hash values.
    • 0:28:56Now that's a lot.
    • 0:28:57That's bigger than last times quadrillion value.
    • 0:29:00But, with enough time, enough money, and enough cloud computing,
    • 0:29:04those early hash functions can be broken.
    • 0:29:07That is, with enough time and energy, you can probably
    • 0:29:10figure out what someone's password is.
    • 0:29:11If you fast forward to the other strings that I showed you
    • 0:29:14on the screen, the much longer ones that use more bits, so to speak,
    • 0:29:18then you have this many possible hash values nowadays.
    • 0:29:22And I actually did look up how to pronounce this,
    • 0:29:25but based on reading it on my screen, I wasn't actually sure
    • 0:29:28how to say the word since this is a really big number that my mathematician
    • 0:29:31colleagues could do a better job pronouncing.
    • 0:29:33But given how many digits are on the screen,
    • 0:29:36given how many commas are on the screen here,
    • 0:29:38this is a really big number such that you
    • 0:29:40and I probably don't need to worry about an adversary using brute force figuring
    • 0:29:45out and still being able to figure out by the end of time
    • 0:29:49what the corresponding password might be unless there
    • 0:29:52are other weaknesses in the system.
    • 0:29:55Now speaking of weaknesses.
    • 0:29:57Has anyone ever forgotten your password?
    • 0:30:00Yes, of course.
    • 0:30:00But have you ever gone to a website or app, clicked that link that says,
    • 0:30:04Forgot Password, question mark, in hopes of getting an email of some sort
    • 0:30:09so that you can reset the password?
    • 0:30:10I mean, odds are, almost everyone here has experienced that.
    • 0:30:13But has anyone ever clicked on that link, gotten back
    • 0:30:17an email that actually contains your password
    • 0:30:22so that you're just immediately reminded what it is?
    • 0:30:24I'm seeing a few nods of the head.
    • 0:30:26You can copy-paste it, then, into the website.
    • 0:30:28Do not use that website anymore.
    • 0:30:30That is evidence of-- that is a symptom of a website or application not
    • 0:30:36practicing best practices.
    • 0:30:38Why?
    • 0:30:39Well, if it is the case that the website can email you your password,
    • 0:30:43that means they can see and they know what your password is.
    • 0:30:46That means this database, this text file we've
    • 0:30:49been talking about is probably vulnerable to some hacker
    • 0:30:52eventually getting into it and stealing all of those usernames
    • 0:30:55and passwords in the clear, no less.
    • 0:30:57Because recall what these hashes are.
    • 0:30:59They're generally meant to be irreversible.
    • 0:31:02When you take as input apple, banana, and cherry,
    • 0:31:04the output looks completely different with no obvious relationship to what
    • 0:31:08those original passwords actually were.
    • 0:31:11And so if that's what's being stored in the database,
    • 0:31:14the company who made that website, the person who made that website or app,
    • 0:31:18they should not be able to reverse that process either,
    • 0:31:21otherwise surely, the adversary can.
    • 0:31:23So it is the case, and I've experienced this myself, often
    • 0:31:27from smaller shops or companies that maybe haven't really
    • 0:31:30invested a lot of time or care into their website,
    • 0:31:33if they are able to email you your original password,
    • 0:31:37it is, by definition, not secure.
    • 0:31:39And it's certainly not up to today's standards, it's just too easy
    • 0:31:43for it to be compromised.
    • 0:31:44So maybe minimally stop using that service
    • 0:31:46and make sure you're not using that password anywhere else.
    • 0:31:49Maximally, maybe send them a note explaining your concern
    • 0:31:52and maybe linking them to some reference online--
    • 0:31:55maybe this video-- in which you explain why you have that concern.
    • 0:32:01Questions, then, on forgetting passwords or hashing or salting?
    • 0:32:07STUDENT: So as you said, some companies may not be practicing these hashes
    • 0:32:13and maybe practicing something very bad.
    • 0:32:15So if I were, let's say, a company and I--
    • 0:32:19because of my practices, I had a leak of passwords and all the data,
    • 0:32:24do I as a company have any obligations or responsibility for what
    • 0:32:30happened since I have all the customer's data and all their passwords,
    • 0:32:35do I have any obligations or responsibilities?
    • 0:32:38DAVID J. MALAN: It's a really good, a noble question.
    • 0:32:41The answer to that ethically is probably yes you should, quite simply.
    • 0:32:45However, the more nuanced answer is that it's probably
    • 0:32:48going to depend on the industry that you're in, the country that you're in,
    • 0:32:51any regulatory requirements that your company faces which might
    • 0:32:55oblige you to report out in that way.
    • 0:32:58So I would read up on the context that's specific to you yourself.
    • 0:33:02And I will say, unfortunately, it is not that common in the world, I dare say,
    • 0:33:08that companies document and detail publicly
    • 0:33:11when there have been security exploits.
    • 0:33:13They might announce that something indeed has happened,
    • 0:33:15but it is rare that companies will go into any amount of detail.
    • 0:33:19Now this is understandable because, one, they're already embarrassed,
    • 0:33:22or if not in legal trouble or financial trouble because that has happened
    • 0:33:26already, but they probably, typically, don't
    • 0:33:29want to provide other adversaries-- other future attackers-- with more
    • 0:33:33information about their systems and the weaknesses that those systems have.
    • 0:33:38The downside, of course, societally, is that if each of us
    • 0:33:42is secretly getting attacked in ways we didn't
    • 0:33:45expect, learning things that would be ideal to share
    • 0:33:49with others in the world.
    • 0:33:51This itself is actually a big question in the world of cybersecurity,
    • 0:33:54just how much and how often to share, especially when
    • 0:33:57you discover a bug or a mistake in someone's system,
    • 0:33:59do you tell them privately, do you tell the world publicly?
    • 0:34:02These are ethical questions that we'll touch on indeed in the coming days
    • 0:34:06as well.
    • 0:34:07Allow me to propose that separate from these concerns here,
    • 0:34:12we can come back to some of those recommendations
    • 0:34:14that we started the class with from this, the National Institute
    • 0:34:17for Standards and Technology.
    • 0:34:19Notice that this was one other quote we did not share last time.
    • 0:34:22A recommendation from NIST is that "Verifiers
    • 0:34:25shall store memorized secrets in the form that
    • 0:34:28is resistant to offline attacks.
    • 0:34:30Memorized secrets SHALL be salted and hashed
    • 0:34:33using a suitable one-way key derivation function.
    • 0:34:36Their purpose is to make each password guessing trial
    • 0:34:41by an attacker who has obtained a password hash file expensive,
    • 0:34:45and therefore, the cost of guessing attack--
    • 0:34:48of a guessing attack high or prohibitive."
    • 0:34:51So when I refer to best practices, I'm really
    • 0:34:54referring to actual documentation like this, either from the United States,
    • 0:34:58from other countries, from other companies.
    • 0:35:00There are indeed these best practices, and among our goals
    • 0:35:03for this class is to expose you to some of those,
    • 0:35:06both on the consumer side-- you and me as individual computer users,
    • 0:35:10but also on the corporate or the academic side
    • 0:35:13as well as to what you should be doing when
    • 0:35:15you are in a position of being responsible for someone else's data as
    • 0:35:19well.
    • 0:35:19Now as for the actual hash functions to use nowadays,
    • 0:35:22these are just some of them that are generally recommended nowadays
    • 0:35:26that can be categorized as SHA-2 and SHA-3.
    • 0:35:29These refer to fairly sophisticated mathematical functions that
    • 0:35:32take as input, typically, a password, or some input more generally,
    • 0:35:36and then output a hash value thereof.
    • 0:35:40There are other algorithms, too, that can even
    • 0:35:43be used to verify the authenticity and integrity of messages as well.
    • 0:35:49In fact, today, we'll also focus on how we can use primitives like these
    • 0:35:53to ensure that data was not actually changed in transit when you sent it
    • 0:35:57over the internet from one person to another.
    • 0:36:00But ultimately, what we've been focusing on and what you've seen on this list
    • 0:36:03here are what are generally known as one-way hash functions.
    • 0:36:07That is, these are mathematical functions,
    • 0:36:09or, in the context of programming, these are
    • 0:36:12functions written in code, languages like Python
    • 0:36:15or otherwise, that take as input a string of arbitrary length.
    • 0:36:20That is, a password that's this long, maybe this long, maybe this long,
    • 0:36:23but what's key to these cryptographic functions
    • 0:36:25is they output a hash value of fixed length
    • 0:36:29that is always this many bytes or characters or this many bytes
    • 0:36:33or characters.
    • 0:36:34That is, it doesn't matter how short or how long the password is,
    • 0:36:37these cryptographic, these one-way hash functions are one-way in the sense
    • 0:36:42that they take a potentially infinite domain, if you
    • 0:36:46know this term for mathematics, and condense it into a finite range.
    • 0:36:51That is, a huge number of values, all possible passwords in the world,
    • 0:36:55to just a finite list of possible hash values.
    • 0:36:58It might be a long list of possible hash values,
    • 0:37:01but indeed, no matter how long a string of text
    • 0:37:03is, if it's of some fixed length--
    • 0:37:0616 characters, 32 characters, something else,
    • 0:37:08there's only a finite number of those values.
    • 0:37:11Now there's an implication of this.
    • 0:37:13When you take a really large input space or domain mathematically
    • 0:37:19and map it to a smaller finite range, so to speak,
    • 0:37:24mathematically, it turns out that if you do try to reverse the process,
    • 0:37:28there will be multiple inputs that yield the same output.
    • 0:37:33Think about it this way.
    • 0:37:34If you've got 100 possible passwords in the world,
    • 0:37:37but you only have 10 possible hash values--
    • 0:37:41so 100 passwords, 10 hash values, you have
    • 0:37:44to figure out how to put all of those passwords into 10 buckets, so to speak.
    • 0:37:49So surely, some of those passwords are going to be in the same bucket.
    • 0:37:52Think about it in terms of the English alphabet.
    • 0:37:54If we stuck with that original hash function where A was 1, B was 2,
    • 0:37:59C was 3, presumably Z was 26, there's more than one fruit
    • 0:38:05that starts with a--
    • 0:38:06apple, avocado, and so forth.
    • 0:38:09So there, too, you are going to have multiple fruits mapping
    • 0:38:13to the same finite range of values, hash values 1 through 26.
    • 0:38:17What that means is that if an adversary, or even you, the owner of the system,
    • 0:38:21look at that hash value and see the number 1,
    • 0:38:24you don't know if the password was apple or avocado or some other word that
    • 0:38:30started with A. And so that's what we mean by one-way hash functions.
    • 0:38:34You cannot reliably reverse the process by any means and know definitively what
    • 0:38:39the original input is.
    • 0:38:41Now there is a catch.
    • 0:38:42That technically means on some systems, it
    • 0:38:45might be possible to log in with apple or avocado, or more
    • 0:38:50generally, your actual password and some other seemingly random password that
    • 0:38:54might make no sense to you, but just because mathematically it
    • 0:38:58has the same hash value, that password, too, might let you into the system.
    • 0:39:03But the idea is, especially as we're using really large numbers of bits,
    • 0:39:07really long hash values, the probability of you or me figuring
    • 0:39:11out or an adversary even guessing what that other hash
    • 0:39:14value or what those other inputs--
    • 0:39:17passwords might be is just so small that we tend not to worry about it as well.
    • 0:39:22The algorithms we've looked at on the board
    • 0:39:24here are also known as cryptographic hash functions, which
    • 0:39:27means they have utility in the world of cryptography
    • 0:39:31where the world of cryptography is all about the practice and the study
    • 0:39:34of securing data.
    • 0:39:36Securing data while in transit from one point to another or while
    • 0:39:41at rest on your own system.
    • 0:39:43Let's go ahead here and take a five-minute break,
    • 0:39:45and when we come back, we'll explore precisely that world of cryptography
    • 0:39:50with respect to our data.
    • 0:39:52All right.
    • 0:39:53We're back.
    • 0:39:54And indeed, cryptography is all about the practice and study
    • 0:39:57of securing our data, particularly when we want to transmit it
    • 0:40:01from one person to another.
    • 0:40:03So cryptography can be broken down into a couple of different categories, one
    • 0:40:07of which are codes.
    • 0:40:08And codes are not the type of code that you might write in Python or the like.
    • 0:40:12It has nothing to do with software, but rather,
    • 0:40:15a mapping between what we'll call code words
    • 0:40:17and the actual message or true reading that those words represent.
    • 0:40:21Here, for instance, is an actual book from over 100 years ago
    • 0:40:25that was used to map these code words in the left column
    • 0:40:28to these, indeed, messages or true readings on the right.
    • 0:40:32The idea is, that if that one party wanted
    • 0:40:35to send a secure message to another party,
    • 0:40:37they wouldn't just write it out in plain English.
    • 0:40:39Why?
    • 0:40:40Because if that message, written on a piece of paper or parchment,
    • 0:40:43were intercepted by another human, that other human,
    • 0:40:46assuming they, too, know English, could just
    • 0:40:48read the actual message, the so-called plaintext.
    • 0:40:51In a code, though, you can convert the words
    • 0:40:55that you want to say to code words that make no sense necessarily
    • 0:41:00to someone who's intercepted the message in and of itself
    • 0:41:03unless they, too, have this book.
    • 0:41:05Now you can imagine this being a fairly time-consuming process
    • 0:41:08because when the recipient receives that message, unless they've memorized
    • 0:41:12all of these pages, these code words and the meanings thereof,
    • 0:41:15they have to do quite a bit of work flipping through their copy of the book
    • 0:41:19in order to figure out what that message is.
    • 0:41:21But the fact that they have a copy of the book, too,
    • 0:41:24is a potential threat because if one party or another had their code
    • 0:41:28book stolen, then any of the messages they've sent can now be decoded,
    • 0:41:33so to speak, by looking them up retrospectively.
    • 0:41:36And any future messages, if the owners of the book
    • 0:41:38don't realize that code book has been taken, so, too,
    • 0:41:41could those messages be translated.
    • 0:41:43Not to mention the fact, it's fairly cumbersome.
    • 0:41:46This alone is page 187.
    • 0:41:48And so that's quite a bit of codes and quite a bit of work
    • 0:41:51just to achieve this layer of indirection.
    • 0:41:54But there are some terms of art here that are worth knowing,
    • 0:41:57and you might actually use in everyday context,
    • 0:41:59but not necessarily for the same purpose.
    • 0:42:01So encode, what do we mean by that?
    • 0:42:03It means taking a plaintext text message,
    • 0:42:06be it in English or any human language, and taking that as input
    • 0:42:10and producing as output codetext.
    • 0:42:12So the codetext might be a short succinct
    • 0:42:15sequence of words that might actually be English words,
    • 0:42:19but they're not meant to mean what they normally mean.
    • 0:42:22They're meant to be looked up in the code book
    • 0:42:24to figure out what the message is actually trying to say.
    • 0:42:27Meanwhile, decode, as you might expect, is the opposite.
    • 0:42:29You take as input the codetext that you have received as the recipient,
    • 0:42:33you use that same code book to look up the code words
    • 0:42:38and figure out what the actual message is in order
    • 0:42:41to get the original plaintext, be it in English or any other human language
    • 0:42:45that the code book is designed for.
    • 0:42:47But there's an alternative to codes, if only because those code books can
    • 0:42:51get very cumbersome indeed, they can be taken and compromised and the like.
    • 0:42:56So it's not necessarily the best system in that you
    • 0:42:58need to physically keep something like that secure, let alone
    • 0:43:01do so efficiently when converting.
    • 0:43:03So there are also what we'll call ciphers.
    • 0:43:05And ciphers are more algorithmic in nature.
    • 0:43:09So if you have taken a computer science or a programming course,
    • 0:43:11you already have the predisposition to thinking algorithmically and taking
    • 0:43:16a big problem and breaking it down into smaller pieces
    • 0:43:20and then applying some kind of logic, sometimes again and again,
    • 0:43:23in order to solve some problem.
    • 0:43:25So ciphers focus on exactly that.
    • 0:43:28They don't focus on maybe words or phrases.
    • 0:43:30They might focus on individual letters instead or even bits
    • 0:43:34if it's in the context nowadays of computers.
    • 0:43:37So in the world of ciphers, you might have actually
    • 0:43:39seen them in popular culture.
    • 0:43:41So here, for instance, is just one frame from a famous film known as A Christmas
    • 0:43:46Story, at least here in the US.
    • 0:43:47It plays like every day all day long on a couple of TV channels
    • 0:43:51around Christmas time, but this here is Ralphie,
    • 0:43:55one of the main characters in the movie, and in his hands
    • 0:43:59here is this secret decoder pin that he tried so hard to get through the mail,
    • 0:44:04and the secret decoder pin was from little Orphan Annie herself.
    • 0:44:09And what it does is implement mechanically a cipher,
    • 0:44:13converting one letter to a number and back.
    • 0:44:17But the thing twists left and right so that you can actually
    • 0:44:20figure out what the mapping might be.
    • 0:44:22So this is more of a cipher because it's operating at a lower level--
    • 0:44:26not in entire words or phrases, but one letter at a time.
    • 0:44:30And it's a repeatable process that Ralphie, in this case,
    • 0:44:32can apply again and again to all of the letters of the secret message.
    • 0:44:37In World War II, the Germans, for instance,
    • 0:44:39had the Enigma Machine that you might have read about or seen
    • 0:44:41depicted in films, and this was a mechanical implementation
    • 0:44:45of this same idea of a cipher.
    • 0:44:47But instead of using mathematics or gears turning just this way and that,
    • 0:44:51it was much more mechanical.
    • 0:44:52It was with rotors and lights and the like,
    • 0:44:55but it, too, was implementing a cipher and could
    • 0:44:57be configured with different inputs in order
    • 0:45:00to influence exactly what the output would be.
    • 0:45:03But that, too, is a physical device, and we'll focus here for the most part,
    • 0:45:07though, on things more digital, things that you can ultimately, for instance,
    • 0:45:10nowadays implement much more readily and much more scalably in software.
    • 0:45:15But the words we'll use are pretty much the same.
    • 0:45:17To encipher a message means to take that message in English or any other
    • 0:45:21language, or so-called plaintext, and convert it, not surprisingly,
    • 0:45:25to ciphertext as output Meanwhile, the reverse--
    • 0:45:29or rather, an equivalent term here that you might know as well is to encrypt.
    • 0:45:33Same idea, synonyms for our purposes, plaintext to ciphertext.
    • 0:45:38To encipher or to encrypt.
    • 0:45:39Nowadays, encrypt is probably the more common of those terms
    • 0:45:43Meanwhile, decipher would be the opposite of that,
    • 0:45:46to actually take the ciphertext that someone else has sent to you,
    • 0:45:49run it through an algorithm or cipher, and get back the plaintext.
    • 0:45:54Meanwhile, decrypt would be a synonym for that phrase, which
    • 0:45:57refers to exactly the same process of taking ciphertext as input
    • 0:46:00and outputting plaintext as output.
    • 0:46:03So how do we configure these ciphers so that you and I
    • 0:46:06can use the same algorithm but customize them, not only with our own messages,
    • 0:46:10but also with our own settings so that just because you and I might
    • 0:46:14want to send the same plaintext doesn't mean that the ciphertext has
    • 0:46:18to actually be identical?
    • 0:46:20And indeed, in the world of cryptography,
    • 0:46:22it's quite recommended that you and I use public and well-documented,
    • 0:46:27well-tried-and-tested algorithms publicly,
    • 0:46:30but we do keep one piece of information secret so that our use of that cipher,
    • 0:46:35that algorithm is specific to us.
    • 0:46:37And this customization, this configuration
    • 0:46:40are generally known as keys.
    • 0:46:42Now keys, much like a physical key to a lock on an actual door to your home,
    • 0:46:47a key is what unlocks the capabilities of this cipher,
    • 0:46:51but it's a key that needs to be known and used not only by you, typically,
    • 0:46:55but also by the recipient.
    • 0:46:57So that by having copies of the same key,
    • 0:46:59you can not only encrypt messages or encipher them,
    • 0:47:03but you can also decrypt or decipher those messages, too.
    • 0:47:07Now what are these keys in practice?
    • 0:47:08They're not physical objects in the virtual world,
    • 0:47:10but really just really big numbers.
    • 0:47:13And often, there's some mathematical significance of these numbers,
    • 0:47:16and sometimes those numbers don't even look like numbers.
    • 0:47:19They might be presented on your phone or your laptop
    • 0:47:21or desktop actually as letters of an alphabet
    • 0:47:24and maybe even with some punctuation, too.
    • 0:47:26But at the end of the day, they're really just numbers, or, of course,
    • 0:47:29if you know a bit of computer science already,
    • 0:47:31they're really just 0's and 1's.
    • 0:47:33But it's perhaps helpful to think about them metaphorically as
    • 0:47:36akin to these physical keys.
    • 0:47:38Now how are these keys actually used?
    • 0:47:40Well, within the world of cryptography, there
    • 0:47:43are different types of encryption.
    • 0:47:45And the first we'll look at is known as secret key cryptography.
    • 0:47:49The presumption is that the security of your data
    • 0:47:53relies on the secrecy of some key.
    • 0:47:56So if A wants to send a message to B, then A and B
    • 0:48:00must keep secret whatever key they are using to configure
    • 0:48:05their choice of algorithms.
    • 0:48:06So what do we mean by that?
    • 0:48:08Well, secret key cryptography, specifically
    • 0:48:10in the context of encryption and scrambling data,
    • 0:48:13is also known as symmetric key encryption for the reason
    • 0:48:16that both A and B in this story are going to use the exact same key.
    • 0:48:21And we'll contrast this in just a bit with asymmetric key encryption,
    • 0:48:24which solves other problems as well.
    • 0:48:26So let's consider the process of encryption,
    • 0:48:29much like the process of hashing, as being this black box.
    • 0:48:32Somehow or other, this Black box is going to encrypt information for me.
    • 0:48:37Taking as input my plaintext and hopefully outputting as output
    • 0:48:40my ciphertext that I can actually send over the internet or some other channel
    • 0:48:45to a recipient as well.
    • 0:48:47So in the context, then, of secret key encryption,
    • 0:48:50the picture looks a little something like this.
    • 0:48:52Not only do you pass as input to the algorithm
    • 0:48:55your plaintext message in English or any other human language,
    • 0:48:59you also pass a key.
    • 0:49:00And for now, just think of that key as a number that you and the other person
    • 0:49:04have somehow agreed upon in advance.
    • 0:49:07That algorithm, then, will ultimately output the ciphertext.
    • 0:49:10And to be clear, the motivation for that key
    • 0:49:13is to ensure that if I and you and you and you and you are all
    • 0:49:18using the exact same encryption algorithm,
    • 0:49:21it's not going to be obvious if and when we're
    • 0:49:23sending the exact same messages because that,
    • 0:49:26too, per our discussion of passwords, would leak information.
    • 0:49:30Maybe you don't care about the information being leaked,
    • 0:49:33but it's probably not a good thing if-- just because someone else is getting
    • 0:49:37some message, that, makes it more likely that an adversary can
    • 0:49:41infer what it is you sent because the ciphertext just so happens
    • 0:49:46to look the same.
    • 0:49:47We want our ciphertext to be unique to each of our transmissions.
    • 0:49:51So, let's consider a simple, simple example.
    • 0:49:54Suppose that the message I want to send is just as short as the capital letter
    • 0:49:59A, and suppose that the key that I want to use is as simple as the number 1.
    • 0:50:04These are not good best practices, but we'll
    • 0:50:06use them for the sake of discussion.
    • 0:50:08Let me propose that the simplest algorithm I can perhaps think of
    • 0:50:11is actually one that would take A as input and 1 as input and output B.
    • 0:50:17And you can perhaps infer where this is going.
    • 0:50:19If I instead provide B as input and 1 as input for the plaintext and key
    • 0:50:24respectively, then the output is C.
    • 0:50:27So believe it or not, in yesteryear, Julius Caesar
    • 0:50:30was known to use an algorithm like this whereby this algorithm, Caesar Cipher,
    • 0:50:36is what's generally known as a rotational cipher,
    • 0:50:39because you're rotating the letters of the English alphabet.
    • 0:50:42A becomes B, B becomes C. And I bet if we continue this logic,
    • 0:50:46we can go around from Z becoming A as well.
    • 0:50:50Now this, of course, is being applied at the moment
    • 0:50:52to very short messages that are not that useful.
    • 0:50:55Sending A or B or C is not particularly useful in general,
    • 0:50:58but it's demonstrating how we can encipher or encrypt
    • 0:51:02our plaintext into our ciphertext.
    • 0:51:05However, when someone receives this message,
    • 0:51:09they need to not only what algorithm I used to encrypt it--
    • 0:51:13in this case, Caesar Cipher or a rotational cipher more generally,
    • 0:51:17but they also need to know what the key is.
    • 0:51:20And the key might not be as simple as 1.
    • 0:51:22Here, for instance, is an example of 13.
    • 0:51:24If your key is 13 and your plaintext is A,
    • 0:51:27then your ciphertext should be N, because that is 13 places away from A,
    • 0:51:33and so now the algorithm seems a little less obvious.
    • 0:51:3813 is also representative of something that's long been known on the internet
    • 0:51:41as ROT13 for R-O-T-1-3-- rotate 13 places.
    • 0:51:46It's a very popular way of scrambling information
    • 0:51:49but not in a way that you intend to be secure.
    • 0:51:52Historically, it was often used for like movie spoilers online.
    • 0:51:55If you want to make something a spoiler before there was CSS and blurring
    • 0:51:59effects on websites and whatnot, you could just
    • 0:52:01scramble it so it looks completely encrypted,
    • 0:52:04but it's very easy for someone else with a click of a button
    • 0:52:07even to just decrypt it.
    • 0:52:08However, I would recommend that you not use a key of 26 because why?
    • 0:52:17Well, at least in English, there's only 26 letters of the alphabet, capital A
    • 0:52:21through capital Z in this case.
    • 0:52:22So a key of 26 is going to output for your ciphertext
    • 0:52:26the exact same thing as your plaintext.
    • 0:52:29So there's another joke on the internet whereby ROT26 is twice
    • 0:52:35as secure as ROT13 because 13 times 2 is 26,
    • 0:52:39and obviously, that's not the case deductively here.
    • 0:52:43Now of course, this particular algorithm and keys of this small size,
    • 0:52:471 through 26, not at all secure.
    • 0:52:49Why?
    • 0:52:50Well honestly, I don't even need a computer to crack this cipher.
    • 0:52:53I can probably take out a piece of paper and pencil
    • 0:52:55and just try all possible numbers from 1 to 25--
    • 0:53:00I don't need to even waste my time with 26--
    • 0:53:02and just figure out via brute force what keys someone might have used
    • 0:53:06to send a message using this algorithm.
    • 0:53:08Not on even single letters, but maybe it operates on every individual letter
    • 0:53:13of their message.
    • 0:53:14Wouldn't take me that long to probably figure this out by brute force by hand.
    • 0:53:18And with code, my gosh, I could write some Python code probably
    • 0:53:21that does it even faster than that.
    • 0:53:23So here on the screen is some ciphertext that I created in advance.
    • 0:53:27And I'll stipulate that this ciphertext was enciphered
    • 0:53:30using that same rotational cipher, but I'm not
    • 0:53:33going to tell you just yet what key I actually used.
    • 0:53:36It was originally an English message in all capital letters.
    • 0:53:40So the task at hand now is to decrypt this, I dare say.
    • 0:53:43Whether you are the intended recipient of the message or maybe maliciously,
    • 0:53:47you've intercepted my transmission with this message and it,
    • 0:53:51and now you're trying to brute force your way through by trying,
    • 0:53:54and by the looks of some heads going down and some scribbling, 1 or 2 or 3.
    • 0:53:58I bet we could also brute force our way through this algorithm, but how?
    • 0:54:01How does the decrypting process work?
    • 0:54:03It's really just the same thing in reverse.
    • 0:54:06If this now is our picture and you have ciphertext as your input,
    • 0:54:10you should be able to pass the same key as input--
    • 0:54:131, for instance or 13 or, with no good reason,
    • 0:54:1726, and get back out the plaintext.
    • 0:54:20But of course, the decryption algorithm is indeed the opposite
    • 0:54:23because you don't want to just add one position
    • 0:54:25or add two positions or three positions, you want to subtract 1 or 2 or 3 or 13.
    • 0:54:32You want to go in reverse, so to speak.
    • 0:54:34And so, if I were to pass in B as the ciphertext and 1 as the key,
    • 0:54:40well, the plaintext decrypted should, of course, be A.
    • 0:54:44And that holds now for all of the other letters of the alphabet,
    • 0:54:47assuming I'm reversing this process, in order to decrypt.
    • 0:54:50And now, I'll let you a glance at the screen here for just a moment
    • 0:54:54and see if you yourselves can't figure out
    • 0:54:58what this ciphertext is trying to say.
    • 0:55:02And if you like the idea of figuring this out,
    • 0:55:05if you want to get better at this particular skill,
    • 0:55:08you are an aspiring cryptanalyst, I dare say, focusing
    • 0:55:13on this world of cryptanalysis.
    • 0:55:14And this, too, itself is a job, I dare say particularly with governments,
    • 0:55:18trying to decrypt messages that might very well have been encrypted.
    • 0:55:23Now hopefully the world is using more secure algorithms
    • 0:55:26than these simple rotational ciphers.
    • 0:55:28And what do I mean by secure?
    • 0:55:30Hopefully they're using keys that are much bigger than small numbers like 1
    • 0:55:35through 25.
    • 0:55:36Hopefully they're using much, much, much larger numbers, many more bits, if only
    • 0:55:41so that it takes you and me, when we try to apply cryptanalysis to ciphertext,
    • 0:55:47it takes us way, way longer than this particular algorithm alone.
    • 0:55:52Now I don't want to keep you in suspense,
    • 0:55:55but I also don't want to spoil this if you'd like to try your hand at this.
    • 0:55:58So go ahead and close your eyes if you don't want to see the answer to this,
    • 0:56:03or I suppose you can just look away from your screen.
    • 0:56:05But in five seconds, I'll reveal what the plaintext actually is--
    • 0:56:10and some of you, if you've seen that movie I mentioned,
    • 0:56:12will know immediately why this is the way it is,
    • 0:56:15but otherwise, you might just see this as an advertisement of sorts.
    • 0:56:19So here we go.
    • 0:56:20Your chance to close your eyes in 5, 4, 3, 2, 1.
    • 0:56:34From some faces, some of you have seen this movie around the holidays,
    • 0:56:37but now, I've taken it off the screen and we'll move on now
    • 0:56:40with some actual algorithms.
    • 0:56:41If you'd like to come back on replay and actually see what the answer is,
    • 0:56:45we'll, of course, leave it on-demand.
    • 0:56:47So what are some of the actual algorithms
    • 0:56:49used nowadays for encryption that are best practices?
    • 0:56:53This rotational cipher that I described earlier, Caesar's simple one,
    • 0:56:57is not to be recommended.
    • 0:56:58It's wonderful for demonstration sake and discussion's sake,
    • 0:57:01but it's not something you should be using in practice unless, for instance,
    • 0:57:05you're in, say, middle school trying to send a message on a piece of paper
    • 0:57:08through your classroom of classmates and worried
    • 0:57:11that the teacher might intercept it and the teacher probably
    • 0:57:14doesn't have the instinct to or the care to actually
    • 0:57:18brute force their way through it and figure out what the key is.
    • 0:57:20But that's the level of security you're getting with something
    • 0:57:23like that rotational cipher.
    • 0:57:25But in the real world, with our phones and desktops and laptops today,
    • 0:57:29generally used our AES or triple DES, both of which
    • 0:57:33are popular algorithms that have been vetted by the world
    • 0:57:36and are very commonly used as secret key encryption ciphers
    • 0:57:41or symmetric key encryption ciphers, which, to be clear,
    • 0:57:44require that both the sender and the receiver
    • 0:57:47know and use the exact same key.
    • 0:57:50And for our purposes today, let me just stipulate
    • 0:57:52that the mathematics of these two and other algorithms
    • 0:57:55much more sophisticated and documented in textbooks,
    • 0:57:58but, therefore, it makes it much harder for the adversary
    • 0:58:02to figure out, as by trying 25 different keys, what the actual key in use
    • 0:58:08might be.
    • 0:58:10Questions now about secret key cryptography or any of the primitives
    • 0:58:17we've just discussed?
    • 0:58:19STUDENT: So is it possible that if someone hacks the-- like
    • 0:58:22gets to know about the hash value-- the hash function of a company that it
    • 0:58:26is using, he might be able to use the hash values
    • 0:58:29and use-- like find a reverse function and then get the passwords for that?
    • 0:58:34DAVID J. MALAN: A good question.
    • 0:58:35I wouldn't worry mathematically about someone reversing the hash functions,
    • 0:58:39if only because with all of the ones that are in popular use
    • 0:58:43today in modern systems, there are a lot of smart mathematicians, computer
    • 0:58:47scientists, professionals who have vetted, if not proven mathematically,
    • 0:58:52that these things work as expected.
    • 0:58:54However, if the passwords that have been hashed are relatively easy to guess,
    • 0:59:00or if the adversary just gets lucky with whatever technique
    • 0:59:03they are using, it is absolutely possible to find at least a password,
    • 0:59:07a input that maps to that hash value, but often
    • 0:59:10not without significant effort.
    • 0:59:12And so generally, a company does not want to,
    • 0:59:15should not try to keep proprietary or secret what
    • 0:59:19hash function they're using, what encryption algorithm they're using.
    • 0:59:22If anything, I dare say, it should be reassuring
    • 0:59:25to the public if and when companies are using best practices and de facto
    • 0:59:30standards, all of these algorithms are designed
    • 0:59:32to keep secret not the algorithm itself, which literally can be found
    • 0:59:36in like university textbooks nowadays and on Wikipedia and beyond,
    • 0:59:40but rather, to keep secret the thing that's designed to be secret,
    • 0:59:44which is the key.
    • 0:59:45And now, if you're using too small of a key like I did originally,
    • 0:59:49well, then you're just using the algorithm poorly, perhaps.
    • 0:59:52But so long as you're adhering to best practices
    • 0:59:54and picking a really big, recommended-sized key,
    • 0:59:57then things mathematically should be trustworthy.
    • 1:00:01STUDENT: For an attacker, rather than like basically cracking a hash
    • 1:00:05or cracking an algorithm, wouldn't it be easier
    • 1:00:08to just try and access the basic server database
    • 1:00:12and access the hash function like generated code?
    • 1:00:16So rather, access how the specific algorithm works.
    • 1:00:20That way, they can basically just reverse-engineer it?
    • 1:00:24DAVID J. MALAN: Everything you described is possible.
    • 1:00:27However, I would push back on this assumption
    • 1:00:31that the company should try to keep its hash algorithm secure or hidden.
    • 1:00:36You should trust in the mathematics of what
    • 1:00:38we're discussing today, both in the context of hashes
    • 1:00:41and in the context of encryption.
    • 1:00:43And I've pulled back up on the screen here the number of possible hashes
    • 1:00:48that exist when using one of the most modern standards for hashing passwords.
    • 1:00:53This is such a big number--
    • 1:00:55I dare say, I don't remember how many atoms are in the universe,
    • 1:00:58but I'm going to guess it's fewer than this, maybe.
    • 1:01:01The idea is, intuitively, that if the search space of possible hash values
    • 1:01:07or the search space of possible keys is so darn big,
    • 1:01:11both you and I, not to speak darkly, are going
    • 1:01:14to be dead before the attacker actually figures out what
    • 1:01:18that password or that hash actually is.
    • 1:01:22So that's generally the presumption.
    • 1:01:24Most of what we do today in terms of security all boils down
    • 1:01:28to probabilities and trying to derive the probability of being exploited way,
    • 1:01:33way, way down, even though, if your password is still 00000000,
    • 1:01:40doesn't matter if there's this many or more possibilities if the adversary
    • 1:01:44tries that one first.
    • 1:01:45So keeping algorithms secret, keeping ciphers secret
    • 1:01:49is generally not best practice.
    • 1:01:52You should be trusting that the math and the probabilities
    • 1:01:55will protect your data if you are using these algorithms correctly.
    • 1:02:00And how about one more question before we resume?
    • 1:02:03STUDENT: How cipher work with word?
    • 1:02:07Not number, like with words, how it work?
    • 1:02:12How we can cipher-- or cryptograph like our latest with words, not the number,
    • 1:02:22how it can be work?
    • 1:02:25DAVID J. MALAN: OK, so if your key is a word and not a number,
    • 1:02:28let me first say that generally when it comes to encryption,
    • 1:02:32the keys are not words.
    • 1:02:34These are not passwords, they're not meant to be used in quite the same way.
    • 1:02:37These keys are generally generated by the computer for you,
    • 1:02:41and so as such, they're just random numbers for the most part.
    • 1:02:45With that said, even if it is a word like apple, there are ways--
    • 1:02:50and you would learn this in a class like CS50
    • 1:02:53itself-- to convert a word to the underlying numeric representation.
    • 1:02:58There's a system called ASCII or Unicode.
    • 1:03:00So capital A is actually the number 65 in most systems.
    • 1:03:04Capital B is the number 66.
    • 1:03:06But we can go one level deeper.
    • 1:03:07There's actually a pattern of 0's and 1's that represent A's and B's and C's
    • 1:03:11and so forth, so we can convert everything in the world of computers
    • 1:03:15to numbers.
    • 1:03:16And for that, let me encourage you to take CS50x online.
    • 1:03:20So that, then, is secret key cryptography
    • 1:03:24or symmetric key cryptography, but it doesn't solve all of our problems,
    • 1:03:28because I've taken for granted throughout this whole discussion
    • 1:03:31that the sender and the receiver have a shared secret between them.
    • 1:03:36Whether it's a simple key like 1 or 2 or 13--
    • 1:03:40hopefully not 26-- or hopefully some much bigger value.
    • 1:03:43But there's kind of a chicken and the egg problem there,
    • 1:03:46so to speak, in English whereby how do you actually establish
    • 1:03:50a shared secret between parties A and B if A and B have never talked before,
    • 1:03:57in fact?
    • 1:03:58So for instance, if you're visiting Amazon.com
    • 1:04:01for the first time, a popular e-commerce website, or gmail.com for your email,
    • 1:04:05ideally, and you probably know this already
    • 1:04:08from just living in the real world nowadays,
    • 1:04:10ideally you want that connection to Amazon or Gmail to be encrypted,
    • 1:04:15to be scrambled in some way.
    • 1:04:16Why?
    • 1:04:17Well, you don't want your password being stolen by someone.
    • 1:04:19You don't want your credit card number being intercepted by someone.
    • 1:04:22You don't want your personal emails being read by other people.
    • 1:04:25So it stands to reason that encryption is generally a good thing.
    • 1:04:29And you've seen this, perhaps, in the URL bar
    • 1:04:31via something called HTTPS where the S literally is meant to mean Secure.
    • 1:04:36But odds are, you don't know anyone personally at amazon.com
    • 1:04:41and you don't know anyone personally at gmail.com.
    • 1:04:43So what key are you going to use to communicate securely
    • 1:04:47with these websites, not to mention new websites that don't even exist today
    • 1:04:51but might come online tomorrow, how do you establish a shared secret
    • 1:04:56with someone else?
    • 1:04:57So that's a fundamental gotcha or caveat with symmetric key
    • 1:05:02or secret key encryption, is that it assumes
    • 1:05:05that you have a shared secret between you and the other person.
    • 1:05:09But the chicken and the egg scenario comes
    • 1:05:11in whereby the only way to establish a shared secret
    • 1:05:15would be to send it to the other person securely,
    • 1:05:18but if you can't communicate securely, you can't even
    • 1:05:21send them the secret you want to use.
    • 1:05:23So you're caught in this deadlock.
    • 1:05:26Thankfully, thanks to math, there are ways
    • 1:05:28that we can solve this, too, via not symmetric key cryptography,
    • 1:05:32but public key cryptography, otherwise known as asymmetric key cryptography.
    • 1:05:37And among the algorithms here might be these, something called Diffie-Hellman,
    • 1:05:40MQV, RSA, and others as well.
    • 1:05:43And I dare say, on this list, maybe RSA is among the most well-known.
    • 1:05:47It's perhaps an acronym you've actually seen in the wild.
    • 1:05:50Now what do we mean by public key cryptography,
    • 1:05:53or more specifically, public key encryption?
    • 1:05:56Well, in the world of public key encryption,
    • 1:05:58or asymmetric key encryption, the asymmetry
    • 1:06:02is implying that you actually don't use one key between the two people
    • 1:06:07A and B. You actually use two keys.
    • 1:06:10In the world of public key encryption, everyone in the world
    • 1:06:14has both a public key and a private key.
    • 1:06:17And these two are just really big numbers.
    • 1:06:19There is a mathematical relationship between these numbers, the public key
    • 1:06:23and the private key, but that's a relationship
    • 1:06:25that your phone or your laptop or your desktop
    • 1:06:27figures out when generating these values for you.
    • 1:06:30So unlike our previous discussion of passwords, which you and I as humans do
    • 1:06:35choose and memorize or store in our password managers,
    • 1:06:39when it comes to keys, these are generally,
    • 1:06:42in the world of public key cryptography, generated for you.
    • 1:06:45And as the name suggests, the whole purpose of these keys
    • 1:06:48is to tell the whole world if you want what your public key is.
    • 1:06:52It is not in any way secret.
    • 1:06:54You can literally email it out, you can put it in the signature of every email,
    • 1:06:59you can post it on your website, on social media.
    • 1:07:01The whole point of the public key is to make it, indeed, public.
    • 1:07:05But, suffice it to say, the private key should be kept secret by you,
    • 1:07:10private by you on your own device.
    • 1:07:13That should never be shared with anyone else.
    • 1:07:15But the cool thing about public key cryptography and the mathematics
    • 1:07:19underlying it is that if you share your public key
    • 1:07:23with someone else on the internet, they can use that public key
    • 1:07:27to encrypt a message and then send it to you over email
    • 1:07:30or chat or any other technology.
    • 1:07:33And if you had to guess, what is the only key
    • 1:07:36in the world that can decrypt a message that has
    • 1:07:40been encrypted with your public key?
    • 1:07:42The only key in the world that can decrypt
    • 1:07:46a message that has been encrypted with your public key is your private key.
    • 1:07:51That's what the mathematical relationship ultimately does for you.
    • 1:07:54So, pictorially here, if this is our algorithm that
    • 1:07:57implements this idea of public key encryption,
    • 1:07:59let's see what the inputs and outputs should be.
    • 1:08:01If the goal is to send a message to you and you
    • 1:08:04have shared with the world your public key, whoever is sending you
    • 1:08:07this message uses your public key, their plaintext message, and out of that
    • 1:08:13comes ciphertext.
    • 1:08:15That, then, is how asymmetric key encryption works.
    • 1:08:18Meanwhile, when you receive that message,
    • 1:08:21you can use your own private key and the ciphertext you've just
    • 1:08:26received to get back the plaintext.
    • 1:08:28And this is what we mean by asymmetric.
    • 1:08:30Unlike secret key cryptography or symmetric key cryptography where
    • 1:08:35you're using the same key back and forth, plus 1 or minus 1
    • 1:08:39in the case of the rotational cipher, with asymmetric encryption,
    • 1:08:44you are using one key for one process and another key for the decryption
    • 1:08:49process.
    • 1:08:50So that's what's fundamentally different.
    • 1:08:52RSA is one of the most popular algorithms for this.
    • 1:08:55The browsers you probably use every day are probably
    • 1:08:58using some variant of RSA underneath the hood.
    • 1:09:01We won't get into great detail about the mathematics,
    • 1:09:03but one of the most important details about RSA
    • 1:09:06is that it relies on really big prime numbers.
    • 1:09:10In fact, in a nutshell, what happens with RSA is your computer or your phone
    • 1:09:15chooses a really big prime number called p.
    • 1:09:18It then chooses a really big other prime number called q.
    • 1:09:21Then it multiplies them together to get a new value, we'll call it n.
    • 1:09:25And it uses that value n in the resulting mathematics
    • 1:09:30that the algorithm's authors came up with, dot-dot-dot.
    • 1:09:33The presumption here is that when you take a really big prime number
    • 1:09:37and multiply it against a really big other prime number,
    • 1:09:40it is really hard to figure out from the product of those numbers
    • 1:09:45what the original p and q were.
    • 1:09:48And if you're a little hazy on prime numbers,
    • 1:09:50it's a number that can be only-- that can only be divided by itself and 1.
    • 1:09:55And indeed, we can use those, coming up with two big ones,
    • 1:09:59multiply it together in order to get this value n that is subsequently
    • 1:10:03used in the rest of the mathematics.
    • 1:10:05What are the rest of those mathematics?
    • 1:10:07In essence, this.
    • 1:10:08And this will be the scariest-looking formulas you perhaps
    • 1:10:10see over the course of this class.
    • 1:10:12The value n I just described is used as to divide values
    • 1:10:18ultimately if you're unfamiliar with mod here, this means to, in this context,
    • 1:10:22take the remainder of some value.
    • 1:10:24So what are we doing?
    • 1:10:25Here is a quick summary of how encryption and decryption works
    • 1:10:29with RSA.
    • 1:10:30If you have some message m that you want to send to another person
    • 1:10:35and you have come up with somehow, via the dot-dot-dot process
    • 1:10:39earlier that I alluded to, you've come up with your own public key e there.
    • 1:10:45Well then, someone can take their message,
    • 1:10:47encrypt it by raising that message to the power of e, the exponent of e,
    • 1:10:53and then divide it, divide it, divide it, divide it by n
    • 1:10:56and figure out what the remainder is when dividing by n.
    • 1:11:00That then gives you a value called c for ciphertext.
    • 1:11:03When you then receive that message c, you can use your private key,
    • 1:11:08known here as d, and you raise the ciphertext,
    • 1:11:12its numeric value, to the power of d-- that is, the exponent in d, and you
    • 1:11:16divide, divide, divide by n in order to figure out
    • 1:11:19that remainder, which will give you back the original message.
    • 1:11:22Now that is a significant oversimplification of what's going on,
    • 1:11:26but that's the essence of the algorithm.
    • 1:11:28It has to do with picking two very large prime numbers,
    • 1:11:32multiplying them together to get that value n,
    • 1:11:35and then using n as well as other values that, dot-dot-dot, are generated
    • 1:11:39by the algorithm for you, e and d, in order to encrypt and decrypt messages
    • 1:11:46ultimately.
    • 1:11:47And this is what's generally known as modular arithmetic.
    • 1:11:49It involves lots of division and division and division
    • 1:11:52in order to come up with these remainders,
    • 1:11:53but ultimately, it is a very secure way to asymmetrically share information
    • 1:11:59without having to agree on one shared key in advance,
    • 1:12:02but rather, using a public and a private key instead.
    • 1:12:06Now there are other techniques that come with this world
    • 1:12:10of public key cryptography, and another technique is that of key exchange.
    • 1:12:14So by contrast, if you do actually want to establish
    • 1:12:18some kind of shared secret, there are alternative algorithms
    • 1:12:22that different humans have invented over the years.
    • 1:12:24So there are alternatives to one algorithm or another,
    • 1:12:27and one of these alternatives is actually
    • 1:12:29called Diffie-Hellman, named after another pair of authors here.
    • 1:12:33So here is the essence of the mathematics for this algorithm,
    • 1:12:37the goal of which is indeed key exchange.
    • 1:12:40To figure out, using fancy mathematics, how both A and B can come up
    • 1:12:45with the same value that they can then use as a shared secret,
    • 1:12:49but without anyone who intercepts any of their messages
    • 1:12:52being able to figure out what is that shared value, that shared secret.
    • 1:12:57So what's the essence of the math here?
    • 1:12:59Well, you first pick a value g, which is called a generator.
    • 1:13:02It can be as simple as the number 2.
    • 1:13:04And you pick a big prime number, call it p here.
    • 1:13:07And those are agreed-upon in advance.
    • 1:13:10Meanwhile, person A, say Alice, picks her own private key A,
    • 1:13:15which is another really big number, and then she does this math. g
    • 1:13:18to the power of A mod p.
    • 1:13:20And again, mod refers to taking the remainder of some value.
    • 1:13:23Meanwhile, B, or Bob, still uses the same g, still uses the same p,
    • 1:13:28picks his own private key called B and raises g to the power of B modulo p,
    • 1:13:34and that gives him back this value capital
    • 1:13:36B, whereas Alice had capital A. Then, turns out that Alice and Bob can
    • 1:13:41send those values across the internet--
    • 1:13:44A one way, B the other way, and thanks to some fancy modular arithmetic
    • 1:13:50here, too, Alice can take Bob's B value and raise it
    • 1:13:54to the power of her A value, which effectively gives you
    • 1:13:58g to the power of A times B mod p.
    • 1:14:02Bob, meanwhile, can take Alice's A value that was sent to him,
    • 1:14:05raise it to the power of his private key B, and then mod p.
    • 1:14:10So calculate the remainder with respect to p.
    • 1:14:13The end result, and it's totally fine if these mathematics
    • 1:14:16are uncomfortable for you or whoo!
    • 1:14:18Just know that, thanks to some basic principles of mathematics,
    • 1:14:22this results in both Alice and Bob having the exact same value--
    • 1:14:27we'll call it s for shared secret--
    • 1:14:30even though the value never went across the internet in its entirety.
    • 1:14:35Alice sent part of it this way, Bob sent part of it this way,
    • 1:14:38but because Alice and Bob held on to private values, the little A
    • 1:14:43and the little B, they kept that to themselves, they're
    • 1:14:46able to do these mathematics that ensure that they both came up
    • 1:14:49with the same value even though you or I, if we intercepted
    • 1:14:53any one of those messages, we could not figure out what it is.
    • 1:14:57And now that they have a shared secret s,
    • 1:14:59they can use that using any of those other symmetric
    • 1:15:02ciphers we talked about earlier.
    • 1:15:04AES I put on the board briefly, triple DES I put on the board briefly.
    • 1:15:09Heck, we could even use this in a rotational cipher
    • 1:15:12if we really wanted to, but not, indeed, best practice.
    • 1:15:15So again, don't worry so much about focusing on the mathematics,
    • 1:15:19but if you were to take a higher-level class in theoretical computer science,
    • 1:15:22these are intellectual rabbit holes that you could go down to better understand
    • 1:15:26how the software works.
    • 1:15:27And now to my comments earlier about not trying
    • 1:15:30to invent your own cryptographic functions,
    • 1:15:33this is the kind of reason why.
    • 1:15:35This is the degree of sophistication that you and I take
    • 1:15:37for granted in our phones, our laptops, and desktops
    • 1:15:41that have been vetted by industry and academics alike.
    • 1:15:44Generally best practice is to rely on standards
    • 1:15:47that have been tried and tested rather than
    • 1:15:50try to come up with your own creative cryptosystem, so to speak,
    • 1:15:53that may very well have faults that you yourself do not know.
    • 1:15:57And the icing on the cake is that this is ultimately, if curious as
    • 1:16:01to the underlying mathematics, what value ultimately
    • 1:16:04Alice and Bob are both calculating, g to the power A times B mod p.
    • 1:16:10But more on that in a higher-level mathematics course if indeed
    • 1:16:13of interest.
    • 1:16:14How about one final building block that you
    • 1:16:17get from this world of public key cryptography,
    • 1:16:19and this is one that's going to be increasingly omnipresent,
    • 1:16:23I do think, in our world, especially as we move away
    • 1:16:25from very archaic paper-pencil signatures
    • 1:16:28that you might write with a pen on a paper,
    • 1:16:31and rather, moving to what we'll call digital signatures as well.
    • 1:16:35It turns out that once you're comfortable with the idea
    • 1:16:38of public key cryptography generally involving a public key
    • 1:16:42and a private key, the first of which is literally public,
    • 1:16:46you can share it with the world; the second of which is meant to be private,
    • 1:16:49kept only to you.
    • 1:16:50And if you can take at face value my claim
    • 1:16:53that through appropriate mathematics, there's
    • 1:16:56a relationship possible between these two numbers,
    • 1:16:59that whereas one can encrypt data, the other can decrypt,
    • 1:17:02even if you don't care to get into the specifics of the mathematics,
    • 1:17:06but you just agree that, OK, that sounds reasonable to me,
    • 1:17:09that that math can work, we can now use that building block
    • 1:17:15of a public key and a private key to solve other problems as well.
    • 1:17:18Not just encrypt messages from point A to point B
    • 1:17:21and back, but rather, to sign information, sign documents,
    • 1:17:25even, and say, yes, this was signed by David or someone else.
    • 1:17:30So how does this work?
    • 1:17:31In the world of digital signatures, here's
    • 1:17:33a few more acronyms of algorithms that are commonly
    • 1:17:36used even though we'll continue to simplify them in our discussion.
    • 1:17:39DSA, ECDSA, RSA, and others can be used to give you
    • 1:17:43the ability to sign documents or other pieces of information digitally.
    • 1:17:48So what does it mean to sign something digitally?
    • 1:17:51It's not at all like this with a unique signature,
    • 1:17:53it's all mathematics involved.
    • 1:17:56So, here, then, might be our algorithm for digitally signing
    • 1:18:00some document or piece of information.
    • 1:18:02And I claim that the input to this process is a message.
    • 1:18:06A letter that you've written, a contract that you want to sign,
    • 1:18:09something that you want to put your digital signature on.
    • 1:18:11And the output of this message initially is going to be a hash.
    • 1:18:16So we can use any number of hash functions
    • 1:18:18we talked about earlier that take as input an arbitrary length
    • 1:18:23input, like a message, a document, an essay, a contract,
    • 1:18:27and produce as output a fixed length hash value.
    • 1:18:32So we've seen that and we've stipulated that is indeed
    • 1:18:34possible, similar in spirit to our password discussion earlier.
    • 1:18:38You can even do it for larger inputs than passwords.
    • 1:18:40You can do it for entire documents as well.
    • 1:18:43Once you have that hash, here's how you digitally sign the document.
    • 1:18:47You use your private key, you pass that as input, as well as the hash value
    • 1:18:54you just computed a moment ago into the digital signature algorithm,
    • 1:18:58and the output of that process is a signature.
    • 1:19:01So if you think about this intuitively, what are we doing?
    • 1:19:04Well, we're taking an arbitrary-sized document.
    • 1:19:07Maybe it's a letter that you've written, maybe it's
    • 1:19:09a contract that you've written that you need to sign that might be short
    • 1:19:12or it might be really long.
    • 1:19:14Here's where the value of cryptographic hash functions come in.
    • 1:19:17Recall that a cryptographic hash function, by definition,
    • 1:19:19takes an arbitrary-sized input and reduces it to a fixed-sized output.
    • 1:19:25So it doesn't matter how big the original
    • 1:19:27was, you can distill it into a distinct representation that's shorter.
    • 1:19:31So, per this diagram, if you take that hash value
    • 1:19:35and you encrypt it with your private key, what we say
    • 1:19:39is that the output of that process, which
    • 1:19:41is just a really big number or some sequence of weird-looking text,
    • 1:19:45is your digital signature.
    • 1:19:47Now this is a little weird because what we're doing now
    • 1:19:50is the opposite of public key encryption.
    • 1:19:53With public key encryption, remember, someone else
    • 1:19:56used your public key to encrypt a message to you
    • 1:19:59and you used your private key to decrypt it.
    • 1:20:02But in the case of digital signatures, the story gets flipped upside-down.
    • 1:20:06You use your private key and a hash of your message
    • 1:20:11to digitally sign your document and the output of that is a signature-- again,
    • 1:20:15a number or some string of text.
    • 1:20:17And you send that signature to the recipient saying, this
    • 1:20:20is my digital signature, you can verify it now if you so choose.
    • 1:20:24And they should.
    • 1:20:25So that invites the question, well, how does the recipient
    • 1:20:29verify your digital signature?
    • 1:20:31How do they know that this weird-looking sequence of characters or numbers
    • 1:20:34actually was signed by you?
    • 1:20:36Well, recall that you have not only a private key, but a public key as well.
    • 1:20:41And that public key is accessible to everyone, including that recipient.
    • 1:20:44And so, what happens is this.
    • 1:20:46When that recipient gets your document and your digital signature,
    • 1:20:51so to speak, they probably want to and should verify the digital signature
    • 1:20:55to confirm that, yes, you signed off on that document or contract.
    • 1:20:59So what does that box look like?
    • 1:21:01Well, they have received not only the document itself, the so-called message,
    • 1:21:05they've also received your digital signature.
    • 1:21:07So you've sent them two things.
    • 1:21:08And the digital signature, you can think of it like a human signature,
    • 1:21:11but it's, of course, a big number or a string of text.
    • 1:21:14But they've sent you two things-- the document and that signature.
    • 1:21:17So what do you do?
    • 1:21:18You take the document you've received and you run it
    • 1:21:20through the exact same publicly available hash
    • 1:21:22function, because the document might be long,
    • 1:21:24so you want to collapse it into a short hash representation
    • 1:21:28thereof, just like our use of passwords.
    • 1:21:31So that you can just do easily, no private information involved.
    • 1:21:35But then what do you do?
    • 1:21:36You then take the public key of the person who signed this document, you
    • 1:21:42take the signature that they claim is their signature,
    • 1:21:45and you decrypt their signature with their public key.
    • 1:21:50That should output the exact same hash that you just calculated.
    • 1:21:58So to summarize, the message itself the document in this story is public.
    • 1:22:04It's not encrypted, it's not something you really worry about being private.
    • 1:22:07What you really care about in this story is
    • 1:22:09that it was signed by a specific person.
    • 1:22:11So if that message, that document is available to both the sender
    • 1:22:15and the receiver, both of them do this first process of hashing the message,
    • 1:22:20hashing the document just to get some succinct representation thereof.
    • 1:22:24So it's not this big, it's this big.
    • 1:22:26Makes the math quicker and easier.
    • 1:22:28However, what the recipient does is upon receiving not only that message, which
    • 1:22:33they just hashed, but also your claimed digital signature,
    • 1:22:37they try to decrypt your signature using your public key.
    • 1:22:42And here, too, just as the private key can
    • 1:22:45reverse the encryption done by a public key,
    • 1:22:48so can the public key reverse the encryption done by a private key.
    • 1:22:53So if the recipient mathematically gets the exact same hash
    • 1:22:58after decrypting what you sent them, it must be the case
    • 1:23:01mathematically that the only person in the world who
    • 1:23:04could have signed this document is, in fact, you
    • 1:23:07because they have your public key.
    • 1:23:09And maybe some third party, some registry,
    • 1:23:11some company has said, yes, that is David Malan's public key,
    • 1:23:15you can trust that.
    • 1:23:16And so, if David Malan's private key has not been compromised,
    • 1:23:20you can trust that any signature that you can decrypt with my public key
    • 1:23:26must have been encrypted with my private key.
    • 1:23:31And it takes a while, I think, for these ideas, and certainly the mathematics
    • 1:23:34to sink in, but for now, if you just trust
    • 1:23:36that there's two big numbers in the world, one public, one private,
    • 1:23:39there's a mathematical relationship between them such that one can reverse
    • 1:23:43the effects of the other in either direction,
    • 1:23:46we humans can use this now not only to secure
    • 1:23:49our messages per our discussion of encryption,
    • 1:23:53we can also use it to authenticate messages
    • 1:23:56and attest, yes, this came from David Malan or did not.
    • 1:24:01And unlike a human signature on a piece of paper
    • 1:24:03that can obviously just be photographed, duplicated, traced over,
    • 1:24:07the secrecy of digital signatures relies on keeping your private key private,
    • 1:24:13and that notion does not exist in the world of human signatures,
    • 1:24:16and so in that sense, digital signatures are objectively better
    • 1:24:20than our old-form human ones.
    • 1:24:24Questions now?
    • 1:24:25And I know that's a lot, and it's OK if it didn't all go down at once.
    • 1:24:29Questions on digital signatures, public key encryption or decryption,
    • 1:24:34or anything prior?
    • 1:24:36STUDENT: Would these public and private keys be attributed to, what,
    • 1:24:39your IP address?
    • 1:24:41DAVID J. MALAN: A good question.
    • 1:24:42To what are they attributed?
    • 1:24:43Not to your IP address typically.
    • 1:24:45They are typically stored in a registry, like a central registry that
    • 1:24:49knows that this is Vlad's public key, this is David's public key and so
    • 1:24:53forth.
    • 1:24:53And it relies on a system of trust and transitivity.
    • 1:24:56So if you trust this third party company that is storing all of our public keys,
    • 1:25:01then you can trust whoever it is "they" are, in turn, trusting.
    • 1:25:05Or it can be more distributed.
    • 1:25:07Your public key can literally be distributed
    • 1:25:09in the footer of your emails.
    • 1:25:10It can be posted on your website.
    • 1:25:12It can be on your LinkedIn profile or the like.
    • 1:25:14And so long as other people in the world trust
    • 1:25:17your emails or your website or LinkedIn, they
    • 1:25:21can trust that that is, in fact, your public key.
    • 1:25:23So different ways to implement that system of trust.
    • 1:25:26Other questions?
    • 1:25:28STUDENT: Hashing uses a mathematical function and encryption uses
    • 1:25:33a mathematical function plus a key.
    • 1:25:36Like the Caesar Cipher basically uses the simple function plus the key.
    • 1:25:42Is that analogy correct?
    • 1:25:44DAVID J. MALAN: Yes, that is correct.
    • 1:25:46And if it helps you-- this is an oversimplification,
    • 1:25:49but it's generally helpful, I think, to think of hashing as one-way.
    • 1:25:54So you can only convert a value to a hash value but not the opposite.
    • 1:26:00But encryption is like two-way--
    • 1:26:04it's reversible hashing, so to speak.
    • 1:26:06The output still looks weird and random, but you can undo the process.
    • 1:26:10And one way to think about this is in the world of hashing,
    • 1:26:14because I claim that you can take like an infinite domain,
    • 1:26:18like any possible message you want to send, and convert it
    • 1:26:22to a finite range,--
    • 1:26:23for instance, all A-words could be a hash value of 1,
    • 1:26:27all B-words could have a hash value of 2.
    • 1:26:29That simple example already captures the reality
    • 1:26:34that if you only have the hash values 1, 2,
    • 1:26:38I have no idea what the original input is.
    • 1:26:41And it doesn't matter how hard I try, I'm never going to figure it out
    • 1:26:44because it could be apple or avocado or something else that starts with A.
    • 1:26:48So hashing in that sense, one-way hashing throws away information such
    • 1:26:54that it's not recoverable.
    • 1:26:55But encryption does the opposite.
    • 1:26:58It would be pretty useless if encryption threw away information
    • 1:27:02because the whole point of encryption is to secure messages and information
    • 1:27:06we want to send.
    • 1:27:07So encryption is reversible; hashing, in general, is not.
    • 1:27:13And, as you know, the key, no pun intended, to encryption
    • 1:27:17is necessary so that you can reverse the process in a way that
    • 1:27:21remains secret to other people.
    • 1:27:24How about one more question, and then we'll take a short break
    • 1:27:26and then we'll come back and wrap up.
    • 1:27:28STUDENT: Is there any possibility to spoof the signatures?
    • 1:27:32DAVID J. MALAN: Short answer, no.
    • 1:27:34Like so long as you are using a standard that we believe
    • 1:27:38to be correct and not compromised, so long as your private key has not
    • 1:27:43been stolen by someone or no one's taken it off of your phone or your computer,
    • 1:27:47they should not-- it should not be possible to forge it.
    • 1:27:50The probability is so, so, so low, it should be the least of your concerns
    • 1:27:55is the idea.
    • 1:27:57Now it turns out, there is yet one other application
    • 1:28:00of this world of public key cryptography that solves a problem from last time.
    • 1:28:05Recall that we ended our first class on a note of emphasizing
    • 1:28:09that passwords and password managers can improve our security if used properly,
    • 1:28:14but there's another technology that's becoming increasingly available.
    • 1:28:18And it's colloquially called passkeys.
    • 1:28:21Or more technically, it's an implementation
    • 1:28:23of a standard called web authentication.
    • 1:28:25And it turns out that these passkeys, which
    • 1:28:27are available on certain platforms and certain websites and evermore
    • 1:28:31will be available soon quite shortly, they, too,
    • 1:28:34rely on public and private keys as follows.
    • 1:28:38And thankfully now, as fancy as the mathematics
    • 1:28:41we're alluding to today sound, there really are only two ways
    • 1:28:44to use these public and private keys--
    • 1:28:46to either encrypt with one and decrypt with the other or vice versa.
    • 1:28:50So we have just a fairly basic building block
    • 1:28:53that we can use in one direction or another.
    • 1:28:55So how do passkeys work?
    • 1:28:57In the near-future, as you will find, when
    • 1:29:00you go to certain websites or applications,
    • 1:29:02you probably will not be prompted as frequently to type in a username
    • 1:29:07and pick a password, which is to say, you
    • 1:29:09don't have to generate a hard-to-guess password,
    • 1:29:11you don't have to memorize a hard-to-guess password.
    • 1:29:14You don't have to even store a hard-to-guess password in a password
    • 1:29:17manager because passkeys eliminate passwords.
    • 1:29:21It moves us more toward a world of passwordless accounts.
    • 1:29:26Now how can that be?
    • 1:29:27Because up until now, we've been using usernames and passwords
    • 1:29:30to authenticate ourselves.
    • 1:29:32Well, it turns out, we humans have been getting really good at this math,
    • 1:29:35even if it doesn't feel like it today, we've
    • 1:29:37been getting really good at using mathematics
    • 1:29:39to solve these problems as well.
    • 1:29:41So imagine the following scenario.
    • 1:29:43When you go to a website in the future or app,
    • 1:29:46rather than being prompted to create a username and password,
    • 1:29:49you'll just be prompted to create a passkey.
    • 1:29:52What that means is your laptop or desktop or phone will probably
    • 1:29:56prompt you with some form of factor.
    • 1:29:58They'll ask you for your fingerprint or they'll ask you for a scan of your face
    • 1:30:02or maybe a pin code, a short number that you type in just
    • 1:30:06to demonstrate with high probability that you
    • 1:30:08are authorized to be using this device and creating this account.
    • 1:30:11What then will your device and the website do?
    • 1:30:14Your device will generate a public key and a private key
    • 1:30:19just for that one website or app.
    • 1:30:22Your device will send the public key to that new website, along with your user
    • 1:30:29ID or username, some identifying information
    • 1:30:32so that they know your David or someone else.
    • 1:30:35But you don't send a password.
    • 1:30:37You only send to the website or app your public key.
    • 1:30:41And you keep private, within your browser
    • 1:30:43or some other piece of software, your corresponding private key.
    • 1:30:47And to be clear, this public-private key pair is used only for this one website.
    • 1:30:52You'll do this repeatedly, but automatically
    • 1:30:54for every other website in the world in this model.
    • 1:30:57So what happens when you not register for that website, which
    • 1:31:01you've just done, but you want to log into it tomorrow,
    • 1:31:04next week, or next year?
    • 1:31:06Well, assuming you still have that same device
    • 1:31:08or you're using some kind of cloud service
    • 1:31:11that synchronizes all of your past keys, your public and private keys,
    • 1:31:15across devices--
    • 1:31:16so you haven't lost these past keys, here's
    • 1:31:19how you would log in to the website tomorrow, next week, or next year.
    • 1:31:22The website would send you when you visit a challenge,
    • 1:31:26and a challenge is like some little message.
    • 1:31:28It's like a number or a word or a phrase.
    • 1:31:31It's some piece of randomly-generated data
    • 1:31:33that the website wants you to digitally sign.
    • 1:31:37Well, how do you digitally sign information?
    • 1:31:39I proposed earlier that you can use your private key
    • 1:31:42and pass that key and that challenge, which is just a random input given
    • 1:31:47to you by the website, into your digital signature algorithm, this black box.
    • 1:31:51And the output of that, as before, is your signature.
    • 1:31:54And what is your device do?
    • 1:31:55It sends that signature for that challenge to the website.
    • 1:32:00And if you followed along earlier well enough,
    • 1:32:03you might now realize where we're going with this.
    • 1:32:05How does the website now verify that that is, in fact, your signature?
    • 1:32:11That this did come from David's device and not some adversary online?
    • 1:32:15The website, because it's stored yesterday,
    • 1:32:19last week, last year, your public key, it
    • 1:32:22will use your public key to decrypt your signature
    • 1:32:26using the same algorithm to get back hopefully the same challenge value.
    • 1:32:31And if the output of this verification process
    • 1:32:35matches the challenge the website sent you a second before,
    • 1:32:39it must be the case mathematically that you
    • 1:32:42are, in fact, who you claim to be because it
    • 1:32:44was your device that registered for this website a day, a week,
    • 1:32:48a year ago as well.
    • 1:32:50So again, if we trust in the mathematics here
    • 1:32:53and we trust that these algorithms allow us to encrypt information and decrypt
    • 1:32:58it using a public key and private key, or conversely,
    • 1:33:01a private key and public key, we can, with very, very high confidence,
    • 1:33:06probabilistically say, yes, this is David Malan,
    • 1:33:09I'm going to allow him back into this account.
    • 1:33:12So what's the implication of this passwordless world that
    • 1:33:15uses passkeys keys, or web authentication more technically?
    • 1:33:18It means that we're getting out of the business, potentially,
    • 1:33:21as a society of having to remember dozens
    • 1:33:24or hundreds or thousands of different passwords for all of our accounts.
    • 1:33:28It does require, though, that we don't lose the device or the devices that
    • 1:33:33registered for these websites or apps, but again, increasingly,
    • 1:33:37as the world providing cloud services, whether it's with Apple or Microsoft
    • 1:33:41or Google or others, that presumably can synchronize
    • 1:33:45your passkeys across devices and will conclude ultimately today,
    • 1:33:47by talking about how they can be synchronized securely, even
    • 1:33:51without Google and Microsoft and Apple knowing what your own passkeys are, so
    • 1:33:56long as they provide us with a certain technical guarantee.
    • 1:33:59So the upside of this is we can move away from passwords,
    • 1:34:03and you can even share these passkeys with other people if you so choose.
    • 1:34:08The catch is, right now, they're not omnipresently
    • 1:34:12available on every website out there.
    • 1:34:14It's probably going to take some time for the world to come on board,
    • 1:34:17but I do dare say, in the coming weeks, months, and years,
    • 1:34:20you will see passkeys increasingly offered to you.
    • 1:34:23And so indeed, the next time you visit a website
    • 1:34:25that asks you, hey, do you want to register with your fingerprint
    • 1:34:28or with your face or with a PIN code?
    • 1:34:30And you're never even asked for a password, odds are,
    • 1:34:33it's using this passkey technology instead.
    • 1:34:37Well, let's go ahead and take one more five-minute break here,
    • 1:34:40and when we come back, we'll talk about securing data
    • 1:34:43as it's moving back and forth and sitting on our own systems.
    • 1:34:47All right, so we are back.
    • 1:34:49And allow me to claim that we now have a bunch of ways
    • 1:34:53to hash data and also encrypt data and also now, decrypt data.
    • 1:34:57So how can we use these building blocks to solve
    • 1:34:59some other perhaps familiar problems?
    • 1:35:01Well, there's this notion of encryption in transit,
    • 1:35:04which is a fancy way of saying that you and I probably prefer nowadays
    • 1:35:08that our data be encrypted whenever it's traveling from point A
    • 1:35:11to point B. Whether that point B is Amazon.com, Gmail.com,
    • 1:35:15WhatsApp, or any other service that we're communicating with,
    • 1:35:18we ideally want no one in between us-- some machine in the middle, so
    • 1:35:23to speak, to be able to get at that same data.
    • 1:35:26Because in particular, what you should be worried about
    • 1:35:28is a scenario like this where if Alice is trying to communicate with Bob,
    • 1:35:32you might worry that there's some eavesdropper, so to speak,
    • 1:35:35named Eve between Alice and Bob.
    • 1:35:38And maybe this is via wires nowadays on the internet.
    • 1:35:40Maybe it's somehow wirelessly.
    • 1:35:42Maybe Eve actually represents a company that Alice and Bob
    • 1:35:46are communicating between, like Gmail or Outlook or the like.
    • 1:35:51So encryption in transit, though, is important to distinguish
    • 1:35:54from other forms of encryption.
    • 1:35:56In particular here, Alice might very well
    • 1:35:59have an encrypted connection not to an eavesdropper, per se, but just
    • 1:36:03a third party like Gmail.
    • 1:36:05So assume that Eve here is Gmail.
    • 1:36:07And meanwhile, Bob, when checking his email account,
    • 1:36:10has an encrypted connection to Eve as well, which, in this story now,
    • 1:36:14is Gmail.
    • 1:36:15So Alice has a secure connection to Gmail and Bob
    • 1:36:18has a secure connection to Gmail as well,
    • 1:36:21but that does not mean necessarily that Alice has a secure connection to Bob.
    • 1:36:26Security does not really work through transitivity, so to speak.
    • 1:36:31This might very well mean that the data is only
    • 1:36:34encrypted while in transit from A to E and from B to E,
    • 1:36:39but that doesn't mean that Eve, or Gmail in this story,
    • 1:36:43can't be reading all of Alice's and Bob's emails.
    • 1:36:46And indeed, that is technically possible on Google's end.
    • 1:36:49They, of course, run all of the servers that your Gmail accounts might be on.
    • 1:36:53There's nothing technically probably stopping them
    • 1:36:57from reading anything and everything.
    • 1:36:59Now hopefully they have policies.
    • 1:37:00Hopefully very few humans actually have the privileges or the authorization
    • 1:37:05to even do anything close to that.
    • 1:37:07But technically speaking, just because Alice has a secure connection to Gmail
    • 1:37:11and Bob has a secure connection to Gmail,
    • 1:37:13that doesn't mean that their communications will
    • 1:37:16be encrypted entirely between A and B. And there are lots of examples of this
    • 1:37:22as well.
    • 1:37:23Zoom, for instance, when it comes to video conferencing,
    • 1:37:25you might have an encrypted connection to Zoom,
    • 1:37:27I might have an encrypted connection to Zoom.
    • 1:37:29That does not necessarily mean that Zoom couldn't be Eve in this story
    • 1:37:34listening and watching everything that we're saying while video conferencing
    • 1:37:39as well.
    • 1:37:40So encryption in transit is good in that it at least keeps random people out
    • 1:37:44of the picture because they don't have access to these encrypted channels,
    • 1:37:48but if there is this third party, this machine in the middle
    • 1:37:51or company in the middle, even they might have access to data that we
    • 1:37:55do not want them to have access to.
    • 1:37:58So what, then, is a stronger alternative?
    • 1:38:00Increasingly possible, increasingly available, and something you as a user
    • 1:38:06should be looking for with greater frequency is what
    • 1:38:09we would call an end-to-end encryption.
    • 1:38:11This is a stronger guarantee whereby you can
    • 1:38:14trust that Alice's connection to Bob is, in fact, secure
    • 1:38:19even if-- not pictured here, there are 1, 2, 3, 4 machines in the middle,
    • 1:38:25companies in the middle, eavesdroppers in the middle.
    • 1:38:28If you use encryption properly end-to-end,
    • 1:38:32you can ensure that the only thing Eve or Google or Zoom can see
    • 1:38:38is just your ciphertext, the seemingly random strings of text
    • 1:38:43or 0's and 1's that represent your encrypted data, but without your key,
    • 1:38:47they have no idea what that data actually is.
    • 1:38:51So end-to-end encryption isn't necessarily in most
    • 1:38:54company's best interest.
    • 1:38:55Why?
    • 1:38:55Well, companies like Gmail tend to presumably mine our data,
    • 1:38:59whether it's for advertising purposes or otherwise.
    • 1:39:01And so it's sometimes in companies' interest to have access to your data
    • 1:39:05to keep it secure on their servers, but still
    • 1:39:08in a way that they have access to it.
    • 1:39:10Now that might be not comfortable for you.
    • 1:39:13And so there are alternatives.
    • 1:39:15For instance, iMessage for Apple users and WhatsApp
    • 1:39:18internationally is known in particular for offering end-to-end encryption
    • 1:39:23which, if implemented truthfully and technically correctly,
    • 1:39:27should guarantee that even though your messages might
    • 1:39:29be going through WhatsApp servers, no employee at WhatsApp
    • 1:39:33can actually see your messages because it's encrypted
    • 1:39:36all the way from A to B, even though it's
    • 1:39:39going through a potential eavesdropper.
    • 1:39:42But that depends on exactly what form of encryption you're using,
    • 1:39:45and if it's not end-to-end, it might only
    • 1:39:47be encrypted in transit such that Eve's, that eavesdropper,
    • 1:39:51might indeed have access to the data.
    • 1:39:54So as to how you can use end-to-end encryption,
    • 1:39:56it's an option that a service must provide to you in this case
    • 1:40:00or you must choose services that offer it.
    • 1:40:02It's not necessarily something that's always available,
    • 1:40:05but it is increasingly available in different software.
    • 1:40:09So let's now consider a fairly mundane operation,
    • 1:40:12but one that has implications for these same technologies and solutions.
    • 1:40:16That is, deleting a file, be it on your Mac or your PC
    • 1:40:19or your phone or some other device.
    • 1:40:22Now where is data stored in your devices?
    • 1:40:24Well generally, it might be in a device like this,
    • 1:40:26a large, somewhat older but large hard drive that
    • 1:40:29can store lots and lots of files and folders,
    • 1:40:31or perhaps something smaller known as a solid state
    • 1:40:34drive that might store information entirely digitally
    • 1:40:37without any moving parts.
    • 1:40:39And even smaller might be something like this
    • 1:40:41that you carry around like a USB stick, and they are even smaller nowadays,
    • 1:40:44too, that similarly stores some data digitally.
    • 1:40:48Now how do we go about deleting files from a computer or any
    • 1:40:51of these devices?
    • 1:40:52Well, you typically click it and drag it somewhere, or maybe you right-click it
    • 1:40:56or maybe you tap and drag it to some trash or the like.
    • 1:40:59There's any number of user interface mechanisms for deleting files,
    • 1:41:02but let's consider for our purposes what happens underneath the hood.
    • 1:41:06So let me stipulate that your hard drive, your solid state
    • 1:41:09drive, your USB stick just contains ultimately
    • 1:41:12a whole bunch of 0's and 1's, and those 0's and 1's represent your files
    • 1:41:17and folders.
    • 1:41:18So when you go about deleting a file, by dragging it
    • 1:41:22to the recycle bin on Windows, or dragging it to the Trash
    • 1:41:26Can on macOS, what actually happens?
    • 1:41:29Well, it turns out, not anything at all, really.
    • 1:41:33When you recycle a file on Windows or when you trash a file on macOS,
    • 1:41:37it doesn't actually get deleted in the sense that you and I might expect.
    • 1:41:42By delete it, I mean it's gone.
    • 1:41:43I don't want to be able to find it anywhere.
    • 1:41:45OK, wait a minute, though.
    • 1:41:47Of course, we all know by now, at least on computers,
    • 1:41:49you at least have to empty the Recycle Bin or empty the Trash Can.
    • 1:41:53So OK, maybe I missed that step.
    • 1:41:55But even then, contrary to what you might expect,
    • 1:41:58emptying the Recycle + Bin, emptying the Trash Can also does not generally
    • 1:42:02delete the data.
    • 1:42:04And here's where I'd, again, emphasize, wait a minute,
    • 1:42:06when I delete a file, I want it gone, removed from my computer altogether.
    • 1:42:10But what macOS and Windows and operating systems in general tend to do instead,
    • 1:42:16when you even empty the Recycle Bin or Trash Can,
    • 1:42:19they don't actually get rid of the file, per se, they just forget where it is.
    • 1:42:23Somewhere in the computer's memory, there's
    • 1:42:26like a spreadsheet of sorts, some kind of database or table
    • 1:42:29with at least two columns, one of which has the name of your file
    • 1:42:32or the location of your file, the other of which
    • 1:42:35has some kind of reference to which 0's and 1's on your actual computer
    • 1:42:40implement that specific file.
    • 1:42:43Maybe these 0's and 1's are for one file, these 0's and 1's are
    • 1:42:46for another file, and so forth.
    • 1:42:48So somewhere, your computer is keeping track of what
    • 1:42:50is where physically on your computer.
    • 1:42:53But when you delete a file by emptying the Trash or Recycle Bin,
    • 1:42:56the computer just, eh, forgets where it is.
    • 1:42:58And more importantly, it frees up the space so it can be used later.
    • 1:43:02So what do I mean by that?
    • 1:43:04Well, suppose I do go ahead and delete a file
    • 1:43:07and empty the Recycle Bin or Trash Can, and suppose
    • 1:43:09that these yellow 0's and 1's represent the file that I no longer care about.
    • 1:43:15Well, what's actually going to happen underneath the hood, so to speak,
    • 1:43:19of the computer?
    • 1:43:19Well eventually, some of those yellow 0's and 1's might just
    • 1:43:24get reused for other files.
    • 1:43:26In other words, these 0's and 1's highlighted in yellow
    • 1:43:29represent a file that used to be there, but is not.
    • 1:43:32That is equivalent to saying some other file can now use those same
    • 1:43:360's and 1's.
    • 1:43:37And so here's some random 0's and 1's that may be overwrite some of the file,
    • 1:43:41but not all of it.
    • 1:43:42Notice, there's still a bunch of yellow 0's and 1's here
    • 1:43:45in my depiction of my computer.
    • 1:43:48So it turns out that over time, yes, your file will probably
    • 1:43:53get actually deleted.
    • 1:43:55What do I mean by that?
    • 1:43:56Eventually those 0's and 1's will be repurposed, changed from 1 to 0,
    • 1:44:00changed from 0 to 1 such that your file, for all intents and purposes,
    • 1:44:05is actually gone, because it's been repurposed, that space, altogether.
    • 1:44:09But notice, at least at this point in time,
    • 1:44:12and shortly after you delete a file, even if you've created or downloaded
    • 1:44:15new files, there might still be parts of your files
    • 1:44:18around, which means that sensitive word document or Excel file or images
    • 1:44:23that you had on your computer, there might still be remnants of them,
    • 1:44:27just a few lines from any of those.
    • 1:44:29So you should realize that deleting a file doesn't really get rid of it
    • 1:44:33in the way you might expect or hope.
    • 1:44:35To do that, you need to be a little better with practices.
    • 1:44:39Now what do I mean by this?
    • 1:44:41Secure deletion is another beast altogether.
    • 1:44:44And typically when we delete files, they're not deleted securely.
    • 1:44:48They're not deleted typically in a way that you would hope.
    • 1:44:51So secure deletion does what you might really hope for, get rid of this file
    • 1:44:55altogether.
    • 1:44:56So if we go back to the original contents of my computer
    • 1:44:59with all of these here 0's and 1's, and suppose
    • 1:45:02that I want to delete this file here at the top of the screen,
    • 1:45:05in an extreme ideal world, those 0's and 1's would just be gone.
    • 1:45:10Like that's pretty darn secure.
    • 1:45:11Those bits, those 0's and 1's, they don't even exist anymore.
    • 1:45:15Now this is probably not the best way to securely delete information
    • 1:45:18because if I just got rid of those 0's and 1's somehow, like my hard drive
    • 1:45:23is getting like literally smaller and smaller
    • 1:45:25in terms of how much stuff I can put on it if I don't have as many bits
    • 1:45:29or 0's and 1's available.
    • 1:45:30So that's probably not the best long-term solution
    • 1:45:32because it's expensive.
    • 1:45:34It's like getting rid of some of my capacity.
    • 1:45:36So we don't actually do that, but how might we securely delete a file?
    • 1:45:41I don't think we want to just wait and hope that those 0's and 1's eventually
    • 1:45:46get reused by the system because we might still
    • 1:45:49be left with some remnants which might not be ideal.
    • 1:45:52So what we can do when securely deleting a file is something like this--
    • 1:45:56change all of the 0's and 1's that we don't care about anymore or want,
    • 1:46:00change them all to 0's.
    • 1:46:01And this will effectively securely delete the file
    • 1:46:05because now the 1's that were previously there
    • 1:46:09that represented some piece of information are just completely gone.
    • 1:46:12Or equivalently, I could change them all to 1's.
    • 1:46:15Or I could even change it to random 0's and 1's.
    • 1:46:18The point is, to securely delete a file, you
    • 1:46:20should change all of the 0's and 1's to at least some other pattern
    • 1:46:26so that the file is effectively gone.
    • 1:46:28Now how can you use this to your benefit?
    • 1:46:31Well, some operating systems nowadays support
    • 1:46:34what's called full-disk encryption, and this is good for a number of reasons.
    • 1:46:38One, if you enable a feature called full-disk encryption,
    • 1:46:41which is actually a specific incarnation of an idea known as encryption at rest.
    • 1:46:46Encryption in transit refers, of course, to your data going back and forth
    • 1:46:49from point A to point B. Encryption at rest
    • 1:46:52means it's just sitting there on your device, in your pocket, or on your lap
    • 1:46:56or on your desktop, sitting unused, maybe on or off.
    • 1:47:00So when it comes to full-disk encryption or encryption at rest,
    • 1:47:04you ideally want all of your data somehow encrypted on your Mac,
    • 1:47:09on your PC, on your phone.
    • 1:47:11And only when you log in with your password or maybe
    • 1:47:14your fingerprint or your face should that data be decrypted automatically,
    • 1:47:19and this can happen pretty darn fast nowadays with modern hardware,
    • 1:47:23should the data be unencrypted so you can actually
    • 1:47:25use it and interact with that device.
    • 1:47:28So why is this advantageous?
    • 1:47:31Well, one, if your device gets stolen, so long
    • 1:47:34as you're not logged into it, so long as it's locked,
    • 1:47:37so long as the lid is closed, so long as it's unplugged or any other number
    • 1:47:41of scenarios, at least if someone takes your laptop from the table in Starbucks
    • 1:47:45or the cafe, well, hopefully, if you have
    • 1:47:48a good password or good biometrics, they're
    • 1:47:51not going to be able to get any of your data.
    • 1:47:53They can maybe delete all of your data and they can
    • 1:47:56and sell your computer, they can use your computer, but they probably,
    • 1:47:59if you're practicing best practices, don't have access
    • 1:48:03to the data that's on the system.
    • 1:48:04Why?
    • 1:48:05Because it's completely encrypted at rest and they don't know your password,
    • 1:48:09they don't have your fingerprint, they don't have your face,
    • 1:48:11they should not be able to decrypt that data.
    • 1:48:14So in other words, if this is my unencrypted data,
    • 1:48:17the way I want it and need it when I'm using my computer,
    • 1:48:20full-disk encryption, at rest, would change my entire computer
    • 1:48:25to look random.
    • 1:48:26These are random 0's and 1's now that I generated by using,
    • 1:48:30for instance, my password or my fingerprint or my face.
    • 1:48:34And this is what your hard drive or your solid state drive
    • 1:48:37should look like when the lid is closed, when the power is off.
    • 1:48:41When you are logged out of it, it should be random 0's and 1's.
    • 1:48:46And the upside of this now is that, again,
    • 1:48:49if it's stolen while in this state, there's no data to be used
    • 1:48:53by the adversary because it looks like random 0's and 1's.
    • 1:48:56Better yet, if you deliberately want to get rid of the device
    • 1:49:00because you want to trade it in for resale value,
    • 1:49:02because you want to donate it to someone else,
    • 1:49:04because you want to sell it to someone online,
    • 1:49:06when using full-disk encryption, the upside
    • 1:49:09is that so long as you had a really hard-to-guess password, your data is,
    • 1:49:14for all intents and purposes, securely deleted already.
    • 1:49:17Because only if the new buyer figures out or knows
    • 1:49:21your password or has your same fingerprint or has your same face,
    • 1:49:24they're not going to be able to access any of your data anyway.
    • 1:49:26And this is important nowadays because it turns out, with modern hardware,
    • 1:49:31even if you might want to change all of the 0's and 1's to all 0's or all 1's
    • 1:49:36or all random data, it turns out that today's hardware can fail over time.
    • 1:49:42So even little USB sticks or solid state drives over time can kind of wear out.
    • 1:49:47But they're smart enough, thanks to software
    • 1:49:49known as firmware inside of it, as soon as the device realizes, wait a minute,
    • 1:49:53those bits over there aren't working properly anymore,
    • 1:49:56the device might not let you change them to all 0's or all 1's or a random 0's
    • 1:50:02and 1's anymore.
    • 1:50:03It might just leave them as is forever.
    • 1:50:06Which is to say, it's even more important to start
    • 1:50:09using full-disk encryption, encryption at rest,
    • 1:50:11when you first get a device because that way,
    • 1:50:14you can trust that even if parts of the device degrade over time,
    • 1:50:18all of the data that's there and has been there
    • 1:50:20was at least encrypted with one of your passwords or one of your biometrics
    • 1:50:25in the past.
    • 1:50:26So this is the kind of feature to look for in your Mac, your PC, or your phone
    • 1:50:30to ensure that it is somehow enabled.
    • 1:50:33Thankfully, once you log back in with your password,
    • 1:50:36it goes back to the original data and you can use it.
    • 1:50:38Of course, then, an implication of this best practice
    • 1:50:42is that if you lose your laptop or your phone
    • 1:50:45or your desktop's password, or your fingerprint somehow changed,
    • 1:50:48or your face sufficiently changes, you might be locked out
    • 1:50:51of all of your data, too, but again, that's
    • 1:50:54just another example of this trade-off between usability and security as well.
    • 1:50:59Now a downside, an evil side to full-disk encryption
    • 1:51:02is ransomware, which is how adversaries are monetizing attacks.
    • 1:51:06It's not uncommon nowadays for hackers, for adversaries,
    • 1:51:09when they get into a system, whether it's your laptop
    • 1:51:12or, for instance, a corporate network, or in some cases, hospital
    • 1:51:16systems or a city's own computer networks, to not try to do any damage
    • 1:51:21or just do something like spam or cryptocurrency mining,
    • 1:51:24but to actually encrypt all of the data on these systems they somehow
    • 1:51:30accessed online.
    • 1:51:32Why?
    • 1:51:33Well, if they encrypt all of the data they can then ask for a ransom
    • 1:51:36and say, listen, if you don't give me this many bitcoins,
    • 1:51:39I'm going to give you the key that I used to encrypt your data.
    • 1:51:44And if you poke around online, there have been many examples of this,
    • 1:51:47unfortunately, where hackers have gotten into systems that were not
    • 1:51:51very well-protected, all of the data therein was encrypted,
    • 1:51:55and this is an opportunity for the adversaries
    • 1:51:58to try to extort, say, financial gain from a situation
    • 1:52:02by then only handing you the keys, if ever, once you've actually paid up.
    • 1:52:07And there, too, there's the risk, as in any ransom scenario,
    • 1:52:10where who even knows if they're going to give you the proper key in the end,
    • 1:52:14but this is increasingly a concern for municipalities, for companies,
    • 1:52:17for universities, and the like.
    • 1:52:19So just as we have some upsides here, there,
    • 1:52:22too, is this trade-off in what you can do.
    • 1:52:24And lastly, we thought we'd end on a note about the future
    • 1:52:27because this is a topic that will come up
    • 1:52:29and has come up over time, this topic of quantum computing.
    • 1:52:33So for those less familiar, we've been talking a lot
    • 1:52:35about bits, 0's and 1's today, and at the end
    • 1:52:38of the day that's how today's computer systems are implemented.
    • 1:52:41Patterns of 0's and 1's to represent numbers and letters and colors
    • 1:52:44and videos and sounds and everything.
    • 1:52:47We've been discussing today data more generally.
    • 1:52:50Now typically, in our world now, a bit, a binary digit, can either there be a 0
    • 1:52:56or it can be a 1, as per the diagram we had on the screen in these examples.
    • 1:53:02Either a 0 or a 1.
    • 1:53:04In the world of quantum computing, thanks to some very fancy physics
    • 1:53:08and quantum mechanics in particular, it is possible,
    • 1:53:12it seems, physically, for us to implement the idea of bits a little bit
    • 1:53:17differently using quantum techniques.
    • 1:53:20And there's this idea of not just a bit, but a quantum bit or qubit whose power
    • 1:53:26derives from the reality that physically, you
    • 1:53:28can implement a qubit in such a way that it is representing both a 0
    • 1:53:33and a 1 at the exact same time.
    • 1:53:37So it can be not in just one state, so to speak,
    • 1:53:39one condition at once, but two states at once.
    • 1:53:44And if you have two qubits, they can be in four states at once.
    • 1:53:47If you have three, they can be in eight states at once.
    • 1:53:50If you have 32 of them, they can be in 4 billion states at once.
    • 1:53:55Now what's the implication of this?
    • 1:53:57Well, when we talk about cryptography, when
    • 1:53:59we talk about hashing, when we talk about just very large numbers
    • 1:54:02and trying to figure out via brute force or some other mechanism
    • 1:54:05what some input to a function was, if you have exponentially more computing
    • 1:54:12capabilities by not being able to do one or two
    • 1:54:15things at a time with individual bits, but two or four or eight or 4
    • 1:54:20billion things at once, it stands to reason
    • 1:54:23that if adversaries have access to quantum computing before you
    • 1:54:27and I do, then all of the security you and I now rely on
    • 1:54:31and that we've talked about today could suddenly become insecure.
    • 1:54:35Because we're trusting right now that it's just
    • 1:54:38going to take the adversary a lot, a lot,
    • 1:54:40a lot of time, maybe money, maybe resources,
    • 1:54:42maybe risk to attack our accounts.
    • 1:54:44But if they have exponentially more resources than you and me,
    • 1:54:49then our data really is at risk.
    • 1:54:51And all of the mathematics we've been trusting need to be hardened instead.
    • 1:54:56Now hopefully you and I will have access to quantum computing at the same time
    • 1:55:00as or ideally before all of these adversaries,
    • 1:55:03so hopefully our algorithms for securing information
    • 1:55:06will continue to evolve along with these technologies.
    • 1:55:08So this isn't necessarily something you need to worry about for now.
    • 1:55:11Indeed, I think after today, we have more than enough to worry about.
    • 1:55:15So for today, that's all.
    • 1:55:17We'll see you next time.
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video