CS50 Video Player
    • 🧁

    • 🍮

    • 🍏

    • 🍿
    • 0:00:00Introduction
    • 0:00:15Cryptography
    • 0:01:25Ciphers
    • 0:02:25Substitution Cipher
    • 0:09:25Caesar Cipher
    • 0:11:08Vigenere Cipher
    • 0:17:52Substitution Cipher
    • 0:18:20Frequency Analysis
    • 0:20:20Ciphers (continued)
    • 0:22:20Hashes
    • 0:25:46Hash Function
    • 0:32:28Modern Cryptography
    • 0:33:52Cryptographic Hash Functions
    • 0:35:52SHA-1
    • 0:42:00Modern Cryptography (continued)
    • 0:43:54Cryptography (continued)
    • 0:46:40Public-Key Cryptography
    • 0:51:47Asymmetric Encryption
    • 0:57:45Digital Signatures
    • 1:06:02Blockchain
    • 0:00:00[MUSIC PLAYING]
    • 0:00:17SPEAKER 1: Cryptography.
    • 0:00:18What is it, and why is it important?
    • 0:00:20We're going to answer those two questions in exactly that order.
    • 0:00:23Let's start with what cryptography is.
    • 0:00:25It's the art and science of obscuring, and ideally protecting, information.
    • 0:00:31Now it's an art and a science because there's math involved with it.
    • 0:00:34It's pretty straightforward to manipulate characters in some way
    • 0:00:37by adding some constant number to them or to change them
    • 0:00:39in some systematic manner.
    • 0:00:42But it's an art, because doing so in a way to defend against potential attacks
    • 0:00:47is not as easy as it might first appear.
    • 0:00:50There's a lot of guesswork and calculation
    • 0:00:54that needs to go into play to find a really strong cipher.
    • 0:00:59Cryptography gives us the opportunity to have
    • 0:01:01a basic level of security against an adversary who might
    • 0:01:04do bad things with the information.
    • 0:01:06We usually contrast, in cipher information,
    • 0:01:11with information that is presented in the clear, which
    • 0:01:14is to say there's no protection surrounding it at all.
    • 0:01:17And it's generally considered better to protect information using cryptography
    • 0:01:21than to have information just freely available out there.
    • 0:01:26Now a cipher, we're going to start by talking about cryptography
    • 0:01:28sort of through history.
    • 0:01:29We'll lead up to more modern forms of cryptography,
    • 0:01:32which are derived from more ancient forms of cryptography.
    • 0:01:36But a cipher is one of the most fundamental forms of cryptography.
    • 0:01:40And ciphers are algorithms.
    • 0:01:41And recall that an algorithm is just a step-by-step set of instructions
    • 0:01:46that we use to complete a task.
    • 0:01:49And in case, the task is to obscure or encipher information.
    • 0:01:54And ciphers can also be used in reverse to unobscure, or decipher,
    • 0:01:59that same information that was previously encoded or enciphered.
    • 0:02:04Now there are many different ciphers out there
    • 0:02:07that have varying levels of security potential.
    • 0:02:10Some of the more ancient ciphers that we're going to start with
    • 0:02:13should be [INAUDIBLE] be considered to have no security potential at all
    • 0:02:16considering how easy they are to crack.
    • 0:02:18But again, this leads into the more modern approach to cryptography,
    • 0:02:22which is much more secure than some of these basic ones.
    • 0:02:25And now let's start by imagining that we have possession of this device.
    • 0:02:29Now if you're looking at this device and it seems somewhat familiar to you,
    • 0:02:32it may be because you've recently seen the movie A Christmas Story,
    • 0:02:35where Ralphie, the character there, obtains
    • 0:02:39one of these, which is a little orphan Annie's secret society decoder pin.
    • 0:02:45And this decoder pin has a set of numbers going sequentially one
    • 0:02:50through 26 around the inner edge, and a set
    • 0:02:52of letters, which is not presented in any particular order,
    • 0:02:55around the outer edge.
    • 0:02:57And what would happen is the radio announcer would provide,
    • 0:03:01set your pins to some combination.
    • 0:03:02So line up one number with one letter.
    • 0:03:05And then it would read off some secret message
    • 0:03:07that, ostensibly, only individuals who possessed this pin,
    • 0:03:10or many of the duplicate versions of this pin that were distributed
    • 0:03:13to children around the country, could then decipher
    • 0:03:16by taking the numbers that were given over the radio
    • 0:03:19and transforming them back into letters so that it makes sense.
    • 0:03:22So if you can, if you zoom in on this image,
    • 0:03:24it might be a little difficult to see, but you
    • 0:03:27can see that the 3 corresponds to the letter L, and the 4 corresponds to an M
    • 0:03:31based on this particular setting of this decoder pin.
    • 0:03:36So this is one potential, what we would call a substitution cipher,
    • 0:03:40where we're changing, we're substituting a letter in this case for a number,
    • 0:03:45and that number will henceforth represent that letter
    • 0:03:47for the rest of this message.
    • 0:03:51But what is the problem with this cipher?
    • 0:03:53Or more generally, when we think about issues in computer science
    • 0:03:56where we have adversaries who are trying to penetrate some system,
    • 0:04:00or break a code, or break in, or hack into anything,
    • 0:04:04hack your password, we sometimes frame this in terms of asking the question,
    • 0:04:08what is the attack vector?
    • 0:04:10Where is the vulnerability that is potentially
    • 0:04:13part of this particular cipher?
    • 0:04:17And in this case, it's that anybody who has access to this pin
    • 0:04:22is able to break any cipher that is made with this pin.
    • 0:04:26And again, this pin was distributed pretty extensively in 1930s and 40s
    • 0:04:29to children who listened to this very popular radio program.
    • 0:04:32So these pins were in the hands of many people.
    • 0:04:35And anybody who had access to the pin would
    • 0:04:37be able to understand the message.
    • 0:04:39And so that is, how we might frame this attack vector,
    • 0:04:43is the key, in this case, the pin, which we will call a key for this purpose,
    • 0:04:48is just very prevalent.
    • 0:04:50It's pretty well known how to use this key and manipulate this key.
    • 0:04:54A lot of people have access to that key.
    • 0:04:58But that's just one example of a substitution cipher.
    • 0:05:00We have many different examples of substitution ciphers that we could use.
    • 0:05:04Let's just take another very simple, straightforward one,
    • 0:05:07which is imagine we have all of the letters of the alphabet
    • 0:05:10and we're just going to assign the ordinal position of that letter
    • 0:05:13as its cipher value.
    • 0:05:15So with the secret society pin, there was this sort of random element
    • 0:05:18to it, right?
    • 0:05:19The letters were being skipped.
    • 0:05:20There wasn't a rhyme or reason to them, although the numbers were sequential.
    • 0:05:24Here let's just line up both.
    • 0:05:25Let's use sequential letters and map them to their sequential numbers.
    • 0:05:29So A becomes 1, B becomes 2, and so on.
    • 0:05:33Both of these things are increasing linearly.
    • 0:05:36Now you may recall that as computer scientists,
    • 0:05:39we ordinarily start counting from zero rather than counting from one.
    • 0:05:42I'm counting from one here because this mapping of A to 1 and Z to 26
    • 0:05:46is much more familiar to us intuitively as humans,
    • 0:05:49and I want to keep us grounded in this discussion of cryptography right now.
    • 0:05:54But ordinarily, you might actually instead see this as 0 to 25, 0 being A,
    • 0:05:59through Z being 25 as opposed to 1 through 26.
    • 0:06:02But this cipher would work exactly the same
    • 0:06:05and has roughly the same security potential
    • 0:06:08as Annie's secret society cipher does.
    • 0:06:11And we can actually make this a little bit better because we are consistently
    • 0:06:17increasing the letters, A through Z, and consistently increasing
    • 0:06:20the numbers, 1 through 26.
    • 0:06:22We could also, instead of just doing this direct mapping,
    • 0:06:25we could rotate around.
    • 0:06:27We could start the 1 somewhere else as opposed to being A.
    • 0:06:31And now instead of having just one cipher where A maps to 1, B maps to 2,
    • 0:06:36we have a variety of different ciphers, depending
    • 0:06:38on where we decide we want to have our starting point.
    • 0:06:42So for example, we might instead add two to every number.
    • 0:06:47So instead of going from 1 to 26, we go from 3 to 28.
    • 0:06:54Now think about it.
    • 0:06:55If you're trying to break this cipher and you see patterns
    • 0:06:58like this with all these numbers in them, what might jump out at you?
    • 0:07:03Well, if you're used to seeing ciphers that are 1 through 26, for example,
    • 0:07:07something where you don't see any 1s or 2s
    • 0:07:09and suddenly you're seeing 27s and 28s potentially in the message that might
    • 0:07:13be long enough to have, in this case, Ys or Zs in it
    • 0:07:18might seem to you that this is slightly off.
    • 0:07:21Like this cipher must be shifted in some way.
    • 0:07:24Instead of being this straightforward line,
    • 0:07:26there's some modification that's been made to it.
    • 0:07:29That's kind of a tip off if you're trying to defend
    • 0:07:31against somebody figuring that out.
    • 0:07:33And so instead of going 27, 28 at the end,
    • 0:07:37we might instead wrap around the alphabet.
    • 0:07:40Once we have exhausted the 26 possible values that we started with,
    • 0:07:45the 26 letters of the alphabet, we might instead, once we have X is 26,
    • 0:07:49say, well, instead of Y being 27, Y is 1 and Z is 2.
    • 0:07:54And this is not a massive improvement on the security of this cipher.
    • 0:07:59Like I said, it's still quite fragile and quite easy to break.
    • 0:08:02But it doesn't give quite as much of a clue to a potential adversary
    • 0:08:06as to how to crack it, how to decipher the message.
    • 0:08:11And this can be done for any different value
    • 0:08:14to obtain any number of different ciphers.
    • 0:08:16Instead of going forward by two positions,
    • 0:08:18we could add 20 to every letter's value, again,
    • 0:08:21wrapping around the alphabet when we exhaust,
    • 0:08:24when we get to 26, instead of having 27, 28, we would just reset at 1
    • 0:08:28and continue on.
    • 0:08:32But we can also add 26 to it.
    • 0:08:35But that doesn't look very different than what we had before.
    • 0:08:37And that's where this cipher's vulnerability comes into play.
    • 0:08:42There's only 26 possible ways to rotate the alphabet
    • 0:08:46while keeping the order of the letters preserved, right?
    • 0:08:50Unless we start skipping A, D, G, and then,
    • 0:08:55you know, rearranging the other letters in some other way.
    • 0:08:57If we want to keep everything straightforward in a line,
    • 0:09:00again, wrapping around 26 when necessary, there's
    • 0:09:04only 26 ways to do it.
    • 0:09:06That is to say that shifting the alphabet forward by 26
    • 0:09:09is exactly the same as shifting the alphabet forward by 0.
    • 0:09:12And so that's our limitation.
    • 0:09:14We have a very small number of, again, this word keys that can
    • 0:09:18be used to decipher using this cipher.
    • 0:09:22Now this is an example of something called a rotational cipher,
    • 0:09:25and it's actually a rather famous rotational cipher
    • 0:09:27known as the Caesar Cipher.
    • 0:09:30It's attributed to Julius Caesar and was apparently used
    • 0:09:32more than two millennia ago for him to encode messages to his troops
    • 0:09:38on the line.
    • 0:09:39And at the time, this was revolutionary.
    • 0:09:41And generally what you're going to find with cryptography
    • 0:09:44is there's just this pattern of breaking the mold and doing something new
    • 0:09:50and trying to stay one step ahead.
    • 0:09:52And oftentimes, other people will then catch up.
    • 0:09:55And this cipher, which was once, you know,
    • 0:09:58lauded as being a wonderful cipher, is no longer as strong
    • 0:10:04as it once was thought to be.
    • 0:10:06And so we keep having to advance and improve and get ahead
    • 0:10:09of it for whatever kind of adversary that is, whether that's
    • 0:10:12a potential enemy on the battle line, as might have been the case with Julius
    • 0:10:16Caesar, or whether that's a hacker who's trying to break into your system
    • 0:10:20as might be the case today.
    • 0:10:21And fortunately, again, we're not using Caesar Cipher today
    • 0:10:25to uncipher any of our information.
    • 0:10:26We're using much more modern techniques.
    • 0:10:28But these modern techniques evolved from seeing
    • 0:10:30codes being created, ciphers being created and broken,
    • 0:10:34and then having to be created anew to try
    • 0:10:36and defend against new vulnerabilities that have been exposed.
    • 0:10:39So like I said previously, very easy to decipher or to crack the Caesar Cipher,
    • 0:10:45but at the time, very, very difficult.
    • 0:10:48The limitation, again, limited number of keys.
    • 0:10:50There's only 26 ways to rotate the alphabet for it to make sense.
    • 0:10:53In the English alphabet, of course.
    • 0:10:55If you're using a different alphabet, you're
    • 0:10:56number of keys might be different if you're
    • 0:10:58using the same rotational approach.
    • 0:10:59But the fundamental limitation is you are
    • 0:11:01confined by how many letters are in your alphabet
    • 0:11:04that you're using to encipher information.
    • 0:11:08So let's take things one step further.
    • 0:11:10What is an improvement that we might be able to make to Caesar?
    • 0:11:15That would lead us to this idea potentially of the Vigenere Cipher.
    • 0:11:17So Caesar had this limitation of there's one key
    • 0:11:20and there's only 26 possible values for that key.
    • 0:11:23What Vigenere Cipher does is it, instead of using a single key,
    • 0:11:27uses multiple keys.
    • 0:11:29Instead of picking a number to shift by, we're
    • 0:11:31instead going to define a keyword.
    • 0:11:34And we're going to use the letters of that keyword in sequence as we
    • 0:11:37go to change what our key is at any given
    • 0:11:41time, such that our enciphered message, instead of being enciphered using one
    • 0:11:45key, might use three keys or five keys or 10 keys,
    • 0:11:48depending on the length of the keyword that we use,
    • 0:11:50if that keyword is three or five or 10 letters long.
    • 0:11:54So this keyword becomes the interesting twist
    • 0:11:57that made Caesar much more challenging for an adversary
    • 0:12:01to crack by using different keys.
    • 0:12:03Now let's walk through an example of how the Vigenere Cipher works
    • 0:12:06because I think it makes more sense to see this visually rather than just
    • 0:12:10discussing it verbally.
    • 0:12:11So what we want to do here is encrypt the message HELLO
    • 0:12:15using the keyword LAW.
    • 0:12:17So here our message HELLO is what also might be called plain text.
    • 0:12:22It is in the clear.
    • 0:12:23It is not enciphered.
    • 0:12:23It is not hidden against any adversary.
    • 0:12:27And our key is LAW.
    • 0:12:30All right, so let's take a look at how we might do this.
    • 0:12:32So it oftentimes helps, especially when trying to encipher or decipher
    • 0:12:36using the Vigenere Cipher, to consider all
    • 0:12:40of the inputs that go into determining the final outputted character.
    • 0:12:44So we're going to take a look at plain text,
    • 0:12:46and we're going to convert it, just like we did
    • 0:12:47with Caesar, to its ordinal position.
    • 0:12:49We're going to see where in the alphabet that is.
    • 0:12:51Is it the first letter?
    • 0:12:52Then it's 1.
    • 0:12:53If it's the last letter, it's 26.
    • 0:12:54And so on.
    • 0:12:56We're going to do the exact same thing with each letter of our keyword.
    • 0:12:59So we're going to take a look at the keyword,
    • 0:13:01figure out what that letter's numerical correspondence would be.
    • 0:13:05We're going to then add those two things together.
    • 0:13:07If we go over 26, just as we did with the Caesar Cipher,
    • 0:13:10we're going to wrap back around such that we're confining ourselves
    • 0:13:13to that range of 1 through 26.
    • 0:13:15And then we're going to take that number and transform it into a letter.
    • 0:13:18So for example, if the result there is 2, we're going to change that into a B.
    • 0:13:23And the reason for that is that B is the second letter of the alphabet.
    • 0:13:26So let's walk through this with HELLO as our plain text and LAW as our key.
    • 0:13:32So the first letter of our plain text is H, and the ordinal position
    • 0:13:36of that H is 8.
    • 0:13:37It is the eighth letter of the alphabet.
    • 0:13:39We do the same thing with the first L for LAW, the first letter of LAW.
    • 0:13:43L is 12, it's the 12th letter of the alphabet.
    • 0:13:46So our next step is to add those two values, eight and 12 together.
    • 0:13:49We get 20.
    • 0:13:51We don't need to wrap around.
    • 0:13:52We didn't go over 26, so we're still OK.
    • 0:13:54And the 20th letter of the alphabet is T. So the first step of this
    • 0:13:59is enciphering process with HELLO, using the Vigenere Cipher, using the key LAW,
    • 0:14:04is to turn the H into a T.
    • 0:14:06So we can do this again, we can take a look
    • 0:14:08at the E, the second letter of our plain text.
    • 0:14:10We use the second letter of our keyword now.
    • 0:14:12So we're not using the same key.
    • 0:14:14We're not using 12 over and over and over.
    • 0:14:16We're using a different key.
    • 0:14:17We're now using the A, the second letter of our keyword,
    • 0:14:20whose ordinal position is 1.
    • 0:14:21So 5 plus 1 is 6, and that results in F.
    • 0:14:26Next, we use the first L of HELLO, and the W of LAW.
    • 0:14:31So L is 12, W is the 23rd letter of the alphabet, we add those together,
    • 0:14:35we're at 35.
    • 0:14:3635 is not a legal value in terms of this cipher.
    • 0:14:39We are confined to 1 through 26.
    • 0:14:43And so we just subtract 26 and we get down to 9, and now we have I.
    • 0:14:47So now what do we do?
    • 0:14:47We've exhausted our keyword, but we still
    • 0:14:50have plain text that we need to encipher.
    • 0:14:53Well, as you might expect, the logical thing to do
    • 0:14:55is just go back to the beginning of the keyword and continue on.
    • 0:14:59And so we will.
    • 0:15:00So we'll use the L, the second L of our plain text, and the first L--
    • 0:15:05because we've now exhausted all of those letters,
    • 0:15:07we have to go back to the beginning--
    • 0:15:09the L for LAW.
    • 0:15:1012 plus 12 is 24.
    • 0:15:1224, the 24th letter of the alphabet is X.
    • 0:15:15And we do that finally as well for the O, advancing it one position, because
    • 0:15:19of the A in LAW, to 16, and that is P.
    • 0:15:23So ultimately, HELLO in this case becomes
    • 0:15:27this random set of characters, TFIXP.
    • 0:15:30And some advantages might also immediately jump out at you.
    • 0:15:34With the Caesar Cipher, anytime we changed a letter,
    • 0:15:37it always was that same letter every time we
    • 0:15:42saw it in the enciphered message.
    • 0:15:43So if we had a B and we were advancing everything by two characters,
    • 0:15:47every B in the original message would always
    • 0:15:49be a D because D comes two letters after B.
    • 0:15:53So again, if our Caesar Cipher key is two, every time we see a B,
    • 0:15:58it becomes a D, every time we have an A, it becomes a C, always.
    • 0:16:02Here with the Vigenere Cipher, because we have different keys
    • 0:16:05and we're rotating these keys differently,
    • 0:16:08depending on which letter of the keyword we are
    • 0:16:09and which letter of the plain text we are,
    • 0:16:12those two Ls are not the same, right?
    • 0:16:15Instead of H-E-L-L-O, we don't have some mapping.
    • 0:16:19Those two Ls are I and X. They are not the same character.
    • 0:16:22And so already we're seeing a bit more security here
    • 0:16:25because there's not this potential to guess.
    • 0:16:29Caesar is also much more secure when you consider
    • 0:16:33how many keys are available to you.
    • 0:16:35With the Caesar Cipher we had 26 keys available to us.
    • 0:16:39With the Vigenere Cipher we have 26 to the n keys,
    • 0:16:42where n is the length of our keyword.
    • 0:16:44So for example, if we're using a two letter long keyword,
    • 0:16:47for example, AA or AB or all the way up, that leaves us with 26 squared,
    • 0:16:52or 676 possibilities.
    • 0:16:54Now if we extend to three letter keywords or four letter keywords,
    • 0:16:57we're getting even more and more possibilities.
    • 0:17:00And as we start to increase the number of possibilities,
    • 0:17:02we start to really increase the difficulty for some adversary
    • 0:17:06to figure out what the key is.
    • 0:17:08And that's really the goal of cryptography, right?
    • 0:17:11We want to be able to protect information
    • 0:17:13and we want to defend that information from being determined by other people.
    • 0:17:18So the more work we put into making more challenging keys, the more likely
    • 0:17:22we are to be successful in our attempt to encipher information.
    • 0:17:26So again, Vigenere much more of a secure cipher.
    • 0:17:29It's still not secure and it's definitely
    • 0:17:31not a cipher that is used today.
    • 0:17:33There are computer programs that are capable of figuring out how to decipher
    • 0:17:39using the Vigenere Cipher pretty well.
    • 0:17:42But it's more secure than Caesar for sure
    • 0:17:45because of its changing alphabets and its much larger number of keys.
    • 0:17:52Let's go back to this decoder pin and think about another potential problem
    • 0:17:57that we have.
    • 0:17:58Now assume that your adversary is actually
    • 0:18:00not a member of Annie's secret society.
    • 0:18:02They don't have this pin.
    • 0:18:04So that's already a step up.
    • 0:18:05We previously had assumed that anybody who had the pin could crack it,
    • 0:18:08and that's still true.
    • 0:18:09But let's assume your adversary, lucky you, doesn't have this pin.
    • 0:18:14Is there still a way that they would be able to crack the code without the pin?
    • 0:18:21Think about it for a second.
    • 0:18:22Think about what our characteristics of the English language
    • 0:18:26are that might suggest people figure out what this cipher is.
    • 0:18:31Think about some unique features of the English language,
    • 0:18:33which is one letter words, like I and A, which might appear in the message.
    • 0:18:38If you see a single letter word in a message,
    • 0:18:41you're probably going to guess that it's either the letter I,
    • 0:18:44and every time I see that character or that number I'm going to assume it's
    • 0:18:47an I, or you're going to assume it's an A
    • 0:18:49and you're going to try and plug in an A everywhere.
    • 0:18:51And some trial and error might reveal some patterns that emerge.
    • 0:18:55And there is a very prevalent pattern in the English language,
    • 0:18:58which is that letters appear with a pretty regular frequency.
    • 0:19:01Given any arbitrary text in the English language,
    • 0:19:05it's pretty likely that the distribution of letters within that text
    • 0:19:10is going to follow this pattern roughly 13% of the time, give or take.
    • 0:19:14Any arbitrary letter selected from a text
    • 0:19:16is going to be the letter E. And only 1/10 of 1% of the time will it be a Z.
    • 0:19:23And only 2/10 might it be a J. So there are some letters that
    • 0:19:26appear very frequently and there are other letters that
    • 0:19:28appear very infrequently.
    • 0:19:31And that is still a problem in this generic substitution cipher,
    • 0:19:34even with the letters being scrambled, which seems at first
    • 0:19:37blush to perhaps be much more secure than one where
    • 0:19:40the letters are increasing sequentially and the numbers
    • 0:19:42are increasing sequentially.
    • 0:19:44Even this scattershot mapping of letters to numbers,
    • 0:19:46as long as we're still confined to these two domains
    • 0:19:49where we have A through Z and 1 through 26
    • 0:19:51and there's always a mapping between them,
    • 0:19:53whether they're ordered or not ordered, is still
    • 0:19:56a problem, in the English language anyway, because of frequency analysis.
    • 0:20:00These are actually very common puzzles.
    • 0:20:02Humans might find it kind of tedious to try and solve these puzzles,
    • 0:20:05but otherwise, this is well known as a cryptogram.
    • 0:20:09You may, if you are the puzzling type, this type of puzzle
    • 0:20:12is called a cryptogram.
    • 0:20:13And this pattern is definitely something that is across all messages
    • 0:20:18that appear in the English language.
    • 0:20:20There are plenty of other ciphers that appear, that are used,
    • 0:20:24that are more secure than any of these what
    • 0:20:25we might call one-to-one ciphers, mapping
    • 0:20:27a single character to a different character or to a number.
    • 0:20:32There are some ciphers that substitute pairs or triples of characters
    • 0:20:35at a time.
    • 0:20:35And these ciphers, again, form the basis for what
    • 0:20:37eventually becomes more modern cryptography, which
    • 0:20:40we're getting to in just a moment.
    • 0:20:42There are also transposition ciphers, where
    • 0:20:43instead of substituting one character for another,
    • 0:20:46we simply use an algorithm to rearrange all the letters in some systematic way.
    • 0:20:51And the defect there is that all the letters of our original plain text
    • 0:20:55message are still there and all we need to do is unscramble them.
    • 0:21:00And because there's an algorithm that was
    • 0:21:01used to scramble them in the first place,
    • 0:21:05there's got to be a way to undo it as well.
    • 0:21:07With a little bit of trial and error, we can probably sort that out.
    • 0:21:12Finally, the most egregious issue with these classical ciphers
    • 0:21:17is, how do you distribute the key?
    • 0:21:20How do you tell someone who you want to share information with?
    • 0:21:25How do you tell your ally what the key is for the cipher
    • 0:21:30that you are going to use?
    • 0:21:32You can't encrypt it because if you encrypt the key,
    • 0:21:36how will they know what the real key is?
    • 0:21:38If you say, if you send them a message and they
    • 0:21:40don't know how to interpret it, or they see it and they interpret it
    • 0:21:43as something else, that's not going to be helpful to you.
    • 0:21:46You want them to see the key in the plain text.
    • 0:21:51You want them to see the key in the clear, rather.
    • 0:21:54You want them to just have it.
    • 0:21:55You don't want to encrypt that as you hand it to them.
    • 0:21:59That doesn't do them any good.
    • 0:22:01But if you're giving the key to your ally
    • 0:22:05and your adversary is within earshot, or they
    • 0:22:07have access to that same piece of paper because your ally carelessly throws it
    • 0:22:11away and they can just pick it up, now all
    • 0:22:14of a sudden all of your messages using basic ciphers are fairly insecure.
    • 0:22:20But let's take a step forward in modern cryptography.
    • 0:22:25Perhaps you've seen a screen that looks like this at some point
    • 0:22:28when you're trying to log in to some system.
    • 0:22:32Enter your email and we'll email you a link to change your password.
    • 0:22:36Well, why don't you just email me my password?
    • 0:22:39Like you're going to give me a link to change it,
    • 0:22:41you must know it if I use my credentials to log in
    • 0:22:45to your service any given day.
    • 0:22:47But OK, I guess, sure.
    • 0:22:50The reason for this is actually a reason of security.
    • 0:22:54So let's distinguish ciphers, which we've been talking about, from hashes.
    • 0:22:59So one of the most critical distinctions is
    • 0:23:03that ciphers are generally reversible.
    • 0:23:07You can undo what you did.
    • 0:23:09That's the whole reason why it's important to share with your allies
    • 0:23:12the key.
    • 0:23:13But hashes are generally not reversible.
    • 0:23:17Or certainly, they're not supposed to be reversible.
    • 0:23:20And so it turns out, and we'll learn about this a little bit
    • 0:23:23later, when you log in to some service, if that service
    • 0:23:28is doing a good job of protecting your data,
    • 0:23:31the reason they can't just send you your password is because they actually
    • 0:23:35don't know your password.
    • 0:23:36And that might seem strange because clearly, there
    • 0:23:39must be something-- if I type in my password then I get logged in.
    • 0:23:44But a good service is one that does not store your password in the database.
    • 0:23:49That's probably a good thing if you think about it.
    • 0:23:51In case there was ever a data breach, you
    • 0:23:53wouldn't want your password to be in their database.
    • 0:23:57Instead what they do is they store a hash of your password in the database.
    • 0:24:02And then when you provide your password to them,
    • 0:24:06they run that hash through the same things,
    • 0:24:08called a hash function, which is just a generic idea for a function that
    • 0:24:12takes any arbitrarily large amount of data and maps it to some other range
    • 0:24:18or some other set of values.
    • 0:24:20Now that might be a arbitrarily long string of information.
    • 0:24:25It might be some fixed string where if I run my password through this,
    • 0:24:30I'm going to get back something that is always 20 characters long.
    • 0:24:33But it looks nothing like my original password.
    • 0:24:35I've just made some weird manipulations to it.
    • 0:24:38And that's what happens in log-in systems more generally
    • 0:24:41is you will log in to some service, you'll
    • 0:24:43type in your password, when that information is then submitted
    • 0:24:47to the organization to check your log-in credentials,
    • 0:24:51they will run your password through that same hash function again.
    • 0:24:54And if that value matches what they have in their database for you,
    • 0:25:00that is how they know that you have provided the correct credentials.
    • 0:25:04They're mapping-- they're matching some mapping of your password to the one
    • 0:25:10that they have stored, but they're not actually checking your actual password.
    • 0:25:13And that should probably give you some sense of security.
    • 0:25:16And if you ever use a service where you end up having to click on that link
    • 0:25:20and they actually send you your password,
    • 0:25:24you probably don't want to use that service anymore
    • 0:25:26because they're not taking strong enough precautions to protect your data.
    • 0:25:33So as I said, once we have a password stored in the database,
    • 0:25:36it is actually stored as a hash rather than as the password itself.
    • 0:25:40The service should not be able to tell you what your password really is.
    • 0:25:46So this idea of a hash function-- what is it?
    • 0:25:48Well, as I said, it's something that takes any arbitrary data--
    • 0:25:52and eventually we'll get into hashing things like files and not just words
    • 0:25:56or strings, but for now let's keep it to strings, strings
    • 0:25:58being a sequence of characters or letters, like a word
    • 0:26:00or a phrase or a sentence--
    • 0:26:02and mapping it to some other range.
    • 0:26:05So we'll start out by just mapping a string, a set of letters, to a number.
    • 0:26:10But it could be to a different string, a string that's
    • 0:26:13always 10 characters long, and so on.
    • 0:26:15So there are some properties that good hash functions have.
    • 0:26:19Let's take a look at what some of these are.
    • 0:26:21So they should use only the data being hashed.
    • 0:26:23There shouldn't be anything else that comes into play.
    • 0:26:26They shouldn't be bringing in any outside information.
    • 0:26:28It should rely exclusively on whatever data is
    • 0:26:31being passed in to the hash function.
    • 0:26:34They should also use all of the data being hashed.
    • 0:26:36It becomes a bit less effective if every time I provide a word or a string
    • 0:26:42to my hash function, I'm only using the first letter of that string,
    • 0:26:48such that my hash function for every word
    • 0:26:50or every string I provide that starts with A
    • 0:26:52is going to return the same value.
    • 0:26:55That's not terribly useful to me.
    • 0:26:57I want to get a better distribution of values.
    • 0:27:00Your hash function should be deterministic.
    • 0:27:02And when we say deterministic, we mean no random elements to it.
    • 0:27:06Oftentimes we think that random numbers are nice to jumble things up.
    • 0:27:10But the problem is we want our hash function to always output
    • 0:27:12the same value for the same inputs.
    • 0:27:16So if I give you my password and hash it and I get
    • 0:27:19some output, every time I provide my password
    • 0:27:21and run it through that same hash function,
    • 0:27:23I want to get the same output every time.
    • 0:27:25And that's what sites rely on when they're using hashed passwords as part
    • 0:27:30of the credentialing check.
    • 0:27:31They're relying on the fact that they will always get
    • 0:27:33the same output given the same input.
    • 0:27:36So that's a requirement of a hash function.
    • 0:27:38Hash functions should uniformly distribute data.
    • 0:27:42So oftentimes you're mapping these strings,
    • 0:27:45let's say, to some set of values.
    • 0:27:47Those could be numbers, again, those could be strings.
    • 0:27:49You want to spread those out evenly, ideally,
    • 0:27:52across all of the possible values that you have.
    • 0:27:55You don't want everything to hash to 15 if your range is 0 to 100.
    • 0:28:00You'd ideally like everything to be spread out such
    • 0:28:02that there's an equal number of 0s, 1s, 99s,
    • 0:28:05and so on, as we talked about a little bit when we discussed hash tables.
    • 0:28:10Finally, we also want to be able to generate very different hash codes,
    • 0:28:14very different values for very similar data.
    • 0:28:18For example, LAW and LAWS should hash two very different values.
    • 0:28:23That would be ideal if a tiny bit of variation
    • 0:28:26created a really dramatic ripple effect.
    • 0:28:29And creating this really dramatic ripple effect
    • 0:28:31is pretty key when we're talking about cryptographic hash functions, which
    • 0:28:35we'll get to in a second, which form the basis of almost
    • 0:28:38all modern cryptography, which form the basis of everything
    • 0:28:41that we do that we rely on when we think of security in the computational field,
    • 0:28:47it's almost always relying on these hash functions being really, really
    • 0:28:50good at making small changes have very dramatic ripple effects
    • 0:28:55in the hash code or the hash value, the data that comes out
    • 0:28:59of the hash function.
    • 0:29:02So after all this talk about good hash functions,
    • 0:29:04let's take a look at a pretty bad hash function.
    • 0:29:07And we'll talk about why.
    • 0:29:08We'll talk about one of its virtues, but some of its potential problems as well.
    • 0:29:12So instead, let's add up all of the ordinal positions
    • 0:29:15of all the letters in the hash string.
    • 0:29:17So this ordinal position idea is exactly the same
    • 0:29:19as we had a moment ago when we were talking about Caesar and Vigenere.
    • 0:29:22So A is 1, B is 2, and so on.
    • 0:29:26So for example, for a word like STAR, if we want to add up the ordinal positions
    • 0:29:30of all of the letters in that word, we have S-T-A-R.
    • 0:29:34That's 19 plus 20 plus 1 plus 18.
    • 0:29:38So if you do that math quickly, that ends up being 58.
    • 0:29:42So what is a good thing about this hash function?
    • 0:29:45Well, it's not reversible.
    • 0:29:47If I get a 58, I don't necessarily know that the input that I had there
    • 0:29:53was STAR.
    • 0:29:54It could have been any one of a whole variety of things.
    • 0:29:57It could have been ARTS or RATS or SWAP or PAWS
    • 0:30:01or WASP or MULL or this whole random set of 29 Bs in a row.
    • 0:30:06All of these things, when run through this really terrible hash
    • 0:30:10function that I've defined here, all add up to 58
    • 0:30:13when I follow the rules of this algorithm.
    • 0:30:16So I never know what my input was given my output.
    • 0:30:20That is a good thing.
    • 0:30:21That is what a hash function should do.
    • 0:30:22Hash functions, unlike ciphers, should not be reversible.
    • 0:30:28But the problem that I have here is that I have a lot of collisions, right?
    • 0:30:33There are a lot of different things that map to 58.
    • 0:30:37And when we talked about collisions a little bit previously,
    • 0:30:40we were talking about them in the context of a hash table.
    • 0:30:43And collisions were OK in that context.
    • 0:30:46We were just clustering things together.
    • 0:30:48If they all happened to have the same hash value,
    • 0:30:50we'll just put them in the same bucket.
    • 0:30:52When we're talking about cryptography though,
    • 0:30:54when we start to get into relying on cryptography to keep our data secure,
    • 0:30:59we can't have collisions at all.
    • 0:31:03In fact, pretty much we rely on the fact that it is so mathematically
    • 0:31:07unlikely, neigh impossible to have a collision in order for these things
    • 0:31:12to work.
    • 0:31:13And so collisions, when we're talking about cryptographic hash functions,
    • 0:31:16are definitely not a good thing.
    • 0:31:19So to recap, to check that a user gave us the correct password, if we're
    • 0:31:23storing a hash of the password in the database versus just storing
    • 0:31:27the plain text password in the database, which hopefully no one is storing
    • 0:31:30a plain text password in the database, we
    • 0:31:33run the actual password, the real password through the hash function.
    • 0:31:37We get a hash value as an output, some string or some number
    • 0:31:40or what have you as the output.
    • 0:31:42And if we get a match, odds are they entered the right password.
    • 0:31:46Now I'm saying odds are because we can't be 100% sure.
    • 0:31:51And we can never be 100% sure.
    • 0:31:54We can be really, really, really sure, but there's always
    • 0:31:57a chance of a collision.
    • 0:31:59Even with the best designed hash functions, even
    • 0:32:02with the best designed cryptographic hash functions,
    • 0:32:05there's always a chance of a collision.
    • 0:32:06But ideally, that chance is quite infinitesimal.
    • 0:32:09Very, very, very, very, very, very unlikely.
    • 0:32:12So odds are if we get this hash, comes out of this hash function,
    • 0:32:16it's quite likely, like 99.9% plus likely
    • 0:32:21that they entered the correct password, this is, in fact,
    • 0:32:23the user whose credentials are being verified, and we should log them in.
    • 0:32:29Modern cryptography is just hashing.
    • 0:32:32It's just hashing that's quite a bit more clever, certainly than the example
    • 0:32:35that I just talked about a moment ago.
    • 0:32:37Also, these algorithms tend not to work on a character by character basis.
    • 0:32:42It's the algorithm that I just did as well where
    • 0:32:44I was adding up every single letter.
    • 0:32:45I was looking at each one individually.
    • 0:32:47They tend to take, these modern algorithms
    • 0:32:49tend to take clusters of letters, pairs or triples or so on at a time,
    • 0:32:53maybe do even more things.
    • 0:32:55They might rearrange the letters before they do things to them.
    • 0:32:57So there's multiple layers going on with these encryption algorithms.
    • 0:33:02And unlike some of the ones I've discussed earlier,
    • 0:33:05most of these also have the property where given data of arbitrary size--
    • 0:33:10and now we're starting to really expand our minds into not just words
    • 0:33:13or strings, but also images, files, videos, documents, PDFs,
    • 0:33:18and so on; anything can be run through a hash function to get a value--
    • 0:33:23but we're always going to get a string of bits, a bit string, that
    • 0:33:26is always exactly the same size.
    • 0:33:28So depending on the algorithm, maybe it's
    • 0:33:30going to be a 160-bit long string, or a 256-bit long string.
    • 0:33:36But our range is finite.
    • 0:33:38It's always going to be exactly 256 bits.
    • 0:33:42But the combination of those bits will be different, ideally,
    • 0:33:45for every single piece of data we might throw at it, no matter what.
    • 0:33:50OK, so let's expand our definition of a hash function
    • 0:33:53to this idea of a cryptographic hash function.
    • 0:33:57What properties should they have?
    • 0:34:00They should be very difficult, very, very difficult, basically impossible
    • 0:34:05to reverse.
    • 0:34:06It should be computationally impossible for anybody to undo the encryption.
    • 0:34:12That's pretty much the same as a regular hash function.
    • 0:34:15We're just really hammering the point home when we say this here.
    • 0:34:17They should still be deterministic.
    • 0:34:19We don't want any random elements to it.
    • 0:34:22We still want to a hash a value and always
    • 0:34:23get the same output no matter what if we run that same value through the hash
    • 0:34:28function an arbitrary number of times.
    • 0:34:31They should still generate very different hash codes
    • 0:34:34for very similar data.
    • 0:34:36We still want things to be spread out and we want
    • 0:34:38minor changes to have dramatic effect.
    • 0:34:42And they should never--
    • 0:34:43and this is one of those words that computer scientists love--
    • 0:34:46they should never allow two different sets of data to hash to the same value.
    • 0:34:52Do you see a potential problem when we frame it in this way?
    • 0:34:56When we say they should never be able to do that?
    • 0:35:00We've already restricted ourselves to a finite domain, right?
    • 0:35:02I said a moment ago, maybe this hash function maps to 160-bit long strings.
    • 0:35:10There's only so many combinations of 160 bits.
    • 0:35:14Now that might be an unfathomably large number, but using the word never
    • 0:35:18there becomes a bit dangerous.
    • 0:35:21We can't really rely on that.
    • 0:35:24And we'll see why this could potentially be a problem.
    • 0:35:26This static length string, by the way, is usually referred to as a digest
    • 0:35:31in this context.
    • 0:35:31When we start to talk about more modern cryptography techniques,
    • 0:35:34the output of a cryptographic hash function
    • 0:35:36is usually referred to as a digest.
    • 0:35:40Let's take a look at one of these cryptographic hash functions.
    • 0:35:42And certainly I'm not going to dive into the mathematics of it.
    • 0:35:45I wouldn't be able to explain the mathematics.
    • 0:35:46I wouldn't be able to do it justice if I tried to explain the mathematics of it.
    • 0:35:49But let's just take a look at some of the basics of this.
    • 0:35:52So SHA-1.
    • 0:35:53SHA-1 is quite a famous algorithm.
    • 0:35:55It was designed by the National Security Agency in the mid-1990s.
    • 0:36:00So these are really smart people who are tasked with working
    • 0:36:06with things like military intelligence.
    • 0:36:08These are people who are dedicating their lives to trying to protect data
    • 0:36:14as best as they possibly can.
    • 0:36:17Far more brilliant minds than I, for sure.
    • 0:36:20And this hash function-- and this is a published paper.
    • 0:36:22Hash functions tend to be, actually it's this very strange dichotomy where
    • 0:36:27you describe exactly how the function works,
    • 0:36:30but it still should be irreversible.
    • 0:36:32And this just really becomes a question of incredibly complicated mathematics
    • 0:36:36involved, such that even if you knew so many of the pieces going in,
    • 0:36:40you still might not-- you still wouldn't be able to undo it, even if you tried.
    • 0:36:44It's kind of amazing actually.
    • 0:36:45SHA-1's digests are always 160 bits in length.
    • 0:36:50So this is one of those ones I just said a moment ago.
    • 0:36:53That means that there are 2 to the 160 different SHA-1 digests, which
    • 0:36:59is a bit over 10 to the 48th power.
    • 0:37:01And again, 2 to 160 means for every single one of the 160 bits,
    • 0:37:06that could be a 0 or a 1.
    • 0:37:09So we have that, two options times two options times two options, 160 times.
    • 0:37:15Just to try and make it fathomable, to understand how large this number is,
    • 0:37:20let me try and paint a picture for you.
    • 0:37:22So imagine that you are looking on Earth for a specific grain of sand.
    • 0:37:30You're looking for one specific grain of sand on Earth.
    • 0:37:35That is easier by far than trying to have SHA-1 have a collision where
    • 0:37:45two values would map to the same thing.
    • 0:37:47There's about 10 to the 18 grains of sand on Earth.
    • 0:37:52So that's eight quintillion--
    • 0:37:53I had to look up that word--
    • 0:37:55eight quintillion grains of sand.
    • 0:37:57So way easier to find the grain of sand on Earth
    • 0:37:59than it is to have a collision.
    • 0:38:01In fact, we go even further and say that imagine
    • 0:38:04that every single one of those grains of sand
    • 0:38:07was another planet Earth, each of which also had sand on it.
    • 0:38:12So you have eight quintillion planet Earths.
    • 0:38:16You're trying to find a specific grain of sand
    • 0:38:19on one of those eight quintillion planets.
    • 0:38:23It's still easier than trying to have a collision with SHA-1.
    • 0:38:29SHA-1 is such an important algorithm that it's actually
    • 0:38:33one of the algorithms that is required in federal regulations
    • 0:38:36to be used by the government for encrypting information.
    • 0:38:39There are others as well, but SHA-1 is listed
    • 0:38:41by the National Institute for Science and Technology as a standard algorithm.
    • 0:38:48But there's a problem, which is that SHA-1 is broken.
    • 0:38:53And it has this clever website called SHAttered, shattered.io.
    • 0:38:56So the research team that figured out how to create a collision intentionally
    • 0:39:02create a collision.
    • 0:39:04And intentionally creating collision has the effect of basically saying,
    • 0:39:07this cryptographic hash function is broken.
    • 0:39:11And they have proven that there is a way that they can systematically
    • 0:39:15generate collisions.
    • 0:39:17So that's bad.
    • 0:39:19And we'll see why that's bad in just a moment.
    • 0:39:22But you can go to this URL, shattered.io,
    • 0:39:24and read quite a bit about how the researchers do it.
    • 0:39:26They explain it in different levels.
    • 0:39:28So if you really want to dive into the technology and the mathematics of it,
    • 0:39:31you're certainly welcome to.
    • 0:39:32If you just want to understand it at a base level and why this is a problem,
    • 0:39:35I definitely encourage you to take a look at this site
    • 0:39:38and read more about this.
    • 0:39:39So what did these researchers do?
    • 0:39:42So they said, It is now practically possible
    • 0:39:45to craft two colliding PDF files and obtain a SHA-1 digital signature
    • 0:39:50on the first PDF file, which can also be abused
    • 0:39:53as a valid signature on the second PDF file.
    • 0:39:57In short, what they're basically saying here
    • 0:39:58is we were able to create two PDF files such
    • 0:40:02that if I run them through the SHA-1 algorithm, the digest that I get
    • 0:40:07is the same.
    • 0:40:09Why is this potentially bad?
    • 0:40:13For example, by crafting the two colliding PDF files
    • 0:40:17as two rental agreements with different rent,
    • 0:40:20it is possible to trick someone to create
    • 0:40:22a valid signature for a high-rent contract
    • 0:40:26by having him or her sign a low-rent contract.
    • 0:40:30If you can take a PDF and twist it into anything
    • 0:40:33you want it to be, but have a valid signature,
    • 0:40:36a valid SHA hash associated with it, that's not great.
    • 0:40:41Now before alarm bells start going off because SHA-1 is still
    • 0:40:44use quite extensively, even now, this SHAttered research result
    • 0:40:49was developed in 2017 it was released, but SHA-1 is still
    • 0:40:53being used now, even then.
    • 0:40:56Before you panic though, it has not been broken that many times,
    • 0:41:00although they did very--
    • 0:41:01they worked for two years to create this PDF collision.
    • 0:41:05And they demonstrated a method for how to do it.
    • 0:41:08It has still not happened that many times.
    • 0:41:11Cryptographic hash functions, once they've demonstrated one collision,
    • 0:41:14are broken.
    • 0:41:15That is certainly true.
    • 0:41:16But the actual effects of this have not yet really materialized.
    • 0:41:21The computational power required to create this
    • 0:41:24is well beyond the capabilities of most people, or most syndicates even.
    • 0:41:29So no cause for alarm yet.
    • 0:41:31But it does show that there is a limitation with SHA-1,
    • 0:41:36and we still want to always be staying one step ahead.
    • 0:41:39Just like when Julius Caesar's enemies figured out
    • 0:41:43how to crack the Caesar Cipher, the goal was, we need to get one step ahead.
    • 0:41:46As technologists, we always want to stay one step ahead
    • 0:41:49to make sure that we are doing our best job protecting our data.
    • 0:41:52And as lawyers, we want to make sure we're
    • 0:41:54doing our best job protecting our clients' data
    • 0:41:56against potential adversarial attacks.
    • 0:42:00So as I mentioned, there are other standards
    • 0:42:02that are in use by other organizations, including the federal government.
    • 0:42:05SHA-1, as I mentioned, is just one of a few different options that they use.
    • 0:42:09SHA-2 and SHA-3 are much more robust algorithms.
    • 0:42:14They use more bits, basically, in their digest.
    • 0:42:16So instead of being 160 bits, you can have anywhere between 220
    • 0:42:20and 500 or so bits.
    • 0:42:22So way larger of a domain, even reducing the likelihood
    • 0:42:25of a collision that much more.
    • 0:42:27Again, imagine how unlikely it was with 2 to the 160.
    • 0:42:32Now we make it even more so.
    • 0:42:34500 bits, that's unfathomably large and difficult to duplicate.
    • 0:42:41MD5 and MD6 are other cryptographic hash functions, or hash functions
    • 0:42:45that you may encounter.
    • 0:42:46MD5 in particular I've highlighted here in yellow because it's not actually
    • 0:42:50considered secure anymore, but it's still very, very
    • 0:42:53commonly used as a checksum.
    • 0:42:55Basically, what we do is we run a file through MD5.
    • 0:42:59And say we're a distributor of a file and we
    • 0:43:02want people to come download our source, and they
    • 0:43:04want to be able to trust our source, we might run our file through MD5 and say,
    • 0:43:08if you run this file through MD5, the hash will be blah blah blah blah blah.
    • 0:43:14And other people can then download the file and run it through MD5.
    • 0:43:17It's usually a program that is available on computers for people
    • 0:43:20to just run any arbitrary data through to get a hash result.
    • 0:43:24And they can check, OK, the hash value that I received from this trusted
    • 0:43:27source matches the hash value that I was told I would receive,
    • 0:43:32and so I will trust this.
    • 0:43:33Versus perhaps getting that same software
    • 0:43:35versus some corner of the internet that you don't really trust.
    • 0:43:38If you find the MD5 hash of the trusted source
    • 0:43:41does not match what you downloaded and what you thought was that same file,
    • 0:43:45it's probably a sign that something has changed in it
    • 0:43:47and you don't really want to--
    • 0:43:49you might want to be skeptical about trusting that file rather than just
    • 0:43:52diving right into it.
    • 0:43:54So what do we do that relies on cryptography on the internet today?
    • 0:44:00Or you know, just using our computers every day.
    • 0:44:03Email.
    • 0:44:04Email relies pretty extensively on cryptography,
    • 0:44:07particularly when we start to use secure email services, of which Gmail might
    • 0:44:11not be considered one, but there are services out there,
    • 0:44:14for example, ProtonMail and others, that do encrypt email completely
    • 0:44:18from point to point.
    • 0:44:19Much safer in terms of protecting one's communications.
    • 0:44:25Similarly, you may be familiar with the mobile app Signal is also
    • 0:44:29used to encrypt communications between two people over the text messaging
    • 0:44:33network rather than over email and the internet.
    • 0:44:38Secure web browsing, you may be familiar with this distinction
    • 0:44:41between HTTP and HTTPS.
    • 0:44:44And if you're not, that's OK.
    • 0:44:45We're going to be talking about that a little bit later as well.
    • 0:44:47But you want to make sure that your web traffic is encrypted against people
    • 0:44:51who are able to just monitor the network for all the traffic that is going by.
    • 0:44:55You probably don't want your searches to be someone else's
    • 0:45:00fodder for entertainment.
    • 0:45:02VPNs.
    • 0:45:03If you use a VPN, that's a great thing to do if you're traveling, for example,
    • 0:45:06and you may be on less secure networks than you might find at your business
    • 0:45:11or at home or at a university institution, for example.
    • 0:45:16VPNs allow you to encrypt communications with a network,
    • 0:45:19and also allow the network to pretend to do something on your behalf so that
    • 0:45:24your web traffic cannot be traced back to you directly,
    • 0:45:28which might be advantageous in some situations as well.
    • 0:45:31Document storage as well.
    • 0:45:32So if you use services like Dropbox, for example, generally what
    • 0:45:37Dropbox is going to do is break your document into pieces
    • 0:45:40and encrypt those pieces.
    • 0:45:42Rather than just storing the whole file writ large in some server somewhere
    • 0:45:46on the cloud, it's going to encrypt it before it sends it over
    • 0:45:49so that you have some more comfort that your data is being
    • 0:45:52protected by these cloud services.
    • 0:45:54And certainly, we're going to talk a bit more about what the cloud is
    • 0:45:58and what cloud services are and what they can be used for a little bit
    • 0:46:01later in the course as well.
    • 0:46:05Hash functions and cryptographic hash functions are great,
    • 0:46:08but they are well documented and there's only the one.
    • 0:46:11There's only one version of SHA-1.
    • 0:46:13There's only one version of SHA-3.
    • 0:46:15And that is a limitation.
    • 0:46:17Now it might not be a severe one because it's pretty strong.
    • 0:46:20They're pretty strong algorithms.
    • 0:46:22But are there ways that we can improve our own cryptographic techniques
    • 0:46:27if we're trying to protect data that we are receiving,
    • 0:46:30data that we are sending, and so on?
    • 0:46:32And that leaves this idea of public-key cryptography,
    • 0:46:35or public- and private-key cryptography, or asymmetric encryption.
    • 0:46:38You'll hear these terms kind of used interchangeably.
    • 0:46:41Let's start by talking about public-key cryptography by way of an analogy.
    • 0:46:46We're going to go way back to arithmetic and algebra days here.
    • 0:46:50So imagine we have something like this.
    • 0:46:52We have 14 times 8 equals 112.
    • 0:46:58Multiplication we can think of as a function.
    • 0:47:01It is a function.
    • 0:47:02If 14 is our input and our function is times 8, the result is 112.
    • 0:47:08Now multiplication is not a hash function because it is reversible.
    • 0:47:11I can take that 112, multiply it by 1/8, or equivalently divide by 8,
    • 0:47:17and get back the original input.
    • 0:47:18So multiplication is a function, but it is not a hash function.
    • 0:47:24It is reversible because if we multiply any number x by some other number y,
    • 0:47:30we get a result z.
    • 0:47:31And we can undo that whole process by taking z, multiplying it by 1 over y,
    • 0:47:37or the reciprocal of y, and getting back the original x.
    • 0:47:40Reversible.
    • 0:47:41Goes in both directions.
    • 0:47:44Now let's take this function and kind of obscure it.
    • 0:47:47We know for ourselves that this function that I'm using is n times 8.
    • 0:47:54Whatever I pass in is going to be multiplied by 8.
    • 0:47:58But I don't tell you what that is.
    • 0:48:01I don't tell my friends what that is.
    • 0:48:03I just say, hey, if you want to send me a message,
    • 0:48:05just run it through this function.
    • 0:48:08So again, we're going to just use math as an example here.
    • 0:48:10If my message is 14, I might say, f of 14--
    • 0:48:15and again, this is getting back to algebra, maybe a little bit
    • 0:48:18back in the day--
    • 0:48:20f of 14 is 112.
    • 0:48:23That is my public key, you might think.
    • 0:48:27And you might say, having just gone through this whole example,
    • 0:48:29that, well, it's pretty easy to undo that.
    • 0:48:32If I know that 14 is the plain text and 112 is the cipher text,
    • 0:48:35I can probably figure out that your function is n times 8.
    • 0:48:39And so I've broken your encryption scheme.
    • 0:48:41I have figured out how to reverse your cryptography.
    • 0:48:46Well, it's true that n times 8 is certainly one function
    • 0:48:49that I could use to turn that plain text, 14 in this example,
    • 0:48:55into that cipher text, 112 in this example.
    • 0:48:58But there are other ways that I can do it.
    • 0:49:01My actual function could have been n times 10 minus 28.
    • 0:49:05So 14 times 10 is 140, minus 28 is 112.
    • 0:49:09And there are other contrived mathematical examples
    • 0:49:11that I could continue to do pretty much ad infinitum to define
    • 0:49:16ways to transform 14 into 112.
    • 0:49:20So just because you see that 112, that doesn't mean you
    • 0:49:26have figured out how to break my hash function.
    • 0:49:30You haven't figured out what my encryption technique is.
    • 0:49:34If all I say is, here's a black box that I would like you to feed an input
    • 0:49:37into, even if you see the output, you, or really more concernedly an adversary
    • 0:49:45who sees that output as well should not be able to, or cannot in this case,
    • 0:49:50undo it.
    • 0:49:50Because yes, I could have been using n times 8.
    • 0:49:52I could have been using this crazy thing involving the square of n.
    • 0:49:57And that's kind of the idea behind public-key cryptography.
    • 0:50:01I am going to publicize that I have a function that can be used,
    • 0:50:07but I'm not going to tell you what that function is,
    • 0:50:10and I'm certainly not going to tell you how to reverse it.
    • 0:50:14So public- and private-key cryptography are actually two hash functions
    • 0:50:19where the goal is to reverse them.
    • 0:50:21We kind of talked about this as hash functions
    • 0:50:24are supposed to be irreversible.
    • 0:50:26But the distinction here is that we are creating two functions, f and g, which
    • 0:50:32are intended to reverse one another.
    • 0:50:35So it's not that the function does the single function that is reversible,
    • 0:50:38it is that we have two functions that, working together, create a circuit.
    • 0:50:43If I take data and I run it through function f, I get some output.
    • 0:50:47If I run that output through function g, I get back the original data.
    • 0:50:52I have deciphered the information.
    • 0:50:53And the same thing works in reverse.
    • 0:50:55If I take some data and I run it through function g,
    • 0:50:58I get some hashed output that makes no sense.
    • 0:51:02And if I run that hashed output through function f,
    • 0:51:06I get back the original data once again.
    • 0:51:09Now the key is that-- pun intended-- the key is that one of these functions
    • 0:51:13is public and the other one is private.
    • 0:51:15One of them is available to everybody, and everybody uses
    • 0:51:19that function to send you messages.
    • 0:51:22If you want to send me a message using encryption,
    • 0:51:25using public and private key encryption, you take the message
    • 0:51:29and you use my public key to encrypt it, and you
    • 0:51:32send me the result, the hashed encrypted result.
    • 0:51:36And I use my private key to decrypt it.
    • 0:51:38And I am, ostensibly, the only person who
    • 0:51:41has my private key, even though I've broadcasted,
    • 0:51:43made my public key widely available.
    • 0:51:47Now the math that goes into this is well beyond the scope of a discussion
    • 0:51:52that we're going to have here today.
    • 0:51:53But basically, and most encryption, most cryptography
    • 0:51:57involves the use of prime numbers, particularly
    • 0:52:00very, very large prime numbers.
    • 0:52:02And you're looking for prime numbers that have a particular pattern.
    • 0:52:05And when I say "you're" looking for it, don't worry,
    • 0:52:07you don't have to do this yourself.
    • 0:52:09There are plenty of programs out there, RSA
    • 0:52:11being a very popular one, that can be used to generate
    • 0:52:14these public and private key pairs.
    • 0:52:16But the amazing thing is that it can generate these pairs very quickly,
    • 0:52:22but it's almost impossible to break or figure out
    • 0:52:26what the underlying functions, or even in this case
    • 0:52:28what the underlying two prime numbers are
    • 0:52:30that are the foundation for your own encryption strategy.
    • 0:52:33So it's pretty amazing that it's easy to define these functions
    • 0:52:37and almost impossible to reverse engineer them, so to speak.
    • 0:52:43So we start with a huge prime number, we find some other prime number that
    • 0:52:46has a property, a special property related to it,
    • 0:52:49and from those two numbers we generate two functions whose goal in life
    • 0:52:54is to undo whatever the first one does.
    • 0:52:56So f's job is to undo what g does, g's job is to undo what f does.
    • 0:53:03And this is called a public and private key pair.
    • 0:53:05So your public key is really some complicated hash function
    • 0:53:10that does work.
    • 0:53:11And that hash function is represented as a very long string
    • 0:53:14of numbers and letters.
    • 0:53:16It looks just like a hash digest.
    • 0:53:19But it's just a human representation, a readable representation
    • 0:53:23of a mathematical function.
    • 0:53:25And your private key is the same--
    • 0:53:27or your private key is also a representation of letters and numbers.
    • 0:53:30It's not exactly the same as your public key,
    • 0:53:32but it undoes the work that your private key does.
    • 0:53:35And again, these keys are generated using a program called RSA.
    • 0:53:39So let's take a look at exactly how we would
    • 0:53:42go about doing some asymmetric encryption using
    • 0:53:46public and private keys.
    • 0:53:48So here we have some original data.
    • 0:53:50It's a message perhaps that I want to send.
    • 0:53:52And I want to send it to you.
    • 0:53:54I want to send this message to you, but I don't
    • 0:53:57want to send it to you in the clear.
    • 0:53:59I don't want to, you know, it's sensitive information.
    • 0:54:02I don't want to send it via plain text.
    • 0:54:04And I don't want to use a generic hash function
    • 0:54:07because if I use a generic hash function, like SHA for example,
    • 0:54:10it's irreversible.
    • 0:54:11You will not be able to figure out what I tried to say.
    • 0:54:13So instead, I take this original data and I use your public key.
    • 0:54:18Your public key, again, is just a mathematical-- a very complex--
    • 0:54:22mathematical function.
    • 0:54:24So I take this data, I feed it into your public key, your public hash function,
    • 0:54:30and I get some garbled stuff out.
    • 0:54:33OK?
    • 0:54:34And this is what I send to you.
    • 0:54:36I send you this garbled stuff.
    • 0:54:39In order for you to figure out what the original message is,
    • 0:54:43you use your private key.
    • 0:54:44Not your public key-- your public key is what
    • 0:54:46I use to encipher the information-- but your private key, which
    • 0:54:49is known only to you, hypothetically.
    • 0:54:51It should not be distributed to others.
    • 0:54:55It undoes the work that your public key did.
    • 0:54:57And so if I give you the scrambled data and you
    • 0:55:00use your private key to try and decipher it,
    • 0:55:04you will get back that original data.
    • 0:55:07But here's the great thing.
    • 0:55:08No one else's private key will be able to do that.
    • 0:55:11If anybody intercepts that message other than you and they use their private key
    • 0:55:16or they use your public key again, they will not
    • 0:55:20be able to decipher the message that I sent to you.
    • 0:55:23And so public and private keys are very interesting because they
    • 0:55:26create these pairs.
    • 0:55:28They're these unique encryption schemes that are unique to two people,
    • 0:55:33or really even to one person.
    • 0:55:35If you were to send me a message back, you
    • 0:55:37would send me a message using my public key.
    • 0:55:41You would then send me whatever the encrypted sort of scrambled data
    • 0:55:45is for the message that you sent using my public key.
    • 0:55:50I would then use my private key, which is not
    • 0:55:53known to you or to, hypothetically, anyone
    • 0:55:55else to decipher what you sent me.
    • 0:55:58And I would get back the secret message, or the perhaps not-so-secret,
    • 0:56:01but sensitive message that you sent to me.
    • 0:56:05And so that's this idea of asymmetric encryption.
    • 0:56:09You can encrypt using someone's public key.
    • 0:56:12And anybody can do so.
    • 0:56:13And for that reason, you'll often find technically-minded people will
    • 0:56:17sometimes post their public key literally on the internet,
    • 0:56:20such that anybody who wants to send them a message using a secure channel
    • 0:56:24can do so.
    • 0:56:27And programmers as well.
    • 0:56:28So if I'm doing some work using a tool called GitHub, a popular service
    • 0:56:32available online for sharing and posting source code,
    • 0:56:38if I want to send something from my computer to GitHub's servers
    • 0:56:42in the cloud, I might authenticate using a public key and private key encryption
    • 0:56:48scheme so that they see that I'm using their public key to send them
    • 0:56:52information, they're decrypting it.
    • 0:56:53When they send information back to me, they're using my public key
    • 0:56:57and I use my private key to decrypt it.
    • 0:56:59It's actually part of--
    • 0:57:02it's part of a communication strategy used by technically-minded folks.
    • 0:57:06And you're not restricted to just having one public and private key.
    • 0:57:09For example, I have one public and private key
    • 0:57:11that I use for a secure email, I have one public and private key
    • 0:57:14that I would use for secure texting on my phone,
    • 0:57:19and I have one public and private key that I use for my GitHub repository.
    • 0:57:24So I have different sets and different combinations of these keys.
    • 0:57:29But the key is that-- the key, again, pun intended--
    • 0:57:31is that the decryption can only be done by someone who has the private key, not
    • 0:57:36the public key, because only those two functions are reciprocals
    • 0:57:40of one another.
    • 0:57:40They undo the work that the other did in the first place.
    • 0:57:46But interestingly enough, that's not the only thing
    • 0:57:49we can do with public and private keys.
    • 0:57:52So instead of just encryption, we also have this idea
    • 0:57:54of a digital signature, which is different than e-signature,
    • 0:57:57an e-signature just being the tracing of a pen typically
    • 0:58:00along some surface and just logging where all the pen strokes happen to be.
    • 0:58:04So we're talking about something much more complex than that.
    • 0:58:07We're talking about something cryptographically
    • 0:58:08based when we talk about digital signature.
    • 0:58:10It's kind of the opposite of encryption.
    • 0:58:14And using someone's digital signature, you
    • 0:58:17can verify the authenticity of a document and verify, more precisely,
    • 0:58:22the authenticity of the sender of a document.
    • 0:58:25And we're going to explain this in great detail in just a moment,
    • 0:58:30but the basic idea is they're signing the document using their private key.
    • 0:58:34You still don't see what the key is.
    • 0:58:36And because these public and private key pairs
    • 0:58:39are specific to an individual person, if you
    • 0:58:42were able to verify that that document could only
    • 0:58:45have been signed using someone's private key,
    • 0:58:49then you have quite a serious belief that that person
    • 0:58:53is the person who signed the document, who sent the document, and so on.
    • 0:58:58Digital signatures are 256 bits long pretty consistently,
    • 0:59:04which means there are 2 to the 256th power distinct digital signatures,
    • 0:59:08which makes the potential of a forgery effectively zero.
    • 0:59:13Again, I'm using this--
    • 0:59:14I'm trying to avoid saying never because computer scientists don't like never.
    • 0:59:18But effectively, there is no chance of a forgery.
    • 0:59:24Now the process for how one verifies a digital signature is quite--
    • 0:59:30there's quite a few steps involved.
    • 0:59:32And I have a diagram here that I sourced from online.
    • 0:59:34And what I'd like us to do now is walk through
    • 0:59:36this process to hopefully give you an understanding of how these work
    • 0:59:41and how you might be able to rely on digital signatures.
    • 0:59:44And states and different entities are recognizing digital signatures
    • 0:59:49as a valid way to sign documents, but it really helps
    • 0:59:53to have a good understanding of them such that you, as an attorney,
    • 0:59:57are comfortable with the fact that this does represent a specific individual.
    • 1:00:02So let's take a look at how this process works.
    • 1:00:07So we start with data.
    • 1:00:10Data in this case is any document.
    • 1:00:12Perhaps it's a scanned, signed version of some PDF with somebody's actual ink
    • 1:00:19signature.
    • 1:00:19But again, the whole thing is just scanned.
    • 1:00:22The next step is to use a hash function.
    • 1:00:24The hash function that we could use in this context could be anything.
    • 1:00:27It could be SHA-1.
    • 1:00:29It could be something very complex.
    • 1:00:32In general, the hash function that's going to be used here
    • 1:00:34is actually not a cryptographic hash function.
    • 1:00:36It's going to be something like MD5.
    • 1:00:38So something that anybody has access to.
    • 1:00:40And that's going to result in a hash, a set of zeros and ones.
    • 1:00:44In the case of MD5, it's going to be about 160 or so different characters.
    • 1:00:48Now where things get very interesting is we
    • 1:00:50take that hash, that set of zeros and ones,
    • 1:00:54and we encrypt it using the signer's private keys.
    • 1:00:58Remember, these functions are reciprocals of one another.
    • 1:01:00A public key can undo what the private key does,
    • 1:01:03and the private key can undo what the public key does.
    • 1:01:06Notice in this case we're still not sending anyone our private key.
    • 1:01:11We are just using our private key to encrypt something.
    • 1:01:14So we take this hash that we received from running our file through MD5,
    • 1:01:17we encrypt it using our private key, and we get some other result out of it.
    • 1:01:22This number that comes out of running the hash through our private key
    • 1:01:27is called the signature.
    • 1:01:29We then just couple that-- so when we send this off,
    • 1:01:32we send the signature plus the original document,
    • 1:01:36and that would be considered a digital signature.
    • 1:01:40So that's the signing part of the process.
    • 1:01:42That's where we go.
    • 1:01:43We start with a file.
    • 1:01:45We run that file through a generic hash function.
    • 1:01:47Not our public and private keys, something
    • 1:01:49that is generally pretty accessible.
    • 1:01:51We take that hash, we encrypt it using our private key
    • 1:01:55to get some other hash that looks similar, different zeros and ones,
    • 1:01:59but totally different pattern of zeros and ones.
    • 1:02:02We attach the original document and the digital signature when we send it off,
    • 1:02:07and that is considered a digitally signed document.
    • 1:02:11Now the real crux is how do you prove that I'm the person who
    • 1:02:15sent you this document, right?
    • 1:02:18If you want-- if you're receiving something
    • 1:02:20that has a digital signature, which is supposed
    • 1:02:22to be as good as any other kind of signature,
    • 1:02:25it's supposed to have legal effect.
    • 1:02:28How do we verify that that person who sent you the document
    • 1:02:32was actually the correct one?
    • 1:02:34So then we go to the verification step.
    • 1:02:36So we start, we've now received this digitally signed data.
    • 1:02:41This is the same as this digitally signed data here
    • 1:02:43that was sent by the sender.
    • 1:02:46We also received two pieces of information.
    • 1:02:48We received the document, the original document,
    • 1:02:53and we received the signature.
    • 1:02:55And recall, again, that the signature is what
    • 1:02:57happens when we take the hash of the document
    • 1:02:59and run it using our private key to get a result.
    • 1:03:05Now the interesting step here is remembering
    • 1:03:07that the public and private keys are reciprocals of one another.
    • 1:03:10So we can take this complicated signature hash
    • 1:03:14and we can use the public key, which, again, is publicly available.
    • 1:03:18Anybody should ostensibly have access to someone's public key, not
    • 1:03:23their private key.
    • 1:03:23And notice that the signer has never sent their private key.
    • 1:03:27They've only used it to encrypt some data,
    • 1:03:29but they never sent the private key.
    • 1:03:31The public key has always been available though.
    • 1:03:33We take the signature, we run it through the public key function,
    • 1:03:36and we get a hash.
    • 1:03:39We take the data, the document, and we run it through MD5,
    • 1:03:44the same hash function that the sender was supposed to use, and we get a hash.
    • 1:03:48And we're checking to make sure that these two
    • 1:03:50hashes are equal to one another.
    • 1:03:53If they are equal to one another, that means the signature is valid.
    • 1:03:56Let's talk about why that would be the case.
    • 1:04:01If we use the MD5 of this file, the generic hash of this file,
    • 1:04:07and we encrypt it using our private key, we get some result, OK?
    • 1:04:11But this is very easy to calculate.
    • 1:04:13It's MD5.
    • 1:04:14We're taking a basic document, we're running it through a publicly known,
    • 1:04:18well-defined hash function.
    • 1:04:19Anybody who has access to this document and a program on their computer
    • 1:04:22called MD5 can literally run this document through it
    • 1:04:26and get this number.
    • 1:04:27This is not the tricky part of this.
    • 1:04:30We then take this hash function, we encrypt it using our private key
    • 1:04:36to get some secret number.
    • 1:04:38The public key though will undo that.
    • 1:04:40Remember, the public and private keys are reciprocals of one another.
    • 1:04:43Whatever one does, the other one can undo.
    • 1:04:48And so only my public key will undo the work of my private key.
    • 1:04:54So if I take this value and I encrypt it using my private key,
    • 1:04:57and then I run this value through the public key,
    • 1:05:00I should get the original result again, the original MD5 hash.
    • 1:05:04And that's why we have to send the document as well, not
    • 1:05:08just the digital signature, the numbers that we
    • 1:05:10get by running it through our private key in the first place.
    • 1:05:13That way we have a way to validate that yes, this file has this checksum,
    • 1:05:18and the sender took that checksum, they ran it through their own private key,
    • 1:05:23and when I used their public key to undo it,
    • 1:05:26I get the same value, which is effectively proving, but is,
    • 1:05:31we'll term it as it's very, very, very, very
    • 1:05:34likely that this person who claimed to have sent the document
    • 1:05:39is, in fact, the person who sent that document.
    • 1:05:42And so that's what digital signatures can be used for.
    • 1:05:44It is a mathematical, cryptographic way to verify
    • 1:05:48the identity of the sender of a document or an individual.
    • 1:05:52Or in whatever context you might be using or receiving digital signatures,
    • 1:05:56it is purely a verification step that is based entirely in mathematics.
    • 1:06:02There's one other potentially interesting use
    • 1:06:05of digital signatures that's also quite buzzy right now,
    • 1:06:09and that's blockchain technology.
    • 1:06:11And what is the blockchain?
    • 1:06:13Digital signatures are really key to knowing how the blockchain works
    • 1:06:18and why it is trusted as a decentralized source of information for individuals.
    • 1:06:24So understanding digital signatures means
    • 1:06:27you are in a position to understand blockchain.
    • 1:06:30And I use here the term the blockchain, but it really is a blockchain.
    • 1:06:34There's no such thing as the one blockchain.
    • 1:06:37There are many different-- this is just an idea that is implemented.
    • 1:06:40Generally, we're hearing it in the context of a cryptocurrency,
    • 1:06:43but it does not need to be restricted to that, although cryptocurrencies are so
    • 1:06:49discussed in the media and have been dissected by so many researchers
    • 1:06:52that they provide an interesting vehicle, an interesting lens
    • 1:06:55through which to consider blockchain.
    • 1:06:57And so our example today is going to focus on Bitcoin.
    • 1:07:01It is the most well-documented of the cryptocurrencies.
    • 1:07:04It is the most well-documented implementation of the blockchain,
    • 1:07:07or among the most well-documented implementations.
    • 1:07:10But this is not specifically a lecture about Bitcoin.
    • 1:07:13We're just using Bitcoin as a lens through which to understand blockchain.
    • 1:07:19There's also an outside source that I strongly encourage.
    • 1:07:22This channel on YouTube provides interesting mathematical dissections
    • 1:07:27of topics, and they tackle blockchain and Bitcoin pretty extensively.
    • 1:07:32And this is an excellent supplementary resource
    • 1:07:34to consider if you're trying to dig into this
    • 1:07:36or understand it a little bit more, because in this video
    • 1:07:38I'm going to omit some of the more technical details for the sake
    • 1:07:42of, hopefully, broader understanding.
    • 1:07:44But if you want to dive into it more deeply,
    • 1:07:46this is a resource that I would recommend.
    • 1:07:49And I really like talking about Bitcoin in the context of blockchain
    • 1:07:53because it's actually how I kind of got started almost as an attorney.
    • 1:07:57When I was practicing, when I graduated from law school,
    • 1:08:01I decided to go out on my own and start my own firm.
    • 1:08:06I live in a small town and so a lot of my early work
    • 1:08:08was doing estate plans, wills and such for individuals in my town,
    • 1:08:12getting to know them.
    • 1:08:13But I had studied extensively technology-related law in law school
    • 1:08:17and I really wanted to use it.
    • 1:08:21And a few years into my practice, I had a friend
    • 1:08:23who needed an estate plan prepared, and he asked if he could pay me in Bitcoin.
    • 1:08:28And I had no idea what that meant.
    • 1:08:31I didn't really know anything about Bitcoin at the time.
    • 1:08:34And I looked it up and thought it sounded interesting,
    • 1:08:37and so I said sure.
    • 1:08:38So I learned how to set up an account.
    • 1:08:40And it's also worth mentioning at the outset,
    • 1:08:42as we're talking about cryptocurrency, that you
    • 1:08:45need to understand how Bitcoin works to use Bitcoin.
    • 1:08:47You don't need to understand how the federal banking
    • 1:08:50system works to use a bank.
    • 1:08:54And the same is true here with Bitcoin.
    • 1:08:55But I ended up accepting a Bitcoin payment
    • 1:09:00by creating what's called a Bitcoin wallet.
    • 1:09:03I immediately sold the Bitcoin that I received and turned it into cash, such
    • 1:09:07that I could use it for more generic purposes.
    • 1:09:11And what I decided to do was send out a press release saying,
    • 1:09:14oh, I accept Bitcoin, because it was something that was novel
    • 1:09:17and I hadn't really heard that much about it.
    • 1:09:19And this got the attention of my local paper and companies
    • 1:09:23in the area that were technically minded as well.
    • 1:09:25And so Bitcoin sort of provided this forum
    • 1:09:28to meet new clients that also allowed me to explore fields
    • 1:09:32of the law about which I am passionate.
    • 1:09:34So it's kind of an interesting segue to be able to share that with you now.
    • 1:09:39All right, so stepping away from Bitcoin again more broadly to blockchain.
    • 1:09:43What is the blockchain?
    • 1:09:44It's very similar to something you've already learned about,
    • 1:09:47which is a linked list.
    • 1:09:49So recall that a linked list is a set of nodes, each of which
    • 1:09:52have connections forward and backward to other nodes in the chain.
    • 1:09:57They are linked together.
    • 1:09:58And similarly, with a blockchain, all of the blocks are chained together.
    • 1:10:02It's basically the same terminology slightly modified.
    • 1:10:06So a linked list is a set of nodes, each of which is connected to the one prior
    • 1:10:10and the one after it.
    • 1:10:12We learned about linked lists as having generally three pieces of information
    • 1:10:15associated with them-- a previous pointer, which is basically
    • 1:10:18a reference to the prior node, or in this case, the prior block;
    • 1:10:23we have the next pointer, which is a reference to the next node
    • 1:10:26or the next block; and we had data.
    • 1:10:31And in this case, the data is actually two different things.
    • 1:10:33There's the real data.
    • 1:10:35And again, in the context of a cryptocurrency blockchain
    • 1:10:38we're going to be talking about a list of transactions,
    • 1:10:40a numbered list of transactions from person A to person b,
    • 1:10:44each of those transactions being digitally signed such
    • 1:10:47that you can verify that the person who logs that transaction
    • 1:10:50is actually the one who made that transaction.
    • 1:10:53And also, something called a proof of work.
    • 1:10:55And this proof of work is very interesting
    • 1:10:57because this is how Bitcoin ostensibly derives its authority.
    • 1:11:01There is no central controller of the Bitcoin currency,
    • 1:11:07and it is very decentralized.
    • 1:11:09And there needs to be some way for people
    • 1:11:11to agree as to what the true ledger is, or what the true set of transactions
    • 1:11:18that have happened are.
    • 1:11:19And the way that is done is by relying on something called the proof of work.
    • 1:11:24And we'll dive into that shortly as well.
    • 1:11:27So again, cryptocurrencies, that data is a ledger of transactions, each of which
    • 1:11:31is digitally signed using the digital signature
    • 1:11:33technique we've just discussed by the person who
    • 1:11:37made or initiated that transaction.
    • 1:11:40And that ledger is decentralized, which means that any time there's
    • 1:11:43ever a change, any time any transaction is recorded, in this case,
    • 1:11:47using Bitcoin, again, our lens through which to consider blockchain,
    • 1:11:50that message is broadcast out.
    • 1:11:53So if I make a transaction in Bitcoin, I pay you $10,
    • 1:11:58I'm going to announce to everyone else who has a Bitcoin wallet
    • 1:12:02or who is monitoring the blockchain, the list of transactions, hey,
    • 1:12:07please add the following transaction to this list, Doug pays you $10.
    • 1:12:14And that is announced to everybody, everybody records it in their ledger,
    • 1:12:18and then some stuff is going to start happening.
    • 1:12:22But here is a potential issue.
    • 1:12:25How do you know that the blockchain is legitimate?
    • 1:12:28How do you know that your copy of what is being said is the truth?
    • 1:12:34How do you know that your copy of the blockchain is accurate with respect
    • 1:12:37to all other transactions that have happened?
    • 1:12:40Everybody else has their own copy as well.
    • 1:12:42It's decentralized.
    • 1:12:43We all maintain, anybody who's using Bitcoin maintains
    • 1:12:46their own copy of the blockchain.
    • 1:12:51How do you defend against people modifying it?
    • 1:12:57That's a very interesting question.
    • 1:12:59The way that cryptocurrencies do it is to assume--
    • 1:13:02and this is defined in the Bitcoin paper--
    • 1:13:04the way the cryptocurrencies do it is to assume
    • 1:13:07that the chain that has the most computational work put into it
    • 1:13:11is the true chain.
    • 1:13:13This decision is completely arbitrary.
    • 1:13:16There's no reason why one needs to be vetted over the other.
    • 1:13:19But something had to be agreed upon by, collectively, users of Bitcoin
    • 1:13:23to say in the event of a dispute, between which person's chain is
    • 1:13:30the accurate de facto definitive list of transactions?
    • 1:13:34We're going to go with the one that has been verified the most times.
    • 1:13:38And again, this word verified is sort of a sketchy word.
    • 1:13:41There's nothing inherently about proof of work
    • 1:13:44or anything else that proves that a transaction has taken place in the way
    • 1:13:49that we normally think of this term verified.
    • 1:13:52Rather it is the collective standard by which we all agree to adhere,
    • 1:13:57that the person--
    • 1:13:58or that the blockchain that has the most proof of work in it is the list.
    • 1:14:04That is just something we must subscribe to as users
    • 1:14:08and consumers of blockchain.
    • 1:14:10Now how do we determine which blockchain has had the most computational work
    • 1:14:14into it, which copy of the blockchain has had the most computational work put
    • 1:14:18into it?
    • 1:14:20Well, this is proof of work.
    • 1:14:23So proof of work is how the correct blockchain of all the copies
    • 1:14:29that are decentralized is determined.
    • 1:14:33So recall how hashing works.
    • 1:14:35Hashing allows us to take any arbitrary data and run it through a hash function
    • 1:14:42and get an outcome.
    • 1:14:44And that outcome is going to be, let's say 256 bits, each of those bits being,
    • 1:14:49of course, 0 or 1.
    • 1:14:52Now there's a lot of different combinations there.
    • 1:14:56But some of them will be very unique.
    • 1:15:00And the way Bitcoin works, Bitcoin's blockchain works
    • 1:15:05is to prove a particular block.
    • 1:15:08We are asking people who are oftentimes called miners--
    • 1:15:12that's where this term comes from because they are mining. ,
    • 1:15:15Ultimately the reward for doing this proof of work is to receive Bitcoin
    • 1:15:19that are sort of generated out of thin air.
    • 1:15:20And so these people are termed miners.
    • 1:15:22But we are asking anyone who has a computer to hash the entire block.
    • 1:15:28So hash the entire list of transactions, the reference
    • 1:15:31to the previous block and the next block.
    • 1:15:32And remember, all of that is contained in a single node of this blockchain,
    • 1:15:35basically.
    • 1:15:36And we're looking for a highly unusual pattern.
    • 1:15:40We're looking for maybe the first 30 bits or the first 40 bits
    • 1:15:44to all be zeros.
    • 1:15:47That's really weird.
    • 1:15:49Like, that's a really difficult pattern to find.
    • 1:15:51And the only way to do it is to guess.
    • 1:15:53So you take this entire block, you attach a single piece of data
    • 1:15:56to the bottom of it, like 1, 2, 3.
    • 1:15:58You can just count in that way trying to guess.
    • 1:16:01And if you hash that entire thing together,
    • 1:16:05do you eventually find a block that, when hashed in this way,
    • 1:16:09produces this very, very unique pattern?
    • 1:16:13If so, you just say, here's the number that I attach.
    • 1:16:16So let's say I took the entire block and I hashed it with 12345
    • 1:16:21was the number, right?
    • 1:16:24It's very difficult to find a value that would
    • 1:16:29create this unique pattern of zeros and ones,
    • 1:16:31in particular, zeros, 30 zeros in a row.
    • 1:16:33But it's really, really easy to verify that someone has done it.
    • 1:16:38To verify that someone has done it, all you have to do is if they announce
    • 1:16:43the number that they used, 12345, as their proof of work--
    • 1:16:46and that's what the proof of work really is,
    • 1:16:48it's that number that they use to figure it out--
    • 1:16:51if they announce that and you hash the block with that number,
    • 1:16:54you can verify, yes, that pattern is actually 30 zeros in a row.
    • 1:16:59So I guess you have proven it.
    • 1:17:01Now this is, again, kind of arbitrary.
    • 1:17:04Like, this seems weird.
    • 1:17:05Why are you spending all your time trying
    • 1:17:08to figure out a specific pattern that exists somewhere?
    • 1:17:12That is a question that I cannot answer other than to say that it is
    • 1:17:16the standard by which people who have ascribed to the Bitcoin standard have
    • 1:17:22just agreed to be bound by.
    • 1:17:23The person who finds this number is probably the--
    • 1:17:27is proving the validity of all the transactions above it.
    • 1:17:31And this gets interesting when you think about somebody trying
    • 1:17:34to perpetrate a fraudulent transaction.
    • 1:17:37So imagine I'm trying to perpetrate a fraudulent transaction by initiating
    • 1:17:41a transaction that says, I'm going to pay you $100.
    • 1:17:43And I announce that to you, but I don't broadcast it
    • 1:17:47to everybody else who maintains the blocks, who are maintaining
    • 1:17:50their own copies of blockchains.
    • 1:17:52Which is interesting because you think that I have spent $100,
    • 1:17:55and as far as you're concerned I have spent $100 to you,
    • 1:17:58but no one else is aware of that.
    • 1:18:00So no one else thinks that I have spent $100.
    • 1:18:04They all think I am $100 wealthier than I actually am.
    • 1:18:09The problem then arises that I need to verify that block.
    • 1:18:13I need to verify that transaction.
    • 1:18:15So I append the transaction to my own copy of the blockchain
    • 1:18:18because I am the only person other than you--
    • 1:18:20the two of us maybe have these copies of the blockchain,
    • 1:18:23but everybody else, I didn't broadcast this transaction
    • 1:18:25so no one else knows about it.
    • 1:18:28In order for it to have a proof of work attached to it,
    • 1:18:31in order for it to be considered the valid chain,
    • 1:18:35I would need to prove that block.
    • 1:18:39I would need to find that secret number that when hashed with the entire block,
    • 1:18:44produces a pattern of 30 consecutive zero bits before anybody else does.
    • 1:18:50So that's a 1 in 2 over 2 to the 30th power chance
    • 1:18:55because I'm looking for a pattern of 30 consecutive zeros.
    • 1:18:59There's a 1 in 2 to the 30th power chance
    • 1:19:02that I'm going to find that pattern.
    • 1:19:04And I have to find that pattern before somebody else.
    • 1:19:06And in the meantime, other transactions are coming in on my ledger.
    • 1:19:11On my-- other people are broadcasting their transactions.
    • 1:19:13And I have to keep adding them to my ledger
    • 1:19:16and keep proving that work over and over and over,
    • 1:19:19all the while trying to stay ahead so that my fraudulent transaction is
    • 1:19:23considered ultimately the correct blockchain.
    • 1:19:27Now the odd-- you just can't beat the odds of that.
    • 1:19:30One malicious person trying to perpetrate a fraudulent transaction
    • 1:19:35using the blockchain cannot stay ahead.
    • 1:19:38They can't win the find the secret number
    • 1:19:42game over and over and over and over.
    • 1:19:45Eventually, some other chain, which contains valid transactions,
    • 1:19:48will win out over my attempted fraudulent chain.
    • 1:19:54And it will be disregarded.
    • 1:19:55Nobody will consider that to be a valid part of the chain anymore.
    • 1:20:00And so that's kind of how this works.
    • 1:20:02Again, it's arbitrary the way they decide to resolve or verify.
    • 1:20:07There's nothing about this process that proves
    • 1:20:09that person A sent person B money.
    • 1:20:12It's just the consensus that we have decided, well,
    • 1:20:15if people have gone through the effort to try and find these secret numbers,
    • 1:20:19and many different people are doing it, and this one chain is longer than
    • 1:20:23the others because it's been verified-- again, using this term verified--
    • 1:20:27it's been proven with work over and over and over, we're
    • 1:20:31just going to agree that that's the right one.
    • 1:20:33So again, it's kind of strange.
    • 1:20:34And I do, again, refer you to that video that I shared earlier
    • 1:20:37to get into some of the more technical details of this,
    • 1:20:39which I'm glossing over a little bit here in this discussion.
    • 1:20:42But proof of work is basically the collective consensus
    • 1:20:46of blockchain users, or in this case specifically,
    • 1:20:48of Bitcoin users, for which transactions they are going to consider valid.
    • 1:20:54Because changing any one-- and if you go back in time, as opposed
    • 1:20:56to trying to forward think I want to add a new fraudulent transaction,
    • 1:21:00if you try and go back in time to modify a transaction from the past,
    • 1:21:03say there was a transaction that was you pay me $10
    • 1:21:08and I maintain a copy of the blockchain, so I can go back in time
    • 1:21:12and modify that file, technically, I change it to you pay me $100, well,
    • 1:21:21because I've changed even the tiniest thing
    • 1:21:23and I'm hashing that block, that means that when I hash it
    • 1:21:26with that secret number, I'm no longer getting that secret pattern of 30
    • 1:21:29numbers, 30 zeros in a row.
    • 1:21:32And so that kind of calls that transaction into question.
    • 1:21:35It also, because each of those blocks contains a reference
    • 1:21:38to the next block and the previous block,
    • 1:21:40it also invalidates all of the other transactions in that blockchain.
    • 1:21:46And so because of this weird technique we're
    • 1:21:50doing of hashing blocks, hashing data, trying to look for specific patterns,
    • 1:21:54but realizing that any cryptographic hash function with the tiniest
    • 1:21:58change to the input creates a totally different output,
    • 1:22:03we actually are pretty well defended against people
    • 1:22:06who try and go back in time and make fraudulent transactions using
    • 1:22:10the blockchain.
    • 1:22:11So it's mathematical and it's quirky, but it does provide a clever way
    • 1:22:16to defend against that kind of thing, considering
    • 1:22:19we don't have a central authority to rely on
    • 1:22:21to adjudicate these kinds of disputes.
    • 1:22:22We are collectively, not trusting one another enough,
    • 1:22:26but agreeing to trust the mathematics of the blockchain in order
    • 1:22:30for it to succeed.
    • 1:22:33So as I mentioned, we can very easily verify the correctness
    • 1:22:36of someone's proof of work.
    • 1:22:37That proof of work is just the number that is hashed with the block
    • 1:22:41to produce the secret pattern of 30 zeros and then some other bits,
    • 1:22:46and so on.
    • 1:22:47The longer a chain gets, the more and more likely
    • 1:22:49it is that all the transactions in it are "verified."
    • 1:22:51Again, I keep putting air quotes around that word
    • 1:22:54because it doesn't mean in exactly the same way
    • 1:22:57that we might consider verified colloquially to mean.
    • 1:23:00It doesn't prove anything about the transaction itself,
    • 1:23:03just that we accept it as the standard.
    • 1:23:06We accept this as the de facto truth because of all the mathematics
    • 1:23:09that have been put into it.
    • 1:23:12So the longer a chain gets, the more likely
    • 1:23:14it is that it consists of only verified, legitimate transactions.
    • 1:23:17But that brings up a question of, what is a transaction?
    • 1:23:22A transaction is just an exchange between two people.
    • 1:23:26And if we start to really spread things out,
    • 1:23:28we can almost think about a transaction as a contract.
    • 1:23:33I offer you $10 for you to do something on my behalf,
    • 1:23:37and assuming that we're intending for me to actually give you these $10,
    • 1:23:41and you're intending to actually do something on my behalf,
    • 1:23:43and the thing that you're doing for me is not illegal,
    • 1:23:47we've basically formed a contract.
    • 1:23:50And so while Bitcoin can be used, the blockchain for Bitcoin
    • 1:23:54can be used to send money back and forth between people,
    • 1:23:57the data that goes into the data block of any blockchain is arbitrary.
    • 1:24:02And there's no reason why, instead of being a list of transactions,
    • 1:24:07that data couldn't be something much more significant than that.
    • 1:24:10There's no reason it couldn't be a digitally signed PDF
    • 1:24:14scan of a contract between two people.
    • 1:24:17There's no reason it can't be a message from me typed to you saying,
    • 1:24:22I will pay you $100 if you paint my house on Tuesday,
    • 1:24:27and you sending something back in that same chain saying, I will paint--
    • 1:24:32I accept your offer for this payment.
    • 1:24:35I accept your offer.
    • 1:24:36I will paint your house on Tuesday in exchange for $100.
    • 1:24:40We've just formed a contract with no middleman at all.
    • 1:24:44We are announcing our intentions.
    • 1:24:46It is being recorded publicly in everybody's version of the blockchain.
    • 1:24:50There is verified, again, verified in the sense
    • 1:24:53that we collectively term to be accurate rather than
    • 1:24:59proving that I definitely sent this although the digital signatures
    • 1:25:02associated with these transactions do, again,
    • 1:25:04suggest yes, I am the person who made this transaction because I digitally
    • 1:25:07signed it.
    • 1:25:08If I do the same thing with a contract, if I send you an offer
    • 1:25:12and you accept, and both of those items are in the chain,
    • 1:25:16we arguably have formed a contract.
    • 1:25:18And that is what the blockchain associated with the Ethereum technology
    • 1:25:22is actually more akin to.
    • 1:25:24So Bitcoin is kind of restricted in how it
    • 1:25:27approaches cryptocurrency and approaches transactions between people.
    • 1:25:31And Ethereum opens up a little bit more.
    • 1:25:34And there are other blockchain technologies and other services
    • 1:25:36that rely on the blockchain in order to do things far
    • 1:25:41beyond what a cryptocurrency could do.
    • 1:25:45But all these things are only possible because we rely on--
    • 1:25:49we rely so extensively on cryptography.
    • 1:25:51We use computers to send information securely, encrypt information.
    • 1:25:57And the mathematical unlikelihood of someone
    • 1:26:00being able to duplicate our work, or certainly
    • 1:26:02reverse engineer this encryption is what gives us
    • 1:26:06the confidence to make these transactions in the first place.
    • 1:26:10And so cryptography forms the basis of almost everything
    • 1:26:12that we do when we talk about security on a computer.
    • 1:26:17But ultimately, cryptography just relies on mathematics.
    • 1:26:21So the moral of the story is probably this.
    • 1:26:24You are probably not going to be implementing
    • 1:26:26your own version of the blockchain.
    • 1:26:27And really, you don't need to understand it completely in order to use it.
    • 1:26:32Like I said, you can use Bitcoin without knowing the mathematics of how Bitcoin
    • 1:26:36works, just like you can use a bank without knowing the minutia of how
    • 1:26:40the banking system works.
    • 1:26:42The point of the blockchain is to remove a central authority.
    • 1:26:46We don't rely on one person or one entity or one government
    • 1:26:50to determine what has happened, what the transactions are
    • 1:26:54like we do with a bank.
    • 1:26:55Your bank has a ledger of everybody's accounts.
    • 1:26:58With blockchain technology, we are decentralizing this and making it
    • 1:27:03so that everybody has access to all of the information at once,
    • 1:27:05and it is everybody's responsibility to keep that ledger accurate.
    • 1:27:09And because these ledgers rely so extensively on cryptography,
    • 1:27:13because this technology relies on cryptography,
    • 1:27:16we can use the power of cryptography, the fact
    • 1:27:18that things are very difficult to reverse engineer mathematically
    • 1:27:23to verify that yes, these are the things,
    • 1:27:25these are the things that have happened, these are the transactions that
    • 1:27:28have been logged, and everybody knows about it at the same time.
  • CS50.ai
Shortcuts
Before using a shortcut, click at least once on the video itself (to give it "focus") after closing this window.
Play/Pause spacebar or k
Rewind 10 seconds left arrow or j
Fast forward 10 seconds right arrow or l
Previous frame (while paused) ,
Next frame (while paused) .
Decrease playback rate <
Increase playback rate >
Toggle captions on/off c
Toggle mute m
Toggle full screen f or double-click video