Kind of a big deal. You'd have to be a total square not to have heard about them. Me? I've got eight.
Often over-complicated, over-mysticised, over-singularised (I don't even know what the right word for it is, but people say The Blockchain a lot). What are they? Join me for a rough tour from the ground up and I'll try to make sure you leave here knowing the answer to one question:
What are people talking about when they talk about blockchains?
There's a lot to cover, so it's actually going to come in two parts. This, the first, will look at the data structures known as blockchains and their properties, along with any other bits and pieces you need to make sense of them.
The second part will apply what you've learnt to the practical and widespread applications of blockchains to power distributed ledgers, cryptocurrencies such as Bitcoin and Litecoin and smart-contract based chains like Ethereum.
Before we jump into defining blockchains themselves, we need to cover one concept which is so core to their working it makes absolutely no sense not to bring everyone up to speed.
That concept is what's known as a hash. Related is hashing, the action by which we get our hands on a hash, and a hash function, the specific algorithm you run.
I'll not go into how different hashing algorithms work, since there are many and that's not the important thing to know about them at this point. What's important is how they behave.
A hashing algorithm is a transformation that you can apply to any piece of data, long or short, and get an output which will always be the same length. What that length actually is depends on the specific algorithm, but any given algorithm will always produce hashes of the same length.
For a concrete example, let's look at SHA256, a popular and common hashing algorithm. This the hash of the string unwttng:
A small note: whilst the "256" in SHA256 refers to the fact that the output is exactly 256 bits long, it's quite customary to represent hashes in hexadecimal notation (that is, using 0-9 and a-f) - that's what we'll use in this article.
As long as we always use SHA256 to perform the hash, "unwttng" will always produce this exact output, and any string will produce a hash of exactly 64 characters in length. Take a look at the hash of a totally different string:
The above is true of any hashing algorithm. However, there are a few more properties that are also commonly expected of a good hash function.
Uniformity of the hash function says that we want any given hash output to be in some sense equally likely (or close to it). That is, if we compute millions and millions of hashes using the function, there shouldn't be any real pattern to where the outputs fall in the possible space of outputs. There shouldn't be loads of them that start with an "A", or loads that end with ten "X"s, or anything like that.
Uniformity is a very important feature because it minimises the risk that hashing any two inputs will produce the same output to vanishingly small probabilities. Most practical usage of hashing across all kinds of software is done under the assumption that accidental "collisions" of this kind are effectively impossible.
Non-invertibility is another feature that's often desirable, depending on the intended usage of the algorithm. This says that it should be impossible, or at least prohibitively difficult, to work out the input that led to any given hash. Ideally, it should be easy to transform data into a hash, and practically impossible to go the other way.
Finally, discontinuity tells us to expect that similar inputs will produce drastically different hashes. There are algorithms with very specific (usually search-related) applications that aim for the opposite of this, but most hash functions that you come across will be as non-continuous as possible. The difference between the hash of 'abcdefg' and 'abcdefh' should be almost total, rather than them differing by only one or a few characters.
Try it out for yourself! The below will let you explore SHA-256 - put anything you like in the input and watch the hash change. Try and observe the properties I've listed above if you can.
A blockchain is a data structure.
More specifically, it's a data structure for storing a carefully ordered sequence of data. Even more specifically, it's a data structure for storing a carefully ordered sequence of data in a way that makes it very hard to tamper with.
That last part's important - lots of structures have been invented to store things in order - plain arrays, linked lists, doubly-linked lists, yada yada. Blockchains aim to add a feature to these by making sure that if any member of the list is modified in any way, to any extent, the chain is verifiably invalid by anyone who cares to check. They're basically just robust lists - no magic.
A blockchain, like pretty much any other list-based data structure, is made up of a unit (a block) capable of storing a package of data, and some mechanism for joining the blocks together in order. Hence the name. We're packaging up our data in neat blocks, and then making a chain of blocks.
Let's make one! First, we'll need our data.
What you actually decide to put inside your blocks is very much a usage-dependent thing. It could be a bunch of single strings like our example blocks here, or it could be entire bundled-up packs of thousands of transactions (as is the case with your average cryptocurrency). For now, keep in mind that it basically doesn't matter what kind of data we have, we just want to store generic blobs of something.
We'll also want some metadata about the block. Again, what form this takes is dependent on the purpose of the specific blockchain, but as an example we're going to include a block number inside each block. The first block is called number 1, the second one number 2, and so on.
A typical dynamic list structure at this point would add a few bits of information to logically link lots of blocks together into an ordered sequence. These commonly take the form of pointers to the location of the previous and next items in memory. We'll go ahead and steal that idea.
We now have a fully-functioning doubly linked list, which will serve many applications just fine.
However, a blockchain one-ups all of this: each block also stores a hash of the previous block. The discontinuity and uniformity of hashes (discussed above) lend them very nicely to checking the integrity of data, since if even a single bit of the input data changes, the hash it produces will be obviously and drastically different. Let's add in our hashes to complete a basic but functioning blockchain.
There are a few things to point out here. Firstly, that the first block is special. It's "previous block hash" is all zeroes. That's because there is no previous block to hash, and so we have nothing to validate. This first block is often called the Genesis Block - you can even have a look at Bitcoin's genesis block here if you're interested. There's a lot in there that probably won't make much sense to you just now, but you can see the "Previous Block" hash is all zeroes, just like ours!
Secondly, it's important to get your head around exactly what data we're hashing. The "previous block hash" for any given block is not just the hash of the previous block's string data. It's actually the hash of the entire previous block - metadata, previous hash and all.
To labour that point (I said it was important), look at block number 2. It's previous hash is 48b48...e76cb. Take a moment to find it in the blockchain above.
However, SHA256("I'm a block!") is not equal to 48b48...e76cb:
We reached the 48b48...e76cb hash not by using the string data by itself, but by combining all of: the block number, the previous hash and the string data, in that order. That is, we hashed the string "1" + "00000000...00000000" + "I'm a block!", and in doing so we got the result you can see in the chain:
As far as their operation as a simple data structure go, that's literally all there is to a blockchain! Some connected blocks, with data, with each one holding onto a hash of the previous one.
Different implementations of blockchains use different amounts of extra metadata, and they bundle up all kinds of data in their blocks, but this is the principle. In the next article we'll go further and look at how we can unlock a huge amount of power in these simple structures by sharing them across large distributed networks.
For now though that's a lot to take in, so let's consolidate things by thinking a bit more about this simple blockchain we've put together. Specifically I want to take you through exactly how it is that including hashes in the chain makes it secure against tampering, as I mentioned above.
Let's roleplay. We work at a bank which has decided to store the record of all of its customers' transactions in a blockchain. They've made this choice because somebody told them that a blockchain allows them to verify that nobody has been naughty and tampered with any individual records. People might want to do this for all sorts of reasons - to send money to themselves, frame somebody else for a crime, whatever. It'd be bad if it was easy.
I want to make clear how the structure of a blockchain that we introduced above helps protect against this kind of mischief. Here's the blockchain that we'll use as an example (from now on I'll leave out the memory pointer arrows in a futile attempt at screen-space efficiency):
Perhaps these blocks are stored in some distributed way - one file per block, say, such that you can have reasonable confidence that a wannabe attacker might only be able to tamper with one or a few blocks of your chain. The detail isn't that important (although, if that confidence seems a little miguided to you, hold that thought - you're going to fit right into the cryptocurrency crowd).
Your job at the bank is to occasionally validate this transaction log to make sure it hasn't been messed with (or to write a program to do it, if you're not up for calculating many hashes with a pen and paper). What's your process?
Start at the beginning of the chain. For each block you come to, you're going to calculate the hash of all the data wrapped up in the previous block and compare it to the "previous hash" stored in the current block. If they match, great! Hashes are very fast to calculate so it's no skin off your back to make this check.
Now let's imagine I enter the scene. My job at the bank is to be a criminal stooge and try to steal money. I happen to know that Ingrid's filthy rich, so I update block number 2 for my own gain. However, I'm unable to get at any other blocks, so I leave them as they are. Now the chain looks like:
When you next come to validate the chain, what do you find? Block 1 checks out, as it always does. Block 2 checks out - it has the correct hash for block 1. That's weird, right? The bad block itself validates just fine. Block 3, however, very much does not check out. You work out the hash of block 2:
Woah! That's not the previous hash that block 3 claims it knows about! In fact, it's nothing like it at all - this is discontinuity of the hashing function at work. You now know that the chain has been tampered with. You'll need to... I don't know, do whatever a bank would do in this situation - restore the chain from last backups?
Panic, I guess.
How come it was so important to me earlier on that you understand that each block's "previous hash" is the hash of the entire previous block and not just its data string? To see, just go one step further into your validation process and look at block 4. If you've been good and recalculated block 3's "previous hash" to take account of block 2's changes, then you're going to find that block 4's "previous hash" now no longer matches the hash of block 3! This is because part of block 3 has changed - namely, the "previous hash" referring to block 2.
See how the hash-chaining mechanism of a blockchain means that even a single corrupt or tampered-with block will invalidate the entire chain after it. Cool, right? If each block only had the hash of the previous block's content, but not its metadata too, then in our example block 3 would have told us something was amiss, but block 4 would have checked out as good again. A much weaker state of affairs, I hope you agree.
In this part, we've learned about hashes, and the structure of a basic blockchain. We've also taken a look at an only-a-bit-contrived example of how a blockchain can provide security that a more basic list structure can't, using the power of hashes for data integrity checks.
Next time, we'll see how blockchains are being applied to great effect (and huge financial value) in the real world. Starting with the grand-daddy of cryptocurrencies, Bitcoin, we'll learn about proof-of-work and distributed ledgers to find out how you can make a functioning decentralised currency using a blockchain. Then, finally, I'll cover the basics of the even-more futuristic smart-contract blockchains emerging to generalise the concept to more than just currency, like Ethereum.
Click the link below for part 2.
Thanks for reading!