Wednesday, July 4, 2007

The next physical wave of the Internet

The first physical wave was all about decentralized connectivity. IMPs and later routers permitted a file to be broken into many fixed-sized blocks or packets and then sent independently to a distant machine. The blocks took different routes, arrived out of sequence and sometimes with duplication or retries, but no matter, they were all stitched together perfectly at the receiving end. The distinguishing feature of this architecture was that there was no one machine controlling the journeys of these blocks. It didn’t matter if a router broke down or T1 communication lines were severed, the emergent behavior of all those routers was to find a way to get all those packets delivered. It worked in the face of disaster or misconfiguration, and the process of delivery was abstracted completely from the many applications that depended upon it.

The next physical wave is about decentralized storage. We have the cheap hard-drives and servers that hold them and we have lots of data to protect. The problem is that we manage the data in an old-fashioned centralized way. Napster and its progeny were on the right track, but they were all about sharing; providing access through thousands of copies of a (music) file. But that’s no good here. Today’s user demands privacy but wants the same convenience of machine independence.

Visualize this: In the same way that the TCP/IP protocols split up a file into blocks for transfer, let’s do it for storage. We’ll compress the data for efficiency and encrypt them into thousands of anonymous blocks and store them on many different ‘block servers’. The block servers will be like stripped down web servers; only smart enough to accept a block for storage based on a 64 bit number and give it back in future when presented with that same 64 bit number. If you break into one of these servers, what will you see? There will be hundreds of millions of encrypted blocks of exactly the same size addressed by a set of these numbers.

Next, place some intelligence on the client computers that use this space. When it is time to store a file, software will create those blocks and then send them to the block servers. But how will it decide which block server should be used, and where (the 64 bit number) on that server it should be placed? I’m sure that you can dream up strategies to place blocks based on an ascending sequence of addresses on the next available server, but I suspect most of these ideas will require some central authority that regulates where everyone’s blocks must go to prevent conflicts. That’s no good for the next wave. We cannot efficiently grow a centralized storage system without technological (scaling) or political problems. We’ve tried that. It’s not working.

Here’s how we do it: Create a ‘storage schedule’ on the fly at the instant of storage for a file that is based on (1) a user’s privately-held encryption key and (2) the complete pathname of the file to be stored. This schedule will be created in a 64 bit number space using ‘one-way’ functions developed over the last 30 years by encryption theorists. Store the blocks. Discard the schedule. At some time in the future when you want to retrieve the file, recreate the schedule from (1) the encryption key and (2) the file’s pathname and use it to go to each block server in the list and ask for the particular block.

Let’s think about the ramifications of this technique. First, the ‘one-way’ functions statistically guarantee that the servers all receive an equal number of blocks so our hardware people will love us because all the equipment is used to peak efficiency. Secondly, the blocks of the file are retrieved through a direct numerical calculation – unlike conventional solutions that require two or three database lookups. This eliminates the requirement for expensive IT staff to manage complicated mission-critical database servers. Thirdly, we have a storage algorithm with two variables. If we hold the encryption key constant and permute the file's pathname, we have a hierarchical file system that can grow to any size (as long as we add more block servers when they fill up) that depends only on that one encryption key. If we permute the encryption key but use the same file pathname the schedule is still unique so we can have any number of independent file systems co-existing on the same block servers. That gives unbounded scalability.

The real issue in everyone’s mind, however, is privacy. Why would I place my personal data on someone else’s server? Why would I trust someone to hold my data? Let’s analyze the security of this architecture. With conventional technology, looking for customer data is a bit like breaking into a bank. Once the thief gets through the ‘door’ by hacking through the security or bribing the sys admin, the file system is laid out before them and they can easily get to the specific ‘safety-deposit box’ or file of a customer. Hackers can then use their formidable skills and resources (e.g. botnets) to try to ‘break open the box’ or crack the encryption of that file. Now consider our system. We let them into the safe. A hacker can ask any block server for a block by specifying a 64 bit number. The problem they face is that they don’t know where to look. To rebuild a file without the encryption key that created the schedule, the hacker has to make 263 = 9 billion, billion guesses at a hundred different servers, decrypt each block and reassemble them in the correct order. This is like searching 100 haystacks to find a specific blade of grass – unlike a needle, the right block does not look any different than its neighbors. Such an attack would take longer than the creation of the universe.

With this technology it is finally possible to create the user-centric Internet of storage. It is possible to place all data into a distributed and homogeneous store of anonymous blocks with complete privacy for all participants. The protection of data will no longer require the machine-centric point of view of the past and users will comfortably store their data ‘on the net’ in complete confidence. It only makes sense.

3 comments:

cstuckless said...

Hi Tom. I saw the article in the Business Post that lead me here. Nice to catch up on what you've been up to since the MI days.

Your distributed storage mechanism is very interesting and looks very sound based on what I've read here. What's your approach to key management? If there's a chink in the armour or one thing that keeps you up at night, I imagine key management has to be near the top of the list?

All the best - may your current and future endevours be both rewarding and successful.

Colin

Tom Chalker said...

Hi Colin:

Thanks for the note. Your post motivated me to finally post a fresh blog. Thanks for the nudge.

In the meantime, we have thought long and hard about key escrow. What we have come up with is a hierarchical scheme of key storage areas that are based on physical tokens - actually special purpose USB drives that hold keys and executables.

We have a white paper on our site that describes this: http://www.datasentinel.com/papers/
Pixecur-10290-1-1%20Security%20Model.pdf
I'd love to hear your thoughts and criticisms on the ideas. Let me know what you think. Maybe I can turn our discussion into a blog as well.

Cheers,
Tom

Anonymous said...

Hey,

When ever I surf on web I come to this website[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url]You have really contiributed very good info here letsmakestoragework.blogspot.com. Do you pay attention towards your health?. Let me show you one truth. Research shows that about 90% of all United States grownups are either chubby or overweight[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url] Therefore if you're one of these citizens, you're not alone. In fact, most of us need to lose a few pounds once in a while to get sexy and perfect six pack abs. Now the question is how you are planning to have quick weight loss? Quick weight loss can be achived with little effort. You need to improve some of you daily habbits to achive weight loss in short span of time.

About me: I am author of [url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips]Quick weight loss tips[/url]. I am also health trainer who can help you lose weight quickly. If you do not want to go under difficult training program than you may also try [url=http://www.weightrapidloss.com/acai-berry-for-quick-weight-loss]Acai Berry[/url] or [url=http://www.weightrapidloss.com/colon-cleanse-for-weight-loss]Colon Cleansing[/url] for quick weight loss.