Tuesday, April 24, 2007

Just what is a file system, anyway?

What is a file system? Look at the C: drive on your PC, or the F: drive that shows up when you're in the office. In the simplest terms they are both just a database managing a bunch of blocks that reside on something that has persistent magnetic, electrostatic (USB drive) or optical properties (CD-ROM, DVD). If you peel back the covers on the 'device driver' that controls your hard-drive, you see that it is just an optimized database that maps filenames to specific blocks on the drive. Some of those filenames represent directories, so the contents of those directory blocks are just another layer of this database; more filenames pointing to other blocks of data. The end result is a heirarchical file system that abstracts variable-length files into nested directories by hiding that database implementation.

Now take another step back and look at your mapped drive on F:. What is that? Another database. A little different, perhaps. It's a mapping of your username, password and group memberships with a smattering of security settings to a specific directory on a certain hard-drive housed in a file server machine in your office. It looks like a part of the directory system on a hard-drive because it is just that. The only difference is that the hard-drive is in a different computer and there is a conversation between your computer and the file server to deliver the files when you want them. Again, the principles of abstraction insist that the details of this database are hidden to simplify the maintenance of this remote file system.

Lets take one more step. Let's look at a corporate-wide file system or perhaps the storage system of an on-line provider. How are they built? They're too large to be formed by simple clusters of Windows or Linux file servers. They're made from an alphabet soup of SAN, DAS and NAS and they run a complex application called File Virtualization. What's that? Well, it's another database. This one has the much richer content of implementation detail for each of the component storage pieces. Things like capacity, speed, physical location, maintenance history. Like the layers below, it will map the credentials of a user to entry points in the storage devices where the user's files will go. It's smart enough to shift user files around to ensure all of the storage devices are used efficiently, manage disk-to-disk and disk-to-tape backup procedures and presumably it can react when a sub-system fails.

This all makes good sense. What we have here is a set of layers of storage that have evolved over time to compensate for the weaknesses of the earlier layers. A single hard-drive is not big enough to meet the needs of an office. A file server does not have enough of its own drives to provide space for a large corporation. It is too much work for an IT staff to make sure that all the storage devices are used at capacity and are properly protected.

But there's a complexity problem here. Each of the layers is progressively more complicated. Each layer's 'database' is a single point of failure that must be protected through strategies of multiple redundant copies. A breakdown in any point in this chain may lead to an interruption of service while a particular database is restored from a backup. Highly qualified IT professionals who understand the complicated software must be on hand to monitor these processes and deal with conflicting circumstances that might pollute one of these databases as they are re-integrated with the real-time data. In short, it is expensive and it is fragile.

What about a fresh look at file storage? By changing the fundamental architecture of file data storage it is possible to replace the complexity of all those layers with one simple layer. This permits a massive file system that supports unlimited numbers of private collections of files across any number of cheap server appliances and does not require any databases at all. Would you as a consumer of storage space be interested in a file system that intrinsically self-balances, grows organically as need permits and does not require a backup procedure to protect data?

Impossible? No it's not. Keep reading and you'll see how one simple algorithm can make all file storage simple. It's all about thinking outside the box.

Sunday, April 22, 2007

Mental Models

How many discrete objects are in your home? Try to estimate the pieces of furniture, appliances, entertainment devices, books, DVDs, ornaments. Then drill down even deeper. How many dishes, pictures on walls, pens in drawers, toothbrushes, paper documents and sewing needles can you add to that number. I bet there are tens of thousands of distinct things in your home that you could put your hands on within 60 seconds of searching.

Now turn to your computer. How many separate files on it? If you're a developer with ten years of history, it might number in the high end of the thousands. The average user will have less. How long would it take you (or them) to find a specific file? Even with Vista's improved searching could take a tens of minutes. In fact, it might prove impossible to find that file at all.

Why is that? It's because your brain has been wired through eons of evolution to work in three dimensional space. You remember things in a 3D context, and you learn the geometry of your own home because you spend so much time navigating through it. In contrast, a computer directory is a two dimensional hierarchy of words. Its completely alien to your evolutionary past and to work with it you have to develop new skills. Up until you saw your first files-view, you had never spoke that way or thought that way.

In Medieval times before paper was prevalent and when most people couldn't read, the scholars used a technique called Memory Theatre to remember lots of unrelated pieces of information. People would imagine themselves walking through a large cathedral and they would make associations between objects encountered in such a walk with what they wanted to remember. Later they would retrace the walk to recall all of the items. James Burke does a brilliant recounting of this in the "Matter of Fact" episode of his "The Day the Universe Changed" series.

Wouldn't it be cool be we could walk though our own home and find all of our computer files on the walls and in the drawers?

Wednesday, April 4, 2007

Wireless doesn't mean tiny

I was a big proponent of the J2ME environment from SUN for wireless devices. Seemed wise. Tailor the run-time environment to fit in the small memory spaces of cellphones and Personal Digital Assistants or PDAs (does anyone still call them that anymore?) Then you can write code that is sort of like the code you'd write for the workstations and laptops and it would work.

Well it did, within reason, but ultimately it was just another fork in the development of applications. You see technology relentlessly advances. Processors are still getting smaller, memories continue to get bigger. When you look a handheld device you have to ask what is it that is being constrained by that physical size and what will continue to be constrained in the near future.

Sure, you say, I know: power. The processor can only run so fast without draining the battery. OK, I agree. So why write applications that fit in small memory spaces? That's what J2ME is. A stripped-down version of the run-time that fits in a few MegaBytes. You can buy a 128 GByte USB stick for 10 dollars. Assume the new handhelds will have this space and skate to where the action will be.

The biggest problem with J2ME is that it tries to replace the underlying operating system. Here's a better idea. Put a real OS in there. Linux will fit in the memory space of those devices nicely because you can create a distribution that only has what's needed for that device without compromising the pieces that remain. That isn't possible with a monolithic operating system. Sure you can create a smaller version of a big OS for these devices but that's just another development fork just like J2ME was. Guess what. That's exactly what Apple did with their iPhone.

It will happen that way. The economics will dictate that Linux will ultimately run most of your handheld devices. The scary thing is that once you get used to these devices and the applications that run on them, you won't need the bloated operating systems that run the workstations. You will start using the those machines just as dumb terminals for the devices in your hands. You'll have figured out how to do your word processing on these gadgets and you'll think "if I just had a bigger screen to do this, I'd be OK. Easier that learning a new package...". There's a very smart man called Clayton Christenson who sees this happening. Check out his Podcast.

Shhh. Does anybody hear a dynasty crumbling?