Tuesday, April 24, 2007

Just what is a file system, anyway?

What is a file system? Look at the C: drive on your PC, or the F: drive that shows up when you're in the office. In the simplest terms they are both just a database managing a bunch of blocks that reside on something that has persistent magnetic, electrostatic (USB drive) or optical properties (CD-ROM, DVD). If you peel back the covers on the 'device driver' that controls your hard-drive, you see that it is just an optimized database that maps filenames to specific blocks on the drive. Some of those filenames represent directories, so the contents of those directory blocks are just another layer of this database; more filenames pointing to other blocks of data. The end result is a heirarchical file system that abstracts variable-length files into nested directories by hiding that database implementation.

Now take another step back and look at your mapped drive on F:. What is that? Another database. A little different, perhaps. It's a mapping of your username, password and group memberships with a smattering of security settings to a specific directory on a certain hard-drive housed in a file server machine in your office. It looks like a part of the directory system on a hard-drive because it is just that. The only difference is that the hard-drive is in a different computer and there is a conversation between your computer and the file server to deliver the files when you want them. Again, the principles of abstraction insist that the details of this database are hidden to simplify the maintenance of this remote file system.

Lets take one more step. Let's look at a corporate-wide file system or perhaps the storage system of an on-line provider. How are they built? They're too large to be formed by simple clusters of Windows or Linux file servers. They're made from an alphabet soup of SAN, DAS and NAS and they run a complex application called File Virtualization. What's that? Well, it's another database. This one has the much richer content of implementation detail for each of the component storage pieces. Things like capacity, speed, physical location, maintenance history. Like the layers below, it will map the credentials of a user to entry points in the storage devices where the user's files will go. It's smart enough to shift user files around to ensure all of the storage devices are used efficiently, manage disk-to-disk and disk-to-tape backup procedures and presumably it can react when a sub-system fails.

This all makes good sense. What we have here is a set of layers of storage that have evolved over time to compensate for the weaknesses of the earlier layers. A single hard-drive is not big enough to meet the needs of an office. A file server does not have enough of its own drives to provide space for a large corporation. It is too much work for an IT staff to make sure that all the storage devices are used at capacity and are properly protected.

But there's a complexity problem here. Each of the layers is progressively more complicated. Each layer's 'database' is a single point of failure that must be protected through strategies of multiple redundant copies. A breakdown in any point in this chain may lead to an interruption of service while a particular database is restored from a backup. Highly qualified IT professionals who understand the complicated software must be on hand to monitor these processes and deal with conflicting circumstances that might pollute one of these databases as they are re-integrated with the real-time data. In short, it is expensive and it is fragile.

What about a fresh look at file storage? By changing the fundamental architecture of file data storage it is possible to replace the complexity of all those layers with one simple layer. This permits a massive file system that supports unlimited numbers of private collections of files across any number of cheap server appliances and does not require any databases at all. Would you as a consumer of storage space be interested in a file system that intrinsically self-balances, grows organically as need permits and does not require a backup procedure to protect data?

Impossible? No it's not. Keep reading and you'll see how one simple algorithm can make all file storage simple. It's all about thinking outside the box.

0 comments: