The Death of File Systems
by Jakob Nielsen, February 1996
Relax, oh Nerdy Reader: I am not going to take away your beloved
file-system APIs. Here I am talking about what the user
experiences, not how we provide that experience. The file system
has been a trusted part of most computers for many years, and will likely
continue as such in operating systems for many more. However, several
emerging trends in user interfaces indicate that the basic file-system
model is inadequate to fully satisfy the needs of new users, despite the
flexibility of the underlying code and data structures.
There is no need for users to know how their information is
stored inside the guts of the computer. Indeed, the notion of a
continuous file is itself an abstraction: It masks the fact that the
information is normally stored on noncontiguous sectors of the hard disk.
From a user perspective, current file systems are based on three
assumptions:
- Information is partitioned into coherent and disjunct
units, each of which is treated as a separate object (file). Users
typically manipulate information using a file and are restricted to be "in"
one file at a time.
- Information objects are classified according to a single
hierarchy: the subdirectory structure.
- Each information object is given a single, semiunique
name, which is fixed. This file name is the main way users
access information inside the object.
Window systems have made these assumptions less intolerable, but they still
exist. Modern computing, particularly the Internet, is further undermining
these assumptions in several ways.
Single Units
Before the Internet, printed output supported a canonical representation of
most information objects; the goal of computers was to deliver a
WYSIWYG identity mapping between content and presentation.
In modern user interfaces, information objects often have multiple
presentations and units are combined in multiple ways for
different users and tasks. For example, nomadic users might want to
retrieve e-mail on PDAs or even by voice synthesis over the telephone. To
allow this, the presentation must be significantly briefer than that for a
large workstation screen.
On the Internet, a typical Web page consists of a text file and one or
more image files that are not combined until the page is displayed by the
browser. Even the atomic component objects may not always map to individual
files. For example, an image may exist as both a GIF and a JPEG file; the
version the server ships to the browser is determined by content
negotiation.
Some ways to improve the Web user experience will complicate the issue
further: the GIF and JPEG image "files" may not exist as such in the file
system, but might be generated on demand from an underlying image
representation with parameters such as compression, lossyness, and
color-map depth determined dynamically by available bandwidth and other
considerations. For example, if you download a Web page in your office
using a direct Internet link, the pictures may arrive as large, beautiful,
24-bit color images. Should you download the same page at home using a slow
modem, the page will arrive with small, coarse, black-and-white images.
Off-Line Issues
Even on stand-alone PCs, the file model is falling apart. Consider, for
example, the task of installing a new application. A simple, file-based
user interface would have you drag the application icon from the place it
is stored to the place where it will be needed. These days, however, an
application is rarely satisfied with a single file: typically, installation
litters the system with numerous subsidiary files, preference and
configuration files, initialization files, and so on down the laundry list
-- all stored in obscure directories of little relevance to the average
user.
Using the file system as a user interface to install and copy applications
has caused users many painful hours. In response, vendors have provided
special installer and uninstaller utilities with their software. This led
to a profusion of installers and uninstallers -- and to inconsistency and
extra work for the user trying to keep track of all the additional
utilities.
Unit To Units
Not only does a single information unit often map to multiple files; it can
also contain multiple information units that should be treated
differently in the user interface. The e-mail inbox, for example,
should definitely be treated as a multiplicity of message objects. On a Web
page, it is sometimes useful for the HTML file to contain both the visible
information shown to the user and other information that is used for other
purposes. Our server, for example, has more than 20,000 Web pages, so we
decided to add metainformation to each file -- the e-mail address of the
person responsible for the file. For performance reasons, this information
is stored as a comment field in the file itself, even though it is a
separate information object. If I changed my email address because I moved
to a different domain within the company, the user interface should allow
me to update the e-mail address associated with all my Web pages with a
single operation.
Single Trees
File systems are structured as strict hierarchies of
directories and subdirectories. For users, however, the same
information unit often has multiple classifications. A
corporate logo might appear on several Web pages, and thus would belong to
several page objects across the server hierarchy. I often produce
presentations (implemented as a small set of files per presentation) with
slides that include screen shots, Web pages, or other designs I work on. A
specific image might thus be classified as part of the "AnswerBook"
development project as well as the "Singapore keynote" presentation
project. In either case, if I change one object, I want all occurrences of
the object to change. This should be true even if I changed the look --
expanding a slide to 400 percent in one case, reducing it in another: it's
still the same information even if presented with a different look.
Remember, WYSIWYG is tired; enriched representation is wired.
Hypertext links are the classic case of breaking up file hierarchies.
Indeed, the very name comes from the fact that they form an n-dimensional
hyperspace. Users are notoriously incapable of understanding large
hierarchies, which is why cross-references and other hypertext links are so
useful on the Web. Anybody who has tried to find something in the Yellow
Pages knows how difficult it is to navigate somebody else's classification
structure. If you want to buy a steak, do you look under B for butchers?
No. How about S for steaks? Not quite. Try M for meat, retail. Butcher's
supplies, however, are under B! (Your Yellow Pages directory may
use a different classification. Indeed, that there is no single
classification scheme in the real world is a further example of the
problem.)
File Names
Currently, files are represented in the user interface by their name and a
few additional attributes (mainly data types illustrated by icons). File
names are problematic user-interface primitives for several reasons. First,
users rarely generate good file names, even in systems
that allow long ones. In general, users don't like to type, have limited
creativity in thinking up good names, and are hit by what I call the
"premature classification problem": the name normally has to be generated
long before the content is created, and thus users may not fully understand
what they're naming. Second, users often have difficulty
recognizing a name and remembering what it stands for, especially
when many similar names are in use. Finally, when numerous information
objects are in use, users sometimes have to type a name rather than simply
recognizing it from a list. Not only is name-typing error prone, but users
often don't remember the exact name and directory path they need to
retrieve a certain information object.
In addition to these practical problems, named references are
fundamentally unsuitable for accessing information in a system
laden with it. Users often don't know exactly what they are looking for.
They have no way of peeking inside a file without opening it up and paying
the ensuing penalty in terms of performance and screen space. Hypertext
links are fundamentally content-driven and context-sensitive: if well
designed, they provide a preview of the content and get users to it without
revealing where or how it is stored. URLs, on the other hand, are poorly
designed file names.
The technology needed to create more flexible information interfaces will
certainly include object-storage mechanisms and compound-document
architectures, although exactly how they will be structured is not yet
clear. What is clear is that we should stop presenting computer and network
information storage as one icon per file and start visualizing the logical
structure of the information.