NTTR Preliminary Workshop 1: Understanding File Systems

Rafael Alvarado


Contents

What happened to my file?
Computers, operating systems and file systems
The file system and its structure
Interacting with the file system in a GUI
Interacting with the file system from a command-line
Networks
Summary
Useful links


What happened to my file?

Have you ever wondered, after creating and saving a file on a computer with which you are not familiar, or after downloading a file from the web, what happened to your file?  Are you unfamiliar with the expressions client, server, ftp, telnet, and local area network?  If so, this tutorial is for you. The purpose is to teach you just enough about computer hardware to understand some of the general principles that govern the way software behaves.  With a grasp of these principles, you will be able to learn how to use programs more quickly and you will be able to move more freely between operating systems, such as Unix, Windows NT, and MacOS.  You will also feel less lost when working in a networked environment, such as the Internet.  Because of their being unfamiliar to many users, this tutorial and the corresponding workshop will focus on Unix and Windows NT.

This tutorial is designed to complement the actual workshop that will take place on Monday, May 25th. Here you will be introduced to general principles; there, you will learn how to apply the principles in the use specific pieces of software, such as the Unix command-line via Telnet, the Windows NT Explorer, and the Common Dialog Box found in Windows NT applications.

Please note that the word "Windows" is used herein to refer to Windows 95 and Windows NT 4.0, and not to Windows 3.1 and its variants.  The word "DOS" is used to refer to the command-line interface offered by Windows 95 and Windows NT 4.0 even though, technically speaking, the latter is  a completely different operating system than DOS per se.

Computers, operating systems and file systems

A computer is a machine for receiving, processing, and delivering information.  Information is received through an input device, such as a keyboard or image scanner, and is delivered through an output device, such as a monitor or a printer.  Of course what distinguishes a computer from a typewriter or a camera is the fact that it is also processes information between the phases of input and output.   For example, in using a word processor, characters are passed from the keyboard to the computer's processor.  The processor, according the rules specified by a program such as WordPerfect, connects that character data with all of the other information about your document that you entered from the keyboard or mouse, through clicking buttons, checking boxes, and filling in text boxes.   The sum of all of the commands and text that you have entered, and the processor has processed, are packaged as a file and stored in the computer in a storage device of some kind, such as a hard disk or a floppy disk. 

Figure 1

Never mind how all of this processing actually works; suffice it to say that it is quite complex.  All you need to know is that the information you entered from the keyboard and mouse, and the word processing program processed, is always stored in a file and that all files have a physical location on a disk somewhere.  Now it is one of the primary roles of the operating system to keep track of all of your files (in addition to many others with which you have no direct concern).   All computers have an operating system, which is a program that orchestrates the communication between input, output, storage, processing and other devices.  Among its main functions is to manage the file system, the system by which files are organized so that they can be created, destroyed, located, moved and copied by the user or the computer.

The file system and its structure

A file system is actually a pretty complicated beast.  Fortunately,  you never interact with it directly.  Instead, you access it and control it by means of the user interface.  From a user's perspective, the user interface is what distinguishes one computer platform from another.  For example, a Macintosh uses what is called a graphical user interface, or "GUI," which consists of icons for files and folders and allows the user to perform most functions by mouse.  A Unix machine, in contrast, employs a textual, or command-line interface, with which most functions are performed using short, cryptic commands from the keyboard.  As the slash indicates, DOS/Windows is a kind of synthesis -- or compromise -- between these two approaches, since it offers both a graphical and command-line interface to the file system.

Each type of interface has its advantages and is suited to various purposes.  A GUI is easier to use, because you don't need to memorize written commands to accomplish basic tasks.  If you want to move a file, you just move it with your mouse from one folder to another.   This approach allows computer non-experts to easily get up to speed in controlling the file system so that they can concentrate on their work, not the computer.  The great advantage of the command-line approach is that its commands are easily controlled by scripts, which are small programs that can run numerous and complex operating system commands automatically.   For example, to move a thousand files from various locations to a single new location would be a very tedious task with a mouse.  In Unix you can do this with a single line of code.  Such control is a great advantage to system administrators of computers that run email and web accounts for thousands of users.  In addition, a command-line system is much faster once the user memorizes its commands.

In spite of these differences, all of the three operating systems named above share a common structure for organizing files.  They differ only in the particular metaphor used to represent this structure to you, the user, at the level of the user interface.   That common structure is as follows.

At the smallest, lowest level, there is the file.  A file consists of bits, but is for all practical purposes irreducible; you never work with anything smaller than files, and everything you work with on the computer is a file.  There are important differences in the way each operating system handles files.  For example, Unix, Mac and Windows each have different rules for how files are named.  Because these differences are important to take into consideration when moving files from one type of computer to another, this topic is discussed in more detail below.

A quick note about filenames.  In a networked environment, which we will discuss below, computers running different operating systems exchange files amongst themselves.  Because different operating systems have different file naming conventions, it is important to pay attention to their similarities and differences.  For example, in Unix you are not allowed to use spaces in a filename, although you can in Windows and the Mac.  During the workshop, you will be introduced to a set of simple rules on how to name your files so that when you upload them to your Unix account, the Unix operating system can make sense of the filename.

Now files are stored in larger, higher level units, called directories in Unix and DOS and folders in Mac and Windows (95 and NT).  Directories and folders are in turn containable by any number of other directories and folders.  A directory within a directory is called a subdirectory, and by analogy we can call a folder within a folder as a subfolder.  The entire collection of folders and subfolders and directories and subdirectories are arranged in a single hierarchy, sometimes called a file tree.  In Unix and DOS,  the base or top of this hierarchy is a single, encompassing directory called the root directory. (To have kept with the metaphor, it ought to have been called a trunk.)

At the highest level, directories and folders are contained by units that correspond roughly to the physical storage devices, such as hard drives, in which the files and folders or directories are actually contained.  I say roughly because a hard drive can be divided into smaller parts, called partitions.  For all practical purposes, however, partitions appear to the user as separate devices, so here we'll just call them storage devices.  In a Mac system, all folders are contained within icons that represent the storage devices attached to the computer.   Like files and directories, each device is given an arbitrary name, such as "Bob," just like files are.  In DOS, the device must also be specified by a letter; this label is found at the beginning of the command-line, and is usually the letter "c" followed by a colon and a greater-than sign, like so: "C:>".  In Windows systems, this convention is carried over and appears, within the icon "My Computer," in parentheses next to the name given to the device.  In Unix, the highest level is something called the "filesystem," which may or may not appear prefixed to the command-line the way that the drive letter is in DOS.  Thus, there are three broad levels to the file system: the levels of the file, the folder or directory, and the device.  (The level of the device is a bit more complicated than I have let on, but this description will do for now.)

To summarize, although a directory system looks like a tree with branches, and the folder system looks like a bunch of folders within a bunch of folders, they both possess precisely the same branching, hierarchical structure.  The common structure of the file system on Mac, Unix and Windows machines is given below, in terms of the sometimes explicit arboreal metaphor, along with differences in vocabulary:
 
 

LEVEL Unix Mac DOS Windows
leaves file file file file
branches directory folder directory folder
trunk filesystem volume drive drive
 

Interacting with the file system in a GUI

Essentially there are three ways you interact with files in a GUI system such as the MacOS or Windows: (1) directly through the desktop;  (2) through a file manager of one kind or another; and (3) through what is called a "dialog box," which appears when you open a file or save a file for the first time within an application, such as a word processor.  (Unix also has a GUI -- a number of them, in fact -- but we will not discuss them here.  Suffice it to say that although they often look and feel like their Mac and Windows counterparts, they behave very differently.)

Working through the desktop is straightforward.  Folders have names just like files do and are represented by icons that depict little yellow file folders which, when clicked on, open up and reveal their contents.  You usually start with an icon representing the device that contains the file you want and you keep clicking on folders until you find the file.

Most GUIs come with some sort of file manager.  For example, the Mac allows you to view a window's contents as a hierarchical list -- an outline view -- of folders and files.   The idea of a file manager has always been central in Windows (in fact, I have borrowed the term from the Windows 3.1 application).  In Windows,  you have the option of using something called Explorer, which gives you an outline view of files and folders, but also of devices.  Explorer -- not to be confused with Internet Explorer! -- contains two large panes.  On the left is an outline view that lists devices, folders, files, and some other things like the desktop and recycle bin.  On the right is a view of the contents of whatever is selected on the left, which can be devices, folders, files, etc.  Using a file manager is a very effective way of navigating a file system and getting things done like creating directories and moving files around.  File managers also allow you to view your file system as a whole, a view you don't get when working with just folders or from the command-line.

A dialog box is the little window that pops up whenever you ask to open or save a file from within an application, such a word processor.  Most Windows applications use what is called the "Common Dialog Box," but many use a simple or fancy version.   Whether fancy or simple, all dialog boxes are variants on this theme.  If you encounter a new one, look for three things: where the files are listed, where the folders are listed, and where the devices are listed.

In the workshop, we will go over the specifics of how folders, the Explorer, and the dialog boxes look and work on Windows NT.  For now, just understand that each of these things are variants of each other; they are all file managers which expose to you the objects of the file system -- devices, folders, and files.

Interacting with the file system from a command-line

In a command-line system,  you interact with the file system with, well, commands. You find your location in the file system with commands and you create, delete, move and copy files and directories with commands.  In order to understand these commands you need first to understand the conventions for representing devices, directories, and files.  Like files, directories have names, but they are followed by a slash -- in Unix, a forward slash and in DOS a backslash.   For example, a user's directory on a Unix system may be:
 
/home/buddylove/stuff/

On a DOS system this same directory would look like:
 

D:\home\buddylove\stuff\

The final slash in both cases is optional, though it is good practice to include it.   (Note that the Mac also has a system for verbally representing folders so that programs can be written to perform file system operations; instead of slashes it uses a colon to separate folder names.)  In the above examples, the series of directory names that define a particular location in the hierarchy of folders or directories is called the file path.   (In DOS, the path of the root directory is indicated by a single backslash, and single forward slash in Unix.  Although it is called the root directory, the directory is not named; it is simply represented by a lone slash.)
 
It is important to realize that the root directory of a folder or directory hierarchy, which exists on Macs as well as DOS and Unix machines,  is not the same as the desktop.  Windows and the Mac have desktops that represent all available devices to a user.  On a Mac, storage devices are usually placed directly on the desktop and appear on the right side of the screen; in Windows, the icon "My Computer," which appears on the upper left, contains icons of all the devices available to the computer.  It would appear, then, that the desktop is some sort of super-encompassing folder that contains all devices, but this is not true.  In point of fact, the desktop is a program that displays the files and devices of an operating system in a certain way.  What gets confusing is the fact that one can move files onto the desktop itself, and even create folders there.  What gives?  The truth is that, by a kind of reflexive logic that is best not thought about too much, the desktop is also a directory within the device that contains the operating system.   In Windows 95, the desktop is located usually in the following path:
 

C:\windows\desktop

The Mac is structured the same way; the difference is that the creators of the Mac thought that it would be too confusing to allow users to see the desktop as a folder within a hard drive that is pictured on the desktop, so they hid it from direct view.  Instead, the desktop appears to the user as a button in the file dialog box when you open and save files.   We'll get to this below.

Again, in the workshop, we will go over the specifics of the above works.  You will learn the commands that allow to navigate the Unix file system from the command-line and how to create, rename, copy, move and delete directories and folders.  You will also be introduced to some other commands, such as how to change the permissions of a file or directory, so that others can view your files from the web.  This raises our next and final topic, networks.  Unix is what is called a multi-user system, which means that it is an operating system designed to work with a number of users who are connected to the computer by a network.  Windows NT is also a multi-user system.  And of course so is the Internet, in a sense.  Because networks are complicated, they are confusing.  The following is intended to provide you with just enough knowledge so that you at least answer the question with which we started: where is my file?

Networks

Looking back at our original, simplified diagram of a computer, we can think of a network as two or more computers connected by a wire or some other conduit for transmitting signals.

From a user's perspective, being connected to a network simply means that you have access to the resources of another computer, such as its storage device or its processor or its printer.  When connected to another computer, the computer you sit in front of is called the local computer and the computer you are connected to is called the remote computer.  The remote computer is also called the server or host and the local computer the client or guest.  These terms are complemented by a spatial metaphor; when files move from a client and to a server, we call it uploading; when they move in the other direction we call it downloading.

The remote computer is called a server because it provides services needed by the local, client computer.  Thus a remote computer with a printer used by the local computer is called a print server, and one that provides disk space is called a file server.  For our purposes, there are two major kinds of network access to consider: access to the file system only on the remote computer, and access to the processor and the file system.  When you are connected to a server and have access to its files, it just as if the remote hard driver is attached to your local computer.  When you are connected to a server and have access to its processor, as with a Telnet connection, its just as if your keyboard and monitor are directly attached to the server, and your local computer (processor, etc.) is not there.  For example, when you read email using Pine, you are using the remote Unix machine's processor as if it were your own.

Networked resources are always accessed by means of what are called client applications.  A client application can communicate with a remote server because they "speak" the same protocol, or convention of sending and receiving information over the network.  Thus, there is a client for each kind of network connection you can make.  If all of this sounds confusing, simply remember the following rule: to access services on a remote computer, you need to use a specific client application that can provide those servers.  We'll work with some specific clients in the workshop. 

Figure 2

You will work with a number of client applications during the NTTR seminar week.  To send and receive files between your local computer, such as a Mac or Windows machine, and a remote Unix machine, you will use what is called an FTP client that understands the FTP protocol.  To actually use the processor and file system on the remote Unix server, you will use a Telnet client that understands the Telnet protocol.  To read email, you will use an email client that understands the SMTP protocol.  To read a web page on a remote machine, you will use a web client (called a browser) that understands the HTTP protocol.  And so on.

It is worth noting that some client applications are built directly into the desktop of Windows.  The icon labeled "Network Neighborhood" opens into a client application that allows you to share files with other computers by means of the Novell and Microsoft network protocols.  Access to print servers is built into the printers folder within the "My Computer" icon.  And the new, controversial Windows 98 embeds the HTTP client in the user interface, so that, for example, HTTP (or web) servers appear as just another set of devices in the outline view of Explorer.

In all of this, the important to thing to realize is that the same principles discussed in the previous sections of this tutorial continue to apply: your file is somewhere, it is classified by the file system, and the file system is represented to you, the user, through the user interface.  There are only two differences.  First, the concept of the file system is expanded to include the concept of the machine; above the level of the device is the machine, or server, which will be represented to you accordingly. Second, the concept of the user interface is expanded here to include the notion of a client.  If your file is on a remote storage device, it will be represented to you as file in a folder or directory within the client application.  Thus, when you use an FTP client application, you will be presented with a dialog box similar to the ones you are already familiar with in using word processors and other desktop applications.

Summary

The concepts that you ought have some grasp of before the workshop are bolded in the text.  Together, they relate to how the file system of a computer is represented to you so that you, in a given context, can understand where your files are.  The main point to grasp is that the representation of the file system is consistent in both a graphic and command-line user interface, and in the various applications that run within these interfaces.  No matter where you are, you should be able to locate your position in the file system by looking for you position in the tree of devices, folders, and files.  In a networked environment, devices can include remote storage devices.

Useful links

Unix Computing - Quick Guide
Introduction to Unix Computing at Princeton
Client software for Windows 95
Client software for Macintosh