Structure

Directories impose a structure onto the file system, so that it is not just a ``collection of files''. This is needed because a non-trivial system will have lots of files. A directory maintains information about other files in the file system. For example, the PC server on drive D: has 5800 files on it.

The file system may be flat with just one directory. CP/M and early MSDOS had this structure.

A more common system is a hierarchical tree of directories and files, such as modern MSDOS.

A variation on this is a directed acyclic graph (like a tree but a node can have more than one parent). This is used by Unix, where one file an be ``linked'' to another so that the one file has two or more file names.

Directory operations

Directories have an internal structure, unlike a general file that is just a sequence of bytes. It is more likely to be organised as a sequence of records of some kind. The allowable operations must take this into account.

create
delete
opendir
closedir
readdir
rename
add
link (add a file to the dir)
unlink (remove a file from the dir)

Adding files to a directory

When a directory is created it will be empty except for its parent and maybe itself. When a file is created (eg by creat in Unix) an entry is made for this file in the directory.

In MSDOS, the entry is a 32-byte record. This holds the filename, the extension, the attributes, the date of last modification and the first block and the file size. All the principal information about the file is held in this directory

In Unix, all that kind of information is held in the inode. The directory just holds the name of the file and the inode number.

The inode method allows multiple links to the same file, as the directories just hold the different file names and the same inode number. A ``link count'' within the inode keeps track of how many filenames a file has. When this count drops down to zero, the file is removed.

Locating a file

When a file needs to be accessed, say when it is opened for reading, its blocks on disk have to be found. Say the file is /usr/jan/file1. In Unix, the root directory is located at a fixed place on the disk. This directory is then read for the entry ``usr''. When it is found, its inode is known. From the inode, the location on disk of the directory is found. This is then read looking for the entry for ``jan''. When it is found its inode is given. From there the directory on disk can be found, and read for the file ``file1''. When this is found, its inode is known and the file blocks are finally located.

Unix API

Some of the Unix API for directory operations is #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int mkdir(char *path, mode_t mode); int creat(char *path, mode_t mode); int unlink(char *path); int rmdir(char *path); Example: This creates a directory, puts a file into it and then removes it. #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(void) { int fd; if (mkdir("mydir", 0777) == -1) exit(1); if ((fd = creat("mydir/f1", 0777)) == -1) exit(2); close(fd); unlink("mydir/f1"); exit(0); } The Unix API for reading a directory is #include <dirent.h> DIR *opendir(char *dirname); struct dirent *readdir(DIR *dirp); int closedir(DIR *dirp); The data structure DIR is opaque, and you are not meant to look into it as an applications programmer. The data structure dirent may contain many fields, but the POSIX standard only documents one: struct dirent { char *d_name; } A program to scan the current directory, printing its contents is

File protection

Users of files need some kind of protection mechanism for their files. Files may need to be protected for the following operations:

read
write (or rewrite)
execute
append
delete

The general mechanism for talking about this is protection domains. A process can run in a protection domain, and this then has access to all the objects (files, printers, memory, etc) in that protection domain.

Objects belong to one or more protection domains, and in each domain they give access rights.

Example

When the MSDOS ``command.com'' runs, it does so in a domain in which most files grant delete access.

The files IO.SYS and MSDOS.SYS do not have delete access in this domain, and so they cannot be deleted by the ``del'' command (which is part of command.com).

Similarly, the printer device PRN does not grant delete access in this domain, and it cannot be deleted. On the other hand, it does grant write access, and so you can copy a file to PRN to send it to the printer. This looks like:

  MYFILE.TXT: write, read, delete, ...
  PRN: write

Example

Unix protection domains are defined by who you are and what group you belong to i.e. your ``uid'' and ``gid''.

Given a particular pair, a list can be made of all files that allow operations to be carried out on them. For example, all files can be read that have the `r' bit set for that uid, or the `g' bit set for that group, and the `r' bit set for other.

Every process that runs with that value can access for reading that same set of files.

Example

Files that are readable by everyone belong to every protection domain for reading.

Example

Not all processes that you create have the same values. Unix has a ``setuid'' mode for processes. If a command with the `s' bit set is run, your uid changes to that of the command's owner while it runs.

For example, the memory /dev/kmem is not readable by you, because it contains all running processes (including those of others). The command ``ps'' (process status) needs access to this though. So it runs setuid to root, and root can read this file.

Protection matrix

If you plot the set of protection domains in one direction, the set of objects in another, and draw up the table with entries being the access granted, you have the protection matrix.

which says that file 1 can be read by domain 1, read or written by domain 2, etc.

There are storage problems, which would require use of a sparse matrix. There are update problems: every time an object is created or destroyed the table must be modified. The table itself must be a protected object.

Access control lists

If the table is stored by columns, then it may be attached to each object, so file 1 has column 1, etc. To determine access to an object, you first have to look at the list attached to the object, to see if the domain is in the list. This is called an access control list (ACL).

file 1: (domain 1, r), (domain 2, rw)
file 2: (domain 2, rwx)
file 3: (domain 2, r), (domain 3, wx)

The advantage of this system is that it is the object itself that grants access to a process, based on what it knows about the process. This system is used by Windows NT.

A disadvantage is that the size of each column may be large and also may change. If it has to be kept in the file system, how is it done? I don't know how NT does it.

Unix solves this by simplifying the model: store the uid, gid and file permission mode in the inode, and then use a simple algorithm to grant or deny permission. This is not as flexible as the full system (eg you cannot grant access to all of a group except one).

For example,

  file 1: rwx for uid 28, rw for gid 20

would allow ``rwx'' for those in domain , but only ``rw'' for those in domain .

Capability lists

In a distributed system such as a WAN, it may be impossible to keep a complete list of protection domains anywhere. An ACL cannot be used in this case. Instead, the table is stored by rows, where each row is called a capability list.

domain 1: (file 1, r)
domain 2: (file 1, rw), (file 2, rwx), (file 3, r)
domain 3: (file 3, wx)

Attached to each object is a token of some kind called a capability. The O/S enforces the policy that unless a process has the capability for an object, it cannot access it.

When an object is created, the capability is given to the creating process i.e. the capability is placed in the process' capability list. From now on the process (or anything else in its protection domain) can access the object.

A process can give a capability to another process. This then places it on the second processes' capability list, so that it can also access the object. This way access to an object can spread throughout the system without the object having to do anything.

Security

All of these systems are a waste of time if there is no way of enforcing security access.

Supervisor mode

User processes run in their own address space, and have access to certain services. However, they should not be able to read or write system tables, boot sectors, etc. Something has to be able to do so, though, otherwise these tables would be read-only.

For example, there may be a table of processes currently executing in the system. For a new process to start, this must be writable. It must not be writable by everybody, or I could remove all your processes and vice versa.

Certain ``kernel'' tasks must be performed in ``supervisor'' mode. In this mode there may be need to access more than the user space. There may need to be hardware support for this such as the supervisor mode of the 68000.

Passwords

Access to your own protection domain is usually through a password mechanism. In Unix, a process called ``getty'' waits for you to attempt to logon. When you do, it reads your password, encrypts it and compares it to the entry in the password file. If it matches, it creates a new process with st to your domain and runs a login shell in that process.

Passwords should not be able to be decrypted. The Unix algorithm, for example, is one-way: it allows encryption only. To crack this system you have to encrypt all possibilities and test them against the actual encrypted passwords. Passwords should never be stored in plain text anywhere in the system.

Users have to co-operate with a password system. They should choose passwords that cannot be guessed. Guessable passwords include: your name; your partner's name; your dog's name; your birthday; your address; etc, etc.

The Internet worm tried all variations on these, plus words in the system-wide dictionary. It cracked over 20% of passwords this way.

Encryption

If data is important, it may need to be encrypted, especially if it is to be sent over a network. The DES encryption algorithm is common in the US, although it is illegal to export it. It is also suspected that the American Security Agencies are able to crack it. There are a number of ``public key'' encryption algorithms which are very secure. The PGP (Pretty Good Privacy) algorithm has a version that is legal to use in Australia. The comand is accessible on our system as ``pgp''. It allows encryption and digital signatures.

More on DOS, Windows 3.1 and Windows95

The underlying way of making system calls to MSDOS is via interrupt 21H. Each MSDOS system call is numbered, and expects certain parameters in registers. So a system call is made by loading the registers and calling interrupt 21H. MSDOS examines the registers and takes action accordingly.

MSDOS runs in so-called ``real mode''. This is the only mode for an 8086, but is only one of the possible modes for the 80386 or later. In real mode, addresses are 20 bit addresses, using a segmented memory architecture (see later lectures on memory). This gives an address space of 1 Mbyte only. Within this address space there is no memory protection. All of the 1M is accessible to any application. At the bottom of this memory space is MSDOS itself, so a mis-behaved application can trash the O/S code.

Windows 3.1 on a 386 or later, Windows NT, Windows 95 run in the 386 ``protected mode''. This has 32-bit addressing and memory protection mechanisms.

MSDOS, Windows 3.1 and Windows 95 all use the MSDOS file system. This has the FAT table to point to succesive blocks of a file, and directory entries which contain the 8+3 filename and information such as access mode, last modification time and file size.

Windows 3.1

Windows 3.1 relies on INT 21H calls to access the filesystem. That is, any Windows 3.1 program that wishes to, say, read or write to a file must load registers and call INT 21H. The mechanism may be hidden as a C function call perhaps, but the C function call will do this stuff.

Execution of INT 21H causes a general protection fault, which is caught by Windows 3.1. Since the MSDOS call runs in 16-bit real mode and the application that generated the interrupt is running in 32-bit protected mode, the role of Windows 3.1 is to switch modes, proceed with the system call and then switch back to protected mode. Thus every filesystem access is slowed down by this need to switch processor modes twice.

Windows 95 Windows 95 uses the same filesystem by default. This is to cope with ``legacy'' applications such as Norton Utilities that assume this filesystem. There is new 32-bit code to handle this filesystem though.
The new 32-bit code is designed to handle multi-tasking access to the filesystem. It is ``re-entrant'' so that it can be interrupted and called again. It has a shorter ``critical section'' so that access by processes to the disk is not locked out for so long.
A standard C function call interface to this code is available through the Win 32 API. It can also be accessed by INT 21H, for Windows 3.1 and MSDOS applications to run unchanged. When an INT 21H occurs, there is no call to the old MSDOS file handling code - this is not in Windows 95. Instead a call to the new code is made. No processor mode switch to real mode is made - unless the only device driver available is an old Windows 3.1 driver that runs in real mode.
Windows 95 has removed one of the problems of the MSDOS filesystem which is the 8+3 filename restriction. Filenames can be upto 255 characters in length, with a wider set of characters allowed. This is done without serious change to the MSDOS filesystem. Just as before, the directory holds the filename (unlike Unix). In a directory entry is the old ``short'' 8+3 filename, plus file attributes: read_only, archived, etc.
The file attributes part is the key (sorry, kludge) to the new ``long'' names. Certain attributes are ``illegal'' and cannot occur. Applications that check and maintain the filesystem (such as `chkdisk` look for such illegal combinations and fix illegal directory entries. There are however, some combinations that are not only illegal but also cannot be fixed because they ban modifications! All of the file system utilities tested could not ``fix'' such illegal combinations.
A long filename is stored as a regular directory entry with a short filename constructed by some algorithm from the long name. Following this are a series of extra directory entries which hold the long filename, all with one of the illegal attributes, which now means ``component of long filename''.
There are a number of checks to ensure that the short name remains consistent with the long name (e.g. using `rename` on a floppy disk copy). To guard against untested utilities not realising the new system and altering ``illegal'' directory attributes, a new MSDOS system call is invented to protect direct filesystem access.
This page is http://pandonia.canberra.edu.au/OS/l6_2.html, copyright Jan Newmarch.
It is maintained by Jan Newmarch.
email: jan@ise.canberra.edu.au
Web: http://pandonia.canberra.edu.au/
Last modified: 28 August, 1995