Unix low-level file operations

Contents

File access permissions
Unix API
creat
open
read
write
Example
File status
Use of these functions

File access permissions

Unix files have 10 bits set premissions
as in
      -rwxr-x---
(read, write, execute for owner; read, execute for group; none for others).

Modes can be added or subtracted using the user-level chmod command eg

chmod ug+rx file
chmod o-rwx file
Modes may also be set numerically using a 3-digit octal number. The first digit sets user permissions, the second sets group permissions and the last sets other permissions. Read has value 4, write has value 2, execute has value 1, and the bits are or-ed together to give a single octal digit. For example
chmod 751 file
gives r,w,x permission to user; r,x to group; x to other.

Unix API

The stdio library has a high-level interface to the file operations, with printf, putchar, etc. These are built from a lower-level interface. Typical calls from this are
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int creat(char *path,
          mode_t mode);
int unlink(char *path);
int rename(char *old, char *new);
int open(char *path,
         int oflag,
         ...);
int close(int fildes);
int read(int fildes,
         char *buf,
         unsigned nbyte);
int write(int fildes,
          char *buf,
          unsigned nbyte);
off_t lseek(int filedes, off_t offset, int whence);
int link(char *path1, char *path2);
int chmod(char *path,
	  mode_t mode);
int stat(char *path,
	  struct stat *buf);

creat

int creat(char *path,
          mode_t mode);
creat takes the pathname of a file and creates a new file, removing the contents if it already existed. The mode sets the access permissions. This is a bit-wise OR of the file permissions:
S_IRUSR       0400
S_IWUSR       0200
S_IXUSR       0100
S_IRGRP       040
S_IWGRP       020
S_IXGRP       010
S_IROTH       04
S_IWOTH       02
S_IXOTH       01
This follows the rwx for user, group and other that is shown by ``ls -l''.

open

The open call opens the file in the mode given by oflag, and returns a file descriptor. This is an integer which the O/S uses to index into a per process file descriptor table, which is used to keep info about open files. Zero is always stdin, 1 is stdout, 2 is stderr.

oflag is a bit-wise OR of a number of constants

O_RDONLY
O_WRONLY
O_CREAT
O_TRUNC
When O_CREAT is one of these flags, an additional argument to open is the file permission mode.

read

int read(int fildes,
         char *buf,
         unsigned nbyte);
The read call is a block read of a given number of bytes. This could be tuned to the physical characteristics of the device read from, for example. For example, it may be much faster to do a block read of 1024 bytes from a hard disk than to do 1024 reads of 1 byte from the disk.

The value returned is the actual number of bytes read. This may be less than the number asked for. For example

``read'' returns zero on EOF, and -1 on error. A while loop to read using this call should always test the return value for > 0:
while ((nread = read(...)) > 0)
  ...
if (nread == 0) 
  /* EOF code */
else
  /* error code */

write

int write(int fildes,
          char *buf,
          unsigned nbyte);
The write call is also a block write of a number of bytes. It returns -1 if a write error occurs (such as no space left).

Example

This is a simple version of cp, using the low-level I/O.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define SIZE 1024
#define MODE (S_IRUSR | S_IWUSR | \
              S_IRGRP | S_IROTH)

int main(int argc,
         char *argv[])
{
  int src, dst;
  int in_count, out_count;
  char buf[SIZE];
  int nread;

  if (argc != 3) {
    fprintf(stderr, "Usage: cp f1 f2\n");
    exit(1);
  }

  if ((src = open(argv[1], 
            O_RDONLY)) == -1) {
    fprintf(stderr, "Cant open %s\n", 
                  argv[1]);
    exit(2);
  }

  if ((dst = creat(argv[2], 
            MODE)) == -1) {
    fprintf(stderr, "cant create %s\n", 
              argv[2]);
    exit(3);
  }

  while ((nread =
          read(src, buf, SIZE))
                    > 0) {
    if (write(dst, buf, nread)
                      == -1) {
      fprintf(stderr, "cant write\n");
      exit(4);
    }
  }
  exit(0);
}


File status

The function stat() is used to get file status
#include <sys/types.h>
#include <sys/stat.h>

int stat(char *path, struct stat *buf)
where
struct stat {
  mode_t st_mode;  // permissions mode
  uid_t st_uid;    // user id
  gid_t st_gid;    // group id
  time_t st_atime; // time of last access
  time_t mtime;    // last modification time
  time_t st_ctime; // last file status change
  ...
A set of macros and bit constants can be used on st_mode:
#include <sys/stat.h>
#include <stdio.h>
#include <unistd.h>
#include <limits.h>
#include <iostream.h>

int main(int argc, char *argv[])
{
  struct stat buff;

  char filename[NAME_MAX + 1];

  printf("Enter file name: ");
  if (fgets(filename, NAME_MAX, stdin== NULL)) {
    fprintf(stderr, "error getting filename\n");
    exit(1);

  if (stat(filename, &buff) == -1) {
    perror(filename);
    exit(1);
  } else {
    printf("File owner: %d\n",buff.st_uid);
  }
  exit(0);
}


Use of these functions

When should you use these low-level functions? Only when you have to. In general use the high-level functions. There will be many times when you need finer control.

Note that these functions are specific to the Unix API, and are not in Microsoft C, for example. So code written using these functions will only be portable across Unix, not to other Operating Systems.

create, open

These two functions control what happens when a file has to be created. In addition

File System Internals

Contents

Allocation
Contiguous
Linked list
Index
Inodes

Structure

Files may be organised as a MSDOS and Unix both treat files as byte sequences and impose no internal structure on them. Individual applications may require an internal structure, but not the O/S.

Many old systems that used to read 80 column cards use an 80 character record structure (eg the Burroughs A9)

Some mainframe systems use a tree organisation for files, where each node is indexed by a key.

Types

The types of file under Unix may be Regular files are binary or Ascii files, and contain compiled programs, data files, source code, text, etc. These are the ``ordinary'' files.

Directory files are used by the O/S to maintain the hierarchical file system.

Character special files are devices at a fixed location, with volatile contents. Reads from such files are actually reads from the device. No random access is available on such files. Examples are keyboard, mouse etc. In Unix, these are /dev/kbd, /dev/mouse, /dev/audio, etc

Block special files are also devices, but which allow random access. Examples are SCSI disks /dev/sd0, etc, memory /dev/kmem and floppy disks /dev/fd0

Access

Access to files may be sequential, random or both. It depends on the file type and support from the O/S. Most regular files allow both.

Allocation

Files can be laid out on the disk in a variety of ways

Contiguous

In a contiguous layout, the entire file is laid out as one sequence of disk blocks. It is easy to keep track of the file, because you only need its starting block and size. It means that space must be pre-allocated for the file before writing it can take place. contiguous

Linked list

In a linked list layout, the file is divided into blocks and part of each block is a pointer to the next block. Keeping track of each file is also easy, because you only need the start of each list. Sequential access is easy. Random access is hard. Because space is used by the pointer, reads should not be in powers of 2.

Index

In an index method, a table is maintained of each block of the disk. Entries in this table for each index are the value of the next index for a file. Thus a file starting in block 4 and using blocks 6 and 2 would have table entries

index

link

Then you can store the start of the file somewhere else. This makes it easier to find where a file is, because the index table can be stored in fast memory instead of on disk, and actual data can be full blocks.

MSDOS uses this system. The root directory is at a fixed location on the disk, and is contiguous of fixed size. (This is why you cannot have an unlimited number of files in the root directory in MSDOS.) Entries in this for the start of each file in the root directory point to the FAT (File Allocation Table) which is the index table for the rest of the disk.

When a new file is created or extended, new blocks can be found by looking through the FAT for unused entries.

Inodes

The last method uses a fixed size table of inodes . Each entry in the table contains a set of pointers. These point to the first, second, ... blocks of the file. This allows direct indexing of a fixed number of blocks. If the file is bigger than that, then the next entries are to a single indirect block, to a double indirect block and to a triple indirect block.

inodes

inodes

The advantages are that there is quick access to the first blocks, and potentially huge file sizes. This method is used by Unix.

The inode is used to store all information about a file, and there is one inode per file. This information includes: owner, time last modified, time last accessed, size, file permisssions, number of links, etc.

For example, Minix is a simple version of Unix, and Linux has the option to use a Minix file system. A Minix inode is defined by

struct minix_inode {
        unsigned short i_mode;
        unsigned short i_uid;
        unsigned long i_size;
        unsigned long i_time;
        unsigned char i_gid;
        unsigned char i_nlinks;
        unsigned short i_zone[9];
};
This has 7 direct blocks and 2 levels of indirect block.

When a new file is created (by creat), it creates a new inode entry in the inode table.

Free list

When a file grows, it will fill up any blocks that it currently has, and then will have to request a new block. Unused or spare blocks will need to be maintained using a free list. For example, in MSDOS the free list is just those entries in the FAT table that are unused. Unix keeps a separate free list.

One method of keeping the free list is as a linked list. Traversal of this may be slow, and it takes up space. Another method (used for small computers) is a bit vector. In this, the first block is represented by bit 1, the second by bit 2, etc. A block is free if its corresponding bit is zero, not free otherwise. Special hardware instructions may support quick perusal of this vector.

The simplest way of allocating a block from this list is FCFS (first come first served), but other algorithms are possible. For example, since directories are accessed very frequently, placing blocks for them close to the middle of the disk may speed up access to them.

Disk caching

Files are located at blocks, and read by blocks. A ``block'' will have its size determined by hardware and software characteristics. For example, it may be related to the sector size of the disk. Since a file uses at least one block, small block sizes are desirable, but since they have to be read from and written to disk, large sizes are desirable. For example, most Unix systems now use 1kb blocks.

It is becoming increasingly common to reduce the amount of disk access required by caching. In this a copy of something is kept in fast memory rather than on disk. For example, a block may be read into memory. If another read is required from the block, then instead of re-reading it from disk, the read is taken from a copy of the block in RAM.

The high level I/O functions are often implemented using a cache or buffer. When a read is first requested, say by getchar, a block is read into an input cache for that file, Successive getchar's are from this buffer until it becomes empty, at which point another block-read is done. This block read is tuned to the disk.

Caching may be of read-only data, such as compiled program code. The cache size should be big enough that a substantial amount of execution is done using just the code in the cache.

When data is changed, caching becomes more interesting. Suppose I am modifying a file, and I make a change. To reduce disk I/O this should be cached locally, and not written until a lot can be written. Suppose the system crashes or someone-else wants to make use of the file in the meantime. The caches and disk will be inconsistent. The simplest solution is a write through cache, so that writes are reflected immediately on the disk. Of course, this loses some of the potential savings... Home Systems Software Home


Jan Newmarch (http://jan.newmarch.name) <jan@newmarch.name>

Copyright © Jan Newmarch
Last modified: Wed Nov 19 17:57:59 EST 1997