A file is used to contain data
The data may contain unstructured data - a file of bytes, or characters, etc
A file may contain structured data - a file of records, or objects, etc
The O/S may be aware of the file structure, or not
Linux treats
As far as Linux is concerned, a file is a sequence of bytes
Linux pays
Applications may require a particular structure to a file
Applications are responsible for managing file contents and structure
You can see the raw bytes in a file by
od file
(use od -x
for hexadecimal,
use od -c
for ascii)
e.g. od -c link.png
shows
0000000 211 P N G \r \n 032 \n \0 \0 \0 \r I H D R
0000020 \0 \0 002 326 \0 \0 \0 220 001 003 \0 \0 \0 026 @ 276
0000040 335 \0 \0 \0 001 s R G B \0 256 316 034 351 \0 \0
0000060 \0 006 P L T E \0 \0 \0 377 377 377 245 331 237 335
From http://www.pkware.com/documents/casestudies/APPNOTE.TXT:
ZIP Metadata
ZIP files are identified by metadata consisting of defined record types
containing the storage information necessary for maintaining the files
placed into a ZIP file. Each record type MUST be identified using a header
signature that identifies the record type. Signature values begin with the
two byte constant marker of 0x4b50, representing the characters "PK".
Overall .ZIP file format:
[local file header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[file data n]
[data descriptor n]
[archive decryption header]
[archive extra data record]
[central directory]
[zip64 end of central directory record]
[zip64 end of central directory locator]
[end of central directory record]
Local file header:
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
Linux
Applications .java
extension
The command file
will attempt to determine
file type by reading the first 100 or so bytes of a file
e.g.
prints "POSIX shell script, ASCII text executable, with very long lines", based on the first line "#!/bin/sh"
/boot/grub/i386-pc/modinfo.sh
e.g
prints "Composite Document File V2 Document, Little Endian, Os: Windows...Composite Document..."
based on the first octal bytes "320 317 021 340 241 261 032 341"
(use
/home/httpd/html/LinuxSound/Invoice.doc
od -c
to see these bytes)
The information about file types is kept in a
magic
file somewhere
rm file...
remove files
cp file1 file 2
copy files
mv file1 file2
rename file
ln file1 file2
link files (see later)
cat file
see contents of a file
less file
page through contents of a text file
Directories are files that "contain" other files
Directories can contain directories
Directories form a tree of files and directories
The top of the tree is '/' - the root directory
In Unix, it's not a tree but a "directed graph" (two or more names can point to the same file)
mkdir dir
make a directory
rmdir dir
remove an empty directory
rm -rf dir
forcibly remove a (non-empty) directory
mv dir1 dir2
rename a directory
/bin. /usr/bin
common commands
/sbin, /usr/sbin
sys admin commands
/etc
system and application configuration files
/tmp
temporary files for applications
/var/log, /var/lock, /var/spool, /var/mail
files that change size
/usr/local
applications, source. etc local
to you site
/dev
raw devices - disks, mouse, screen, etc
/mnt, /media
mount points for external media
/home
user's directories
/root
root user's directory
/boot
files to boot Linux
/lib
dynamic link libraries
A file system contains a set of directories and files
A file system is normally associated with a device (e.g. cdrom) or a partition of a disk
A file system is "mounted" into some part of the root directory - unlike Windows C:, D: etc
The command mount
by itself tells you all your
mounted file systems
/dev/sda2 on / type ext3 (rw,relatime,errors=remount-ro)
/dev/sda5 on /sda5 type ext3 (rw,relatime)
/dev/sda1 on /spare type ext3 (rw,relatime)
To mount a file system, create an empty directory e.g.
/mnt/disk
and execute e.g.
mount /dev/sda5 /mnt/disk
To unmount it
umount /mnt/disk
Unix keeps the mounted list current in
/etc/mtab
The file /etc/fstab
is normally created at
Linux installion time
It can be edited by root
You/install process put fixed assignments to mount points e.g.
# /dev/sda2
UUID=19535268-445a-4b74-b3d8-8c885395e1df / ext3 relatime,errors=remount-ro 0 1
# /dev/sda5
UUID=85504921-731f-442d-8c5c-0347c28ff744 /home ext3 relatime 0 2
# /dev/sda1
UUID=83b2cdb7-8f27-4947-86ed-ea47eebdbf0b /spare ext3 relatime 0 2
Linux now generally supports "hot plug" so when you plugin
a USB or insert a CD/DVD, Linux creates an /etc/mtab
entry, and Gnome/KDE will create a desktop icon for it
The command to mount an ISO image as a directory is a bit complex:
mount -o loop /home/httpd/html/boxhill/ict213/distros/ubuntu-11.10-server-amd64.iso /mnt
There are many types of file system, many specific to particular O/S's
Windows has FAT-12, FAT-16, FAT-32, VFAT, NTFS
Linux has ext2, ext3, ext4, Reiserfs, JFS, squashfs, swap, ...
CD's generally use the iso9660 file system
Network file systems include NFS and SMB
A file system controls the amount of information that can be kept about files (see http://en.wikipedia.org/wiki/Comparison_of_file_systems)
filename - length and allowable characters
time of creation, last modification, last access
maximum file size
file access permissions
supports links
supports journalling
The FAT12 and FAT16 file systems are the simplest of the M/S file systems, and FAT16 is still used on e.g. USB sticks
The layout on the disk is
Entries in the root directory point to elements of the FAT (file address table)
To each element of the FAT table is one block: e.g.index 5 of the FAT table corresponds to block 5 of data
The tricky bit: the
A file starting in block 4 and using blocks 6 and 2 would have FAT entries
Directories are files containing fixed sized entries corresponding to each file
The directories in the root are of fixed size
Other directories are kept in the data storage area and can be of any size
A directory entry for a file has
Only 128 files max in the root directory
Only 8+3 filenames (VFAT did a hack to get long file names)
Only a limited set of characters allowed in file names e.g. ':' disallowed
File system limited to 32M by use of 16 bit pointer and block sizes
No mechanism to avoid fragmentation
Later M/S file systems overcame these problems
Information about files is stored in inodes
Most Unix/Linux file systems have a fixed number of inodes just after the boot sector
The first inode in the list is for root '/'
(ls -a -i /
will show '.' as
2 under Linux)
Typically, inodes are given about 1% of the disk
Most files in Linux are small ~1k, so small files need quicker access than large files
Each entry in an inode's data table contains a set of pointers. These point to the first, second, ... blocks of the file. This allows direct indexing of a fixed number of blocks
If the file is bigger than that, then the next entries are to a single indirect block, to a double indirect block and to a triple indirect block.
A directory contains data blocks about each file
Compared to FAT16: no timestamp info, no pointer to data blocks - allows links (see later)
A Unix directory contains the file name and inode index for each file
Two directories could contain two file names, but the same inode
When two files "point" to the same inode, they are "linked" to the same file
Example: /usr/bin/unzip and /usr/bin/zipinfo are linked to the same file (with inode 400984 on my machine), so executing either command runs the same file
Create another link to an existing file by
ln existing-link new-link
Remove a link by
This removes the link, but not the file; the file is removed
when there are no links left
rm link-name
In Unix, hard links cannot be made across devices or between directories
A file can be created as a soft link, which just contains the name of a linked file
Create by
ln -s source destination
Often used so that a file can be installed under one name, but linked to a more common name for use by other programs
e.g. the latest version of the Bluetooth libary on my machine /lib/libbluetooth.so.3.0.2; all version 3 Bluetooth libraries are compatable, so an application will just use /lib/libbluetooth.so.3 and the .so.3 is a symbolic link to .so.3.0.2
Last week: "The file /usr/bin/x-session-manager is a symbolic link to /etc/alternatives/x-session-manager which is a symbolic link to /usr/bin/startsimple.sh"
Symbolic links can be made across file systems or to directories
ext2 - major F/S for years; supports volumes upto 32 terabytes and files upto 2 terabytes; directories can contain 32,000 subdirectories
ext3 - ext2 with journalling, so F/S corruption occurs less often
ext4 - adopted as major journalling F/S from kernel 2.6.28; can support volumes with sizes up to 1 exabyte and files with sizes up to 16 terabytes; directories can contain 64,000 subdirectories
ReiserFS - the first journalling F/S for Linux and is still used by Xandros and Linspire
Btrfs - butter fs may be the next kernel file system. A good article on it is at A short history of btrfs
squashfs - readonly F/S using gzip to compress it; used by LiveCDs for Debian, Ubuntu, Fedora, ...
unionfs - allows two separate file systems to be combined as one. If there are two identical paths in each system then one takes priority. This is used by live distros on a USB stick: one unchanging file system is a squashfs system; the other contains all the changed files, and the second system takes priority over the squashfs
Use parted, fdisk or other partition manager to create a partition of the right type
Run the command for the file system type you want: mkfs.ext2, mkfs.reiserfs, mkfs.bfs, mkfs.ext3, mkfs.minix, mkfs.vfat, mkfs.cramfs, mkfs.ext4, mkfs.msdos
e.g.
mkfs.ext3 /dev/sda2
An alternative is mkfs -t ext3 /dev/sda2
File systems can get corrupted, by crashes, power failures, software bugs, etc
Journalled file systems get corrupted less often
Operations such as de-fragmentation; waving a magnet near your disk; spilling beer on your laptop; etc can corrupt file systems
The command fsck
can be used by you, and is
sometimes invoked during boot the process - it can take
a long time :-(
There are many types of filesystem, each with special properties
Access to each type of filesystem is controlled by a driver for that filesystem
At the user level, files on all types of filesystem
are manipulated by the same commands (ls
, etc)
Linux does not pay attention to filename extensions, but particular applications may do