CEG 233: Linux and Windows 

Lab on Files, Directories, and Links

   

Table of Contents

  1. Educational Objectives
  2. What is a File?
  3. The Concept of an i-Node
  4. What is a Directory?
  5. File Volumes
  6. What is a Link?
  7. Where is What
  8. Interoperability
  9. Permissions and Ownership
  10. Utilities
  11. Lab Experiment
  12. Acknowledgements
  13. References

Educational Objectives

The objectives of this lab experiment are to make you :

  1. Understand the terms File, Directory, Link, Volume in the context of any OS.
  2. Distinguish the file names from file content.
  3. Understand how a persistent storage device holds the content and names.
  4. Aware of the location of important files.
  5. Use a few file and directory permissions tools.
  6. Learn file content comparison tools.
  7. Learn taking archival backups.

Files, Directories, Links, and Volumes

  1. Lab Experiment
  2. Acknowledgements
  3. References

Educational Objectives

The objectives of this article are to make you :


The content of this article applies to both Linux and Windows unless explicitly stated otherwise.

What is a File?

A file is an association of two things: a name and content. Generally, a file is stored on a persistent medium such as a hard disk (HDD), or USB flash-memory based mass storage device ("thumb drive"). The raw content is always a sequence of bytes. Many files have a structure imposed on this content. E.g., text files are expected to contain only the "printable characters" (ASCII A-Z, a-z, 0-9, punctuation, blank, tab, CR, LF). Program files, database files, etc. have even more rigidly defined expectations as to what the raw content should be.

The name of a file is typically of the form basename.ext. It is a good idea to have an extension that indicates the type of the file content. E.g., .txt for text, .mp3 for an audio file in MP3 format, etc. Typically, extensions are only three characters, but not always. E.g., HTML files are often named with .html extension. In Linux, the .ext extensions are not significant, and often files are named without extensions. In Windows also this is really the case, but many programs are "associated" with extensions, and if a file has no extension or has an improper extension, double-clicking on the file name will invoke the wrong program on it. http://en.wikipedia.org/wiki/ Filename_extension is worth a visit.

The basename is forbidden from having certain characters. E.g., Linux forbids the forward-slash, and Windows forbids the backward-slash. It is best to name files using a-b, A-B, 0-9 only, and avoid, even though not forbidden, spaces, punctuation, and other such unusual characters.

The Concept of an i-Node

An OS does not use symbolic file names for files internally. In this course, we use the concept of an i-node to explain this idea further in both Linux and Windows. An i-node is an OS internal data structure that captures all meta data, such as when was the file modified, how big it is. An i-node is an abstraction that deals with the location of content and a few other details. E.g., the i-node tells us where exactly on the HDD is the content located as in "this file's content is located in HDD block numbers 134, 1025, 289, and 903."

An i-node does not contain the file name. It does not contain the file content either. It contains information about the file.

There is a one-to-one correspondence between files and i-nodes. Conceptually, all the i-nodes are collected into an array. The indices of the elements of this i-node array are called i-numbers. We set aside the 0-th i-node as unused. So, all i-numbers of interest are greater than 0.

What is a Directory?

The primary purpose of a directory is to keep the association of name with the location of content. Conceptually, we can think of a directory as a table of two columns; the first column for names of files it has, the second for corresponding i-nodes.

Every file (including directories) has a parent directory that it belongs to. This will fail at the top-most level; so, we artificially make the top-most directory the parent of itself.

The term folder is a synonym for directory. Every directory is a file. So, when we wish to discuss a non-directory file we will use the term ordinary file.

File Volumes

A volume is the collection of files and their meta data organized by an OS as an internal data structure. This is stored on a partition of a HDD. Before a brand new HDD can be used, it needs to be divided into one or more partitions depending on the size of HDD, and a file volume needs to be initialized on each of the partitions. In both Linux and Windows, the partitioning programs are typically named with the word fdisk in it. In Linux the mkfs program, in Windows the format program initializes a partition with a volume. There are many designs for volumes; the currently prominent ones are: ext3, NTFS, FAT16. The details of the internal data structures of these designs are beyond the scope of our course. All three ext3, NTFS, FAT16 volumes are interoperable between Linux and Windows (with the installation of a file system module; see References below.). Because most BIOS can readily deal with FAT16 but not ext3 or NTFS, USB mass storage devices intended to be bootable devices should use FAT16 volumes.

What is a Link?

A link is a "pointer" to a file. In all OS, a file cannot exist without at least one link to it. We can establish multiple links to a given file. The multiple links can be in different directories. There are two kinds of links: hard and soft (also called symbolic). Note that link is an overloaded term; in non-file contexts it has other meanings.

We explore hard links first. Every file has a so-called hard link count of at least 1. Every new hard link we add to a file increases this link count by 1.

The ln utility, in Linux, is used to create either symbolic or hard links to files. Consider the following command:

ln abc.txt xyz.dat

It establishes the new name xyz.dat (which should not have existed prior to this) as a hard link to abc.txt (which should exist already). These two names refer to the same file. In the directory's second column the same i-number appears. Note that we created the hard link in the same directory as the original file. This is not necessary. When a file is deleted, its hard link count decreases by 1. If the new count is 0, the file is deleted, i.e., all the space it occupied is reclaimed.

Unix symbolic links (also known as symlinks or soft links) are a special kind of file pointing at their target at the file system level. Opening a Unix symlink is just like opening the original file. Symbolic links are conceptually pointers to pathnames. They can contain any path name. Deleting a symlink does not affect the original file, but if the target is deleted, the symlink becomes broken.

In Linux, hardlinks cannot cross filesystems: they must stay on the same partition and drive. They also cannot point to directories.  See man ln for more information about links.

In Windows, symblic links are called "shortcuts", and the GUI makes it easy to create shortcuts. Hardlinks of the Linux-kind do not exist. But "hardlinks" known as junctions can be created using the administrator-only command line utitity fsutil or by using downloaded programs such as junction.exe from technet.microsoft.com. A junction is a symbolic link to a directory.

A Few File Types

Many Windows and Linux applications depend on recognizing the type of a file based on the extension it has. But, this is just a heuristic. There are several web sites that have extensive lists of extensions; one such site is http://filext.com/ .

Linux Windows Brief description/Limitations in the Learning Objective, if any
.txt .txt Plain text file
.c .c Plain text file containing C source code
.java .java Plain text file containing Java source code
.py .py Plain text file containing Python source code
.dvi .dvi Device Independent typesetting/publish file; Open source
.pdf .pdf Device Independent typesetting/publish file; proprietary
.html .html Hypertext Markup Language; plain text file with markups for use on WWW
.com Program file; old DOS/Windows format
.exe Program file; current Windows format
.so .dll Dynamic Link Library file
.o .obj Object code file
.ko .sys An OS module
.zip .zip Compressed archive of one or more files and directories
.tar .tar Uncompressed ("tape") archive of one or more files and directories
.gz Compressed file
.bz2 Compressed file using bzip2, am more refined compression

Where is What

Linux Windows Brief description/Limitations in the Learning Objective, if any
/home C:\Documents and Settings Users home directories
/usr/bin C:\Program Files Installed programs
/sbin C:\Windows\system32 System programs
/usr/sbin System programs
/etc Registry Configuration settings

In Linux, all directories are subdirectories of the the "root" directory, or base of the file system. A directory is also a file. To disambiguate, we refer to a file that is not a directory an ordinary file. A directory can contain an unlimited number of ordinary files and subdirectories. Linux file names can be long, the exact max length depends on the underlying file system design, and unimportant in this course. Linux file names typically contain letters (a-z, A-Z), digits (0-9), and the dot ('.'). Linux file names can also contain spaces, tabs, and punctuation characters such as comma, etc. but these can be troublesome, so we will avoid them.

Note: The shell has a featured called tab completion which makes typing long pathnames less painful. Experiment with typing a partial filename and pressing tab.

Linux internally identifies a file with a number called the i-number. Two different files will have different i-numbers. An i-number is an index into an array called the i-node array. A directory should be thought of as a table of two columns: first column lists the name of the file, the second lists its i-number. The internal order of the rows of this table is unimportant.

Linux files of any system form a hierarchical tree — all internal nodes are directories, all ordinary files are leaves, and some leaf nodes are empty directories. Every file is reachable by walking a path starting from the root, to the parent directory of the file, and finally to the file. This is known as the absolute path name. An example absolute path name is /d1/d2/d3/myFile.txt. Linux path names use the forward slash ('/') to separate the directory names. In Linux, the back slash ('\') has a special meaning to be discussed much later.

Linux files are typically organized as follows:

Directory Contains
/bin Executable files (programs)
/etc et cetera; Configuration files
/lib Library files (executable code shared by multiple programs)
/mnt Subdirectories expected to be used as mount points
/media Subdirectories expected to be used as mount points for Removable Media
/sbin Executable programs used for system maintenance
/tmp Any user can create temporary files here. File permissions do apply.
/usr Programs for day to day use of the system, as distributed by the OS vendor
/dev Special files for accessing peripherals (such as hard drives or joysticks)
proc Information about currently running processes
/var Files of variable size, such as logs or queues for printer or mail servers

The bin, sbin subdirectory name often reoccur. E.g., a Linux system may have /bin, /usr/bin, /usr/local/bin, /sbin, /usr/sbin, /usr/local/sbin, /opt/kde/bin, and /common/bin. See man hier for more information.

A system that has multiple local disks mounts the partitions on the subdirectories of /mnt so that the end result is again a single hierarchical tree. It is also possible to have one physical disk partition shared by multiple systems via a LAN.

Every directory contains two special entries: (i) the dot refers to itself, a self-loop; and (ii) the dot-dot refers to the parent directory.

When referring to the Linux file system as a tree, we are excluding these two from every directory. Another exception occurs via what are known as links—hard, or soft. These are discussed later.

File extensions have no special meaning in Linux. They are a convention to make things easier for human users, and as such have no special meaning to the operating system. However, it is always best to follow these conventions to prevent later confusion.

Other Linux filename conventions include those for backup (created by an editor after saving a file) and auto-save files (copies of unsaved changes made by editors at regular intervals). Backup filenames are made by appending a tilde (~) to the original filename. Auto-save filenames are the original filename with a pound signs (#) before and after it.

Interoperability

Permissions and Ownership

Permissions are the Linux way of keeping track of which users can do what to files. Ownership determines which of a file's permissions apply to users. Both are displayed in an "ls -l" listing.

There are three basic permissions: read ("r"), write ("w"), and execute ("x"). Users are allowed to perform these operations on a file if and only if the corresponding permission is set for them on that file (a file's permissions are collectively referred to as its mode). Note: the executable permission has a special meaning for directories. Without it, a user cannot cd into a directory, no matter what others are set.

Every file has three sets of permissions. One applies to the user who owns the file ("u"), one to the file's group ("g"), and one to all other users on the system ("o"). For example, a file can be readable by everyone but writable only by the users belonging to that file's group.  Permissions are changed by the chmod utility.  Ownership is changed by the chown utility.

Windows Access Control Lists

Windows has read, write, etc permissions. These are typically handled in the Properties tabs sheet of a file. At the command line level, the two relevant commands are attrib and cacls. An access control list explicitly enumerates each user and the given/denied privileges.

Utilities

The following are command line utilities that you are expected to learn as part of this course. For further details on the commands, look it up in the text book, man/help pages, and the web.

Linux Windows Brief description/Limitations in the Learning Objective, if any
ls dir Listing of files. For ls, learn the options of -lisa, -r, -R. For dir, learn, /a, /tc, /s
tree Graphically displays the directory structure of a drive or path.
cat type List the contents of a file on the stdout
more more List the contents of a file on the stdout, pausing after each page
mkdir mkdir Creates a directory.
rmdir rmdir Removes a directory.
cd cd Change the current directory to the one given as argument.
pushd pushd Saves the current directory name on the stack, and then cd's the one given as argument.
popd popd Pop off the top-most name on the stack, and then cd to it
mv move Move or rename a file
cp copy Copy a file
cp -r xcopy Copies files and directory trees.
rm del Remove files
ln Link (hard or soft) to an existing file
chmod attrib Change file permissions/attributes
cacls Displays or modifies access control lists (ACLs) of files
chown Change ownership of a file
umask packed vector of bits controlling the initial permissions on a newly created file
wc Word count of file content; also counts lines and bytes
od Octal dump of file content. Almost always used with -x for hexadecimal dump
tr Translate characters
assoc List associations of commands with extensions
file Heuristically determine the type of file content
grep find Search for a string in a file's content. For now, learn it without regexp.
find Locate a file. By name, etc. For now, learn it without regexp.
diff fc List the differences between two text files
du Disk space Used
nano edit Simple text editors
vi vim A powerful text editor. For now, learn to edit simple text files with it.
emacs emacs A very powerful multi-purpose text editor. For now, learn to edit simple text files with it.

 


Lab Experiment

All work is expected to be carried out in the Operating Systems and Internet Security (OSIS) Lab, 429 Russ.   But, you are welcome to work wherever.  Note that use of both Linux and Windows and other software, that may not always be installed in other facilities, may be needed.

As you work on each step, record (i) the step number, (ii) the lines you type and (iii) your observations in a plain text file named myLabJournal.txt using your own words and/or copying appropriate lines.  You may use any editor you wish to edit this file.  The quality of your observations and explanations is judged.

In Linux

  1. [Installed Programs] Determine the total number of files in the directory /usr/bin. [Hint: Use ls recursive, wc, and a pipe.]
  2. [Permissions] Create a text file named myInfo.txt  in your home directory containing exactly four lines: Your full name, your UID, your email address, and the darkest wish ;-) you have, each on a separate line.  Make sure that the file is strictly for your eyes-only.  Record how you did it.  Copy this file to your USB thumb drive.  We are assuming that your USBD is using a FAT/ FAT16/ FAT32 volume.
  3. [Hard and Soft Links] Change to your home directory.  In ~, create three sub-directories named D1, D2, D3, and also create a file named ls0.txt as follows:  ls -lisR > ls0.txt.  Establish both a hard link and also a soft-link to this ls.txt in D1 and D2 giving the links suitable names.  Establish inside D3, a soft link named ls0.txt to the soft-link to ~/ls0.txt you now have in D2.  If you changed directories, go back to home.  Create a file named ls1.txt as follows:  ls -lisR > ls1.txt. Now, delete ls0.txt, and create a file named ls2.txt as follows:  ls -lisR > ls2.txt.  Can you edit ~/D3/ls0.txt?  Explain.
  4. [Explain Differences] Recreate ls0.txt as above. Use diff ls0.txt ls1.txt, and again diff ls1.txt ls2.txt and explain the differences.
  5. [Tar Ball of Home] Copy recursively your Linux home directory onto your USB drive. Omit the hiddent files and directories. [You may wish to clean up your home directory of "junk" files before this operation.] Create a tar ball of your entire home directory that is compressed by bzip2. This tar ball should be located in /tmp directory and named ceg233nn.tar.bzip2 where ceg233nn stands for your OSIS Lab Login ID. Use du to determine the actual disk usage of your home. What was the compression percentage?

In Windows

In this part of the experiment, the overall goal is to repeat the above in Windows.

  1. List what the Windows "equivalents" are for the Linux programs you used above.
  2. Using these equivalents, perform the steps listed in the Linux part of this lab, but use file names wls0.txt, wls1.txt, wls2.txt.
  3. In step 3, ignore hard link establishment.
  4. Copy the file named myInfo.txt from your USB drive to a directory on Windows hard drive.  Are the permissions on this newly copied file still preserving "strictly for your eyes-only"?  Explain.
  5. For step 5, use the copy that you have on your USB drive of your Linux home directory.  Instead of tar+bzip2, use zip-archiving.

Turnin

Note the number n of this Lab from the course home page and use Ln as the first argument to turnin.

Turn in the files: (i) ReadMe.txt, (ii) myLabJournal.txt, (iii) ls1.txt, (iv) ls2.txt, (v) wls1.txt, and (vi) wls2.txt.

Do NOT turnin the tarball or the zip archives.

Link to Grading Sheet


Acknowledgements


References

  1. http://en.wikipedia.org/wiki/ Filename_extension Required visit.
  2. Sobell, Chapters 3 and 4, Required Reading.
  3. Access Linux ext2/ext3 system files from Windows XP or Vista: (i) http://www.fs-driver.org (ii)http://www.go2linux.org/accessing-linux-drive-ext-with-vista  Recommended Reading.
  4. Microsoft TechNet, "How to create and manipulate NTFS junction points", http://support.microsoft.com/?kbid=205524Recommended Reading.
.
Copyright © 2009 Prabhaker Mateti last edited: July 10, 2009