The objectives of this lab experiment are to make you :
The content of this article applies to both Linux and Windows unless explicitly stated otherwise.
A file is an association of two things: a name and content. Generally, a file is stored on a persistent medium such as a hard disk (HDD), or USB flash-memory based mass storage device ("thumb drive"). The raw content is always a sequence of bytes. Many files have a structure imposed on this content. E.g., text files are expected to contain only the "printable characters" (ASCII A-Z, a-z, 0-9, punctuation, blank, '\t' (tab), '\r' (CR), '\n' (LF)). Program files, database files, etc. have even more rigidly defined expectations as to what the raw content should be.
The name of a file is typically of the form basename.ext. It is a good idea to have an extension that indicates the type of the file content. E.g., .txt for text, .mp3 for an audio file in MP3 format, etc. Typically, extensions are only three characters, but not always. E.g., HTML files are often named with .html extension.
Other Linux filename conventions include those for backup (created by an editor after saving a file) and auto-save files (copies of unsaved changes made by editors at regular intervals). Backup filenames are made by appending a tilde (~) to the original filename. Auto-save filenames are the original filename with a pound signs (#) before and after it.
The base name is forbidden from having certain characters. E.g., Linux forbids the forward-slash, and Windows forbids the backward-slash. It is best to name files using a-z, A-Z, 0-9 only, and avoid, even though not forbidden, spaces, punctuation, and other such unusual characters. File names can be long, the exact max length depends on the underlying file system design, and unimportant in this course.
Internally, an OS does not use symbolic file names for files. In this course, we use the concept of an i-node to explain this idea further in both Linux and Windows. An i-node is an abstraction that deals with the location of content and a few other details. An i-node is an OS internal data structure that captures all meta data, such as when was the file modified, how big it is, etc. E.g., the i-node tells us where exactly on the HDD is the content located as in "this file's content is located in HDD block numbers 134, 1025, 289, and 903." It does not contain the file content, but tells us where it is. An i-node does not contain the file name. It contains information about the file.
Conceptually, all the i-nodes are collected into an array. An i-number is an index into this array. Linux internally identifies every file with an i-number. Two different files will have different i-numbers. There is a one-to-one correspondence between files and i-nodes. We set aside the 0-th i-node as unused. So, all i-numbers of interest are greater than 0.
A directory should be thought of as a table of two columns: first column lists the name of the file, the second lists its i-number. The internal order of the rows of this table is unimportant. The primary purpose of a directory is to keep the association of name with the location of content. The term folder is a synonym for directory. Every directory is a file. So, when we wish to discuss a non-directory file we will use the term ordinary file.
Every file (including directories) has a parent directory that it belongs to. This will fail at the top-most level; so, we artificially make the top-most directory the parent of itself. Every directory contains two special entries: (i) the dot refers to itself -- a self-loop; and (ii) the dot-dot refers to the parent directory.
When referring to the OS file system as a tree, we are excluding these two from every directory. Another exception occurs via what are known as links—hard, or soft. These are discussed later.
All directories are subdirectories of the the "root" directory, or base of the file system. A directory can contain an unlimited number of ordinary files and subdirectories. The root directory is really anonymous. In Linux, we refer to it as/with /. In Windows, we almost never refer to it; we refer to its top-most subdirectories as C:, D:, etc.
Files of any OS form a hierarchical tree — all internal nodes are directories, all ordinary files are leaves, and some leaf nodes are empty directories. Every file is reachable by walking a path starting from the root, to the parent directory of the file, and finally to the file. This is known as the absolute path name. An example absolute path name is /d1/d2/d3/myFile.txt. Linux path names use the forward slash ('/') to separate the directory names. In Linux, the back slash ('\') has a special meaning to be discussed much later.
A volume is the collection of files and their meta data organized by an OS as an internal data structure. This is stored on a partition of a HDD. Before a brand new HDD can be used, it needs to be divided into one or more partitions depending on the size of HDD, and a file volume needs to be initialized on each of the partitions. In both Linux and Windows, the partitioning programs are typically named with the word fdisk in it. In Linux the mkfs program, in Windows the format program initializes a partition with a volume.
There are many designs for volumes; the currently (2011) prominent ones are: ext4, NTFS 6, FAT32. The details of the internal data structures of these designs are beyond the scope of our course. All these volumes are interoperable between Linux and Windows (with the installation of a file system module; see References below.). Because most BIOS can readily deal with FATnn but not ext or NTFS, USB mass storage devices intended to be bootable devices should use FATnn volumes.
A system that has multiple local disks mounts the partitions on the subdirectories of /mnt so that the end result is again a single hierarchical tree.
It is also possible to have one physical disk partition shared by multiple systems via a LAN.
A link is a "pointer" to a file. In all OS, a file cannot exist without at least one link to it. We can establish multiple links to a given file. The multiple links can be in different directories. There are two kinds of links: hard and soft (also called symbolic). Note that link is an overloaded term; in non-file contexts it has other meanings.
We explore hard links first. Every file has a so-called hard link count of at least 1. Every new hard link we add to a file increases this link count by 1.
The ln utility, in Linux, is used to create either symbolic or hard links to files. Consider the following command:
ln abc.txt xyz.dat
It establishes the new name xyz.dat (which should not have existed prior to this) as a hard link to abc.txt (which should exist already). These two names refer to the same file. In the directory's second column the same i-number appears. Note that we created the hard link in the same directory as the original file. This is not necessary. When a file is deleted, its hard link count decreases by 1. If the new count is 0, the file is deleted, i.e., all the space it occupied is reclaimed.
Unix symbolic links (also known as symlinks or soft links) are a special kind of file pointing at their target at the file system level. Opening a Unix symlink is just like opening the original file. Symbolic links are conceptually pointers to pathnames. They can contain any path name. Deleting a symlink does not affect the original file, but if the target is deleted, the symlink becomes broken.
In Linux, hard links cannot cross file systems: they must stay on the same partition and drive. They also cannot point to directories. See man ln for more information about links.
In Windows, symbolic links are called "shortcuts", with an extension of .lnk and the GUI makes it easy to create shortcuts. Hard links of the Linux-kind do not exist. But "hard links" known as junctions can be created using the administrator-only command line utitity fsutil or by using downloaded programs such as junction.exe from technet.microsoft.com. A junction is a symbolic link to a directory.
In Linux, the .ext extensions are not significant, and often files are named without extensions. They are a convention to make things easier for human users, and as such have no special meaning to the OS. In Windows also this is really the case. However, it is always best to follow these conventions to prevent later confusion.
Many Windows and Linux applications depend on recognizing the type of a file based on the extension it has. This is known as "association" based on extensions, and if a file has no extension or has an improper extension, (double-) clicking on the file name in Linux or Windows GUI applications will invoke the wrong program on it.
There are several web sites (e.g., http://en.wikipedia.org/wiki/ Filename_extension, and http://filext.com/) that discuss extensions.
| Linux | Windows | Brief description/Limitations in the Learning Objective, if any | |
| .txt | .txt | Plain text file | |
| .c | .c | Plain text file containing C source code | |
| .cpp | .cpp | Plain text file containing C++ source code | |
| .java | .java | Plain text file containing Java source code | |
| .py | .py | Plain text file containing Python source code | |
| .dvi | .dvi | Device Independent typesetting/publish file; Open source | |
| .odt | .odt | Open Document Text Format; Open source | |
| Device Independent typesetting/publish file; proprietary | |||
| .html | .html | Hypertext Markup Language; plain text file with markups for use on WWW | |
| .com | Program file; old DOS/Windows format | ||
| .exe | Program file; current Windows format | ||
| .so | .dll | Dynamic Link Library file | |
| .o | .obj | Object code file | |
| .ko | .sys | An OS module | |
| .zip | .zip | Compressed archive of one or more files and directories | |
| .tar | .tar | Uncompressed ("tape") archive of one or more files and directories | |
| .gz | Compressed file | ||
| .bz2 | Compressed file using bzip2,a more refined compression |
| Linux | Windows | Brief description/Limitations in the Learning Objective, if any | |
| /home | C:\Users | Users home directories | |
| /usr/bin | C:\Program Files | Installed programs | |
| /sbin | C:\Windows\system32 | System programs | |
| /usr/sbin | System programs | ||
| /etc | Registry | Configuration settings | |
Both bash and PowerShell have a featured called tab completion which makes typing long pathnames less painful. Experiment with typing a partial filename and pressing tab.
Linux files are typically organized as follows:
| Directory | Contains |
| /bin | Executable files (programs) |
| /etc | et cetera; Configuration files |
| /lib | Library files (executable code shared by multiple programs) |
| /mnt | Subdirectories expected to be used as mount points |
| /media | Subdirectories expected to be used as mount points for Removable Media |
| /sbin | Executable programs used for system maintenance |
| /tmp | Any user can create temporary files here. File permissions do apply. |
| /usr | Programs for day to day use of the system, as distributed by the OS vendor |
| /dev | Special files for accessing peripherals (such as hard drives or joysticks) |
| /proc | Information about currently running processes |
| /var | Files of variable size, such as logs or queues for printer or mail servers |
The bin, sbin subdirectory name often reoccur. E.g., a Linux system may have /bin, /usr/bin, /usr/local/bin, /sbin, /usr/sbin, /usr/local/sbin, /opt/kde/bin, and /common/bin. See man hier for more information.
Interoperability in the context of files is the ability of a file being used within an OS that did not create it. E.g., how would a plain text file created in Linux appear to Windows applications? A text file is a sequence of lines. A line is a sequence of ordinary characters ending with an "end-of-line" indicator. Linux uses the '\n' (ASCII Line Feed), Windows uses '\r' (ASCII Carriage Return) followed by '\n', and Mac OS uses '\r' as the end-of-line indicators. Most applications on these three OS are now aware of these differences and adjust automatically.
Permissions are the Linux way of keeping track of which users can do what to files. Ownership determines which of a file's permissions apply to users. Both are displayed in an "ls -l" listing.
There are three basic permissions: read ("r"), write ("w"), and execute ("x"). Users are allowed to perform these operations on a file if and only if the corresponding permission is set for them on that file (a file's permissions are collectively referred to as its mode). Note: the executable permission has a special meaning for directories. Without it, a user cannot cd into a directory, no matter what others are set.
Every file has three sets of permissions. One applies to the user who owns the file ("u"), one to the file's group ("g"), and one to all other users on the system ("o"). For example, a file can be readable by everyone but writable only by the users belonging to that file's group. Permissions are changed by the chmod utility. Ownership is changed by the chown utility.
Windows has read, write, etc permissions. These are typically handled in the Properties tabs sheet of a file. At the command line level, the two relevant commands are attrib and icacls. An access control list explicitly enumerates each user and the given/denied privileges.
In addition to attib and icacls, PowerShell provides the Get-Acl and Set-Acl cmdlets for modifying access control lists. Additionally, file attributes and permissions can be modified with file object members. For example, to set "file.txt" to be hidden and readonly, use this line:
(gi file.txt).Attributes = 'hidden', 'readonly'Note that if the file is hidden, you need the -force flag to access it in PowerShell. To return "file.txt" to normal settings, use this line:
(gi file.txt -force).Attributes = 'normal'Permissions can be copied from one file to another using the cmdlets or file object members:
get-acl c:\a.txt | set-acl -path C:\b.txt $acl = (gi a.txt).getAccessControl() (gi b.txt).setAccessControl($acl)Modifying permissions is more complicated:
$file = gi file.txt
$acl = $file.GetAccessControl()
$newRule = New-Object Security.AccessControl.FileSystemAccessRule "username", Modify, Allow
$acl.ModifyAccessRule("Add", $newRule, [ref]$false)
$file.SetAccessControl($acl)
The following are command line utilities that you are expected to learn as part of this course. For further details on the commands, look it up in the text book, man/help pages, and the web.
| Linux | Windows | Brief description/Limitations in the Learning Objective, if any |
| ls | Listing of files. For bash, learn the options of -lisa, -r, -R. | |
| ls | Listing of files. For PowerShell, learn -f, -r, -filter, and -exclude | |
| tree | tree | Graphically displays the directory structure of a drive or path. |
| cat | cat | List the contents of a file on the stdout |
| more | more | List the contents of a file on the stdout, pausing after each page |
| mkdir | mkdir | Creates a directory. |
| rmdir | rmdir | Removes a directory. |
| cd | cd | Change the current directory to the one given as argument. |
| pushd | pushd | Saves the current directory name on the stack, and then cd's the one given as argument. |
| popd | popd | Pop off the top-most name on the stack, and then cd to it |
| mv | mv | Move or rename a file |
| cp | cp | Copy a file |
| cp -r | cp -r | Copies files and directory trees recursively. |
| rm | rm | Remove files |
| ln | Link (hard or soft) to an existing file. | |
| mklink | Link (hard or soft) to an existing file. Type cmd /c mklink to use it in PowerShell | |
| chmod | attrib | Change file permissions/attributes |
| icacls | Displays or modifies access control lists (ACLs) of files | |
| chown | Change ownership of a file. In PowerShell, multiple steps are necessary | |
| umask | packed vector of bits controlling the initial permissions on a newly created file | |
| wc | measure | Word count of file content; also counts lines and bytes |
| od | Octal dump of file content. Almost always used with -x for hexadecimal dump | |
| tr | Translate/substitute characters; useful in improving interoperability | |
| assoc | List associations of commands with extensions. Type cmd /c assoc to use it in PowerShell | |
| file | Heuristically determine the type of file content | |
| grep | select-string | Search for a string in a file's content. For now, learn it without regexp. |
| find | gci | Locate a file. By name, etc. For now, learn it without regexp. |
| which | Gives the full path name of a command | |
| where | Gives the full path name of a command. Type cmd /c where to use it in PowerShell | |
| diff | diff | List the differences between two text files |
| du | measure | Disk space Used. In PowerShell, try gci . -r | measure -property length -sum |
| nano | notepad | Simple text editors. |
| vi | vim | A powerful text editor. For now, learn to edit simple text files with it. |
| emacs | emacs | A very powerful multi-purpose text editor. For now, learn to edit simple text files with it. |
All work is expected to be carried out in the Operating Systems and Internet Security (OSIS) Lab, 429 Russ. But, you are welcome to work wherever. Note that use of both Linux and Windows and other software, that may not always be installed in other facilities, may be needed.
As you work on each step, record (i) the step number, (ii) the lines you type and (iii) your observations in a plain text file named myLabJournal.txt using your own words and/or copying appropriate lines. You may use any editor you wish to edit this file. Read through the Grading Sheet below and gather relevant portions into answers.txt. The quality of your observations and explanations is judged.
Note the number n of this Lab from the course home page and use Ln as the first argument to turnin. Turn in the files: ReadMe.txt myLabJournal.txt answers.txt myInfo.txt lisaRA.txt lisaRB.txt lisaRC.txt RAwin.txt RBwin.txt RCwin.txt. Do NOT turnin the tarball or the zip archives.
Sarah Gothard