CEG 233: Linux and Windows

Lab on Programs and Processes

Abstract: The word "program" is often used loosely, and interchangeably with the word "process". But we should be very careful in the use of this word. It is silly to say that "a program is running." It should be "a process is running." This article and the associated lab experiment serve as a technical introduction at the freshman level to programs and processes. It introduces the control of processes as in stopping, resuming, changing priorities, and explains the resources (such as CPU time, etc.) consumed.

Table of Contents

  1. Educational Objectives
  2. Background
    1. Programs
    2. Processes
    3. Process Management
  3. Lab Experiment
  4. Acknowledgements
  5. References

Educational Objectives

The objectives of this lab experiment are to make you :

  1. Understand the distinction between programs and processes
  2. Control processes as in stopping, resuming, changing priorities
  3. Observe the resources (such as CPU time, etc.) consumed

Background

The word "program" is often used loosely, and interchangeably with the word "process". In this course, we will be very careful in the use of this word. As a result, it would be silly to say that "a program is running." It should be "a process is running."

Programs

A program is a static (i.e., unchanging/passive) entity. It is a file whose content is rigidly formulated as needed by the operating system. For each OS, there are several such rigid formats. In Linux, ELF is the most common, and there are other formats. In Windows, the EXE format is the most common, and the obsolete COM format is till in use.

Programs are divided into two classes for the purposes of this course: Applications and Systems Programs. Programs such as word processors, email clients, web browsers are applications. Programs such as init (that controls the sessions of an OS), the loader that load a program into memory as a required prelude to making the program into a process, ifconfig/ ipconfig (that set the parameters of network cards) are systems programs. Programs whose absence would make an OS incomplete/ crippled are called systems programs. Programs that make a computer system useful in a particular way are applications. This definition has been evolving over the decades. E.g., compilers, linkers, and shells used be considered systems programs.

Creation of Programs

We write the source code of programs. A program may also have help files, documentation, and other such files. These are not essential in that their absence will not disable the launching of a program. When these files are asked for, you will only get a "missing file" error.

Compiling

The source code is a file of text that must abide by the syntax and semantics of some programming language. Some well known programming languages are C++, Java, Perl, Python, and Assembly. For reasons of modularity and manageability, the source code is often split into multiple files.

Source code files are processed by programs called compilers, interpreters, and assemblers. After compilation of the source code, object code files are produced. The content of the object files is rigidly controlled. It is often the case that source code files written in different programming langauges are compiled into object code files that can be linked together.

In Linux, object code files have .o extension; in Windows, the extension for object code files is .obj.

Java files typically get compiled into the byte code of JVM, which is platform (i.e., CPU and OS) independent; the extension for these byte code files is .class. There are regular compilers also that compile Java straight into the machine code of a specific CPU.

Integrated development environments (IDE) are the primary tools for developing programs. Behind the scences, they compile, link, and manage the entire development activity. In this course, we are trying to understand these activities. In Linux, the command line tools with the names gcc or g++ are driver programs that examine the arguments in a sophisticated way given and invoke appropriate tools (such as compilers, assemblers, and linkers) based on the arguments.

Linking

The object code files and methods/ procedures/ functions from pre-existing library files are linked into an executable file that is then qualified to be called a program. In Linux, programs (traditionally) do not have any extension. In Windows, program files have .exe extension. Files with .com extension are old format program files dating from MS DOS.

The structure and content of an object code file obeys rigid rules. Conceptually, we can think of each file beginning with a TOC (table of contents like in a book), followed by the executable machine code of the various methods. The TOC describes among aother things imported and exported symbols (i.e., names of variables, methods, etc). A given object file may use names that are defined elsewhere; these are imported sysmbols. A given object code file may define some symbols that may or may not be used within that file, but are intended to be of use elsewhere. These are exported symbols.

A linker (also called linkage editor) essentially "stitches" the object code files together replacing all references of imported symbols with their addresses defined in the exports list. This stitching succeeds only when all the imported symbols across all object code files, that make up one program, are found among the exported symbols (including those exported by various libraries).

In Linux, the linker is actually named ld for historical reasons. It has nothing to do with the loading activity described below.

Libraries

Certain methods are so common and so useful that over the decades the code for these has been developed carefully and optimized into collections known as libraries. A library can be viewed as a catenation of object code files with a TOC up front.

In Linux, library files have names ending with the extension .so and in Windows .dll. These are essential in that the absence of any such file will cause the launch of a program to fail.

System Calls

Every OS includes methods that are intended to be called by processes. These methods cause a change of mode: the process may be in the so-called "user" (or unprivileged) mode, and this call of an OS internal method causes the process to enter into a "kernel" (i.e. privileged) mode for the duration of this method. Such calls are known as system calls. To support this, all modern CPUs have special instructions variously called INTerrupt, trap, svc (supervisor call), distinct from an instruction, usually labeled CALL, that calls another method.

Both Linux and Windows have some 300+ system calls. These can be invoked directly, but more commonly the system calls are wrapped inside more convenient library methods.

Install Directories

TBD What does installing a program do? install, *.deb, *.rpm. *.msi, installable *.exe files.

Programs are installed into specific directories. On Linux, the standard directories are

  1. /bin
  2. /usr/bin
  3. /sbin
  4. /usr/sbin

On Windows, the standard directories are (C: is used as an example only; do echo %SystemDrive% in cmd to see the actual drive name on the PC your are working on):

  1. C:\Program Files
  2. C:\Windows
  3. C:\Windows\system32

Utilities on Programs

The following are standard programs that you are expected to learn as part of this course in the context of programs. For further details on the commands, look it up in the text book, man/help pages, and the web.

Linux Windows Brief description
file Heuristically determine what kind of a file the given one is
size Display sizes of code, bss, and data of a program
ldd tasklist /m Display the libraries needed to invoke a program
env set Display the Environment of the invoking shell
install Install a program
nm Display the names of variables, methods, etc defined in a program or object code file
strip Strip the above names etc.

Linux includes many utilities for extracting information about the contents of files. Two of the most important are file and size.

file FILENAME... will output the type of the given files, such as "ASCII text" or "MP3 file with ID3 version 2.3.0 tag". It does this by examining certain distinctive patterns of bytes within a file (called the type's magic number), and can often get quite detailed information.

size outputs information about object files or compiled executables, such as those produced by GCC. Specifically, it lists the sizes of the various sections of the object file. The "text" section is the code, "data" contains data which is initialized, and "bss" is the uninitialized part of the data segment. (Recall the difference between initializing a variable and merely declaring it from CS240.)

Processes

A successful invocation of a program results in a process. The invocation is typically done either via a shell (cmd, PowerShell or bash) or a menu system (which is a "graphical" shell). Internally, the shells make a system call (built into the OS) that accomplishes this. More technically, in response to an exec system call, the OS loads the program into main memory, constructs certain OS-internal data structures. The resulting entity is called a process. A process is a dynamic (i.e., changing/active) entity.

The word "load" is a highly technical term and, at the level of CEG233, a difficult one to describe. Often students are confuse linking with loading. Adding to this confusion are the terms static and ynamic prefixed to both. Static linking is a compile-time activity; per program, we need only do this once. When a program is invoked, static loading brings the entire program into memory before the resulting process begins its execution. In dynamic loading, only portions are brought in as needed, and some portions of the program may never be brought in. Static loading is an activity that happens just before running. Dynamic linking links all the object code files together into a program but postpones linking the methods of the libraries; such a program file is considerably smaller than an equivalent statically linked program. Dynamic linking and loading happens during running of the process.

The loader program/process is often invisible to the users. In Linux, the programs named ld.so and ld-linux accomplish the dynamic linking and loading as part of the exec system call.

The main(argc, argv, envp) method of a freshly created process is supplied three arguments by the invoker process. The argv is a vector of pointers to strings, argc a count of items in argv[], and envp a pointer a an array of characters known as the environment. A shell (a CLI, or GUI shell such as explorer) facilitates the construction of these arguments from keyboard/ mouse/ user given input. The environment is the set of string variables available to all processes. In Linux, env command displays the environment and the set command manipulates it.

Since all programs can access the environment string, it's frequently used as a way to supply options to commands without repeating them every time the command is invoked. (ls reads LS_OPTIONS, for example).

Other examples of values commonly stored in the environment are:

It is a Linux/Windows convention that all global environment variable names be upper-case.

In bash and in powershell, environment variables may be manipulated just like any other shell variable. In bash, e.g., PATH=$PATH:~/bin appends the user's own bin directory to the path.

Process Management

A primary fucntion of any OS is: Given a program, create a process and run it. Both Linux and Windows run many processes "simultaneouly" and strive to guarantee that no process interferes with another, that each is given a fair share of resources, and given the hardware, the overall performance is maximized.

Process States

Process states: Read-to-run, Running, Waiting for an event, swapped out. State transitions occur as a result of process scheduling by the OS. Preemption. Priorities. See the Required Reading.

Resources Used by Processes

Every process consumes some resources. The most obvious ones are CPU time, memory, open files, and devices. No process is able to "get" them unless they request the OS. These are granted to the processes as requested/ needed/ available by the OS.

Every process begins with an Open File Table containing three entries in indices 0, 1 and 2. The "stdout" and "stdin" are the normal text input and output of commands, i.e. what shows up in the terminal. C++ programmers can think of them as like "cout" and "cin". There is also a "stderr" for the output of error messages. The shell usually refers to them by number: stdin == 0, stdout == 1, and stderr == 2. The stdin is initially bound to the keyboard; the stdout and stderr are initially bound to the screen. When additional files are opened these are inserted into the Open File Table; as files get closed, these are vacated. So, at any given moment, the Open File Table may not be contiguously filled. There is a limit on the size of this table imposed by the sys admin of the system; typically, it is around 30.

Standard Processes in Linux

The following list was generated by ps aux and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   2952  1852 ?        Ss   Nov23   0:01 /sbin/init
daemon    4118  0.0  0.0   1812   552 ?        Ss   Nov23   0:00 /sbin/portmap
statd     4137  0.0  0.0   1876   716 ?        Ss   Nov23   0:00 /sbin/rpc.statd
root      4407  0.0  0.0   1696   520 tty1     Ss+  Nov23   0:00 /sbin/getty 38400 tty1
root      4914  0.6  6.6  77020 69344 tty7     SLs+ Nov23  45:50 /usr/bin/X
root      4931  0.0  0.0   5280   992 ?        Ss   Nov23   0:00 /usr/sbin/sshd
root      5506  0.0  1.2  31692 13132 ?        S    Nov27   0:00 kded [kdeinit]            
pmateti   5828  0.0  1.2  31868 13024 ?        S    Nov23   1:57 kwin [kdeinit]
pmateti   5830  0.0  1.7  35256 17660 ?        S    Nov23   0:44 kdesktop [kdeinit]                                          
pmateti   5832  0.0  1.8  37200 19476 ?        S    Nov23   1:06 kicker [kdeinit]                                            
pmateti   5863  0.0  0.1   4712  2044 pts/0    Ss   Nov23   0:00 /bin/bash

Standard Processes in Windows

The following list was generated by tasklist and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.


Image Name                   PID Session Name     Session#    Mem Usage
========================= ====== ================ ======== ============
System Idle Process            0 Console                 0         28 K
System                         4 Console                 0        236 K
smss.exe                     812 Console                 0        380 K
csrss.exe                    876 Console                 0      3,460 K
winlogon.exe                 904 Console                 0      6,452 K
services.exe                 948 Console                 0      5,908 K
lsass.exe                    960 Console                 0      1,632 K
ati2evxx.exe                1120 Console                 0      3,344 K
svchost.exe                 1152 Console                 0      5,252 K
spoolsv.exe                 2012 Console                 0      5,356 K
avgamsvr.exe                 312 Console                 0        332 K
MDM.EXE                      436 Console                 0      2,852 K
explorer.exe                1312 Console                 0     29,260 K
alg.exe                     1332 Console                 0      3,552 K
wmiprvse.exe                6488 Console                 0      5,632 K

Process Utilities

The following are standard programs that you are expected to learn as part of this course in the context of processes. For further details on the commands, look it up in the text book, man/help pages, and the web.

Linux Windows Brief description/Limitations in the Learning Objective, if any
ksysguard taskmgr Continuously updated GUI view of processes
ps tasklist Display processes currently alive
top Continuously updated text view of processes
nice Invoke the rest of the command at a lower priority
time Invoke the rest of the command and time it
kill taskkill /pid Kill a process whose number is given.
killall taskkill /im Kill a process whose name is given.
bg Place the last suspended process in the background
fg Place the last suspended process in the foreground
sc Service Controller
ltrace Show library calls being made
strace Show system calls being made

Signals and the Kill Command

Syntax: kill -[SIGNAL] PID...

Despite its name, ending processes is only one function of the kill command. More generally, it sends signals to processes (i.e., it raises exceptions). Programs can either catch these signals and handle them gracefully, or allow the operating system default to handle them.

The default signal sent by kill is SIGTERM. A different signal can be given before the PIDs, either by number (kill -1) or by name (with or without the "SIG", kill -HUP and kill -SIGHUP both work).

Signals are sent for other events besides the user running kill. Many of the most common signals are never sent directly by users except when testing. Bugs in a program may cause it to terminate with SIGSEGV, and pressing control-c usually sends SIGINT, for example.

Unfortunately, signal numbers vary between Unix flavors. The most common signals usually stay the same, but it's a good idea to check kill -l for supported signals. Further, although many systems provide convenience utilities for common tasks, they sometimes have different effects when moving between systems. For example, the command that kills all processes matching a certain name on Linux will end all running processes on Solaris!

Common Signals:

Number Name Meaning
1 SIGHUP "Hang up", causes programs to quit or reload their configuration.
2 SIGINT "Interrupt", like control-c in Bash
4 SIGILL "Illegal instruction", meaning bad assembly code.
9 SIGKILL Cannot be caught and thus causes any process to terminate immediately.
11 SIGSEGV "Segmentation fault", a memory or pointer error.
15 SIGTERM Terminate the process, with whatever graceful shutdown it provides (the default).
13 SIGPIPE Pipe redirection failure.
(Varies) SIGSTOP Suspends the process, like control-z in Bash. (18 on Linux, 23 on Solaris)
(Varies) SIGCONT Continues a suspended process, like fg in Bash. (18 on Linux, 25 on Solaris)
< h3>Windows PsTools

In this Lab, the tools needed for the Linux part are readily present in a typical Linux distribution, but the tools needed for Windows (known as PsTools) need to be downloaded from http://technet.microsoft.com/en-us/sysinternals/ None of the tools requires any special installation. The tools included in the PsTools suite are:

PsExecexecute processes remotely
PsFileshows files opened remotely
PsGetSiddisplay the SID of a computer or a user
PsInfolist information about a system
PsKillkill processes by name or process ID
PsListlist detailed information about processes
PsLoggedOnsee who's logged on locally and via resource sharing
PsLogListdump event log records
PsPasswdchanges account passwords
PsServiceview and control services
PsShutdownshuts down and optionally reboots a computer
PsSuspendsuspends processes

(The author of the above alerts us that some anti-virus scanners may report that one or more of the tools are infected with a "remote admin" virus. None of the PsTools contain viruses, but they have been used by viruses, which is why they trigger virus notifications.) See the References below.

Lab Experiment

All work is expected to be carried out in the Operating Systems and Internet Security (OSIS) Lab, 429 Russ. But, you are welcome to work wherever. Note that use of both Linux and Windows and other software, that may not always be installed in other facilities, may be needed.

Explain/Discover/Record below means writing in your own words and/or copying appropriate lines into a plain text file named myLabJournal.txt.

In Windows

  1. Open the Task Manager (TM), and keep this window visible for the rest of this experiment. View the Processes tab. Make sure that PID, CPU usage, CPU Time, Page Faults, User Name, (Virtual) Memory columns are being displayed. If you wish, choose any other columns.
  2. Invoke a web browser, and surf to a site of your choice that has a lot of text.
  3. Invoke MS Word via the menu system. Discover (Hint: examine TM ) and record the exact program name and PID of Word and the above web browser.
  4. Record the data shown in TM for Word (i) before and (ii) after the following action: Copy and paste a paragraph (of about 10000 characters) from some web site into Word, and save it on the Desktop as a file named junk.doc. Explain the changes you see in the TM data for this process.
  5. Invoke powershell. In this window, invoke the pssuspend tool you downloaded to suspend the Word process. Record and explain if any changes have occurred in the TM with respect to Word.
  6. Copy and paste a paragraph (the same as above or any other) from some web site into Word. Record and explain what the response is from Word.
  7. Invoke the pssuspend tool to resume the Word process. Redo the above step. Record and explain if any changes have occurred in the TM with respect to Word.
  8. Observe the TM for about 60 seconds and answer the following: What are the names of the processes with the highest numbers in the columns chosen in Step 1?
  9. Invoke pslist with no arguments. Select and explain the details shown of any two processes.

In Linux

In this part of the experiment, the overall goal is to repeat the above in Linux.

  1. List what the Linux "equivalents" are for: TM, MS Word, powershell, pssuspend, pslist .
  2. Using these equivalents, perform the steps listed in the Windows part of this lab. Before performing each step describe the details as in the Windows section of the step that you are about to do.

Turnin

  1. Trim the myLabJournal.txt file into answers.txt keeping material that is directly relevant to the items above.
  2. Note the number <n> of this Lab from the course home page and use L<n> as the first argument to turnin . Turn in the files called answers.txt, myLabJournal.txt, and the usual ReadMe.txt as explained in Expectations.

Link to Grading Sheet

Acknowledgements

References

  1. Mark G. Sobell, "A Practical Guide to Linux(R) Commands, Editors, and Shell Programming", Prentice Hall, ISBN-10: 0131478230, ISBN-13: 978-0131478237, 2005. Chapters 5 and 8. In particular, bg, fg, kill, etc. Required Reading.
  2. DLLs, Processes, and Threads, http://msdn.microsoft.com/en-us/library/ms682584(v=VS.85).aspx Required Reading.
  3. Process states, http://en.wikipedia.org/wiki/Process_states Required Reading.
  4. Wes Miller, Introduction to the PsTools. http://www.microsoft.com/ technet/ technetmag/ issues/ 2007/03/ DesktopFiles/default.aspx "Miller gives a high-level overview of the Sysinternals PsTools in the March column of his TechNet Magazine column. " Required Reading.

Copyright © 2011 Prabhaker Mateti