CEG 233: Linux and Windows

Notes on Scripting with BASH

This article is a brief introduction to bash at a level suitable for our course CEG233. For more details, see the listed references. It assumes that you have a good understanding of control structures and methods, say in Java. It also expects you to read the bash chapter(s) given in the References. Bash is a sophisticated scripting language. Occasionally its lexical structure is archaic. So do not get discouraged if some issues are too hard to understand, and do not get disappointed because CEG233 omits many items. Bash is a "large" language. In CEG 233, we cover a small subset of the language. In particular, we do not dwell on its string processing features. Our coverage is implied by the sections this article has.

Table of Contents

  1. Educational Objectives
  2. Linux bash shell
  3. Interactive Conveniences
  4. Ten Essential Commands
  5. Lexical Structure
  6. IO Redirection
  7. Bash Built-ins
  8. Variables
  9. Bash: Control Structures
  10. Bash: Scripting Example
  11. Other Unix Shells
  12. Lab Experiment
  13. Acknowledgements
  14. References

Educational Objectives

The objectives of this article are to make you :

  1. Familiar with command line utilities
  2. Learn fundamentals of Linux bash scripting
  3. Ready for Windows CMD

Linux bash shell

bash is an interpreter for a language that has no explicit name; so, we often refer to it as the bash language. The main goal of any shell language is to facilitate the invocation of programs supplying them with needed arguments. It contains variables, assignment statements, if-statements, loops, functions, etc. making it a programming language, but at a higher level than say Java, and one that deals with filenames, files, their permissions etc.. Such higher level languages are called scripting languages. In Linux and Windows, it is typical to see not only bash but also Perl and Python used for scripting purposes.

This article is an introduction to bash at a level suitable for our course CEG233. For more details, refer to the man pages and see the listed references. To keep the article more focused and brief, we assume that if-statements, for-loops, and functions are not used while in in the interactive mode. These are described only in the context of batch files. We assume that file names do not contain spaces and other "strange" characters. We also skip shortcuts and alternate ways of doing things.

From within KDE or Gnome, bash can be invoked for interactive use through a terminal program such as konsole or gterm.

Interactive Conveniences

The following are easy to learn by actual practice, and are tedious and confusing when described.

Ten Essential Commands

Even though the following are not built into bash, it is worth learning them now, as we will use them in our examples below.

The Ten Most Essential Commands for Linux Users are shown below.

Command Examples
ls ls -l List files, showing all their information.
ls -a Show all files (even hidden ones).
cd cd myFiles Change to "myFiles" as the current working directory.
cd - Go back to the previous working directory.
mkdir mkdir XYZ Create a new dir named XYZ
cp cp file1 file2 Copy "file1" to "file2". WARNING: If a file is accidentally over-written, it's gone for good!
cp [filename...] dirname/ Copy the given filenames into the directory called "dirname".
cp -r dir1 dir2 Copy the directory named "dir1" and all its contents to "dir2".
mv mv file1 file2 Rename "file1" to "file2". WARNING: If a file is accidentally over-written, it's gone for good!
mv [FILE AND DIRNAME...] destdir/ Move the given files into "destdir".
rm rm [filename...] Delete the specified files. WARNING: Once deleted, a file CANNOT be recovered! DANGER: On most Unix systems, this command will act without asking for confirmation!
rm -r [FILE OR DIRNAME...] Remove the given directories and all their contents. WARNING: ALL FILES in these directories will be deleted!
rm * Delete EVERYTHING in the current directory. If used with "-r" this will delete recursively and thus remove directories and their contents too!
ssh ssh s001xyz@unixapps1 secure shell login to unixapps1 as user named s001xyz
sftp sftp s001xyz@unixapps1 secure FTP
konsole konsole terminal window (executes bash) in which you can try the examples
top top Continuoulsy update the list of processes

Notes

Lexical Structure

A sequence of characters ending with a carriage-return (the Enter key) that you type is divided into words. Let us call them w0, w1, ... It is a distraction at this point to describe the rules that govern the splitting of a line into words. Let us just assume that the words we have do not contain spaces. In Linux, file names and command names may contain both upper or lower case letters, but are not considered same. So, ls is different from LS.

IO Redirection

Command Summary Examples
> FILE Redirect stdout. c1 > FILE Overwrite the file with stdout of "c1".
>> FILE Redirect stdout. c1 >> FILE Append the stdout of "c1" to a file.
< FILE Redirect stdin. c1 < FILE Use the file as the stdin for "c1".
c1 | c2 c1's stdout piped into stdin of c2. ps aux | grep Emacs Shows PIDs of Emacs processes.

Bash Built-ins

Bash has a number of built-in commands; i.e., invoking these do not spawn new processes. Here is an incomplete list of the builtins.

alias Define/Display an alias.
bind Display/Set current Readline key and function bindings
builtin Run a shell builtin, passing it args, and return its exit status.
command Runs command with arguments ignoring any shell function named command.
declare Declare variables and give them attributes. If no names are given, then display the values of variables instead.
echo Output the args, separated by spaces, terminated with a newline. Ex: echo $PATH
help Display helpful information about builtin commands.
let The let builtin allows arithmetic to be performed on shell variables.
printf printf format [arguments]
read One line is read from the standard input
source source filename
ulimit Controls the resources available to processes started by the shell
unalias Removes an alias from the list of aliases.

Using Variables

Shell variables are declared and initialized with the set command, or simply by setting them equal to something. For example, test=1 creates a variable named "test" containing the value "1". Unlike variables in some other languages, a type is not necessary; the values are stored as strings.

Note: there must not be whitespace around the equals sign. test = 1 will not work!

Variables are referenced by their names, and must always be prefixed with a "$". (echo $test prints "1", but echo test just prints "test".) To avoid confusion with the $ prompt character, in these examples the prompt changes % by setting the prompt variable named PS1: PS1=%

An example:

 % test=txt
 % echo $test
 txt
 % test=file.$test
 % echo $test
 file.txt
    

Note: if a variable is set in a sub-shell, such as a script, it's value isn't changed in the invoking shell—the new value is lost as soon as the script finishes. Use export VARIABLE... to propagate the change up shell levels if desired.

Manipulating Variables

Complex operations may be performed on variables by surrounding them with curly braces ("{}"). The two most important of these are "#" and "%", which remove text from the beginning and end of a variable, respectively.

An example:

      $ test=txt
      $ echo ${test#t}
      xt
      $ echo ${test%t}
      tx
      $ test=file.lst
      $ echo ${test%lst}txt
      file.txt
      $ echo $test
      file.lst
    

Only the value substituted into the command changes. The string stored in the variable is not affected.

Environment

The environment is the set of string variables available to all processes. The env command displays the environment and the set command manipulates it.

Since all programs can access the environment string, it's frequently used as a way to supply options to commands without repeating them every time the command is invoked. (ls reads LS_OPTIONS, for example).

Other examples of values commonly stored in the environment are:

It is a Unix convention that all global environment variable names be upper-case.

In bash, environment variables may be manipulated just like any other shell variable, using $, =, and so forth. For example, PATH=$PATH:~/bin appends the user's own bin directory to the path.

Quoting

Quoting text causes Bash to interpret text differently. For example, a filename containing spaces must be enclosed in single or double quotes, or Bash will pass it as multiple separate command line tokens.

There are three different forms of quoting, each for a different purpose.

Single quotes (')
Every character inside these quotes is interpreted completely literally. No variable substitutions are performed.
Double quotes (")
Many special characters, such as spaces, are significant. However, some are interpreted. Notably, variable substitution with $ is allowed.
Backquotes (`)
Unlike the other two, this form does not prevent interpretation of the quoted tokens. Instead, text inside is interpreted as a command line, and its output is substituted into the original command line.

Arrays

Arrays in bash differ quite a bit from those in other programming languages. But first let us focus on the properties common with other languages.

Line[1]='I do not know which to prefer,'
Line[2]='The beauty of inflections'
Line[3]='Or the beauty of innuendoes,'
Line[4]='The blackbird whistling'
Line[5]='Or just after.'

Attribution[1]=' Wallace Stevens'
Attribution[2]='"Thirteen Ways of Looking at a Blackbird"'

Two arrays Line[] and Attrib[] are defined in the above. All of the array elements are initialized with strings. Note the use of single- and double-quotes. Note also that we did not use 0 as an index. Now, let us print the poem.

for index in 1 2 3 4 5
do
  echo ${Line[index]}
done

for index in 1 2
do
  echo '    ' ${Attribution[index]}
done

Note that we must use the braces to retrieve the value of an array item.

So far, the only peculiar thing was the syntax ${Attrib[index]} instead of just Attrib[index]. Try the following.

myArray=( zero one two three four five )
myArray[2]=not-two-any-more
myArray[9]=nine
echo without braces 0 $myArray[0] 2 $myArray[2] 9 $myArray[9]
echo with braces 0 ${myArray[0]} 2 ${myArray[2]} 9 ${myArray[9]}

In the lines above, we defined myArray[] in an aggregate form first with only six elements, modified the one with index 2, and introduced an element whose index is 9. As you can see, its starting index is 0. It does not "have" array elements at 6, 7 and 8. This does not make ${myArray[7]} illegal; instead, it yields the empty string.

There are many other peculiarities, but for this course, what we covered above is sufficient. Make sure you understand why the output of the above two echo lines is the way it is. Always check with an echo statement if your array is behaving the way you want.

Bash Scripts

The #!/bin/bash line is called a shebang. It's a special indicator that tells the operating system which shell should be used to interpret the script, in this case bash. The shebang is always the first line of a script and is composed of the characters #! followed by the full pathname of the interpreter's executable.

You should always test scripts before running them on important data. Two useful commands for this are echo and touch. Placing echo before commands displays what would have been run instead of actually doing things, which is helpful for checking things like filename modification. touch FILENAME... will create empty dummy files suitable for manipulation by scripts.

Control Structures

In addition to simple lists of commands, shell scripts can contain all the familiar control structures, such as conditionals and loops:

Name Syntax Example
if if CONDITION; then COMMANDS...; fi if [ $test ]; then echo "test is true"; fi
See man test for further help with conditionals.
for for VARIABLE in LIST...; do COMMANDS...; done for i in 1 2 3; do echo $i iteration ; done
condition [ EXPRESSION ] Check if EXPRESSION is true or false. See man test for a full explanation of valid expressions.
[ -e FILENAME ] True iff the give file exists
[ $hippos -eq 5 ] True iff the variable hippos contains the number 5
[ $hippos = "five" ] True iff the variable hippos contains the string "five"
[ ! -x $filename ] True iff the file with a filename given in the variable filename is not executable.
[ -e FILE1 -o -d FILE2 ] True when either FILE1 exists or FILE2 is a directory
[ -n $compiler -a $objfile -ot $cppfile ] True when $compiler contains a string and the filename whose name is given by $objfile is older than that given by $cppfile

As may be expected, the shell language contains many special features for working with filenames. It is thus very easy to create control structures based on the simple examples above that perform complex operations on files.

Exercise: Discover what the following does:

for nm in *
do
  echo nm $nm
  echo nmdottxt ${nm%.txt}
  echo nmdotstar ${nm%.*}
done

Exercise: Discover what the following does:

for ((i=23; i < 250; i+=12 ))
do
  echo i is $i
done

Exercise:Discover what the following does: (Look up what sed with -r flag does.)


for i in *; do mv $i `echo $i | sed -r 's/[0-9 ]+//'` ; done

Procedures

Bash procedures, or "shell functions", provide a way to group a set of commands for a certain task together so that they can be conveniently invoked by one command. They can be thought of like aliases that expand to a script instead of a single command.

Like aliases, procedures must be declared before use. The declaration is of the form:

      procedure_name () {
          BODY...
      }

Where BODY is the list of commands to be executed, usually one per line.

The "()" are used only to indicated that a procedure is being declared. They are not present when calling it. To execute a procedure, simply use procedure_name args....

NOTE: like aliases or variables, procedures are not available outside the scope in which they were declared. If a procedure is declared within a subshell (for example, a separate script beginning with a shebang), invoking it from the calling shell will give an error!

For example, given the script exampleProcedure.sh:

      #!/bin/bash

      exampleProcedure () {
          echo This is an exampleProcedure with no args
      }
    

The following test.sh uses exampleProcedure:

      #!/bin/bash

      #Get it defined in the current shell scope, using "."
      #or "source", NOT "./"
      . exampleProcedure.sh

      #Call it by name
      exampleProcedure
    

Scripting Example

Exercise Develop a bash procedure called cleanUp() that cleans up the contents of the current directory as described below.

  1. Move into BACKUPDIR files whose names are of the form f.o provided one of the corresponding f.cpp, f.c, f.C, f.h, or f.H files exists.
  2. Rename files whose names are of the form f.text.txt to f.txt.text
  3. Rename files whose names are of the form f.mpeg.mpg or f.mpg.mpeg to f-new.mpg.mpeg
Solution

We make procedures for each of the tasks above. These are invoked by cleanUp

#!/bin/bash

dotOhDelete()
{
    bnm=$(basename $1 .o)
    if ([ -f $bnm.c ] ||  [ -f $bnm.C ] || [ -f $bnm.cpp ])  then
        rm $bnm.o
    fi
}

fixExteExt()
{
    bnm=$(basename $1 .$2.$3)
    mv $1 $bnm.$3
}

cleanUp()
{
    pushd $1
    for f in *.o ; do
        dotOhDelete $f
    done
    for f in *.text.txt ; do
        fixExteExt $f text txt
    done
    for f in *.mpeg.mpg ; do
        fixExteExt $f mpeg mpg
    done
    for f in *.mpg.mpeg ; do
        fixExteExt $f mpg mpeg
    done
    popd
}

When we develop a fairly complicated script, it is essential that we try it out on sample inputs. In this case, we should create some directories and files so that cleanUp has some work todo. Such additional scripts are often called test harnesses.

#!/bin/bash
mkFiles()                      # create TEST dir and a few files in it
{
    mkdir -p  TEST
    pushd TEST
    touch -f f.o f.C
    touch -f blah.text.txt
    popd
}

runTest()
{
    source cleanUp.sh
    ls -l TEST
    cleanUp TEST
    ls -l TEST
}

mkFiles
runTest

Other Unix Shells

The Linux/Unix world has many shells, most of them predating bash and a few after bash. Here is an incomplete list of their names: csh, tcsh, ksh, ash, zsh.

Lab Experiment

See http://www.cs.wright.edu/~pmateti/Courses/233/Labs/Scripting/ScriptingLab1.htm and http://www.cs.wright.edu/ ~pmateti/ Courses/233/ Labs/ Scripting/ScriptingLab2.html

Acknowledgements

Ben Murray

References

  1. Mark G. Sobell, "A Practical Guide to Linux(R) Commands, Editors, and Shell Programming", Prentice Hall, ISBN-10: 0131478230, ISBN-13: 978-0131478237, 2005. Read Chapter 5. Lookup basename. Required reading.

  2. http://ftp.gnu.org/old-gnu/Manuals/bash-2.05a/ Bash Reference Manual. Reference.

Copyright © 2012 pmateti@wright.edu