WSU logo


College of Engineering & CS
Wright State University
Dayton, Ohio 45435-0001

CEG 333: Introduction to Unix

Prabhaker Mateti

Bash: Scripting Example


Version 1

The purpose of this example script is to scan a directory for .doc files, and convert them to plain text. In reality, the binary structure of a Word document is complex and non-linear, but for demonstration purposes it suffices to simply extract ASCII strings in order. Redundant conversion should not be done if the .doc file has not been changed since it was last converted.

The converted file should have the same filename as the original, but with the extension changed to ".txt" (i.e., text from "document.doc" goes in "document.txt").

How to generate this new filename? ${file} mans the value of the value named "file", ${file%doc} means that value with the string "doc" removed from the end, if it's there. So ${file%doc}txt removes the ".doc" extension and appends a new one, ".txt".

(See the notes on Bash variable manipulation.)

In English, the script should do the following:

      #!/bin/sh: Tell the kernel this is a shell script

      for each file matching the FNRE *.doc, set a variable to it's filename and do:
        txtfile=the filename with a .txt extension
        if the file is readable and is newer than $txtfile, then
          Put a header in txtfile saying where the text came from
	  Use strings(1) to dump the text into $txtfile
        fi
      done
    

And the script:

      #!/bin/bash                                    #1

      for file in *.doc; do                          #2
        txtfile=${file%doc}txt                       #3
        if [ -r $file -a $file -nt $txtfile ]; then  #4
          echo Text found in $file > $txtfile        #5
          strings $file >> $txtfile                  #6
        fi
      done
    

Notes about this code:

Version 2

Version 1 of this script accomplishes a lot in nine lines, but a few things could be improved:

The script:

      #!/bin/sh

      for file in `find . -name *.doc`; do                                  #1
        txtfile="${file%doc}txt"                                            #2
        txtfile=`echo "$txtfile" | sed 's/ \+/_/g'`                         #3
        if [ -r "$file" -a "$file" -nt $txtfile ]; then                     #4
          echo Text found in "$file" > $txtfile                             #5
          echo Last update: `ls -l "$file" | cut -d' ' -f 7-10` >> $txtfile #6
          strings "$file" >> $txtfile                                       #7
        fi
      done
    

Changed lines:

Version 3

Version 2 is more useful, but it would be even better if the user could specify an arbitrary directories and individual files. Version 3 accepts directory and file names as parameters.

Given parameter, if it is a directory, the script should search it as it previously did the current directory, converting each file. However, for files it should only invoke the conversion code inside the for loop. To prevent code duplication, the new script will use two procedures, convert_dir and convert_file.

      #!/bin/sh

      convert_dir() {                                                       #1
        for file in `find "$1" -name *.doc`; do                             #2
          convert_file "$file"                                              #3
        done                                                                #4
      }                                                                     #5

      convert_file() {                                                      #6
        file="$1"                                                           #7
        txtfile="${file%doc}txt"                                            #8
        txtfile=`echo "$txtfile" | sed 's/ \+/_/g'`                         #9

        if [ -r "$file" -a "$file" -nt $txtfile ]; then                     #10
          echo Text found in "$file" > $txtfile                             #11
          echo Last update: `ls -l "$file" | cut -d' ' -f 7-10` >> $txtfile #12
          strings "$file" >> $txtfile                                       #13
        fi                                                                  #14
      }                                                                     #15

      if [ $# -eq 0 ]; then                                                 #16
        convert_dir .                                                       #17
      else                                                                  #18
        for i in "$@"; do                                                   #19
          if [ -d "$i" ]; then                                              #20
            convert_dir "$i"                                                #21
	  elif [ -r "$i" ]; then                                            #22
            convert_file "$i"                                               #23
          else                                                              #24
	    echo "There was a problem reading $i"                           #25
          fi                                                                #26
        done                                                                #27
      fi                                                                    #28
    

Notes about this code: