WSU logo


College of Engineering & CS
Wright State University
Dayton, Ohio 45435-0001

CEG 333: Introduction to Unix

Prabhaker Mateti

Regular Expressions


A regular expression (also called a regex or regexp) is a pattern which describes the characteristics of a block of text. This is handy for tasks like searching and replacing or running commands on multiple files. There are several different kinds of regex as different syntaxes are used depending on the task.

FNRE (File Name Regular Expressions)

These regexps are expanded to a list of files. This format is very simple and has only two important characters:

Wildcard Meaning Example Matches
* Zero or more characters a*txt "atxt", "aa.txt", "abtxt", "aba.txt", etc
? Exactly one character ?.txt "a.txt", "b.txt", etc. NOT "aa.txt" or "ab.txt".

SMRE (String Matching Regular Expressions)

This regex flavor is used by grep, Emacs, sed, and many other utilities to search for text. It is more complicated than the shell syntax. See man "grep(1)", "regexp(n)", and "perlre(1)" for more detail.

Syntax Meaning Example Matches
Quantifiers
* Match 0 or more times .* Any number of repetitions of any character
+ Match 1 or more times [ab]+ Any string consisting of only a's and b's ("a", "b", "aab" "bbbab", and so forth). Note: must be prefixed with \ in basic regex mode.
? Match 1 or 0 times b? Either "b" or "bb" but not further repetitions. Note: must be prefixed with \ in basic regex mode.
| Match either of the expressions joined a|b Either "a" or "b".
Other
^ Beginning of line ^Unix occurrences of "Unix" just after newlines
$ End of line CEG$ occurrences of "CEG" just before newlines
[CHARACTERS...] Any one of the characters inside the square brackets [ \t] either a space or a tab
\ Quoting The quoted character without special syntactic meaning (except as above). \. a literal "." (as opposed to the quantifier)
\ Escape sequence Mostly like C/C++.
Warning: may not work in sed!
\n a newline

Note: different programs interpret regular expressions slightly differently. For example, grep and sed have two modes: basic and extended (enabled with grep -E). In basic mode, ?, +, and |, will be interpreted only when prefixed with a \ (the opposite of a backslash's normal meaning.).

Using regular expressions

* Shell wildcard meaning "zero or more characters" cat * Display the contents of all files in the current dir.
grep search for text (a regular expression) in files grep PATTERN [FILENAME...] Print all the lines matching PATTERN in the given files. If no files are given, search stdin.
grep -i PATTERN [FILENAME...] Make the search case-insensitive.
grep -r PATTERN PATHNAME... Make the search recurse through the given directories.
grep -v PATTERN [FILENAME...] Print all the lines not matching PATTERN in the given files.
sed edit streams of text sed "s/PATTERN/REPLACMENT/" [-i FILENAME...] Replace all occurances of the regexp PATTERN on stdin with the text REPLACMENT.
Note: With -i, operate on files instead of stdin.