|
CEG 333: Introduction to UnixPrabhaker MatetiRegular Expressions |
A regular expression (also called a regex or regexp) is a pattern which describes the characteristics of a block of text. This is handy for tasks like searching and replacing or running commands on multiple files. There are several different kinds of regex as different syntaxes are used depending on the task.
These regexps are expanded to a list of files. This format is very simple and has only two important characters:
| Wildcard | Meaning | Example | Matches |
* |
Zero or more characters | a*txt |
"atxt", "aa.txt", "abtxt", "aba.txt", etc |
? |
Exactly one character | ?.txt |
"a.txt", "b.txt", etc. NOT "aa.txt" or "ab.txt". |
This regex flavor is used by grep, Emacs, sed, and many other utilities to search for text. It is more complicated than the shell syntax. See man "grep(1)", "regexp(n)", and "perlre(1)" for more detail.
| Syntax | Meaning | Example | Matches |
| Quantifiers | |||
* |
Match 0 or more times | .* |
Any number of repetitions of any character |
+ |
Match 1 or more times | [ab]+ |
Any string consisting of only a's and b's ("a",
"b", "aab" "bbbab", and so
forth). Note: must be prefixed with \
in basic regex mode. |
? |
Match 1 or 0 times | b? |
Either "b" or "bb" but not further
repetitions. Note: must be prefixed with
\ in basic regex mode. |
| |
Match either of the expressions joined | a|b |
Either "a" or "b". |
| Other | |||
^ |
Beginning of line | ^Unix |
occurrences of "Unix" just after newlines |
$ |
End of line | CEG$ |
occurrences of "CEG" just before newlines |
[CHARACTERS...] |
Any one of the characters inside the square brackets | [ \t] |
either a space or a tab |
\ Quoting |
The quoted character without special syntactic meaning (except as above). | \. |
a literal "." (as opposed to the quantifier) |
\ Escape sequence |
Mostly like C/C++. Warning: may not work in sed! |
\n |
a newline |
Note: different programs interpret regular expressions
slightly differently. For example, grep and sed have two modes:
basic and extended (enabled with grep -E). In basic
mode, ?, +, and |, will be
interpreted only when prefixed with a \ (the opposite
of a backslash's normal meaning.).
* |
Shell wildcard meaning "zero or more characters" | cat * |
Display the contents of all files in the current dir. |
grep |
search for text (a regular expression) in files | grep PATTERN [FILENAME...] |
Print all the lines matching PATTERN in the given files. If no files are given, search stdin. |
grep -i PATTERN [FILENAME...] |
Make the search case-insensitive. | ||
grep -r PATTERN PATHNAME... |
Make the search recurse through the given directories. | ||
grep -v PATTERN [FILENAME...] |
Print all the lines not matching PATTERN in the given files. | ||
sed |
edit streams of text | sed "s/PATTERN/REPLACMENT/" [-i FILENAME...] |
Replace all occurances of the regexp PATTERN on stdin with the text REPLACMENT. |
Note: With -i, operate on files
instead of stdin. |
|||