Regular Expressions

From: The GNU Awk User's Guide

Node:Top, Next:Foreword, Previous:(dir), Up:(dir)

General Introduction

This file is part of the documentation of awk, a program that you can use to select particular records in a file and perform operations upon them. Copyright © 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc.

This is Edition 3 of GAWK: Effective AWK Programming: A User's Guide for GNU Awk, for the 3.1.1 (or later) version of the GNU implementation of AWK.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "GNU General Public License", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License".

"A GNU Manual"
"You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development."

General Introduction
Foreword
Regular Expressions
Glossary
GNU General Public License
GNU Free Documentation License
- ADDENDUM: How to use this License for your documents
Index

Node:Foreword, Next:Preface, Previous:Top, Up:Top

Foreword

Arnold Robbins and I are good friends. We were introduced 11 years ago by circumstances--and our favorite programming language, AWK. The circumstances started a couple of years earlier. I was working at a new job and noticed an unplugged Unix computer sitting in the corner. No one knew how to use it, and neither did I. However, a couple of days later it was running, and I was root and the one-and-only user. That day, I began the transition from statistician to Unix programmer.

On one of many trips to the library or bookstore in search of books on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and Weinberger, The AWK Programming Language, Addison-Wesley, 1988. AWK's simple programming paradigm--find a pattern in the input and then perform an action--often reduced complex or tedious data manipulations to few lines of code. I was excited to try my hand at programming in AWK.

Alas, the awk on my computer was a limited version of the language described in the AWK book. I discovered that my computer had "old awk" and the AWK book described "new awk." I learned that this was typical; the old version refused to step aside or relinquish its name. If a system had a new awk, it was invariably called nawk, and few systems had it. The best way to get a new awk was to ftp the source code for gawk from prep.ai.mit.edu. gawk was a version of new awk written by David Trueman and Arnold, and available under the GNU General Public License.

(Incidentally, it's no longer difficult to find a new awk. gawk ships with Linux, and you can download binaries or source code for almost any system; my wife uses gawk on her VMS box.)

My Unix system started out unplugged from the wall; it certainly was not plugged into a network. So, oblivious to the existence of gawk and the Unix community in general, and desiring a new awk, I wrote my own, called mawk. Before I was finished I knew about gawk, but it was too late to stop, so I eventually posted to a comp.sources newsgroup.

A few days after my posting, I got a friendly email from Arnold introducing himself. He suggested we share design and algorithms and attached a draft of the POSIX standard so that I could update mawk to support language extensions added after publication of the AWK book.

Frankly, if our roles had been reversed, I would not have been so open and we probably would have never met. I'm glad we did meet. He is an AWK expert's AWK expert and a genuinely nice person. Arnold contributes significant amounts of his expertise and time to the Free Software Foundation.

This book is the gawk reference manual, but at its core it is a book about AWK programming that will appeal to a wide audience. It is a definitive reference to the AWK language as defined by the 1987 Bell Labs release and codified in the 1992 POSIX Utilities standard.

On the other hand, the novice AWK programmer can study a wealth of practical programs that emphasize the power of AWK's basic idioms: data driven control-flow, pattern matching with regular expressions, and associative arrays. Those looking for something new can try out gawk's interface to network protocols via special /inet files.

The programs in this book make clear that an AWK program is typically much smaller and faster to develop than a counterpart written in C. Consequently, there is often a payoff to prototype an algorithm or design in AWK to get it running quickly and expose problems early. Often, the interpreted performance is adequate and the AWK prototype becomes the product.

The new pgawk (profiling gawk), produces program execution counts. I recently experimented with an algorithm that for n lines of input, exhibited ~ C n^2 performance, while theory predicted ~ C n log n behavior. A few minutes poring over the awkprof.out profile pinpointed the problem to a single line of code. pgawk is a welcome addition to my programmer's toolbox.

Arnold has distilled over a decade of experience writing and using AWK programs, and developing gawk, into this book. If you use AWK or want to learn how, then read this book.

Michael Brennan
Author of mawk

Node:Regexp, Next:Regexp Usage, Previous:Top, Up:Top

Regular Expressions

A regular expression, or regexp, is a way of describing a set of strings. Because regular expressions are such a fundamental part of awk programming, their format and use deserve a separate chapter.

A regular expression enclosed in slashes (/) is an awk pattern that matches every input record whose text belongs to that set. The simplest regular expression is a sequence of letters, numbers, or both. Such a regexp matches any string that contains that sequence. Thus, the regexp foo matches any string containing foo. Therefore, the pattern /foo/ matches any input record containing the three characters foo anywhere in the record. Other kinds of regexps let you specify more complicated classes of strings.

Initially, the examples in this chapter are simple. As we explain more about how regular expressions work, we will present more complicated instances.

Regexp Usage: How to Use Regular Expressions.
Escape Sequences: How to write nonprinting characters.
Regexp Operators: Regular Expression Operators.
Character Lists: What can go between [...].
GNU Regexp Operators: Operators specific to GNU software.
Case-sensitivity: How to do case-insensitive matching.
Leftmost Longest: How much text matches.
Computed Regexps: Using Dynamic Regexps.

Node:Regexp Usage, Next:Escape Sequences, Previous:Regexp, Up:Regexp

How to Use Regular Expressions

A regular expression can be used as a pattern by enclosing it in slashes. Then the regular expression is tested against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, the following prints the second field of each record that contains the string foo anywhere in it:

$ awk '/foo/ { print $2 }' BBS-list
-| 555-1234
-| 555-6699
-| 555-6480
-| 555-2127

~ (tilde), ~ operator Regular expressions can also be used in matching expressions. These expressions allow you to specify the string to match against; it need not be the entire current input record. The two operators ~ and !~ perform regular expression comparisons. Expressions using these operators can be used as patterns, or in if, while, for, and do statements. (See Control Statements in Actions.) For example:

exp ~ /regexp/

is true if the expression exp (taken as a string) matches regexp. The following example matches, or selects, all input records with the uppercase letter J somewhere in the first field:

$ awk '$1 ~ /J/' inventory-shipped
-| Jan  13  25  15 115
-| Jun  31  42  75 492
-| Jul  24  34  67 436
-| Jan  21  36  64 620

So does this:

awk '{ if ($1 ~ /J/) print }' inventory-shipped

This next example is true if the expression exp (taken as a character string) does not match regexp:

exp !~ /regexp/

The following example matches, or selects, all input records whose first field does not contain the uppercase letter J:

$ awk '$1 !~ /J/' inventory-shipped
-| Feb  15  32  24 226
-| Mar  15  24  34 228
-| Apr  31  52  63 420
-| May  16  34  29 208
...

When a regexp is enclosed in slashes, such as /foo/, we call it a regexp constant, much like 5.27 is a numeric constant and "foo" is a string constant.

Node:Escape Sequences, Next:Regexp Operators, Previous:Regexp Usage, Up:Regexp

Escape Sequences

Some characters cannot be included literally in string constants ("foo") or regexp constants (/foo/). Instead, they should be represented with escape sequences, which are character sequences beginning with a backslash (\). One use of an escape sequence is to include a double-quote character in a string constant. Because a plain double quote ends the string, you must use \" to represent an actual double-quote character as a part of the string. For example:

$ awk 'BEGIN { print "He said \"hi!\" to her." }'
-| He said "hi!" to her.

The backslash character itself is another character that cannot be included normally; you must write \\ to put one backslash in the string or regexp. Thus, the string whose contents are the two characters " and \ must be written "\"\\".

Backslash also represents unprintable characters such as TAB or newline. While there is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, they may look ugly.

The following table lists all the escape sequences used in awk and what they represent. Unless noted otherwise, all these escape sequences apply to both string constants and regexp constants:

\\: A literal backslash, \.
\a: The "alert" character, Ctrl-g, ASCII code 7 (BEL). (This usually makes some sort of audible noise.)
\b: Backspace, Ctrl-h, ASCII code 8 (BS).
\f: Formfeed, Ctrl-l, ASCII code 12 (FF).
\n: Newline, Ctrl-j, ASCII code 10 (LF).
\r: Carriage return, Ctrl-m, ASCII code 13 (CR).
\t: Horizontal TAB, Ctrl-i, ASCII code 9 (HT).
\v: Vertical tab, Ctrl-k, ASCII code 11 (VT).
\nnn: The octal value nnn, where nnn stands for 1 to 3 digits between 0 and 7. For example, the code for the ASCII ESC (escape) character is \033.
\xhh...: The hexadecimal value hh, where hh stands for a sequence of hexadecimal digits (0-9, and either A-F or a-f). Like the same construct in ISO C, the escape sequence continues until the first nonhexadecimal digit is seen. However, using more than two hexadecimal digits produces undefined results. (The \x escape sequence is not allowed in POSIX awk.)
\/: A literal slash (necessary for regexp constants only). This expression is used when you want to write a regexp constant that contains a slash. Because the regexp is delimited by slashes, you need to escape the slash that is part of the pattern, in order to tell awk to keep processing the rest of the regexp.
\": A literal double quote (necessary for string constants only). This expression is used when you want to write a string constant that contains a double quote. Because the string is delimited by double quotes, you need to escape the quote that is part of the string, in order to tell awk to keep processing the rest of the string.

In gawk, a number of additional two-character sequences that begin with a backslash have special meaning in regexps. See gawk-Specific Regexp Operators.

In a regexp, a backslash before any character that is not in the previous list and not listed in gawk-Specific Regexp Operators, means that the next character should be taken literally, even if it would normally be a regexp operator. For example, /a\+b/ matches the three characters a+b.

For complete portability, do not use a backslash before any character not shown in the previous list.

To summarize:

The escape sequences in the table above are always processed first, for both string constants and regexp constants. This happens very early, as soon as awk reads your program.
gawk processes both regexp constants and dynamic regexps (see Using Dynamic Regexps), for the special operators listed in gawk-Specific Regexp Operators.
A backslash before any other character means to treat that character literally.

Advanced Notes: Backslash Before Regular Characters

If you place a backslash in a string constant before something that is not one of the characters previously listed, POSIX awk purposely leaves what happens as undefined. There are two choices:

Strip the backslash out: This is what Unix awk and gawk both do. For example, "a\qc" is the same as "aqc". (Because this is such an easy bug both to introduce and to miss, gawk warns you about it.) Consider FS = "[ \t]+\|[ \t]+" to use vertical bars surrounded by whitespace as the field separator. There should be two backslashes in the string FS = "[ \t]+\\|[ \t]+".)
Leave the backslash alone: Some other awk implementations do this. In such implementations, typing "a\qc" is the same as typing "a\\qc".

Advanced Notes: Escape Sequences for Metacharacters

Suppose you use an octal or hexadecimal escape to represent a regexp metacharacter. (See Regular Expression Operators.) Does awk treat the character as a literal character or as a regexp operator?

Historically, such characters were taken literally. (d.c.) However, the POSIX standard indicates that they should be treated as real metacharacters, which is what gawk does. In compatibility mode (see Command-Line Options), gawk treats the characters represented by octal and hexadecimal escape sequences literally when used in regexp constants. Thus, /a\52b/ is equivalent to /a\*b/.

Node:Regexp Operators, Next:Character Lists, Previous:Escape Sequences, Up:Regexp

Regular Expression Operators

You can combine regular expressions with special characters, called regular expression operators or metacharacters, to increase the power and versatility of regular expressions.

The escape sequences described earlier in Escape Sequences, are valid inside a regexp. They are introduced by a \ and are recognized and converted into corresponding real characters as the very first step in processing regexps.

Here is a list of metacharacters. All characters that are not escape sequences and that are not listed in the table stand for themselves:

\

This is used to suppress the special meaning of a character when matching. For example, \$ matches the character $.

^

This matches the beginning of a string. For example, ^@chapter matches @chapter at the beginning of a string and can be used to identify chapter beginnings in Texinfo source files. The ^ is known as an anchor, because it anchors the pattern to match only at the beginning of the string.

It is important to realize that ^ does not match the beginning of a line embedded in a string. The condition is not true in the following example:

if ("line1\nLINE 2" ~ /^L/) ...

$

This is similar to ^, but it matches only at the end of a string. For example, p$ matches a record that ends with a p. The $ is an anchor and does not match the end of a line embedded in a string. The condition in the following example is not true:

if ("line1\nLINE 2" ~ /1$/) ...

.

This matches any single character, including the newline character. For example, .P matches any single character followed by a P in a string. Using concatenation, we can make a regular expression such as U.A, which matches any three-character sequence that begins with U and ends with A.

In strict POSIX mode (see Command-Line Options), . does not match the NUL character, which is a character with all bits equal to zero. Otherwise, NUL is just another character. Other versions of awk may not be able to match the NUL character.

[...]

This is called a character list.¹² It matches any one of the characters that are enclosed in the square brackets. For example, [MVX] matches any one of the characters M, V, or X in a string. A full discussion of what can be inside the square brackets of a character list is given in Using Character Lists.

[^ ...]

This is a complemented character list. The first character after the [ must be a ^. It matches any characters except those in the square brackets. For example, [^awk] matches any character that is not an a, w, or k.

|

This is the alternation operator and it is used to specify alternatives. The | has the lowest precedence of all the regular expression operators. For example, ^P|[[:digit:]] matches any string that matches either ^P or [[:digit:]]. This means it matches any string that starts with P or contains a digit.

The alternation applies to the largest possible regexps on either side.

(...)

Parentheses are used for grouping in regular expressions, as in arithmetic. They can be used to concatenate regular expressions containing the alternation operator, |. For example, @(samp|code)\{[^}]+\} matches both @code{foo} and @samp{bar}. (These are Texinfo formatting control sequences.)

*

This symbol means that the preceding regular expression should be repeated as many times as necessary to find a match. For example, ph* applies the * symbol to the preceding h and looks for matches of one p followed by any number of hs. This also matches just p if no hs are present.

The * repeats the smallest possible preceding expression. (Use parentheses if you want to repeat a larger expression.) It finds as many repetitions as possible. For example, awk '/$c[ad][ad]*r x$/ { print }' sample prints every record in sample containing a string of the form (car x), (cdr x), (cadr x), and so on. Notice the escaping of the parentheses by preceding them with backslashes.

+

This symbol is similar to *, except that the preceding expression must be matched at least once. This means that wh+y would match why and whhy, but not wy, whereas wh*y would match all three of these strings. The following is a simpler way of writing the last * example:

awk '/\(c[ad]+r x\)/ { print }' sample

?

This symbol is similar to *, except that the preceding expression can be matched either once or not at all. For example, fe?d matches fed and fd, but nothing else.

{n}

{n,}

{n,m}

One or two numbers inside braces denote an interval expression. If there is one number in the braces, the preceding regexp is repeated n times. If there are two numbers separated by a comma, the preceding regexp is repeated n to m times. If there is one number followed by a comma, then the preceding regexp is repeated at least n times:

wh{3}y: Matches whhhy, but not why or whhhhy.
wh{3,5}y: Matches whhhy, whhhhy, or whhhhhy, only.
wh{2,}y: Matches whhy or whhhy, and so on.

Interval expressions were not traditionally available in awk. They were added as part of the POSIX standard to make awk and egrep consistent with each other.

However, because old programs may use { and } in regexp constants, by default gawk does not match interval expressions in regexps. If either --posix or --re-interval are specified (see Command-Line Options), then interval expressions are allowed in regexps.

For new programs that use { and } in regexp constants, it is good practice to always escape them with a backslash. Then the regexp constants are valid and work the way you want them to, using any version of awk.¹³

In regular expressions, the *, +, and ? operators, as well as the braces { and }, have the highest precedence, followed by concatenation, and finally by |. As in arithmetic, parentheses can change how operators are grouped.

In POSIX awk and gawk, the *, +, and ? operators stand for themselves when there is nothing in the regexp that precedes them. For example, /+/ matches a literal plus sign. However, many other versions of awk treat such a usage as a syntax error.

If gawk is in compatibility mode (see Command-Line Options), POSIX character classes and interval expressions are not available in regular expressions.

Node:Character Lists, Next:GNU Regexp Operators, Previous:Regexp Operators, Up:Regexp

Using Character Lists

Within a character list, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, using the locale's collating sequence and character set. For example, in the default C locale, [a-dx-z] is equivalent to [abcdxyz]. Many locales sort characters in dictionary order, and in these locales, [a-dx-z] is typically not equivalent to [abcdxyz]; instead it might be equivalent to [aBbCcDdxXyYz], for example. To obtain the traditional interpretation of bracket expressions, you can use the C locale by setting the LC_ALL environment variable to the value C.

To include one of the characters \, ], -, or ^ in a character list, put a \ in front of it. For example:

[d\]]

matches either d or ].

This treatment of \ in character lists is compatible with other awk implementations and is also mandated by POSIX. The regular expressions in awk are a superset of the POSIX specification for Extended Regular Expressions (EREs). POSIX EREs are based on the regular expressions accepted by the traditional egrep utility.

Character classes are a new feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but the actual characters can vary from country to country and/or from character set to character set. For example, the notion of what is an alphabetic character differs between the United States and France.

A character class is only valid in a regexp inside the brackets of a character list. Character classes consist of [:, a keyword denoting the class, and :]. Here are the character classes defined by the POSIX standard.

[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space and TAB characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable but not visible, whereas an a is both.)
[:lower:] Lowercase alphabetic characters.
[:print:] Printable characters (characters that are not control characters).
[:punct:] Punctuation characters (characters that are not letters, digits, control characters, or space characters).
[:space:] Space characters (such as space, TAB, and formfeed, to name a few).
[:upper:] Uppercase alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

For example, before the POSIX standard, you had to write /[A-Za-z0-9]/ to match alphanumeric characters. If your character set had other alphabetic characters in it, this would not match them, and if your character set collated differently from ASCII, this might not even match the ASCII alphanumeric characters. With the POSIX character classes, you can write /[[:alnum:]]/ to match the alphabetic and numeric characters in your character set.

Two additional special sequences can appear in character lists. These apply to non-ASCII character sets, which can have single symbols (called collating elements) that are represented with more than one character. They can also have several characters that are equivalent for collating, or sorting, purposes. (For example, in French, a plain "e" and a grave-accented "è" are equivalent.) These sequences are:

Collating symbols: Multicharacter collating elements enclosed between [. and .]. For example, if ch is a collating element, then [[.ch.]] is a regexp that matches this collating element, whereas [ch] is a regexp that matches either c or h.
Equivalence classes: Locale-specific names for a list of characters that are equal. The name is enclosed between [= and =]. For example, the name e might be used to represent all of "e," "è," and "é." In this case, [[=e=]] is a regexp that matches any of e, é, or è.

These features are very valuable in non-English-speaking locales.

Caution: The library functions that gawk uses for regular expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes.

Node:GNU Regexp Operators, Next:Case-sensitivity, Previous:Character Lists, Up:Regexp

`gawk`-Specific Regexp Operators

GNU software that deals with regular expressions provides a number of additional regexp operators. These operators are described in this section and are specific to gawk; they are not available in other awk implementations. Most of the additional operators deal with word matching. For our purposes, a word is a sequence of one or more letters, digits, or underscores (_):

\w: Matches any word-constituent character--that is, it matches any letter, digit, or underscore. Think of it as shorthand for [[:alnum:]_].
\W: Matches any character that is not word-constituent. Think of it as shorthand for [^[:alnum:]_].
\<: Matches the empty string at the beginning of a word. For example, /\<away/ matches away but not stowaway.
\>: Matches the empty string at the end of a word. For example, /stow\>/ matches stow but not stowaway.
\y: Matches the empty string at either the beginning or the end of a word (i.e., the word boundary). For example, \yballs?\y matches either ball or balls, as a separate word.
\B: Matches the empty string that occurs between two word-constituent characters. For example, /\Brat\B/ matches crate but it does not match dirty rat. \B is essentially the opposite of \y.

There are two other operators that work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, gawk's regexp library routines consider the entire string to match as the buffer. The operators are:

\`: Matches the empty string at the beginning of a buffer (string).
\': Matches the empty string at the end of a buffer (string).

Because ^ and $ always work in terms of the beginning and end of strings, these operators don't add any new capabilities for awk. They are provided for compatibility with other GNU software.

In other GNU software, the word-boundary operator is \b. However, that conflicts with the awk language's definition of \b as backspace, so gawk uses a different letter. An alternative method would have been to require two backslashes in the GNU operators, but this was deemed too confusing. The current method of using \y for the GNU \b appears to be the lesser of two evils.

The various command-line options (see Command-Line Options) control how gawk interprets characters in regexps:

No options: In the default case, gawk provides all the facilities of POSIX regexps and the previously described GNU regexp operators. GNU regexp operators described in Regular Expression Operators. However, interval expressions are not supported.
--posix: Only POSIX regexps are supported; the GNU operators are not special (e.g., \w matches a literal w). Interval expressions are allowed.
--traditional: Traditional Unix awk regexps are matched. The GNU operators are not special, interval expressions are not available, nor are the POSIX character classes ([[:alnum:]], etc.). Characters described by octal and hexadecimal escape sequences are treated literally, even if they represent regexp metacharacters.
--re-interval: Allow interval expressions in regexps, even if --traditional has been provided.

Node:Case-sensitivity, Next:Leftmost Longest, Previous:GNU Regexp Operators, Up:Regexp

Case Sensitivity in Matching

Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters) and inside character sets. Thus, a w in a regular expression matches only a lowercase w and not an uppercase W.

The simplest way to do a case-independent match is to use a character list--for example, [Ww]. However, this can be cumbersome if you need to use it often, and it can make the regular expressions harder to read. There are two alternatives that you might prefer.

One way to perform a case-insensitive match at a particular point in the program is to convert the data to a single case, using the tolower or toupper built-in string functions (which we haven't discussed yet; see String Manipulation Functions). For example:

tolower($1) ~ /foo/  { ... }

converts the first field to lowercase before matching against it. This works in any POSIX-compliant awk.

Another method, specific to gawk, is to set the variable IGNORECASE to a nonzero value (see Built-in Variables). When IGNORECASE is not zero, all regexp and string operations ignore case. Changing the value of IGNORECASE dynamically controls the case-sensitivity of the program as it runs. Case is significant by default because IGNORECASE (like most variables) is initialized to zero:

x = "aB"
if (x ~ /ab/) ...   # this test will fail

IGNORECASE = 1
if (x ~ /ab/) ...   # now it will succeed

In general, you cannot use IGNORECASE to make certain rules case-insensitive and other rules case-sensitive, because there is no straightforward way to set IGNORECASE just for the pattern of a particular rule.¹⁴ To do this, use either character lists or tolower. However, one thing you can do with IGNORECASE only is dynamically turn case-sensitivity on or off for all the rules at once.

IGNORECASE can be set on the command line or in a BEGIN rule (see Other Command-Line Arguments; also see Startup and Cleanup Actions). Setting IGNORECASE from the command line is a way to make a program case-insensitive without having to edit it.

Prior to gawk 3.0, the value of IGNORECASE affected regexp operations only. It did not affect string comparison with ==, !=, and so on. Beginning with version 3.0, both regexp and string comparison operations are also affected by IGNORECASE.

Beginning with gawk 3.0, the equivalences between upper- and lowercase characters are based on the ISO-8859-1 (ISO Latin-1) character set. This character set is a superset of the traditional 128 ASCII characters, which also provides a number of characters suitable for use with European languages.

The value of IGNORECASE has no effect if gawk is in compatibility mode (see Command-Line Options). Case is always significant in compatibility mode.

Node:Leftmost Longest, Next:Computed Regexps, Previous:Case-sensitivity, Up:Regexp

How Much Text Matches?

Consider the following:

echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'

This example uses the sub function (which we haven't discussed yet; see String Manipulation Functions) to make a change to the input record. Here, the regexp /a+/ indicates "one or more a characters," and the replacement text is <A>.

The input contains four a characters. awk (and POSIX) regular expressions always match the leftmost, longest sequence of input characters that can match. Thus, all four a characters are replaced with <A> in this example:

$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
-| <A>bcd

For simple match/no-match tests, this is not so important. But when doing text matching and substitutions with the match, sub, gsub, and gensub functions, it is very important. Understanding this principle is also important for regexp-based record and field splitting (see How Input Is Split into Records, and also see Specifying How Fields Are Separated).

Node:Computed Regexps, Previous:Leftmost Longest, Up:Regexp

Using Dynamic Regexps

The righthand side of a ~ or !~ operator need not be a regexp constant (i.e., a string of characters between slashes). It may be any expression. The expression is evaluated and converted to a string if necessary; the contents of the string are used as the regexp. A regexp that is computed in this way is called a dynamic regexp:

BEGIN { digits_regexp = "[[:digit:]]+" }
$0 ~ digits_regexp    { print }

This sets digits_regexp to a regexp that describes one or more digits, and tests whether the input record matches this regexp.

When using the ~ and !~ Caution: When using the ~ and !~ operators, there is a difference between a regexp constant enclosed in slashes and a string constant enclosed in double quotes. If you are going to use a string constant, you have to understand that the string is, in essence, scanned twice: the first time when awk reads your program, and the second time when it goes to match the string on the lefthand side of the operator with the pattern on the right. This is true of any string-valued expression (such as digits_regexp, shown previously), not just string constants.

What difference does it make if the string is scanned twice? The answer has to do with escape sequences, and particularly with backslashes. To get a backslash into a regular expression inside a string, you have to type two backslashes.

For example, /\*/ is a regexp constant for a literal *. Only one backslash is needed. To do the same thing with a string, you have to type "\\*". The first backslash escapes the second one so that the string actually contains the two characters \ and *.

Given that you can use both regexp and string constants to describe regular expressions, which should you use? The answer is "regexp constants," for several reasons:

String constants are more complicated to write and more difficult to read. Using regexp constants makes your programs less error-prone. Not understanding the difference between the two kinds of constants is a common source of errors.
It is more efficient to use regexp constants. awk can note that you have supplied a regexp and store it internally in a form that makes pattern matching more efficient. When using a string constant, awk must first convert the string into this internal form and then perform the pattern matching.
Using regexp constants is better form; it shows clearly that you intend a regexp match.

Advanced Notes: Using `\n` in Character Lists of Dynamic Regexps

Some commercial versions of awk do not allow the newline character to be used inside a character list for a dynamic regexp:

$ awk '$0 ~ "[ \t\n]"'
error--> awk: newline in character class [
error--> ]...
error-->  source line number 1
error-->  context is
error-->          >>>  <<<

But a newline in a regexp constant works with no problem:

$ awk '$0 ~ /[ \t\n]/'
here is a sample line
-| here is a sample line
Ctrl-d

gawk does not have this problem, and it isn't likely to occur often in practice, but it's worth noting for future reference.

Node:Glossary, Next:Copying, Previous:Regexp, Up:Top

Glossary

Action: A series of awk statements attached to a rule. If the rule's pattern matches an input record, awk executes the rule's action. Actions are always enclosed in curly braces. (See Actions.)
Amazing awk Assembler: Henry Spencer at the University of Toronto wrote a retargetable assembler completely as sed and awk scripts. It is thousands of lines long, including machine descriptions for several eight-bit microcomputers. It is a good example of a program that would have been better written in another language. You can get it from ftp://ftp.freefriends.org/arnold/Awkstuff/aaa.tgz.
Amazingly Workable Formatter (awf): Henry Spencer at the University of Toronto wrote a formatter that accepts a large subset of the nroff -ms and nroff -man formatting commands, using awk and sh. It is available over the Internet from ftp://ftp.freefriends.org/arnold/Awkstuff/awf.tgz.
Anchor: The regexp metacharacters ^ and $, which force the match to the beginning or end of the string, respectively.
ANSI: The American National Standards Institute. This organization produces many standards, among them the standards for the C and C++ programming languages. These standards often become international standards as well. See also "ISO."
Array: A grouping of multiple values under the same name. Most languages just provide sequential arrays. awk provides associative arrays.
Assertion: A statement in a program that a condition is true at this point in the program. Useful for reasoning about how a program is supposed to behave.
Assignment: An awk expression that changes the value of some awk variable or data object. An object that you can assign to is called an lvalue. The assigned values are called rvalues. See Assignment Expressions.
Associative Array: Arrays in which the indices may be numbers or strings, not just sequential integers in a fixed range.
awk Language: The language in which awk programs are written.
awk Program: An awk program consists of a series of patterns and actions, collectively known as rules. For each input record given to the program, the program's rules are all processed in turn. awk programs may also contain function definitions.
awk Script: Another name for an awk program.
Bash: The GNU version of the standard shell (the Bourne-Again SHell). See also "Bourne Shell."
BBS: See "Bulletin Board System."
Bit: Short for "Binary Digit." All values in computer memory ultimately reduce to binary digits: values that are either zero or one. Groups of bits may be interpreted differently--as integers, floating-point numbers, character data, addresses of other memory objects, or other data. awk lets you work with floating-point numbers and strings. gawk lets you manipulate bit values with the built-in functions described in Using gawk's Bit Manipulation Functions.
Computers are often defined by how many bits they use to represent integer values. Typical systems are 32-bit systems, but 64-bit systems are becoming increasingly popular, and 16-bit systems are waning in popularity.
Boolean Expression: Named after the English mathematician Boole. See also "Logical Expression."
Bourne Shell: The standard shell (/bin/sh) on Unix and Unix-like systems, originally written by Steven R. Bourne. Many shells (bash, ksh, pdksh, zsh) are generally upwardly compatible with the Bourne shell.
Built-in Function: The awk language provides built-in functions that perform various numerical, I/O-related, and string computations. Examples are sqrt (for the square root of a number) and substr (for a substring of a string). gawk provides functions for timestamp management, bit manipulation, and runtime string translation. (See Built-in Functions.)
Built-in Variable: ARGC, ARGV, CONVFMT, ENVIRON, FILENAME, FNR, FS, NF, NR, OFMT, OFS, ORS, RLENGTH, RSTART, RS, and SUBSEP are the variables that have special meaning to awk. In addition, ARGIND, BINMODE, ERRNO, FIELDWIDTHS, IGNORECASE, LINT, PROCINFO, RT, and TEXTDOMAIN are the variables that have special meaning to gawk. Changing some of them affects awk's running environment. (See Built-in Variables.)
Braces: See "Curly Braces."
Bulletin Board System: A computer system allowing users to log in and read and/or leave messages for other users of the system, much like leaving paper notes on a bulletin board.
C: The system programming language that most GNU software is written in. The awk programming language has C-like syntax, and this Web page points out similarities between awk and C when appropriate.
In general, gawk attempts to be as similar to the 1990 version of ISO C as makes sense. Future versions of gawk may adopt features from the newer 1999 standard, as appropriate.
C++: A popular object-oriented programming language derived from C.
Character Set: The set of numeric codes used by a computer system to represent the characters (letters, numbers, punctuation, etc.) of a particular country or place. The most common character set in use today is ASCII (American Standard Code for Information Interchange). Many European countries use an extension of ASCII known as ISO-8859-1 (ISO Latin-1).
CHEM: A preprocessor for pic that reads descriptions of molecules and produces pic input for drawing them. It was written in awk by Brian Kernighan and Jon Bentley, and is available from http://cm.bell-labs.com/netlib/typesetting/chem.gz.
Coprocess: A subordinate program with which two-way communications is possible.
Compiler: A program that translates human-readable source code into machine-executable object code. The object code is then executed directly by the computer. See also "Interpreter."
Compound Statement: A series of awk statements, enclosed in curly braces. Compound statements may be nested. (See Control Statements in Actions.)
Concatenation: Concatenating two strings means sticking them together, one after another, producing a new string. For example, the string foo concatenated with the string bar gives the string foobar. (See String Concatenation.)
Conditional Expression: An expression using the ?: ternary operator, such as expr1 ? expr2 : expr3. The expression expr1 is evaluated; if the result is true, the value of the whole expression is the value of expr2; otherwise the value is expr3. In either case, only one of expr2 and expr3 is evaluated. (See Conditional Expressions.)
Comparison Expression: A relation that is either true or false, such as (a < b). Comparison expressions are used in if, while, do, and for statements, and in patterns to select which input records to process. (See Variable Typing and Comparison Expressions.)
Curly Braces: The characters { and }. Curly braces are used in awk for delimiting actions, compound statements, and function bodies.
Dark Corner: An area in the language where specifications often were (or still are) not clear, leading to unexpected or undesirable behavior. Such areas are marked in this Web page with "(d.c.)" in the text and are indexed under the heading "dark corner."
Data Driven: A description of awk programs, where you specify the data you are interested in processing, and what to do when that data is seen.
Data Objects: These are numbers and strings of characters. Numbers are converted into strings and vice versa, as needed. (See Conversion of Strings and Numbers.)
Deadlock: The situation in which two communicating processes are each waiting for the other to perform an action.
Double-Precision: An internal representation of numbers that can have fractional parts. Double-precision numbers keep track of more digits than do single-precision numbers, but operations on them are sometimes more expensive. This is the way awk stores numeric values. It is the C type double.
Dynamic Regular Expression: A dynamic regular expression is a regular expression written as an ordinary expression. It could be a string constant, such as "foo", but it may also be an expression whose value can vary. (See Using Dynamic Regexps.)
Environment: A collection of strings, of the form name=val, that each program has available to it. Users generally place values into the environment in order to provide information to various programs. Typical examples are the environment variables HOME and PATH.
Empty String: See "Null String."
Epoch: The date used as the "beginning of time" for timestamps. Time values in Unix systems are represented as seconds since the epoch, with library functions available for converting these values into standard date and time formats.
The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC. See also "GMT" and "UTC."
Escape Sequences: A special sequence of characters used for describing nonprinting characters, such as \n for newline or \033 for the ASCII ESC (Escape) character. (See Escape Sequences.)
FDL: See "Free Documentation License."
Field: When awk reads an input record, it splits the record into pieces separated by whitespace (or by a separator regexp that you can change by setting the built-in variable FS). Such pieces are called fields. If the pieces are of fixed length, you can use the built-in variable FIELDWIDTHS to describe their lengths. (See Specifying How Fields Are Separated, and Reading Fixed-Width Data.)
Flag: A variable whose truth value indicates the existence or nonexistence of some condition.
Floating-Point Number: Often referred to in mathematical terms as a "rational" or real number, this is just a number that can have a fractional part. See also "Double-Precision" and "Single-Precision."
Format: Format strings are used to control the appearance of output in the strftime and sprintf functions, and are used in the printf statement as well. Also, data conversions from numbers to strings are controlled by the format string contained in the built-in variable CONVFMT. (See Format-Control Letters.)
Free Documentation License: This document describes the terms under which this Web page is published and may be copied. (See GNU Free Documentation License.)
Function: A specialized group of statements used to encapsulate general or program-specific tasks. awk has a number of built-in functions, and also allows you to define your own. (See Functions.)
FSF: See "Free Software Foundation."
Free Software Foundation: A nonprofit organization dedicated to the production and distribution of freely distributable software. It was founded by Richard M. Stallman, the author of the original Emacs editor. GNU Emacs is the most widely used version of Emacs today.
gawk: The GNU implementation of awk.
General Public License: This document describes the terms under which gawk and its source code may be distributed. (See GNU General Public License.)
GMT: "Greenwich Mean Time." This is the old term for UTC. It is the time of day used as the epoch for Unix and POSIX systems. See also "Epoch" and "UTC."
GNU: "GNU's not Unix". An on-going project of the Free Software Foundation to create a complete, freely distributable, POSIX-compliant computing environment.
GNU/Linux: A variant of the GNU system using the Linux kernel, instead of the Free Software Foundation's Hurd kernel. Linux is a stable, efficient, full-featured clone of Unix that has been ported to a variety of architectures. It is most popular on PC-class systems, but runs well on a variety of other systems too. The Linux kernel source code is available under the terms of the GNU General Public License, which is perhaps its most important aspect.
GPL: See "General Public License."
Hexadecimal: Base 16 notation, where the digits are 0-9 and A-F, with A representing 10, B representing 11, and so on, up to F for 15. Hexadecimal numbers are written in C using a leading 0x, to indicate their base. Thus, 0x12 is 18 (1 times 16 plus 2).
I/O: Abbreviation for "Input/Output," the act of moving data into and/or out of a running program.
Input Record: A single chunk of data that is read in by awk. Usually, an awk input record consists of one line of text. (See How Input Is Split into Records.)
Integer: A whole number, i.e., a number that does not have a fractional part.
Internationalization: The process of writing or modifying a program so that it can use multiple languages without requiring further source code changes.
Interpreter: A program that reads human-readable source code directly, and uses the instructions in it to process data and produce results. awk is typically (but not always) implemented as an interpreter. See also "Compiler."
Interval Expression: A component of a regular expression that lets you specify repeated matches of some part of the regexp. Interval expressions were not traditionally available in awk programs.
ISO: The International Standards Organization. This organization produces international standards for many things, including programming languages, such as C and C++. In the computer arena, important standards like those for C, C++, and POSIX become both American national and ISO international standards simultaneously. This Web page refers to Standard C as "ISO C" throughout.
Keyword: In the awk language, a keyword is a word that has special meaning. Keywords are reserved and may not be used as variable names.
gawk's keywords are: BEGIN, END, if, else, while, do...while, for, for...in, break, continue, delete, next, nextfile, function, func, and exit.
Lesser General Public License: This document describes the terms under which binary library archives or shared objects, and their source code may be distributed.
Linux: See "GNU/Linux."
LGPL: See "Lesser General Public License."
Localization: The process of providing the data necessary for an internationalized program to work in a particular language.
Logical Expression: An expression using the operators for logic, AND, OR, and NOT, written &&, ||, and ! in awk. Often called Boolean expressions, after the mathematician who pioneered this kind of mathematical logic.
Lvalue: An expression that can appear on the left side of an assignment operator. In most languages, lvalues can be variables or array elements. In awk, a field designator can also be used as an lvalue.
Matching: The act of testing a string against a regular expression. If the regexp describes the contents of the string, it is said to match it.
Metacharacters: Characters used within a regexp that do not stand for themselves. Instead, they denote regular expression operations, such as repetition, grouping, or alternation.
Null String: A string with no characters in it. It is represented explicitly in awk programs by placing two double quote characters next to each other (""). It can appear in input data by having two successive occurrences of the field separator appear next to each other.
Number: A numeric-valued data object. Modern awk implementations use double-precision floating-point to represent numbers. Very old awk implementations use single-precision floating-point.
Octal: Base-eight notation, where the digits are 0-7. Octal numbers are written in C using a leading 0, to indicate their base. Thus, 013 is 11 (one times 8 plus 3).
P1003.2: See "POSIX."
Pattern: Patterns tell awk which input records are interesting to which rules.
A pattern is an arbitrary conditional expression against which input is tested. If the condition is satisfied, the pattern is said to match the input record. A typical pattern might compare the input record against a regular expression. (See Pattern Elements.)
POSIX: The name for a series of standards that specify a Portable Operating System interface. The "IX" denotes the Unix heritage of these standards. The main standard of interest for awk users is IEEE Standard for Information Technology, Standard 1003.2-1992, Portable Operating System Interface (POSIX) Part 2: Shell and Utilities. Informally, this standard is often referred to as simply "P1003.2."
Precedence: The order in which operations are performed when operators are used without explicit parentheses.
Private: Variables and/or functions that are meant for use exclusively by library functions and not for the main awk program. Special care must be taken when naming such variables and functions. (See Naming Library Function Global Variables.)
Range (of input lines): A sequence of consecutive lines from the input file(s). A pattern can specify ranges of input lines for awk to process or it can specify single lines. (See Pattern Elements.)
Recursion: When a function calls itself, either directly or indirectly. If this isn't clear, refer to the entry for "recursion."
Redirection: Redirection means performing input from something other than the standard input stream, or performing output to something other than the standard output stream.
You can redirect the output of the print and printf statements to a file or a system command, using the >, >>, |, and |& operators. You can redirect input to the getline statement using the <, |, and |& operators. (See Redirecting Output of print and printf, and Explicit Input with getline.)
Regexp: Short for regular expression. A regexp is a pattern that denotes a set of strings, possibly an infinite set. For example, the regexp R.*xp matches any string starting with the letter R and ending with the letters xp. In awk, regexps are used in patterns and in conditional expressions. Regexps may contain escape sequences. (See Regular Expressions.)
Regular Expression: See "regexp."
Regular Expression Constant: A regular expression constant is a regular expression written within slashes, such as /foo/. This regular expression is chosen when you write the awk program and cannot be changed during its execution. (See How to Use Regular Expressions.)
Rule: A segment of an awk program that specifies how to process single input records. A rule consists of a pattern and an action. awk reads an input record; then, for each rule, if the input record satisfies the rule's pattern, awk executes the rule's action. Otherwise, the rule does nothing for that input record.
Rvalue: A value that can appear on the right side of an assignment operator. In awk, essentially every expression has a value. These values are rvalues.
Scalar: A single value, be it a number or a string. Regular variables are scalars; arrays and functions are not.
Search Path: In gawk, a list of directories to search for awk program source files. In the shell, a list of directories to search for executable programs.
Seed: The initial value, or starting point, for a sequence of random numbers.
sed: See "Stream Editor."
Shell: The command interpreter for Unix and POSIX-compliant systems. The shell works both interactively, and as a programming language for batch files, or shell scripts.
Short-Circuit: The nature of the awk logical operators && and ||. If the value of the entire expression is determinable from evaluating just the lefthand side of these operators, the righthand side is not evaluated. (See Boolean Expressions.)
Side Effect: A side effect occurs when an expression has an effect aside from merely producing a value. Assignment expressions, increment and decrement expressions, and function calls have side effects. (See Assignment Expressions.)
Single-Precision: An internal representation of numbers that can have fractional parts. Single-precision numbers keep track of fewer digits than do double-precision numbers, but operations on them are sometimes less expensive in terms of CPU time. This is the type used by some very old versions of awk to store numeric values. It is the C type float.
Space: The character generated by hitting the space bar on the keyboard.
Special File: A file name interpreted internally by gawk, instead of being handed directly to the underlying operating system--for example, /dev/stderr. (See Special File Names in gawk.)
Stream Editor: A program that reads records from an input stream and processes them one or more at a time. This is in contrast with batch programs, which may expect to read their input files in entirety before starting to do anything, as well as with interactive programs which require input from the user.
String: A datum consisting of a sequence of characters, such as I am a string. Constant strings are written with double quotes in the awk language and may contain escape sequences. (See Escape Sequences.)
Tab: The character generated by hitting the TAB key on the keyboard. It usually expands to up to eight spaces upon output.
Text Domain: A unique name that identifies an application. Used for grouping messages that are translated at runtime into the local language.
Timestamp: A value in the "seconds since the epoch" format used by Unix and POSIX systems. Used for the gawk functions mktime, strftime, and systime. See also "Epoch" and "UTC."
Unix: A computer operating system originally developed in the early 1970's at AT&T Bell Laboratories. It initially became popular in universities around the world and later moved into commercial environments as a software development system and network server system. There are many commercial versions of Unix, as well as several work-alike systems whose source code is freely available (such as GNU/Linux, NetBSD, FreeBSD, and OpenBSD).
UTC: The accepted abbreviation for "Universal Coordinated Time." This is standard time in Greenwich, England, which is used as a reference time for day and date calculations. See also "Epoch" and "GMT."
Whitespace: A sequence of space, TAB, or newline characters occurring inside an input record or a string.

Node:Copying, Next:GNU Free Documentation License, Previous:Glossary, Up:Top

GNU General Public License

Version 2, June 1991

Copyright © 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

Preamble

The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.

To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.

For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.

Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.

Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.

The precise terms and conditions for copying, distribution and modification follow.

Terms and Conditions for Copying, Distribution and Modification

This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.
You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.
You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
1. You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.
2. You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.
3. If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
1. Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
2. Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
3. Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.
You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.
Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.
If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.
If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
NO WARRANTY
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

END OF TERMS AND CONDITIONS

How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.

one line to give the program's name and an idea of what it does.
Copyright (C) year  name of author

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA.

Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like this when it starts in an interactive mode:

Gnomovision version 69, Copyright (C) year name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
type `show w'.  This is free software, and you are welcome
to redistribute it under certain conditions; type `show c'
for details.

The hypothetical commands show w and show c should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than show w and show c; they could even be mouse-clicks or menu items--whatever suits your program.

You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names:

Yoyodyne, Inc., hereby disclaims all copyright
interest in the program `Gnomovision'
(which makes passes at compilers) written
by James Hacker.

signature of Ty Coon, 1 April 1989
Ty Coon, President of Vice

This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.

Node:GNU Free Documentation License, Next:Index, Previous:Copying, Up:Top

GNU Free Documentation License

Version 1.1, March 2000

Copyright (C) 2000  Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

PREAMBLE
The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, "Title Page" means the text near the most prominent appearance of the work's title, preceding the beginning of the body of the text.
VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
COPYING IN QUANTITY
If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has less than five).
3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
4. Preserve all the copyright notices of the Document.
5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document's license notice.
8. Include an unaltered copy of this License.
9. Preserve the section entitled "History", and its title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section entitled "History" in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the "History" section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
11. In any section entitled "Acknowledgements" or "Dedications", preserve the section's title, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
13. Delete any section entitled "Endorsements". Such a section may not be included in the Modified Version.
14. Do not retitle any existing section as "Endorsements" or to conflict in title with any Invariant Section.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties-for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled "History" in the various original documents, forming one section entitled "History"; likewise combine any sections entitled "Acknowledgements", and any sections entitled "Dedications". You must delete all sections entitled "Endorsements."
COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one quarter of the entire aggregate, the Document's Cover Texts may be placed on covers that surround only the Document within the aggregate. Otherwise they must appear on covers around the whole aggregate.
TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License provided that you also include the original English version of this License. In case of a disagreement between the translation and the original English version of this License, the original English version will prevail.
TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

  Copyright (C)  year  your name.
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.1
  or any later version published by the Free Software Foundation;
  with the Invariant Sections being list their titles, with the
  Front-Cover Texts being list, and with the Back-Cover Texts being list.
  A copy of the license is included in the section entitled ``GNU
  Free Documentation License''.

If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being list"; likewise for Back-Cover Texts.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Node:Index, Previous:GNU Free Documentation License, Up:Top

Index

! (exclamation point), ! operator: Boolean Ops
! (exclamation point), ! operator: Egrep Program, Precedence
! (exclamation point), != operator: Precedence, Typing and Comparison
! (exclamation point), !~ operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity, Regexp Usage
! operator: Egrep Program, Ranges
" (double quote): Quoting, Read Terminal
" (double quote), regexp constants: Computed Regexps
# (number sign), #! (executable scripts): Executable Scripts
# (number sign), #! (executable scripts), portability issues with: Executable Scripts
# (number sign), commenting: Comments
$ (dollar sign): Regexp Operators
$ (dollar sign), $ field operator: Precedence, Fields
$ (dollar sign), incrementing fields and arrays: Increment Ops
$ field operator: Fields
% (percent sign), % operator: Precedence
% (percent sign), %= operator: Precedence, Assignment Ops
& (ampersand), && operator: Precedence, Boolean Ops
& (ampersand), gsub/gensub/sub functions and: Gory Details
' (single quote): Quoting, Long, One-shot
' (single quote), vs. apostrophe: Comments
' (single quote), with double quotes: Quoting
() (parentheses): Regexp Operators
() (parentheses), pgawk program: Profiling
* (asterisk), * operator, as multiplication operator: Precedence
* (asterisk), * operator, as regexp operator: Regexp Operators
* (asterisk), * operator, null strings, matching: Gory Details
* (asterisk), ** operator: Options, Precedence, Arithmetic Ops
* (asterisk), **= operator: Options, Precedence, Assignment Ops
* (asterisk), *= operator: Precedence, Assignment Ops
+ (plus sign): Regexp Operators
+ (plus sign), + operator: Precedence
+ (plus sign), ++ operator: Precedence, Increment Ops
+ (plus sign), += operator: Precedence, Assignment Ops
+ (plus sign), decrement/increment operators: Increment Ops
, (comma), in range patterns: Ranges
- (hyphen), - operator: Precedence
- (hyphen), -- (decrement/increment) operator: Precedence
- (hyphen), -- operator: Increment Ops
- (hyphen), -= operator: Precedence, Assignment Ops
- (hyphen), filenames beginning with: Options
- (hyphen), in character lists: Character Lists
/ (forward slash): Regexp
/ (forward slash), / operator: Precedence
/ (forward slash), /= operator: Precedence, Assignment Ops
/ (forward slash), /= operator, vs. /=.../ regexp constant: Assignment Ops
/ (forward slash), patterns and: Expression Patterns
/= operator vs. /=.../ regexp constant: Assignment Ops
/dev/... special files (gawk): Special FD
/inet/ files (gawk): TCP/IP Networking
/p files (gawk): Portal Files
; (semicolon): Statements/Lines
; (semicolon), AWKPATH variable and: PC Using
; (semicolon), separating statements in actions: Statements, Action Overview
< (left angle bracket), < operator: Precedence, Typing and Comparison
< (left angle bracket), < operator (I/O): Getline/File
< (left angle bracket), <= operator: Precedence, Typing and Comparison
= (equals sign), = operator: Assignment Ops
= (equals sign), == operator: Precedence, Typing and Comparison
> (right angle bracket), > operator: Precedence, Typing and Comparison
> (right angle bracket), > operator (I/O): Redirection
> (right angle bracket), >= operator: Precedence, Typing and Comparison
> (right angle bracket), >> operator (I/O): Precedence, Redirection
? (question mark): GNU Regexp Operators, Regexp Operators
? (question mark), ?: operator: Precedence
[] (square brackets): Regexp Operators
\ (backslash): Regexp Operators, Quoting, Comments, Read Terminal
\ (backslash), \" escape sequence: Escape Sequences
\ (backslash), \' operator (gawk): GNU Regexp Operators
\ (backslash), \/ escape sequence: Escape Sequences
\ (backslash), \< operator (gawk): GNU Regexp Operators
\ (backslash), \> operator (gawk): GNU Regexp Operators
\ (backslash), \` operator (gawk): GNU Regexp Operators
\ (backslash), \a escape sequence: Escape Sequences
\ (backslash), \b escape sequence: Escape Sequences
\ (backslash), \B operator (gawk): GNU Regexp Operators
\ (backslash), \f escape sequence: Escape Sequences
\ (backslash), \n escape sequence: Escape Sequences
\ (backslash), \nnn escape sequence: Escape Sequences
\ (backslash), \r escape sequence: Escape Sequences
\ (backslash), \t escape sequence: Escape Sequences
\ (backslash), \v escape sequence: Escape Sequences
\ (backslash), \W operator (gawk): GNU Regexp Operators
\ (backslash), \w operator (gawk): GNU Regexp Operators
\ (backslash), \x escape sequence: Escape Sequences
\ (backslash), \y operator (gawk): GNU Regexp Operators
\ (backslash), as field separators: Command Line Field Separator
\ (backslash), continuing lines and: Egrep Program, Statements/Lines
\ (backslash), continuing lines and, comments and: Statements/Lines
\ (backslash), continuing lines and, in csh: Statements/Lines, More Complex
\ (backslash), gsub/gensub/sub functions and: Gory Details
\ (backslash), in character lists: Character Lists
\ (backslash), in escape sequences: Escape Sequences
\ (backslash), in escape sequences, POSIX and: Escape Sequences
\ (backslash), regexp constants: Computed Regexps
^ (caret): GNU Regexp Operators, Regexp Operators
^ (caret), ^ operator: Options, Precedence
^ (caret), ^= operator: Options, Precedence, Assignment Ops
^ (caret), in character lists: Character Lists
_ (underscore), _ C macro: Explaining gettext
_ (underscore), in names of private variables: Library Names
_ (underscore), translatable string: Programmer i18n
_gr_init user-defined function: Group Functions
_pw_init user-defined function: Passwd Functions
Aho, Alfred: Contributors, History
alarm clock example program: Alarm Program
alarm.awk program: Alarm Program
algorithms: Basic High Level
Alpha (DEC): Manual History
amazing awk assembler (aaa): Glossary
amazingly workable formatter (awf): Glossary
ambiguity, syntactic: /= operator vs. /=.../ regexp constant: Assignment Ops
amiga: Amiga Installation
ampersand (&), && operator: Boolean Ops
ampersand (&), &&operator: Precedence
ampersand (&), gsub/gensub/sub functions and: Gory Details
asterisk (*), * operator, as multiplication operator: Precedence
asterisk (*), * operator, as regexp operator: Regexp Operators
asterisk (*), * operator, null strings, matching: Gory Details
asterisk (*), ** operator: Options, Precedence, Arithmetic Ops
asterisk (*), **= operator: Options, Precedence, Assignment Ops
asterisk (*), *= operator: Precedence, Assignment Ops
backslash (\): Regexp Operators, Quoting, Comments, Read Terminal
backslash (\), \" escape sequence: Escape Sequences
backslash (\), \' operator (gawk): GNU Regexp Operators
backslash (\), \/ escape sequence: Escape Sequences
backslash (\), \< operator (gawk): GNU Regexp Operators
backslash (\), \> operator (gawk): GNU Regexp Operators
backslash (\), \` operator (gawk): GNU Regexp Operators
backslash (\), \a escape sequence: Escape Sequences
backslash (\), \b escape sequence: Escape Sequences
backslash (\), \B operator (gawk): GNU Regexp Operators
backslash (\), \f escape sequence: Escape Sequences
backslash (\), \n escape sequence: Escape Sequences
backslash (\), \nnn escape sequence: Escape Sequences
backslash (\), \r escape sequence: Escape Sequences
backslash (\), \t escape sequence: Escape Sequences
backslash (\), \v escape sequence: Escape Sequences
backslash (\), \W operator (gawk): GNU Regexp Operators
backslash (\), \w operator (gawk): GNU Regexp Operators
backslash (\), \x escape sequence: Escape Sequences
backslash (\), \y operator (gawk): GNU Regexp Operators
backslash (\), as field separators: Command Line Field Separator
backslash (\), continuing lines and: Egrep Program, Statements/Lines
backslash (\), continuing lines and, comments and: Statements/Lines
backslash (\), continuing lines and, in csh: Statements/Lines, More Complex
backslash (\), gsub/gensub/sub functions and: Gory Details
backslash (\), in character lists: Character Lists
backslash (\), in escape sequences: Escape Sequences
backslash (\), in escape sequences, POSIX and: Escape Sequences
backslash (\), regexp constants: Computed Regexps
BBS-list file: Sample Data Files
Beebe, Nelson: Acknowledgments
Bell Laboratories awk extensions: BTL
BeOS: BeOS Installation
Berry, Karl: Acknowledgments
binary input/output: User-modified
bindtextdomain function (C library): Explaining gettext
bindtextdomain function (gawk): Programmer i18n, I18N Functions
bindtextdomain function (gawk), portability and: I18N Portability
BINMODE variable: PC Using, User-modified
bits2str user-defined function: Bitwise Functions
bitwise, complement: Bitwise Functions
bitwise, operations: Bitwise Functions
bitwise, shift: Bitwise Functions
body, in actions: Statements
body, in loops: While Statement
Boolean expressions: Boolean Ops
Boolean expressions, as patterns: Expression Patterns
Boolean operators, See Boolean expressions: Boolean Ops
Bourne shell, quoting rules for: Quoting
braces ({}), actions and: Action Overview
braces ({}), pgawk program: Profiling
braces ({}), statements, grouping: Statements
bracket expressions, See character lists: Regexp Operators
break statement: Break Statement
Brennan, Michael: Other Versions, Simple Sed, Two-way I/O, Delete
Broder, Alan J.: Contributors
Brown, Martin: Bugs, Contributors, Acknowledgments
BSD portals: Portal Files
caret (^): GNU Regexp Operators, Regexp Operators
caret (^), ^ operator: Options, Precedence
caret (^), ^= operator: Options, Precedence, Assignment Ops
caret (^), in character lists: Character Lists
case sensitivity, array indices and: Array Intro
case sensitivity, converting case: String Functions
case sensitivity, example programs: Library Functions
case sensitivity, gawk: Case-sensitivity
case sensitivity, regexps and: User-modified, Case-sensitivity
case sensitivity, string comparisons and: User-modified
character encodings: Ordinal Functions
character lists: Character Lists, Regexp Operators
character lists, character classes: Character Lists
character lists, collating elements: Character Lists
character lists, collating symbols: Character Lists
character lists, complemented: Regexp Operators
character lists, equivalence classes: Character Lists
character lists, non-ASCII: Character Lists
character lists, range expressions: Character Lists
character sets: Ordinal Functions
character sets (machine character encodings): Glossary
character sets, See Also character lists: Regexp Operators
characters, counting: Wc Program
characters, transliterating: Translate Program
characters, values of as numbers: Ordinal Functions
Chassell, Robert J.: Acknowledgments
chdir function, implementing in gawk: Sample Library
chem utility: Glossary
chr user-defined function: Ordinal Functions
Cliff random numbers: Cliff Random Function
cliff_rand user-defined function: Cliff Random Function
close function: I/O Functions, Close Files And Pipes, Getline/Pipe, Getline/Variable/File
close function, return values: Close Files And Pipes
close function, two-way pipes and: Two-way I/O
Close, Diane: Contributors, Manual History
compl function (gawk): Bitwise Functions
complement, bitwise: Bitwise Functions
compound statements, control statements and: Statements
concatenating: Concatenation
conditional expressions: Conditional Exp
configuration option, --disable-nls: Additional Configuration Options
configuration option, --enable-portals: Additional Configuration Options
configuration option, --with-included-gettext: Additional Configuration Options, Gawk I18N
configuration options, gawk: Additional Configuration Options
constants, nondecimal: Nondecimal Data
constants, types of: Constants
Deifik, Scott: Bugs, Contributors, Acknowledgments
division: Arithmetic Ops
do-while statement: Do Statement, Regexp Usage
documentation, of awk programs: Library Names
documentation, online: Manual History
documents, searching: Dupword Program
dollar sign ($): Regexp Operators
dollar sign ($), $ field operator: Precedence, Fields
dollar sign ($), incrementing fields and arrays: Increment Ops
double quote ("): Quoting, Read Terminal
double quote ("), regexp constants: Computed Regexps
double-precision floating-point: Basic Data Typing
Drepper, Ulrich: Acknowledgments
email address for bug reports, bug-gawk@gnu.org: Bugs
EMISTERED: TCP/IP Networking
empty pattern: Empty
empty strings, See null strings: Regexp Field Splitting
epoch, definition of: Glossary
equals sign (=), = operator: Assignment Ops
equals sign (=), == operator: Precedence, Typing and Comparison
EREs (Extended Regular Expressions): Character Lists
ERRNO variable: Internals, Auto-set, Getline
error handling: Special FD
error handling, ERRNO variable and: Auto-set
error output: Special FD
escape processing, gsub/gensub/sub functions: Gory Details
escape sequences: Escape Sequences
escape sequences, unrecognized: Options
evaluation order: Increment Ops
evaluation order, concatenation: Concatenation
evaluation order, functions: Calling Built-in
examining fields: Fields
exclamation point (!), ! operator: Egrep Program, Precedence, Boolean Ops
exclamation point (!), != operator: Precedence, Typing and Comparison
exclamation point (!), !~ operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity, Regexp Usage
exit statement: Exit Statement
exp function: Numeric Functions
expand utility: Very Simple
expressions: Expressions
expressions, as patterns: Expression Patterns
expressions, assignment: Assignment Ops
expressions, Boolean: Boolean Ops
expressions, comparison: Typing and Comparison
expressions, conditional: Conditional Exp
expressions, matching, See comparison expressions: Typing and Comparison
expressions, selecting: Conditional Exp
Extended Regular Expressions (EREs): Character Lists
extension function (gawk): Using Internal File Ops
extensions, Bell Laboratories awk: BTL
extensions, in gawk, not in POSIX awk: POSIX/GNU
extensions, mawk: Other Versions
extract.awk program: Extract Program
extraction, of marked strings (internationalization): String Extraction
false, logical: Truth Values
FDL (Free Documentation License): GNU Free Documentation License
features, adding to gawk: Adding Code
features, advanced, See advanced features: Obsolete
features, deprecated: Obsolete
features, undocumented: Undocumented
Fenlason, Jay: Contributors, History
fflush function: I/O Functions
fflush function, unsupported: Options
Fish, Fred: Bugs, Contributors
fixed-width data: Constant Size
flag variables: Tee Program, Boolean Ops
floating-point: Floating Point Issues
floating-point, numbers: Basic Data Typing
floating-point, numbers, AWKNUM internal type: Internals
FNR variable: Auto-set, Records
FNR variable, changing: Auto-set
for statement: For Statement
for statement, in arrays: Scanning an Array
force_number internal function: Internals
force_string internal function: Internals
format specifiers, mixing regular with positional specifiers: Printf Ordering
format specifiers, printf statement: Control Letters
format specifiers, strftime function (gawk): Time Functions
format strings: Basic Printf
formats, numeric output: OFMT
formatting output: Printf
forward slash (/): Regexp
forward slash (/), / operator: Precedence
forward slash (/), /= operator: Precedence, Assignment Ops
forward slash (/), /= operator, vs. /=.../ regexp constant: Assignment Ops
forward slash (/), patterns and: Expression Patterns
Free Documentation License (FDL): GNU Free Documentation License
Free Software Foundation (FSF): Glossary, Getting, Manual History
free_temp internal macro: Internals
FreeBSD: Glossary
FS variable: User-modified, Field Separators
FS variable, --field-separator option and: Options
FS variable, as null string: Single Character Fields
FS variable, as TAB character: Options
FS variable, changing value of: Known Bugs, Field Separators
FS variable, running awk programs and: Cut Program
FS variable, setting from command line: Command Line Field Separator
FSF (Free Software Foundation): Glossary, Getting, Manual History
G-d: Acknowledgments
Garfinkle, Scott: Contributors
General Public License (GPL): Glossary
General Public License, See GPL: Manual History
GNITS mailing list: Acknowledgments
GNU awk, See gawk: Preface
GNU Free Documentation License: GNU Free Documentation License
GNU General Public License: Glossary
GNU Lesser General Public License: Glossary
GNU long options: Options, Command Line
GNU long options, printing list of: Options
GNU Project: Glossary, Manual History
GNU/Linux: Glossary, Atari Compiling, Additional Configuration Options, I18N Example, Manual History
GPL (General Public License): Glossary, Manual History
GPL (General Public License), printing: Options
grcat program: Group Functions
Grigera, Juan: Bugs, Contributors
group database, reading: Group Functions
group file: Group Functions
groups, information about: Group Functions
Hankerson, Darrel: Bugs, Contributors, Acknowledgments
Hartholz, Elaine: Acknowledgments
Hartholz, Marshall: Acknowledgments
Hasegawa, Isamu: Contributors, Acknowledgments
hexadecimal numbers: Nondecimal-numbers
hexadecimal, values, enabling interpretation of: Options
histsort.awk program: History Sorting
Hughes, Phil: Acknowledgments
HUP signal: Profiling
hyphen (-), - operator: Precedence
hyphen (-), -- (decrement/increment) operators: Precedence
hyphen (-), -- operator: Increment Ops
hyphen (-), -= operator: Precedence, Assignment Ops
hyphen (-), filenames beginning with: Options
hyphen (-), in character lists: Character Lists
id utility: Id Program
id.awk program: Id Program
if statement: If Statement, Regexp Usage
if statement, actions, changing: Ranges
igawk.sh program: Igawk Program
IGNORECASE variable: User-modified, Case-sensitivity
IGNORECASE variable, array sorting and: Array Sorting
IGNORECASE variable, array subscripts and: Array Intro
IGNORECASE variable, in example programs: Library Functions
implementation issues, gawk: Notes
implementation issues, gawk, debugging: Compatibility Mode
implementation issues, gawk, limits: Redirection, Getline Notes
in operator: Id Program, Precedence, Typing and Comparison
in operator, arrays and: Scanning an Array, Reference to Elements
increment operators: Increment Ops
index function: String Functions
indexing arrays: Array Intro
initialization, automatic: More Complex
int function: Numeric Functions
INT signal (MS-DOS): Profiling
integers: Basic Data Typing
integers, unsigned: Basic Data Typing
interval expressions: Regexp Operators
inventory-shipped file: Sample Data Files
ISO: Glossary
ISO 8859-1: Glossary
ISO Latin-1: Glossary
Jacobs, Andrew: Passwd Functions
Jaegermann, Michal: Contributors, Acknowledgments
Jedi knights: Undocumented
join user-defined function: Join Function
Kahrs, Jürgen: Contributors, Acknowledgments
Kenobi, Obi-Wan: Undocumented
Kernighan, Brian: Basic Data Typing, Other Versions, Contributors, BTL, Concatenation, Acknowledgments, Conventions, History
kill command, dynamic profiling: Profiling
Knights, jedi: Undocumented
Kwok, Conrad: Contributors
labels.awk program: Labels Program
languages, data-driven: Basic High Level
left angle bracket (<), < operator: Precedence, Typing and Comparison
left angle bracket (<), < operator (I/O): Getline/File
left angle bracket (<), <= operator: Precedence, Typing and Comparison
left shift, bitwise: Bitwise Functions
leftmost longest match: Multiple Line
length function: String Functions
Lesser General Public License (LGPL): Glossary
LGPL (Lesser General Public License): Glossary
Linux: Glossary, Atari Compiling, Additional Configuration Options, I18N Example, Manual History
locale categories: Explaining gettext
localization: I18N and L10N
localization, See internationalization, localization: I18N and L10N
log files, timestamps in: Time Functions
Lost In Space: Dynamic Extensions
ls utility: More Complex
lshift function (gawk): Bitwise Functions
lvalues/rvalues: Assignment Ops
mailing labels, printing: Labels Program
mailing list, GNITS: Acknowledgments
make_builtin internal function: Internals
make_number internal function: Internals
make_string internal function: Internals
mark parity: Ordinal Functions
marked string extraction (internationalization): String Extraction
marked strings, extracting: String Extraction
Marx, Groucho: Increment Ops
match function: String Functions
match function, RSTART/RLENGTH variables: String Functions
matching, expressions, See comparison expressions: Typing and Comparison
matching, leftmost longest: Multiple Line
matching, null strings: Gory Details
mawk program: Other Versions
memory, releasing: Internals
memory, setting limits: Options
message object files: Explaining gettext
message object files, converting from portable object files: I18N Example
message object files, specifying directory of: Programmer i18n, Explaining gettext
metacharacters, escape sequences for: Escape Sequences
mktime function (gawk): Time Functions
modifiers, in format specifiers: Format Modifiers
monetary information, localization: Explaining gettext
msgfmt utility: I18N Example
names, arrays/variables: Library Names, Arrays
names, functions: Library Names, Definition Syntax
namespace issues: Library Names, Arrays
namespace issues, functions: Definition Syntax
nawk utility: Names
negative zero: Floating Point Issues
not Boolean-logic operator: Boolean Ops
NR variable: Auto-set, Records
NR variable, changing: Auto-set
null strings: Basic Data Typing, Truth Values, Regexp Field Splitting, Records
null strings, array elements and: Delete
null strings, as array subscripts: Uninitialized Subscripts
null strings, converting numbers to strings: Conversion
null strings, matching: Gory Details
null strings, quoting and: Quoting
number sign (#), #! (executable scripts): Executable Scripts
number sign (#), #! (executable scripts), portability issues with: Executable Scripts
number sign (#), commenting: Comments
numbers: Internals
numbers, as array subscripts: Numeric Array Subscripts
numbers, as values of characters: Ordinal Functions
numbers, Cliff random: Cliff Random Function
numbers, converting: Conversion
numbers, converting, to strings: Bitwise Functions, User-modified
numbers, floating-point: Basic Data Typing
numbers, floating-point, AWKNUM internal type: Internals
numbers, hexadecimal: Nondecimal-numbers
numbers, NODE internal type: Internals
numbers, octal: Nondecimal-numbers
numbers, random: Numeric Functions
numbers, rounding: Round Function
numeric, constants: Scalar Constants
numeric, output format: OFMT
numeric, strings: Typing and Comparison
numeric, values: Internals
oawk utility: Names
obsolete features: Obsolete
octal numbers: Nondecimal-numbers
octal values, enabling interpretation of: Options
OFMT variable: User-modified, Conversion, OFMT
OFMT variable, POSIX awk and: OFMT
OFS variable: User-modified, Output Separators, Changing Fields
OpenBSD: Glossary
operating systems, BSD-based: Portal Files, Manual History
operating systems, PC, gawk on: PC Using
operating systems, PC, gawk on, installing: PC Installation
operating systems, porting gawk to: New Ports
operating systems, See Also GNU/Linux, PC operating systems, Unix: Installation
operations, bitwise: Bitwise Functions
operators, arithmetic: Arithmetic Ops
operators, assignment: Assignment Ops
operators, assignment, evaluation order: Assignment Ops
operators, Boolean, See Boolean expressions: Boolean Ops
operators, decrement/increment: Increment Ops
operators, GNU-specific: GNU Regexp Operators
operators, input/output: Precedence, Redirection, Getline/Coprocess, Getline/Pipe, Getline/File
operators, logical, See Boolean expressions: Boolean Ops
operators, precedence: Precedence, Increment Ops
operators, relational, See operators, comparison: Typing and Comparison
operators, short-circuit: Boolean Ops
operators, string: Concatenation
operators, string-matching: Regexp Usage
operators, string-matching, for buffers: GNU Regexp Operators
operators, word-boundary (gawk): GNU Regexp Operators
param_cnt internal variable: Internals
parameters, number of: Internals
parentheses (): Regexp Operators
parentheses (), pgawk program: Profiling
password file: Passwd Functions
patterns: Patterns and Actions
patterns, comparison expressions as: Expression Patterns
patterns, counts: Profiling
patterns, default: Very Simple
patterns, empty: Empty
patterns, expressions as: Regexp Patterns
patterns, ranges in: Ranges
patterns, regexp constants as: Expression Patterns
patterns, types of: Pattern Overview
PC operating systems, gawk on: PC Using
PC operating systems, gawk on, installing: PC Installation
percent sign (%), % operator: Precedence
percent sign (%), %= operator: Precedence, Assignment Ops
period (.): Regexp Operators
PERL: Future Extensions
Peters, Arno: Contributors
Peterson, Hal: Contributors
pgawk program: Profiling
pgawk program, awkprof.out file: Profiling
pgawk program, dynamic profiling: Profiling
pipes, closing: Close Files And Pipes
pipes, input: Getline/Pipe
pipes, output: Redirection
plus sign (+): Regexp Operators
plus sign (+), + operator: Precedence
plus sign (+), ++ operator: Precedence, Increment Ops
plus sign (+), += operator: Precedence, Assignment Ops
plus sign (+), decrement/increment operators: Increment Ops
positional specifiers, printf statement: Printf Ordering, Format Modifiers
positional specifiers, printf statement, mixing with regular formats: Printf Ordering
positive zero: Floating Point Issues
POSIX, programs, implementing in awk: Clones
POSIXLY_CORRECT environment variable: Options
precedence: Precedence, Increment Ops
precedence, regexp operators: Regexp Operators
private variables: Library Names
process information, files for: Special Process
processes, two-way communications with: Two-way I/O
processing data: Basic High Level
PROCINFO array: Group Functions, Passwd Functions, Auto-set, Special Caveats li>question mark (?): GNU Regexp Operators, Regexp Operators
question mark (?), ?: operator: Precedence
QUIT signal (MS-DOS): Profiling
quoting: Comments, Long, Read Terminal
quoting, rules for: Quoting
quoting, tricks for: Quoting
Rakitzis, Byron: History Sorting
rand function: Numeric Functions
random numbers, Cliff: Cliff Random Function
random numbers, rand/srand functions: Numeric Functions
random numbers, seed of: Numeric Functions
range expressions: Character Lists
range patterns: Ranges
Rankin, Pat: Bugs, Contributors, Assignment Ops, Acknowledgments
regexp constants: Typing and Comparison, Regexp Constants, Regexp Usage
regexp constants, /=.../, /= operator and: Assignment Ops
regexp constants, as patterns: Expression Patterns
regexp constants, in gawk: Using Constant Regexps
regexp constants, slashes vs. quotes: Computed Regexps
regexp constants, vs. string constants: Computed Regexps
regexp, See regular expressions: Regexp
regular expressions: Regexp
regular expressions as field separators: Field Separators
regular expressions, anchors in: Regexp Operators
regular expressions, as field separators: Regexp Field Splitting
regular expressions, as patterns: Regexp Patterns, Regexp Usage
regular expressions, as record separators: Records
regular expressions, case sensitivity: User-modified, Case-sensitivity
regular expressions, computed: Computed Regexps
regular expressions, constants, See regexp constants: Regexp Usage
regular expressions, dynamic: Computed Regexps
regular expressions, dynamic, with embedded newlines: Computed Regexps
regular expressions, gawk, command-line options: GNU Regexp Operators
regular expressions, interval expressions and: Options
regular expressions, leftmost longest match: Leftmost Longest
regular expressions, operators: Regexp Operators, Regexp Usage
regular expressions, operators, for buffers: GNU Regexp Operators
regular expressions, operators, for words: GNU Regexp Operators
regular expressions, operators, gawk: GNU Regexp Operators
regular expressions, operators, precedence of: Regexp Operators
regular expressions, searching for: Egrep Program
relational operators, See comparison operators: Typing and Comparison
return statement, user-defined functions: Return Statement
return values, close function: Close Files And Pipes
rev user-defined function: Function Example
rewind user-defined function: Rewind Function
right angle bracket (>), > operator: Precedence, Typing and Comparison
right angle bracket (>), > operator (I/O): Redirection
right angle bracket (>), >= operator: Precedence, Typing and Comparison
right angle bracket (>), >> operator (I/O): Precedence, Redirection
right shift, bitwise: Bitwise Functions
Ritchie, Dennis: Basic Data Typing
RLENGTH variable: Auto-set
RLENGTH variable, match function and: String Functions
Robbins, Arnold: Future Extensions, Bugs, Contributors, Alarm Program, Passwd Functions, Getline/Pipe, Command Line Field Separator
Robbins, Bill: Getline/Pipe
Robbins, Harry: Acknowledgments
Robbins, Jean: Acknowledgments
Robbins, Miriam: Passwd Functions, Getline/Pipe, Acknowledgments
Robinson, Will: Dynamic Extensions
robot, the: Dynamic Extensions
Rommel, Kai Uwe: Contributors, Acknowledgments
round user-defined function: Round Function
rounding: Round Function
rounding numbers: Round Function
RS variable: User-modified, Records
RS variable, multiline records and: Multiple Line
rshift function (gawk): Bitwise Functions
RSTART variable: Auto-set
RSTART variable, match function and: String Functions
RT variable: Auto-set, Multiple Line, Records
Rubin, Paul: Contributors, History
rule, definition of: Getting Started
rvalues/lvalues: Assignment Ops
scalar values: Basic Data Typing
Schreiber, Bert: Acknowledgments
Schreiber, Rita: Acknowledgments
search paths: VMS Running, PC Using
search paths, for source files: VMS Running, Igawk Program, AWKPATH Variable
searching: String Functions
searching, files for regular expressions: Egrep Program
searching, for words: Dupword Program
sed utility: Glossary, Igawk Program, Simple Sed, Field Splitting Summary
semicolon (;): Statements/Lines
semicolon (;), AWKPATH variable and: PC Using
semicolon (;), separating statements in actions: Statements, Action Overview
separators, field: User-modified
separators, field, FIELDWIDTHS variable and: User-modified
separators, field, POSIX and: Fields
separators, for records: Records
separators, for records, regular expressions as: Records
separators, for statements in actions: Action Overview
separators, record: User-modified
separators, subscript: User-modified
set_value internal function: Internals
shells, piping commands into: Redirection
shells, quoting: Using Shell Variables
shells, quoting, rules for: Quoting
shells, scripts: One-shot
shells, variables: Using Shell Variables
shift, bitwise: Bitwise Functions
sin function: Numeric Functions
single quote ('): Quoting, Long, One-shot
single quote ('), vs. apostrophe: Comments
single quote ('), with double quotes: Quoting
single-character fields: Single Character Fields
single-precision floating-point: Basic Data Typing
Skywalker, Luke: Undocumented
split function: String Functions
split function, array elements, deleting: Delete
split utility: Split Program
split.awk program: Split Program
sprintf function: String Functions, OFMT
sprintf function, OFMT variable and: User-modified
sprintf function, print/printf statements and: Round Function
sqrt function: Numeric Functions
square brackets ([]): Regexp Operators
srand function: Numeric Functions
Stallman, Richard: Glossary, Contributors, Acknowledgments, Manual History
standard input: Special FD, Read Terminal
standard output: Special FD
stat function, implementing in gawk: Sample Library
statements, compound, control statements and: Statements
statements, control, in actions: Statements
statements, multiple: Statements/Lines
stlen internal variable: Internals
stptr internal variable: Internals
stream editors: Igawk Program, Simple Sed, Field Splitting Summary
strftime function (gawk): Time Functions
string constants: Scalar Constants
string constants, vs. regexp constants: Computed Regexps
string extraction (internationalization): String Extraction
string operators: Concatenation
string-matching operators: Regexp Usage
strings: Internals
strings, converting: Conversion
strings, converting, numbers to: Bitwise Functions, User-modified
strings, empty, See null strings: Records
strings, extracting: String Extraction
strings, for localization: Programmer i18n
strings, length of: Scalar Constants
strings, merging arrays into: Join Function
strings, NODE internal type: Internals
strings, null: Regexp Field Splitting
strings, numeric: Typing and Comparison
strings, splitting: String Functions
troubleshooting, --non-decimal-data option: Options
troubleshooting, -F option: Known Bugs
troubleshooting, == operator: Typing and Comparison
troubleshooting, awk uses FS not IFS: Field Separators
troubleshooting, backslash before nonspecial character: Escape Sequences
troubleshooting, division: Arithmetic Ops
troubleshooting, fatal errors, field widths, specifying: Constant Size
troubleshooting, fatal errors, printf format strings: Format Modifiers
troubleshooting, fflush function: I/O Functions
troubleshooting, function call syntax: Function Calls
troubleshooting, gawk: Compatibility Mode, Known Bugs
troubleshooting, gawk, bug reports: Bugs
troubleshooting, gawk, fatal errors, function arguments: Calling Built-in
troubleshooting, getline function: File Checking
troubleshooting, gsub/sub functions: String Functions
troubleshooting, match function: String Functions
troubleshooting, print statement, omitting commas: Print Examples
troubleshooting, printing: Redirection
troubleshooting, quotes with file names: Special FD
troubleshooting, readable data files: File Checking
troubleshooting, regexp constants vs. string constants: Computed Regexps
troubleshooting, string concatenation: Concatenation
troubleshooting, substr function: String Functions
troubleshooting, system function: I/O Functions
troubleshooting, typographical errors, global variables: Options
true, logical: Truth Values
Trueman, David: Contributors, Acknowledgments, History
trunc-mod operation: Arithmetic Ops
truth values: Truth Values
type conversion: Conversion
type internal variable: Internals
undefined functions: Function Caveats
underscore (_), _ C macro: Explaining gettext
underscore (_), in names of private variables: Library Names
underscore (_), translatable string: Programmer i18n
undocumented features: Undocumented
uninitialized variables, as array subscripts: Uninitialized Subscripts
uniq utility: Uniq Program
uniq.awk program: Uniq Program
Unix: Glossary
Unix awk, backslashes in escape sequences: Escape Sequences
Unix awk, close function and: Close Files And Pipes
Unix awk, password files, field separators and: Command Line Field Separator
Unix, awk scripts and: Executable Scripts
unsigned integers: Basic Data Typing
update_ERRNO internal function: Internals
user database, reading: Passwd Functions
user-defined, functions: User-defined
user-defined, functions, counts: Profiling
user-defined, variables: Variables
user-modifiable variables: User-modified
users, information about, printing: Id Program
users, information about, retrieving: Passwd Functions
USR1 signal: Profiling
values, numeric: Basic Data Typing
values, string: Basic Data Typing
variable typing: Typing and Comparison
variables: Basic Data Typing, Other Features
variables, assigning on command line: Assignment Options
variables, built-in: Built-in Variables, Using Variables
variables, built-in, -v option, setting with: Options
variables, built-in, conveying information: Auto-set
variables, flag: Boolean Ops
variables, getline command into, using: Getline/Variable/Coprocess, Getline/Variable/Pipe, Getline/Variable/File, Getline/Variable
variables, global, for library functions: Library Names
variables, global, printing list of: Options
variables, initializing: Using Variables
variables, names of: Arrays
variables, private: Library Names
variables, setting: Options
variables, shadowing: Definition Syntax
variables, types of: Assignment Ops
variables, types of, comparison expressions and: Typing and Comparison
variables, uninitialized, as array subscripts: Uninitialized Subscripts
variables, user-defined: Variables
vertical bar (|): Regexp Operators
vertical bar (|), | operator (I/O): Precedence, Getline/Pipe
vertical bar (|), |& I/O operator (I/O): Two-way I/O
vertical bar (|), |& operator (I/O): Precedence, Getline/Coprocess
vertical bar (|), |& operator (I/O), two-way communications: Portal Files
vertical bar (|), || operator: Precedence, Boolean Ops
vname internal variable: Internals
w utility: Constant Size
Wall, Larry: Future Extensions
warnings, issuing: Options
wc utility: Wc Program
wc.awk program: Wc Program
Weinberger, Peter: Contributors, History
while statement: While Statement, Regexp Usage
whitespace, as field separators: Field Separators
whitespace, functions, calling: Calling Built-in
whitespace, newlines as: Options
Williams, Kent: Contributors
Woods, John: Contributors
word boundaries, matching: GNU Regexp Operators
word, regexp definition of: GNU Regexp Operators
word-boundary operator (gawk): GNU Regexp Operators
wordfreq.awk program: Word Sorting
words, counting: Wc Program
words, duplicate, searching for: Dupword Program
words, usage counts, generating: Word Sorting
xgettext utility: String Extraction
XOR bitwise operation: Bitwise Functions
xor function (gawk): Bitwise Functions
Zaretskii, Eli: Acknowledgments
zero, negative vs. positive: Floating Point Issues
Zoulas, Christos: Contributors
{} (braces), actions and: Action Overview
{} (braces), pgawk program: Profiling
{} (braces), statements, grouping: Statements
| (vertical bar): Regexp Operators
| (vertical bar), | operator (I/O): Precedence, Redirection, Getline/Pipe
| (vertical bar), |& operator (I/O): Two-way I/O, Precedence, Redirection, Getline/Coprocess
| (vertical bar), |& operator (I/O), pipes, closing: Close Files And Pipes
| (vertical bar), |& operator (I/O), two-way communications: Portal Files
| (vertical bar), || operator: Precedence, Boolean Ops
~ (tilde), ~ operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity

Footnotes

These commands are available on POSIX-compliant systems, as well as on traditional Unix-based systems. If you are using some other operating system, you still need to be familiar with the ideas of I/O redirection and pipes.
Often, these systems use gawk for their awk implementation!
All such differences appear in the index under the entry "differences in awk and gawk."
GNU stands for "GNU's not Unix."
The terminology "GNU/Linux" is explained in the Glossary.
Although we generally recommend the use of single quotes around the program text, double quotes are needed here in order to put the single quote into the message.
The #! mechanism works on Linux systems, systems derived from the 4.4-Lite Berkeley Software Distribution, and most commercial Unix systems.
The line beginning with #! lists the full file name of an interpreter to run and an optional initial command-line argument to pass to that interpreter. The operating system then runs the interpreter with the given argument and the full argument list of the executed program. The first argument in the list is the full file name of the awk program. The rest of the argument list contains either options to awk, or data files, or both.
In the C shell (csh), you need to type a semicolon and then a backslash at the end of the first line; see awk Statements Versus Lines, for an explanation. In a POSIX-compliant shell, such as the Bourne shell or bash, you can type the example as shown. If the command echo $path produces an empty output line, you are most likely using a POSIX-compliant shell. Otherwise, you are probably using the C shell or a shell derived from it.
On some very old systems, you may need to use ls -lg to get this output.
The ? and : referred to here is the three-operand conditional expression described in Conditional Expressions. Splitting lines after ? and : is a minor gawk extension; if --posix is specified (see Command-Line Options), then this extension is disabled.
In other literature, you may see a character list referred to as either a character set, a character class, or a bracket expression.
Use two backslashes if you're using a string constant with a regexp operator or function.
Experienced C and C++ programmers will note that it is possible, using something like IGNORECASE = 1 && /foObAr/ { ... } and IGNORECASE = 0 || /foobar/ { ... }. However, this is somewhat obscure and we don't recommend it.
At least that we know about.
In POSIX awk, newlines are not considered whitespace for separating fields.
The sed utility is a "stream editor." Its behavior is also defined by the POSIX standard.
Older versions of gawk would interpret these names internally only if the system did not actually have a /dev/fd directory or any of the other special files listed earlier. Usually this didn't make a difference, but sometimes it did; thus, it was decided to make gawk's behavior consistent on all systems and to have it always interpret the special file names itself.
The technical terminology is rather morbid. The finished child is called a "zombie," and cleaning up after it is referred to as "reaping."
The internal representation of all numbers, including integers, uses double-precision floating-point numbers. On most modern systems, these are in IEEE 754 standard format.
Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.
The POSIX standard is under revision. The revised standard's rules for typing and comparison are the same as just described for gawk.
The original version of awk used to keep reading and ignoring input until the end of the file was seen.
In POSIX awk, newline does not count as whitespace.
Some early implementations of Unix awk initialized FILENAME to "-", even if there were data files to be processed. This behavior was incorrect and should not be relied upon in your programs.
Thanks to Michael Brennan for pointing this out.
The C version of rand is known to produce fairly poor sequences of random numbers. However, nothing requires that an awk implementation use the C rand to implement the awk version of rand. In fact, gawk uses the BSD random function, which is considerably better than rand, to produce random numbers.
Computer-generated random numbers really are not truly random. They are technically known as "pseudorandom." This means that while the numbers in a sequence appear to be random, you can in fact generate the same sequence of random numbers over and over again.
Unless you use the --non-decimal-data option, which isn't recommended. See Allowing Nondecimal Input Data, for more information.
This is different from C and C++, in which the first character is number zero.
This consequence was certainly unintended.
As this Web page was being finalized, we learned that the POSIX standard will not use these rules. However, it was too late to change gawk for the 3.1 release. gawk behaves as described here.
A program is interactive if the standard output is connected to a terminal device.
See Glossary, especially the entries "Epoch" and "UTC."
The GNU date utility can also do many of the things described here. Its use may be preferable for simple time-related operations in shell scripts.
Occasionally there are minutes in a year with a leap second, which is why the seconds can go up to 60.
As this is a recent standard, not every system's strftime necessarily supports all of the conversions listed here.
If you don't understand any of this, don't worry about it; these facilities are meant to make it easier to "internationalize" programs. Other internationalization features are described in Internationalization with gawk.
This is because ISO C leaves the behavior of the C version of strftime undefined and gawk uses the system's version of strftime if it's there. Typically, the conversion specifier either does not appear in the returned string or appears literally.
This example shows that 0's come in on the left side. For gawk, this is always true, but in some languages, it's possible to have the left side fill with 1's. Caveat emptor.
For some operating systems, the gawk port doesn't support GNU gettext. This applies most notably to the PC operating systems. As such, these features are not available if you are using one of those operating systems. Sorry.
Americans use a comma every three decimal places and a period for the decimal point, while many Europeans do exactly the opposite: 1,234.56 versus 1.234,56.
Starting with gettext version 0.11.1, the xgettext utility that comes with GNU gettext can handle .awk files.
This example is borrowed from the GNU gettext manual.
This is good fodder for an "Obfuscated awk" contest.
Perhaps it would be better if it were called "Hippy." Ah, well.
This is very different from the same operator in the C shell, csh.
Not recommended.
Your version of gawk may use a different directory; it will depend upon how gawk was built and installed. The actual directory is the value of $(datadir) generated when gawk was configured. You probably don't need to worry about this, though.
The effects are not identical. Output of the transformed record will be in all lowercase, while IGNORECASE preserves the original contents of the input record.
While all the library routines could have been rewritten to use this convention, this was not done, in order to show how my own awk programming style has evolved and to provide some basis for this discussion.
gawk's --dump-variables command-line option is useful for verifying this.
http://mathworld.wolfram.com/CliffRandomNumberGenerator.hmtl
ASCII has been extended in many countries to use the values from 128 to 255 for country-specific characters. If your system uses these extensions, you can simplify _ord_init to simply loop from 0 to 255.
It would be nice if awk had an assignment operator for concatenation. The lack of an explicit operator for concatenation makes string operations more difficult than they really need to be.
This function was written before gawk acquired the ability to split strings into single characters using "" as the separator. We have left it alone, since using substr is more portable.
It is often the case that password information is stored in a network database.
It also introduces a subtle bug; if a match happens, we output the translated line, not the original.
wc can't just use the value of FNR in endfile. If you examine the code in Noting Data File Boundaries you will see that FNR has already been reset by the time endfile is called.
On some older System V systems, tr may require that the lists be written as range expressions enclosed in square brackets ([a-z]) and quoted, to prevent the shell from attempting a file name expansion. This is not a feature.
This program was written before gawk acquired the ability to split each character in a string into separate array elements.
"Real world" is defined as "a program actually used to get something done."
On some very old versions of awk, the test getline junk < t can loop forever if the file exists but is empty. Caveat emptor.
http://www.cygwin.com
http://cm.bell-labs.com/who/bwk
This version is edited slightly for presentation. The complete version can be found in extension/filefuncs.c in the gawk distribution.
Compiled programs are typically written in lower-level languages such as C, C++, Fortran, or Ada, and then translated, or compiled, into a form that the computer can execute directly.
http://www.validgh.com/goldberg/paper.ps.
Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.

`[:alnum:]`	Alphanumeric characters.
`[:alpha:]`	Alphabetic characters.
`[:blank:]`	Space and TAB characters.
`[:cntrl:]`	Control characters.
`[:digit:]`	Numeric characters.
`[:graph:]`	Characters that are both printable and visible. (A space is printable but not visible, whereas an `a` is both.)
`[:lower:]`	Lowercase alphabetic characters.
`[:print:]`	Printable characters (characters that are not control characters).
`[:punct:]`	Punctuation characters (characters that are not letters, digits, control characters, or space characters).
`[:space:]`	Space characters (such as space, TAB, and formfeed, to name a few).
`[:upper:]`	Uppercase alphabetic characters.
`[:xdigit:]`	Characters that are hexadecimal digits.

Regular Expressions

General Introduction

Table of Contents

Foreword

Regular Expressions

How to Use Regular Expressions

Escape Sequences

Advanced Notes: Backslash Before Regular Characters

Advanced Notes: Escape Sequences for Metacharacters

Regular Expression Operators

Using Character Lists

gawk-Specific Regexp Operators

Case Sensitivity in Matching

How Much Text Matches?

Using Dynamic Regexps

Advanced Notes: Using \n in Character Lists of Dynamic Regexps

Glossary

GNU General Public License

Preamble

Terms and Conditions for Copying, Distribution and Modification

NO WARRANTY

END OF TERMS AND CONDITIONS

How to Apply These Terms to Your New Programs

GNU Free Documentation License

ADDENDUM: How to use this License for your documents

Index

Footnotes

`gawk`-Specific Regexp Operators

Advanced Notes: Using `\n` in Character Lists of Dynamic Regexps