This file is part of the documentation of awk
, a program that you can use to select
particular records in a file and perform operations upon them.
Copyright © 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc.
This is Edition 3 of GAWK: Effective AWK Programming: A User's Guide for GNU Awk, for the 3.1.1 (or later) version of the GNU implementation of AWK.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being "GNU General Public License", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled "GNU Free Documentation License".
Arnold Robbins and I are good friends. We were introduced 11 years ago
by circumstances--and our favorite programming language, AWK.
The circumstances started a couple of years
earlier. I was working at a new job and noticed an unplugged
Unix computer sitting in the corner. No one knew how to use it,
and neither did I. However,
a couple of days later it was running, and
I was root
and the one-and-only user.
That day, I began the transition from statistician to Unix programmer.
On one of many trips to the library or bookstore in search of books on Unix, I found the gray AWK book, a.k.a. Aho, Kernighan and Weinberger, The AWK Programming Language, Addison-Wesley, 1988. AWK's simple programming paradigm--find a pattern in the input and then perform an action--often reduced complex or tedious data manipulations to few lines of code. I was excited to try my hand at programming in AWK.
Alas, the awk
on my computer was a limited version of the
language described in the AWK book. I discovered that my computer
had "old awk
" and the AWK book described "new awk
."
I learned that this was typical; the old version refused to step
aside or relinquish its name. If a system had a new awk
, it was
invariably called nawk
, and few systems had it.
The best way to get a new awk
was to ftp
the source code for
gawk
from prep.ai.mit.edu
. gawk
was a version of
new awk
written by David Trueman and Arnold, and available under
the GNU General Public License.
(Incidentally,
it's no longer difficult to find a new awk
. gawk
ships with
Linux, and you can download binaries or source code for almost
any system; my wife uses gawk
on her VMS box.)
My Unix system started out unplugged from the wall; it certainly was not
plugged into a network. So, oblivious to the existence of gawk
and the Unix community in general, and desiring a new awk
, I wrote
my own, called mawk
.
Before I was finished I knew about gawk
,
but it was too late to stop, so I eventually posted
to a comp.sources
newsgroup.
A few days after my posting, I got a friendly email
from Arnold introducing
himself. He suggested we share design and algorithms and
attached a draft of the POSIX standard so
that I could update mawk
to support language extensions added
after publication of the AWK book.
Frankly, if our roles had been reversed, I would not have been so open and we probably would have never met. I'm glad we did meet. He is an AWK expert's AWK expert and a genuinely nice person. Arnold contributes significant amounts of his expertise and time to the Free Software Foundation.
This book is the gawk
reference manual, but at its core it
is a book about AWK programming that
will appeal to a wide audience.
It is a definitive reference to the AWK language as defined by the
1987 Bell Labs release and codified in the 1992 POSIX Utilities
standard.
On the other hand, the novice AWK programmer can study
a wealth of practical programs that emphasize
the power of AWK's basic idioms:
data driven control-flow, pattern matching with regular expressions,
and associative arrays.
Those looking for something new can try out gawk
's
interface to network protocols via special /inet
files.
The programs in this book make clear that an AWK program is typically much smaller and faster to develop than a counterpart written in C. Consequently, there is often a payoff to prototype an algorithm or design in AWK to get it running quickly and expose problems early. Often, the interpreted performance is adequate and the AWK prototype becomes the product.
The new pgawk
(profiling gawk
), produces
program execution counts.
I recently experimented with an algorithm that for
n lines of input, exhibited
~ C n^2
performance, while
theory predicted
~ C n log n
behavior. A few minutes poring
over the awkprof.out
profile pinpointed the problem to
a single line of code. pgawk
is a welcome addition to
my programmer's toolbox.
Arnold has distilled over a decade of experience writing and
using AWK programs, and developing gawk
, into this book. If you use
AWK or want to learn how, then read this book.
Michael Brennan
Author of mawk
A regular expression, or regexp, is a way of describing a
set of strings.
Because regular expressions are such a fundamental part of awk
programming, their format and use deserve a separate chapter.
A regular expression enclosed in slashes (/
)
is an awk
pattern that matches every input record whose text
belongs to that set.
The simplest regular expression is a sequence of letters, numbers, or
both. Such a regexp matches any string that contains that sequence.
Thus, the regexp foo
matches any string containing foo
.
Therefore, the pattern /foo/
matches any input record containing
the three characters foo
anywhere in the record. Other
kinds of regexps let you specify more complicated classes of strings.
Initially, the examples in this chapter are simple. As we explain more about how regular expressions work, we will present more complicated instances.
[...]
.
A regular expression can be used as a pattern by enclosing it in
slashes. Then the regular expression is tested against the
entire text of each record. (Normally, it only needs
to match some part of the text in order to succeed.) For example, the
following prints the second field of each record that contains the string
foo
anywhere in it:
$ awk '/foo/ { print $2 }' BBS-list -| 555-1234 -| 555-6699 -| 555-6480 -| 555-2127
~
(tilde), ~
operator
Regular expressions can also be used in matching expressions. These
expressions allow you to specify the string to match against; it need
not be the entire current input record. The two operators ~
and !~
perform regular expression comparisons. Expressions
using these operators can be used as patterns, or in if
,
while
, for
, and do
statements.
(See Control Statements in Actions.)
For example:
exp ~ /regexp/
is true if the expression exp (taken as a string)
matches regexp. The following example matches, or selects,
all input records with the uppercase letter J
somewhere in the
first field:
$ awk '$1 ~ /J/' inventory-shipped -| Jan 13 25 15 115 -| Jun 31 42 75 492 -| Jul 24 34 67 436 -| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped
This next example is true if the expression exp
(taken as a character string)
does not match regexp:
exp !~ /regexp/
The following example matches,
or selects, all input records whose first field does not contain
the uppercase letter J
:
$ awk '$1 !~ /J/' inventory-shipped -| Feb 15 32 24 226 -| Mar 15 24 34 228 -| Apr 31 52 63 420 -| May 16 34 29 208 ...
When a regexp is enclosed in slashes, such as /foo/
, we call it
a regexp constant, much like 5.27
is a numeric constant and
"foo"
is a string constant.
Some characters cannot be included literally in string constants
("foo"
) or regexp constants (/foo/
).
Instead, they should be represented with escape sequences,
which are character sequences beginning with a backslash (\
).
One use of an escape sequence is to include a double-quote character in
a string constant. Because a plain double quote ends the string, you
must use \"
to represent an actual double-quote character as a
part of the string. For example:
$ awk 'BEGIN { print "He said \"hi!\" to her." }' -| He said "hi!" to her.
The backslash character itself is another character that cannot be
included normally; you must write \\
to put one backslash in the
string or regexp. Thus, the string whose contents are the two characters
"
and \
must be written "\"\\"
.
Backslash also represents unprintable characters such as TAB or newline. While there is nothing to stop you from entering most unprintable characters directly in a string constant or regexp constant, they may look ugly.
The following table lists
all the escape sequences used in awk
and
what they represent. Unless noted otherwise, all these escape
sequences apply to both string constants and regexp constants:
\\
\
.
\a
\b
\f
\n
\r
\t
\v
\nnn
0
and 7
. For example, the code for the ASCII ESC
(escape) character is \033
.
\xhh...
0
-9
, and either A
-F
or a
-f
). Like the same construct
in ISO C, the escape sequence continues until the first nonhexadecimal
digit is seen. However, using more than two hexadecimal digits produces
undefined results. (The \x
escape sequence is not allowed in
POSIX awk
.)
\/
awk
to keep processing the rest of the regexp.
\"
awk
to keep processing the rest of the string.
In gawk
, a number of additional two-character sequences that begin
with a backslash have special meaning in regexps.
See gawk
-Specific Regexp Operators.
In a regexp, a backslash before any character that is not in the previous list
and not listed in
gawk
-Specific Regexp Operators,
means that the next character should be taken literally, even if it would
normally be a regexp operator. For example, /a\+b/
matches the three
characters a+b
.
For complete portability, do not use a backslash before any character not shown in the previous list.
To summarize:
awk
reads your program.
gawk
processes both regexp constants and dynamic regexps
(see Using Dynamic Regexps),
for the special operators listed in
gawk
-Specific Regexp Operators.
If you place a backslash in a string constant before something that is
not one of the characters previously listed, POSIX awk
purposely
leaves what happens as undefined. There are two choices:
awk
and gawk
both do.
For example, "a\qc"
is the same as "aqc"
.
(Because this is such an easy bug both to introduce and to miss,
gawk
warns you about it.)
Consider FS = "[ \t]+\|[ \t]+"
to use vertical bars
surrounded by whitespace as the field separator. There should be
two backslashes in the string FS = "[ \t]+\\|[ \t]+"
.)
awk
implementations do this.
In such implementations, typing "a\qc"
is the same as typing
"a\\qc"
.
Suppose you use an octal or hexadecimal
escape to represent a regexp metacharacter.
(See Regular Expression Operators.)
Does awk
treat the character as a literal character or as a regexp
operator?
Historically, such characters were taken literally.
(d.c.)
However, the POSIX standard indicates that they should be treated
as real metacharacters, which is what gawk
does.
In compatibility mode (see Command-Line Options),
gawk
treats the characters represented by octal and hexadecimal
escape sequences literally when used in regexp constants. Thus,
/a\52b/
is equivalent to /a\*b/
.
You can combine regular expressions with special characters, called regular expression operators or metacharacters, to increase the power and versatility of regular expressions.
The escape sequences described
earlier
in Escape Sequences,
are valid inside a regexp. They are introduced by a \
and
are recognized and converted into corresponding real characters as
the very first step in processing regexps.
Here is a list of metacharacters. All characters that are not escape sequences and that are not listed in the table stand for themselves:
\
\$
matches the character $
.
^
^@chapter
matches @chapter
at the beginning of a string and can be used
to identify chapter beginnings in Texinfo source files.
The ^
is known as an anchor, because it anchors the pattern to
match only at the beginning of the string.
It is important to realize that ^
does not match the beginning of
a line embedded in a string.
The condition is not true in the following example:
if ("line1\nLINE 2" ~ /^L/) ...
$
^
, but it matches only at the end of a string.
For example, p$
matches a record that ends with a p
. The $
is an anchor
and does not match the end of a line embedded in a string.
The condition in the following example is not true:
if ("line1\nLINE 2" ~ /1$/) ...
.
.P
matches any single character followed by a P
in a string. Using
concatenation, we can make a regular expression such as U.A
, which
matches any three-character sequence that begins with U
and ends
with A
.
In strict POSIX mode (see Command-Line Options),
.
does not match the NUL
character, which is a character with all bits equal to zero.
Otherwise, NUL is just another character. Other versions of awk
may not be able to match the NUL character.
[...]
[MVX]
matches any one of
the characters M
, V
, or X
in a string. A full
discussion of what can be inside the square brackets of a character list
is given in
Using Character Lists.
[^ ...]
[
must be a ^
. It matches any characters
except those in the square brackets. For example, [^awk]
matches any character that is not an a
, w
,
or k
.
|
|
has the lowest precedence of all the regular
expression operators.
For example, ^P|[[:digit:]]
matches any string that matches either ^P
or [[:digit:]]
. This
means it matches any string that starts with P
or contains a digit.
The alternation applies to the largest possible regexps on either side.
(...)
|
. For example,
@(samp|code)\{[^}]+\}
matches both @code{foo}
and
@samp{bar}
.
(These are Texinfo formatting control sequences.)
*
ph*
applies the *
symbol to the preceding h
and looks for matches
of one p
followed by any number of h
s. This also matches
just p
if no h
s are present.
The *
repeats the smallest possible preceding expression.
(Use parentheses if you want to repeat a larger expression.) It finds
as many repetitions as possible. For example,
awk '/\(c[ad][ad]*r x\)/ { print }' sample
prints every record in sample
containing a string of the form
(car x)
, (cdr x)
, (cadr x)
, and so on.
Notice the escaping of the parentheses by preceding them
with backslashes.
+
*
, except that the preceding expression must be
matched at least once. This means that wh+y
would match why
and whhy
, but not wy
, whereas
wh*y
would match all three of these strings.
The following is a simpler
way of writing the last *
example:
awk '/\(c[ad]+r x\)/ { print }' sample
?
*
, except that the preceding expression can be
matched either once or not at all. For example, fe?d
matches fed
and fd
, but nothing else.
{n}
{n,}
{n,m}
wh{3}y
whhhy
, but not why
or whhhhy
.
wh{3,5}y
whhhy
, whhhhy
, or whhhhhy
, only.
wh{2,}y
whhy
or whhhy
, and so on.
Interval expressions were not traditionally available in awk
.
They were added as part of the POSIX standard to make awk
and egrep
consistent with each other.
However, because old programs may use {
and }
in regexp
constants, by default gawk
does not match interval expressions
in regexps. If either --posix
or --re-interval
are specified
(see Command-Line Options), then interval expressions
are allowed in regexps.
For new programs that use {
and }
in regexp constants,
it is good practice to always escape them with a backslash. Then the
regexp constants are valid and work the way you want them to, using
any version of awk
.13
In regular expressions, the *
, +
, and ?
operators,
as well as the braces {
and }
,
have
the highest precedence, followed by concatenation, and finally by |
.
As in arithmetic, parentheses can change how operators are grouped.
In POSIX awk
and gawk
, the *
, +
, and ?
operators
stand for themselves when there is nothing in the regexp that precedes them.
For example, /+/
matches a literal plus sign. However, many other versions of
awk
treat such a usage as a syntax error.
If gawk
is in compatibility mode
(see Command-Line Options),
POSIX character classes and interval expressions are not available in
regular expressions.
Within a character list, a range expression consists of two
characters separated by a hyphen. It matches any single character that
sorts between the two characters, using the locale's
collating sequence and character set. For example, in the default C
locale, [a-dx-z]
is equivalent to [abcdxyz]
. Many locales
sort characters in dictionary order, and in these locales,
[a-dx-z]
is typically not equivalent to [abcdxyz]
; instead it
might be equivalent to [aBbCcDdxXyYz]
, for example. To obtain
the traditional interpretation of bracket expressions, you can use the C
locale by setting the LC_ALL
environment variable to the value
C
.
To include one of the characters \
, ]
, -
, or ^
in a
character list, put a \
in front of it. For example:
[d\]]
matches either d
or ]
.
This treatment of \
in character lists
is compatible with other awk
implementations and is also mandated by POSIX.
The regular expressions in awk
are a superset
of the POSIX specification for Extended Regular Expressions (EREs).
POSIX EREs are based on the regular expressions accepted by the
traditional egrep
utility.
Character classes are a new feature introduced in the POSIX standard. A character class is a special notation for describing lists of characters that have a specific attribute, but the actual characters can vary from country to country and/or from character set to character set. For example, the notion of what is an alphabetic character differs between the United States and France.
A character class is only valid in a regexp inside the
brackets of a character list. Character classes consist of [:
,
a keyword denoting the class, and :]
. Here are the character
classes defined by the POSIX standard.
[:alnum:] | Alphanumeric characters.
|
[:alpha:] | Alphabetic characters.
|
[:blank:] | Space and TAB characters.
|
[:cntrl:] | Control characters.
|
[:digit:] | Numeric characters.
|
[:graph:] | Characters that are both printable and visible.
(A space is printable but not visible, whereas an a is both.)
|
[:lower:] | Lowercase alphabetic characters.
|
[:print:] | Printable characters (characters that are not control characters).
|
[:punct:] | Punctuation characters (characters that are not letters, digits,
control characters, or space characters).
|
[:space:] | Space characters (such as space, TAB, and formfeed, to name a few).
|
[:upper:] | Uppercase alphabetic characters.
|
[:xdigit:] | Characters that are hexadecimal digits.
|
For example, before the POSIX standard, you had to write /[A-Za-z0-9]/
to match alphanumeric characters. If your
character set had other alphabetic characters in it, this would not
match them, and if your character set collated differently from
ASCII, this might not even match the ASCII alphanumeric characters.
With the POSIX character classes, you can write
/[[:alnum:]]/
to match the alphabetic
and numeric characters in your character set.
Two additional special sequences can appear in character lists. These apply to non-ASCII character sets, which can have single symbols (called collating elements) that are represented with more than one character. They can also have several characters that are equivalent for collating, or sorting, purposes. (For example, in French, a plain "e" and a grave-accented "è" are equivalent.) These sequences are:
[.
and .]
. For example, if ch
is a collating element,
then [[.ch.]]
is a regexp that matches this collating element, whereas
[ch]
is a regexp that matches either c
or h
.
[=
and =]
.
For example, the name e
might be used to represent all of
"e," "è," and "é." In this case, [[=e=]]
is a regexp
that matches any of e
, é
, or è
.
These features are very valuable in non-English-speaking locales.
Caution: The library functions that gawk
uses for regular
expression matching currently recognize only POSIX character classes;
they do not recognize collating symbols or equivalence classes.
gawk
-Specific Regexp OperatorsGNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this
section and are specific to gawk
;
they are not available in other awk
implementations.
Most of the additional operators deal with word matching.
For our purposes, a word is a sequence of one or more letters, digits,
or underscores (_
):
\w
[[:alnum:]_]
.
\W
[^[:alnum:]_]
.
\<
/\<away/
matches away
but not
stowaway
.
\>
/stow\>/
matches stow
but not stowaway
.
\y
\yballs?\y
matches either ball
or balls
, as a separate word.
\B
/\Brat\B/
matches crate
but it does not match dirty rat
.
\B
is essentially the opposite of \y
.
There are two other operators that work on buffers. In Emacs, a
buffer is, naturally, an Emacs buffer. For other programs,
gawk
's regexp library routines consider the entire
string to match as the buffer.
The operators are:
\`
\'
Because ^
and $
always work in terms of the beginning
and end of strings, these operators don't add any new capabilities
for awk
. They are provided for compatibility with other
GNU software.
In other GNU software, the word-boundary operator is \b
. However,
that conflicts with the awk
language's definition of \b
as backspace, so gawk
uses a different letter.
An alternative method would have been to require two backslashes in the
GNU operators, but this was deemed too confusing. The current
method of using \y
for the GNU \b
appears to be the
lesser of two evils.
The various command-line options
(see Command-Line Options)
control how gawk
interprets characters in regexps:
gawk
provides all the facilities of
POSIX regexps and the
previously described
GNU regexp operators.
GNU regexp operators described
in Regular Expression Operators.
However, interval expressions are not supported.
--posix
\w
matches a literal w
). Interval expressions
are allowed.
--traditional
awk
regexps are matched. The GNU operators
are not special, interval expressions are not available, nor
are the POSIX character classes ([[:alnum:]]
, etc.).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
--re-interval
--traditional
has been provided.
Case is normally significant in regular expressions, both when matching
ordinary characters (i.e., not metacharacters) and inside character
sets. Thus, a w
in a regular expression matches only a lowercase
w
and not an uppercase W
.
The simplest way to do a case-independent match is to use a character
list--for example, [Ww]
. However, this can be cumbersome if
you need to use it often, and it can make the regular expressions harder
to read. There are two alternatives that you might prefer.
One way to perform a case-insensitive match at a particular point in the
program is to convert the data to a single case, using the
tolower
or toupper
built-in string functions (which we
haven't discussed yet;
see String Manipulation Functions).
For example:
tolower($1) ~ /foo/ { ... }
converts the first field to lowercase before matching against it.
This works in any POSIX-compliant awk
.
Another method, specific to gawk
, is to set the variable
IGNORECASE
to a nonzero value (see Built-in Variables).
When IGNORECASE
is not zero, all regexp and string
operations ignore case. Changing the value of
IGNORECASE
dynamically controls the case-sensitivity of the
program as it runs. Case is significant by default because
IGNORECASE
(like most variables) is initialized to zero:
x = "aB" if (x ~ /ab/) ... # this test will fail IGNORECASE = 1 if (x ~ /ab/) ... # now it will succeed
In general, you cannot use IGNORECASE
to make certain rules
case-insensitive and other rules case-sensitive, because there is no
straightforward way
to set IGNORECASE
just for the pattern of
a particular rule.14
To do this, use either character lists or tolower
. However, one
thing you can do with IGNORECASE
only is dynamically turn
case-sensitivity on or off for all the rules at once.
IGNORECASE
can be set on the command line or in a BEGIN
rule
(see Other Command-Line Arguments; also
see Startup and Cleanup Actions).
Setting IGNORECASE
from the command line is a way to make
a program case-insensitive without having to edit it.
Prior to gawk
3.0, the value of IGNORECASE
affected regexp operations only. It did not affect string comparison
with ==
, !=
, and so on.
Beginning with version 3.0, both regexp and string comparison
operations are also affected by IGNORECASE
.
Beginning with gawk
3.0,
the equivalences between upper-
and lowercase characters are based on the ISO-8859-1 (ISO Latin-1)
character set. This character set is a superset of the traditional 128
ASCII characters, which also provides a number of characters suitable
for use with European languages.
The value of IGNORECASE
has no effect if gawk
is in
compatibility mode (see Command-Line Options).
Case is always significant in compatibility mode.
Consider the following:
echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
This example uses the sub
function (which we haven't discussed yet;
see String Manipulation Functions)
to make a change to the input record. Here, the regexp /a+/
indicates "one or more a
characters," and the replacement
text is <A>
.
The input contains four a
characters.
awk
(and POSIX) regular expressions always match
the leftmost, longest sequence of input characters that can
match. Thus, all four a
characters are
replaced with <A>
in this example:
$ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }' -| <A>bcd
For simple match/no-match tests, this is not so important. But when doing
text matching and substitutions with the match
, sub
, gsub
,
and gensub
functions, it is very important.
Understanding this principle is also important for regexp-based record
and field splitting (see How Input Is Split into Records,
and also see Specifying How Fields Are Separated).
The righthand side of a ~
or !~
operator need not be a
regexp constant (i.e., a string of characters between slashes). It may
be any expression. The expression is evaluated and converted to a string
if necessary; the contents of the string are used as the
regexp. A regexp that is computed in this way is called a dynamic
regexp:
BEGIN { digits_regexp = "[[:digit:]]+" } $0 ~ digits_regexp { print }
This sets digits_regexp
to a regexp that describes one or more digits,
and tests whether the input record matches this regexp.
When using the ~
and !~
Caution: When using the ~
and !~
operators, there is a difference between a regexp constant
enclosed in slashes and a string constant enclosed in double quotes.
If you are going to use a string constant, you have to understand that
the string is, in essence, scanned twice: the first time when
awk
reads your program, and the second time when it goes to
match the string on the lefthand side of the operator with the pattern
on the right. This is true of any string-valued expression (such as
digits_regexp
, shown previously), not just string constants.
What difference does it make if the string is scanned twice? The answer has to do with escape sequences, and particularly with backslashes. To get a backslash into a regular expression inside a string, you have to type two backslashes.
For example, /\*/
is a regexp constant for a literal *
.
Only one backslash is needed. To do the same thing with a string,
you have to type "\\*"
. The first backslash escapes the
second one so that the string actually contains the
two characters \
and *
.
Given that you can use both regexp and string constants to describe regular expressions, which should you use? The answer is "regexp constants," for several reasons:
awk
can note
that you have supplied a regexp and store it internally in a form that
makes pattern matching more efficient. When using a string constant,
awk
must first convert the string into this internal form and
then perform the pattern matching.
\n
in Character Lists of Dynamic RegexpsSome commercial versions of awk
do not allow the newline
character to be used inside a character list for a dynamic regexp:
$ awk '$0 ~ "[ \t\n]"' error--> awk: newline in character class [ error--> ]... error--> source line number 1 error--> context is error--> >>> <<<
But a newline in a regexp constant works with no problem:
$ awk '$0 ~ /[ \t\n]/' here is a sample line -| here is a sample line Ctrl-d
gawk
does not have this problem, and it isn't likely to
occur often in practice, but it's worth noting for future reference.
awk
statements attached to a rule. If the rule's
pattern matches an input record, awk
executes the
rule's action. Actions are always enclosed in curly braces.
(See Actions.)
awk
Assembler
sed
and awk
scripts. It is thousands
of lines long, including machine descriptions for several eight-bit
microcomputers. It is a good example of a program that would have been
better written in another language.
You can get it from ftp://ftp.freefriends.org/arnold/Awkstuff/aaa.tgz.
awf
)
nroff -ms
and nroff -man
formatting
commands, using awk
and sh
.
It is available over the Internet
from ftp://ftp.freefriends.org/arnold/Awkstuff/awf.tgz.
^
and $
, which force the match
to the beginning or end of the string, respectively.
awk
provides associative arrays.
awk
expression that changes the value of some awk
variable or data object. An object that you can assign to is called an
lvalue. The assigned values are called rvalues.
See Assignment Expressions.
awk
Language
awk
programs are written.
awk
Program
awk
program consists of a series of patterns and
actions, collectively known as rules. For each input record
given to the program, the program's rules are all processed in turn.
awk
programs may also contain function definitions.
awk
Script
awk
program.
awk
lets you work with floating-point numbers and strings.
gawk
lets you manipulate bit values with the built-in
functions described in
Using gawk
's Bit Manipulation Functions.
Computers are often defined by how many bits they use to represent integer
values. Typical systems are 32-bit systems, but 64-bit systems are
becoming increasingly popular, and 16-bit systems are waning in
popularity.
/bin/sh
) on Unix and Unix-like systems,
originally written by Steven R. Bourne.
Many shells (bash
, ksh
, pdksh
, zsh
) are
generally upwardly compatible with the Bourne shell.
awk
language provides built-in functions that perform various
numerical, I/O-related, and string computations. Examples are
sqrt
(for the square root of a number) and substr
(for a
substring of a string).
gawk
provides functions for timestamp management, bit manipulation,
and runtime string translation.
(See Built-in Functions.)
ARGC
,
ARGV
,
CONVFMT
,
ENVIRON
,
FILENAME
,
FNR
,
FS
,
NF
,
NR
,
OFMT
,
OFS
,
ORS
,
RLENGTH
,
RSTART
,
RS
,
and
SUBSEP
are the variables that have special meaning to awk
.
In addition,
ARGIND
,
BINMODE
,
ERRNO
,
FIELDWIDTHS
,
IGNORECASE
,
LINT
,
PROCINFO
,
RT
,
and
TEXTDOMAIN
are the variables that have special meaning to gawk
.
Changing some of them affects awk
's running environment.
(See Built-in Variables.)
awk
programming language has C-like syntax, and this Web page
points out similarities between awk
and C when appropriate.
In general, gawk
attempts to be as similar to the 1990 version
of ISO C as makes sense. Future versions of gawk
may adopt features
from the newer 1999 standard, as appropriate.
pic
that reads descriptions of molecules
and produces pic
input for drawing them.
It was written in awk
by Brian Kernighan and Jon Bentley, and is available from
http://cm.bell-labs.com/netlib/typesetting/chem.gz.
awk
statements, enclosed in curly braces. Compound
statements may be nested.
(See Control Statements in Actions.)
foo
concatenated with
the string bar
gives the string foobar
.
(See String Concatenation.)
?:
ternary operator, such as
expr1 ? expr2 : expr3
. The expression
expr1 is evaluated; if the result is true, the value of the whole
expression is the value of expr2; otherwise the value is
expr3. In either case, only one of expr2 and expr3
is evaluated. (See Conditional Expressions.)
(a < b)
.
Comparison expressions are used in if
, while
, do
,
and for
statements, and in patterns to select which input records to process.
(See Variable Typing and Comparison Expressions.)
{
and }
. Curly braces are used in
awk
for delimiting actions, compound statements, and function
bodies.
awk
programs, where you specify the data you
are interested in processing, and what to do when that data is seen.
awk
stores numeric values. It is the C type double
.
"foo"
, but it may also be an expression whose value can vary.
(See Using Dynamic Regexps.)
=
val, that each
program has available to it. Users generally place values into the
environment in order to provide information to various programs. Typical
examples are the environment variables HOME
and PATH
.
The epoch on Unix and POSIX systems is 1970-01-01 00:00:00 UTC.
See also "GMT" and "UTC."
\n
for newline or \033
for the ASCII
ESC (Escape) character. (See Escape Sequences.)
awk
reads an input record, it splits the record into pieces
separated by whitespace (or by a separator regexp that you can
change by setting the built-in variable FS
). Such pieces are
called fields. If the pieces are of fixed length, you can use the built-in
variable FIELDWIDTHS
to describe their lengths.
(See Specifying How Fields Are Separated,
and
Reading Fixed-Width Data.)
strftime
and sprintf
functions, and are used in the
printf
statement as well. Also, data conversions from numbers to strings
are controlled by the format string contained in the built-in variable
CONVFMT
. (See Format-Control Letters.)
awk
has a number of built-in
functions, and also allows you to define your own.
(See Functions.)
gawk
awk
.
gawk
and its source
code may be distributed. (See GNU General Public License.)
0
-9
and
A
-F
, with A
representing 10, B
representing 11, and so on, up to F
for 15.
Hexadecimal numbers are written in C using a leading 0x
,
to indicate their base. Thus, 0x12
is 18 (1 times 16 plus 2).
awk
. Usually, an awk
input
record consists of one line of text.
(See How Input Is Split into Records.)
awk
is typically (but not always) implemented as an interpreter.
See also "Compiler."
awk
programs.
awk
language, a keyword is a word that has special
meaning. Keywords are reserved and may not be used as variable names.
gawk
's keywords are:
BEGIN
,
END
,
if
,
else
,
while
,
do...while
,
for
,
for...in
,
break
,
continue
,
delete
,
next
,
nextfile
,
function
,
func
,
and
exit
.
&&
, ||
, and !
in awk
. Often called Boolean
expressions, after the mathematician who pioneered this kind of
mathematical logic.
awk
, a field designator can also be used as an
lvalue.
awk
programs by placing two double quote characters next to
each other (""
). It can appear in input data by having two successive
occurrences of the field separator appear next to each other.
awk
implementations use
double-precision floating-point to represent numbers.
Very old awk
implementations use single-precision floating-point.
0
-7
.
Octal numbers are written in C using a leading 0
,
to indicate their base. Thus, 013
is 11 (one times 8 plus 3).
awk
which input records are interesting to which
rules.
A pattern is an arbitrary conditional expression against which input is
tested. If the condition is satisfied, the pattern is said to match
the input record. A typical pattern might compare the input record against
a regular expression. (See Pattern Elements.)
awk
users is
IEEE Standard for Information Technology, Standard 1003.2-1992,
Portable Operating System Interface (POSIX) Part 2: Shell and Utilities.
Informally, this standard is often referred to as simply "P1003.2."
awk
program. Special care must be
taken when naming such variables and functions.
(See Naming Library Function Global Variables.)
awk
to process or it can
specify single lines. (See Pattern Elements.)
You can redirect the output of the print
and printf
statements
to a file or a system command, using the >
, >>
, |
, and |&
operators. You can redirect input to the getline
statement using
the <
, |
, and |&
operators.
(See Redirecting Output of print
and printf
,
and Explicit Input with getline
.)
R.*xp
matches any string starting with the letter R
and ending with the letters xp
. In awk
, regexps are
used in patterns and in conditional expressions. Regexps may contain
escape sequences. (See Regular Expressions.)
/foo/
. This regular expression is chosen
when you write the awk
program and cannot be changed during
its execution. (See How to Use Regular Expressions.)
awk
program that specifies how to process single
input records. A rule consists of a pattern and an action.
awk
reads an input record; then, for each rule, if the input record
satisfies the rule's pattern, awk
executes the rule's action.
Otherwise, the rule does nothing for that input record.
awk
, essentially every expression has a value. These values
are rvalues.
gawk
, a list of directories to search for awk
program source files.
In the shell, a list of directories to search for executable programs.
sed
awk
logical operators &&
and ||
.
If the value of the entire expression is determinable from evaluating just
the lefthand side of these operators, the righthand side is not
evaluated.
(See Boolean Expressions.)
awk
to store
numeric values. It is the C type float
.
gawk
, instead of being handed
directly to the underlying operating system--for example, /dev/stderr
.
(See Special File Names in gawk
.)
I am a
string
. Constant strings are written with double quotes in the
awk
language and may contain escape sequences.
(See Escape Sequences.)
gawk
functions
mktime
, strftime
, and systime
.
See also "Epoch" and "UTC."
Copyright © 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and modification follow.
Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.
You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.
In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.
If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.
It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.
If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
one line to give the program's name and an idea of what it does. Copyright (C) year name of author This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, USA.
Also add information on how to contact you by electronic and paper mail.
If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:
Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details.
The hypothetical commands show w
and show c
should show
the appropriate parts of the General Public License. Of course, the
commands you use may be called something other than show w
and
show c
; they could even be mouse-clicks or menu items--whatever
suits your program.
You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. signature of Ty Coon, 1 April 1989 Ty Coon, President of Vice
This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License.
Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or other written document "free" in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of "copyleft", which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free
program should come with manuals providing the same freedoms that the
software does. But this License is not limited to software manuals;
it can be used for any textual work, regardless of subject matter or
whether it is published as a printed book. We recommend this License
principally for works whose purpose is instruction or reference.
This License applies to any manual or other work that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. The "Document", below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as "you".
A "Modified Version" of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document's overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (For example, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License.
The "Cover Texts" are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, whose contents can be viewed and edited directly and straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup has been designed to thwart or discourage subsequent modification by readers is not Transparent. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML designed for human modification. Opaque formats include PostScript, PDF, proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML produced by some word processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the material
this License requires to appear in the title page. For works in
formats which do not have any title page as such, "Title Page" means
the text near the most prominent appearance of the work's title,
preceding the beginning of the body of the text.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and
you may publicly display copies.
If you publish printed copies of the Document numbering more than 100, and the Document's license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a publicly-accessible computer-network location containing a complete Transparent copy of the Document, free of added material, which the general network-using public has access to download anonymously at no charge using public-standard network protocols. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give
them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version's license notice. These titles must be distinct from any other section titles.
You may add a section entitled "Endorsements", provided it contains nothing but endorsements of your Modified Version by various parties-for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License
give permission to use their names for publicity for or to assert or
imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections entitled "History"
in the various original documents, forming one section entitled
"History"; likewise combine any sections entitled "Acknowledgements",
and any sections entitled "Dedications". You must delete all sections
entitled "Endorsements."
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute
it individually under this License, provided you insert a copy of this
License into the extracted document, and follow this License in all
other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, does not as a whole count as a Modified Version of the Document, provided no compilation copyright is claimed for the compilation. Such a compilation is called an "aggregate", and this License does not apply to the other self-contained works thus compiled with the Document, on account of their being thus compiled, if they are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one quarter
of the entire aggregate, the Document's Cover Texts may be placed on
covers that surround only the Document within the aggregate.
Otherwise they must appear on covers around the whole aggregate.
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section 4.
Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License provided that you also include the
original English version of this License. In case of a disagreement
between the translation and the original English version of this
License, the original English version will prevail.
You may not copy, modify, sublicense, or distribute the Document except
as expressly provided for under this License. Any other attempt to
copy, modify, sublicense or distribute the Document is void, and will
automatically terminate your rights under this License. However,
parties who have received copies, or rights, from you under this
License will not have their licenses terminated so long as such
parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License "or any later version" applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and
license notices just after the title page:
Copyright (C) year your name. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list. A copy of the license is included in the section entitled ``GNU Free Documentation License''.If you have no Invariant Sections, write "with no Invariant Sections" instead of saying which ones are invariant. If you have no Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover Texts being list"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.
!
(exclamation point), !
operator: Boolean Ops
!
(exclamation point), !
operator: Egrep Program, Precedence
!
(exclamation point), !=
operator: Precedence, Typing and Comparison
!
(exclamation point), !~
operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity, Regexp Usage
!
operator: Egrep Program, Ranges
"
(double quote): Quoting, Read Terminal
"
(double quote), regexp constants: Computed Regexps
#
(number sign), #!
(executable scripts): Executable Scripts
#
(number sign), #!
(executable scripts), portability issues with: Executable Scripts
#
(number sign), commenting: Comments
$
(dollar sign): Regexp Operators
$
(dollar sign), $
field operator: Precedence, Fields
$
(dollar sign), incrementing fields and arrays: Increment Ops
$
field operator: Fields
%
(percent sign), %
operator: Precedence
%
(percent sign), %=
operator: Precedence, Assignment Ops
&
(ampersand), &&
operator: Precedence, Boolean Ops
&
(ampersand), gsub
/gensub
/sub
functions and: Gory Details
'
(single quote): Quoting, Long, One-shot
'
(single quote), vs. apostrophe: Comments
'
(single quote), with double quotes: Quoting
()
(parentheses): Regexp Operators
()
(parentheses), pgawk
program: Profiling
*
(asterisk), *
operator, as multiplication operator: Precedence
*
(asterisk), *
operator, as regexp operator: Regexp Operators
*
(asterisk), *
operator, null strings, matching: Gory Details
*
(asterisk), **
operator: Options, Precedence, Arithmetic Ops
*
(asterisk), **=
operator: Options, Precedence, Assignment Ops
*
(asterisk), *=
operator: Precedence, Assignment Ops
+
(plus sign): Regexp Operators
+
(plus sign), +
operator: Precedence
+
(plus sign), ++
operator: Precedence, Increment Ops
+
(plus sign), +=
operator: Precedence, Assignment Ops
+
(plus sign), decrement/increment operators: Increment Ops
,
(comma), in range patterns: Ranges
-
(hyphen), -
operator: Precedence
-
(hyphen), --
(decrement/increment) operator: Precedence
-
(hyphen), --
operator: Increment Ops
-
(hyphen), -=
operator: Precedence, Assignment Ops
-
(hyphen), filenames beginning with: Options
-
(hyphen), in character lists: Character Lists
/
(forward slash): Regexp
/
(forward slash), /
operator: Precedence
/
(forward slash), /=
operator: Precedence, Assignment Ops
/
(forward slash), /=
operator, vs. /=.../
regexp constant: Assignment Ops
/
(forward slash), patterns and: Expression Patterns
/=
operator vs. /=.../
regexp constant: Assignment Ops
/dev/...
special files (gawk
): Special FD
/inet/
files (gawk
): TCP/IP Networking
/p
files (gawk
): Portal Files
;
(semicolon): Statements/Lines
;
(semicolon), AWKPATH
variable and: PC Using
;
(semicolon), separating statements in actions: Statements, Action Overview
<
(left angle bracket), <
operator: Precedence, Typing and Comparison
<
(left angle bracket), <
operator (I/O): Getline/File
<
(left angle bracket), <=
operator: Precedence, Typing and Comparison
=
(equals sign), =
operator: Assignment Ops
=
(equals sign), ==
operator: Precedence, Typing and Comparison
>
(right angle bracket), >
operator: Precedence, Typing and Comparison
>
(right angle bracket), >
operator (I/O): Redirection
>
(right angle bracket), >=
operator: Precedence, Typing and Comparison
>
(right angle bracket), >>
operator (I/O): Precedence, Redirection
?
(question mark): GNU Regexp Operators, Regexp Operators
?
(question mark), ?:
operator: Precedence
[]
(square brackets): Regexp Operators
\
(backslash): Regexp Operators, Quoting, Comments, Read Terminal
\
(backslash), \"
escape sequence: Escape Sequences
\
(backslash), \'
operator (gawk
): GNU Regexp Operators
\
(backslash), \/
escape sequence: Escape Sequences
\
(backslash), \<
operator (gawk
): GNU Regexp Operators
\
(backslash), \>
operator (gawk
): GNU Regexp Operators
\
(backslash), \`
operator (gawk
): GNU Regexp Operators
\
(backslash), \a
escape sequence: Escape Sequences
\
(backslash), \b
escape sequence: Escape Sequences
\
(backslash), \B
operator (gawk
): GNU Regexp Operators
\
(backslash), \f
escape sequence: Escape Sequences
\
(backslash), \n
escape sequence: Escape Sequences
\
(backslash), \
nnn escape sequence: Escape Sequences
\
(backslash), \r
escape sequence: Escape Sequences
\
(backslash), \t
escape sequence: Escape Sequences
\
(backslash), \v
escape sequence: Escape Sequences
\
(backslash), \W
operator (gawk
): GNU Regexp Operators
\
(backslash), \w
operator (gawk
): GNU Regexp Operators
\
(backslash), \x
escape sequence: Escape Sequences
\
(backslash), \y
operator (gawk
): GNU Regexp Operators
\
(backslash), as field separators: Command Line Field Separator
\
(backslash), continuing lines and: Egrep Program, Statements/Lines
\
(backslash), continuing lines and, comments and: Statements/Lines
\
(backslash), continuing lines and, in csh
: Statements/Lines, More Complex
\
(backslash), gsub
/gensub
/sub
functions and: Gory Details
\
(backslash), in character lists: Character Lists
\
(backslash), in escape sequences: Escape Sequences
\
(backslash), in escape sequences, POSIX and: Escape Sequences
\
(backslash), regexp constants: Computed Regexps
^
(caret): GNU Regexp Operators, Regexp Operators
^
(caret), ^
operator: Options, Precedence
^
(caret), ^=
operator: Options, Precedence, Assignment Ops
^
(caret), in character lists: Character Lists
_
(underscore), _
C macro: Explaining gettext
_
(underscore), in names of private variables: Library Names
_
(underscore), translatable string: Programmer i18n
_gr_init
user-defined function: Group Functions
_pw_init
user-defined function: Passwd Functions
alarm.awk
program: Alarm Program
awk
assembler (aaa
): Glossary
awf
): Glossary
/=
operator vs. /=.../
regexp constant: Assignment Ops
&
), &&
operator: Boolean Ops
&
), &&
operator: Precedence
&
), gsub
/gensub
/sub
functions and: Gory Details
*
), *
operator, as multiplication operator: Precedence
*
), *
operator, as regexp operator: Regexp Operators
*
), *
operator, null strings, matching: Gory Details
*
), **
operator: Options, Precedence, Arithmetic Ops
*
), **=
operator: Options, Precedence, Assignment Ops
*
), *=
operator: Precedence, Assignment Ops
\
): Regexp Operators, Quoting, Comments, Read Terminal
\
), \"
escape sequence: Escape Sequences
\
), \'
operator (gawk
): GNU Regexp Operators
\
), \/
escape sequence: Escape Sequences
\
), \<
operator (gawk
): GNU Regexp Operators
\
), \>
operator (gawk
): GNU Regexp Operators
\
), \`
operator (gawk
): GNU Regexp Operators
\
), \a
escape sequence: Escape Sequences
\
), \b
escape sequence: Escape Sequences
\
), \B
operator (gawk
): GNU Regexp Operators
\
), \f
escape sequence: Escape Sequences
\
), \n
escape sequence: Escape Sequences
\
), \
nnn escape sequence: Escape Sequences
\
), \r
escape sequence: Escape Sequences
\
), \t
escape sequence: Escape Sequences
\
), \v
escape sequence: Escape Sequences
\
), \W
operator (gawk
): GNU Regexp Operators
\
), \w
operator (gawk
): GNU Regexp Operators
\
), \x
escape sequence: Escape Sequences
\
), \y
operator (gawk
): GNU Regexp Operators
\
), as field separators: Command Line Field Separator
\
), continuing lines and: Egrep Program, Statements/Lines
\
), continuing lines and, comments and: Statements/Lines
\
), continuing lines and, in csh
: Statements/Lines, More Complex
\
), gsub
/gensub
/sub
functions and: Gory Details
\
), in character lists: Character Lists
\
), in escape sequences: Escape Sequences
\
), in escape sequences, POSIX and: Escape Sequences
\
), regexp constants: Computed Regexps
BBS-list
file: Sample Data Files
awk
extensions: BTL
bindtextdomain
function (C library): Explaining gettext
bindtextdomain
function (gawk
): Programmer i18n, I18N Functions
bindtextdomain
function (gawk
), portability and: I18N Portability
BINMODE
variable: PC Using, User-modified
bits2str
user-defined function: Bitwise Functions
{}
), actions and: Action Overview
{}
), pgawk
program: Profiling
{}
), statements, grouping: Statements
break
statement: Break Statement
^
): GNU Regexp Operators, Regexp Operators
^
), ^
operator: Options, Precedence
^
), ^=
operator: Options, Precedence, Assignment Ops
^
), in character lists: Character Lists
gawk
: Case-sensitivity
chdir
function, implementing in gawk
: Sample Library
chem
utility: Glossary
chr
user-defined function: Ordinal Functions
cliff_rand
user-defined function: Cliff Random Function
close
function: I/O Functions, Close Files And Pipes, Getline/Pipe, Getline/Variable/File
close
function, return values: Close Files And Pipes
close
function, two-way pipes and: Two-way I/O
compl
function (gawk
): Bitwise Functions
--disable-nls
: Additional Configuration Options
--enable-portals
: Additional Configuration Options
--with-included-gettext
: Additional Configuration Options, Gawk I18N
gawk
: Additional Configuration Options
do
-while
statement: Do Statement, Regexp Usage
awk
programs: Library Names
$
): Regexp Operators
$
), $
field operator: Precedence, Fields
$
), incrementing fields and arrays: Increment Ops
"
): Quoting, Read Terminal
"
), regexp constants: Computed Regexps
bug-gawk@gnu.org
: Bugs
EMISTERED
: TCP/IP Networking
=
), =
operator: Assignment Ops
=
), ==
operator: Precedence, Typing and Comparison
ERRNO
variable: Internals, Auto-set, Getline
ERRNO
variable and: Auto-set
gsub
/gensub
/sub
functions: Gory Details
!
), !
operator: Egrep Program, Precedence, Boolean Ops
!
), !=
operator: Precedence, Typing and Comparison
!
), !~
operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity, Regexp Usage
exit
statement: Exit Statement
exp
function: Numeric Functions
expand
utility: Very Simple
extension
function (gawk
): Using Internal File Ops
awk
: BTL
gawk
, not in POSIX awk
: POSIX/GNU
mawk
: Other Versions
extract.awk
program: Extract Program
gawk
: Adding Code
fflush
function: I/O Functions
fflush
function, unsupported: Options
AWKNUM
internal type: Internals
FNR
variable: Auto-set, Records
FNR
variable, changing: Auto-set
for
statement: For Statement
for
statement, in arrays: Scanning an Array
force_number
internal function: Internals
force_string
internal function: Internals
printf
statement: Control Letters
strftime
function (gawk
): Time Functions
/
): Regexp
/
), /
operator: Precedence
/
), /=
operator: Precedence, Assignment Ops
/
), /=
operator, vs. /=.../
regexp constant: Assignment Ops
/
), patterns and: Expression Patterns
free_temp
internal macro: Internals
FS
variable: User-modified, Field Separators
FS
variable, --field-separator
option and: Options
FS
variable, as TAB character: Options
FS
variable, changing value of: Known Bugs, Field Separators
FS
variable, running awk
programs and: Cut Program
FS
variable, setting from command line: Command Line Field Separator
awk
, See gawk
: Preface
grcat
program: Group Functions
histsort.awk
program: History Sorting
HUP
signal: Profiling
-
), -
operator: Precedence
-
), --
(decrement/increment) operators: Precedence
-
), --
operator: Increment Ops
-
), -=
operator: Precedence, Assignment Ops
-
), filenames beginning with: Options
-
), in character lists: Character Lists
id
utility: Id Program
id.awk
program: Id Program
if
statement: If Statement, Regexp Usage
if
statement, actions, changing: Ranges
igawk.sh
program: Igawk Program
IGNORECASE
variable: User-modified, Case-sensitivity
IGNORECASE
variable, array sorting and: Array Sorting
IGNORECASE
variable, array subscripts and: Array Intro
IGNORECASE
variable, in example programs: Library Functions
gawk
: Notes
gawk
, debugging: Compatibility Mode
gawk
, limits: Redirection, Getline Notes
in
operator: Id Program, Precedence, Typing and Comparison
in
operator, arrays and: Scanning an Array, Reference to Elements
index
function: String Functions
int
function: Numeric Functions
INT
signal (MS-DOS): Profiling
inventory-shipped
file: Sample Data Files
join
user-defined function: Join Function
kill
command, dynamic profiling: Profiling
labels.awk
program: Labels Program
<
), <
operator: Precedence, Typing and Comparison
<
), <
operator (I/O): Getline/File
<
), <=
operator: Precedence, Typing and Comparison
length
function: String Functions
ls
utility: More Complex
lshift
function (gawk
): Bitwise Functions
make_builtin
internal function: Internals
make_number
internal function: Internals
make_string
internal function: Internals
match
function: String Functions
match
function, RSTART
/RLENGTH
variables: String Functions
mawk
program: Other Versions
mktime
function (gawk
): Time Functions
msgfmt
utility: I18N Example
nawk
utility: Names
NR
variable: Auto-set, Records
NR
variable, changing: Auto-set
#
), #!
(executable scripts): Executable Scripts
#
), #!
(executable scripts), portability issues with: Executable Scripts
#
), commenting: Comments
AWKNUM
internal type: Internals
NODE
internal type: Internals
oawk
utility: Names
OFMT
variable: User-modified, Conversion, OFMT
OFMT
variable, POSIX awk
and: OFMT
OFS
variable: User-modified, Output Separators, Changing Fields
gawk
on: PC Using
gawk
on, installing: PC Installation
gawk
to: New Ports
gawk
): GNU Regexp Operators
param_cnt
internal variable: Internals
()
: Regexp Operators
()
, pgawk
program: Profiling
gawk
on: PC Using
gawk
on, installing: PC Installation
%
), %
operator: Precedence
%
), %=
operator: Precedence, Assignment Ops
.
): Regexp Operators
pgawk
program: Profiling
pgawk
program, awkprof.out
file: Profiling
pgawk
program, dynamic profiling: Profiling
+
): Regexp Operators
+
), +
operator: Precedence
+
), ++
operator: Precedence, Increment Ops
+
), +=
operator: Precedence, Assignment Ops
+
), decrement/increment operators: Increment Ops
printf
statement: Printf Ordering, Format Modifiers
printf
statement, mixing with regular formats: Printf Ordering
awk
: Clones
POSIXLY_CORRECT
environment variable: Options
PROCINFO
array: Group Functions, Passwd Functions, Auto-set, Special Caveats
li>question mark (?
): GNU Regexp Operators, Regexp Operators
?
), ?:
operator: Precedence
QUIT
signal (MS-DOS): Profiling
rand
function: Numeric Functions
rand
/srand
functions: Numeric Functions
/=.../
, /=
operator and: Assignment Ops
gawk
: Using Constant Regexps
gawk
, command-line options: GNU Regexp Operators
gawk
: GNU Regexp Operators
return
statement, user-defined functions: Return Statement
close
function: Close Files And Pipes
rev
user-defined function: Function Example
rewind
user-defined function: Rewind Function
>
), >
operator: Precedence, Typing and Comparison
>
), >
operator (I/O): Redirection
>
), >=
operator: Precedence, Typing and Comparison
>
), >>
operator (I/O): Precedence, Redirection
RLENGTH
variable: Auto-set
RLENGTH
variable, match
function and: String Functions
round
user-defined function: Round Function
RS
variable: User-modified, Records
RS
variable, multiline records and: Multiple Line
rshift
function (gawk
): Bitwise Functions
RSTART
variable: Auto-set
RSTART
variable, match
function and: String Functions
RT
variable: Auto-set, Multiple Line, Records
sed
utility: Glossary, Igawk Program, Simple Sed, Field Splitting Summary
;
): Statements/Lines
;
), AWKPATH
variable and: PC Using
;
), separating statements in actions: Statements, Action Overview
FIELDWIDTHS
variable and: User-modified
set_value
internal function: Internals
sin
function: Numeric Functions
'
): Quoting, Long, One-shot
'
), vs. apostrophe: Comments
'
), with double quotes: Quoting
split
function: String Functions
split
function, array elements, deleting: Delete
split
utility: Split Program
split.awk
program: Split Program
sprintf
function: String Functions, OFMT
sprintf
function, OFMT
variable and: User-modified
sprintf
function, print
/printf
statements and: Round Function
sqrt
function: Numeric Functions
[]
): Regexp Operators
srand
function: Numeric Functions
stat
function, implementing in gawk
: Sample Library
stlen
internal variable: Internals
stptr
internal variable: Internals
strftime
function (gawk
): Time Functions
NODE
internal type: Internals
--non-decimal-data
option: Options
-F
option: Known Bugs
==
operator: Typing and Comparison
awk
uses FS
not IFS
: Field Separators
printf
format strings: Format Modifiers
fflush
function: I/O Functions
gawk
: Compatibility Mode, Known Bugs
gawk
, bug reports: Bugs
gawk
, fatal errors, function arguments: Calling Built-in
getline
function: File Checking
gsub
/sub
functions: String Functions
match
function: String Functions
print
statement, omitting commas: Print Examples
substr
function: String Functions
system
function: I/O Functions
type
internal variable: Internals
_
), _
C macro: Explaining gettext
_
), in names of private variables: Library Names
_
), translatable string: Programmer i18n
uniq
utility: Uniq Program
uniq.awk
program: Uniq Program
awk
, backslashes in escape sequences: Escape Sequences
awk
, close
function and: Close Files And Pipes
awk
, password files, field separators and: Command Line Field Separator
awk
scripts and: Executable Scripts
update_ERRNO
internal function: Internals
USR1
signal: Profiling
-v
option, setting with: Options
getline
command into, using: Getline/Variable/Coprocess, Getline/Variable/Pipe, Getline/Variable/File, Getline/Variable
|
): Regexp Operators
|
), |
operator (I/O): Precedence, Getline/Pipe
|
), |&
I/O operator (I/O): Two-way I/O
|
), |&
operator (I/O): Precedence, Getline/Coprocess
|
), |&
operator (I/O), two-way communications: Portal Files
|
), ||
operator: Precedence, Boolean Ops
vname
internal variable: Internals
w
utility: Constant Size
wc
utility: Wc Program
wc.awk
program: Wc Program
while
statement: While Statement, Regexp Usage
gawk
): GNU Regexp Operators
wordfreq.awk
program: Word Sorting
xgettext
utility: String Extraction
xor
function (gawk
): Bitwise Functions
{}
(braces), actions and: Action Overview
{}
(braces), pgawk
program: Profiling
{}
(braces), statements, grouping: Statements
|
(vertical bar): Regexp Operators
|
(vertical bar), |
operator (I/O): Precedence, Redirection, Getline/Pipe
|
(vertical bar), |&
operator (I/O): Two-way I/O, Precedence, Redirection, Getline/Coprocess
|
(vertical bar), |&
operator (I/O), pipes, closing: Close Files And Pipes
|
(vertical bar), |&
operator (I/O), two-way communications: Portal Files
|
(vertical bar), ||
operator: Precedence, Boolean Ops
~
(tilde), ~
operator: Expression Patterns, Precedence, Typing and Comparison, Regexp Constants, Computed Regexps, Case-sensitivity
These commands are available on POSIX-compliant systems, as well as on traditional Unix-based systems. If you are using some other operating system, you still need to be familiar with the ideas of I/O redirection and pipes.
Often, these systems
use gawk
for their awk
implementation!
All such differences
appear in the index under the
entry "differences in awk
and gawk
."
GNU stands for "GNU's not Unix."
The terminology "GNU/Linux" is explained in the Glossary.
Although we generally recommend the use of single quotes around the program text, double quotes are needed here in order to put the single quote into the message.
The #!
mechanism works on
Linux systems,
systems derived from the 4.4-Lite Berkeley Software Distribution,
and most commercial Unix systems.
The
line beginning with #!
lists the full file name of an interpreter
to run and an optional initial command-line argument to pass to that
interpreter. The operating system then runs the interpreter with the given
argument and the full argument list of the executed program. The first argument
in the list is the full file name of the awk
program. The rest of the
argument list contains either options to awk
, or data files,
or both.
In the C shell (csh
), you need to type
a semicolon and then a backslash at the end of the first line; see
awk
Statements Versus Lines, for an
explanation. In a POSIX-compliant shell, such as the Bourne
shell or bash
, you can type the example as shown. If the command
echo $path
produces an empty output line, you are most likely
using a POSIX-compliant shell. Otherwise, you are probably using the
C shell or a shell derived from it.
On some
very old systems, you may need to use ls -lg
to get this output.
The ?
and :
referred to here is the
three-operand conditional expression described in
Conditional Expressions.
Splitting lines after ?
and :
is a minor gawk
extension; if --posix
is specified
(see Command-Line Options), then this extension is disabled.
In other literature, you may see a character list referred to as either a character set, a character class, or a bracket expression.
Use two backslashes if you're using a string constant with a regexp operator or function.
Experienced C and C++ programmers will note
that it is possible, using something like
IGNORECASE = 1 && /foObAr/ { ... }
and
IGNORECASE = 0 || /foobar/ { ... }
.
However, this is somewhat obscure and we don't recommend it.
At least that we know about.
In POSIX awk
, newlines are not
considered whitespace for separating fields.
The sed
utility is a "stream editor."
Its behavior is also defined by the POSIX standard.
Older versions of
gawk
would interpret these names internally only if the system
did not actually have a /dev/fd
directory or any of the other
special files listed earlier. Usually this didn't make a difference,
but sometimes it did; thus, it was decided to make gawk
's
behavior consistent on all systems and to have it always interpret
the special file names itself.
The technical terminology is rather morbid. The finished child is called a "zombie," and cleaning up after it is referred to as "reaping."
The internal representation of all numbers, including integers, uses double-precision floating-point numbers. On most modern systems, these are in IEEE 754 standard format.
Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.
The POSIX standard is under
revision. The revised standard's rules for typing and comparison are
the same as just described for gawk
.
The original version of awk
used to keep
reading and ignoring input until the end of the file was seen.
In
POSIX awk
, newline does not count as whitespace.
Some early implementations of Unix awk
initialized
FILENAME
to "-"
, even if there were data files to be
processed. This behavior was incorrect and should not be relied
upon in your programs.
Thanks to Michael Brennan for pointing this out.
The C version of rand
is known to produce fairly poor sequences of random numbers.
However, nothing requires that an awk
implementation use the C
rand
to implement the awk
version of rand
.
In fact, gawk
uses the BSD random
function, which is
considerably better than rand
, to produce random numbers.
Computer-generated random numbers really are not truly random. They are technically known as "pseudorandom." This means that while the numbers in a sequence appear to be random, you can in fact generate the same sequence of random numbers over and over again.
Unless
you use the --non-decimal-data
option, which isn't recommended.
See Allowing Nondecimal Input Data, for more information.
This is different from C and C++, in which the first character is number zero.
This consequence was certainly unintended.
As this Web page was being finalized,
we learned that the POSIX standard will not use these rules.
However, it was too late to change gawk
for the 3.1 release.
gawk
behaves as described here.
A program is interactive if the standard output is connected to a terminal device.
See Glossary, especially the entries "Epoch" and "UTC."
The GNU date
utility can
also do many of the things described here. Its use may be preferable
for simple time-related operations in shell scripts.
Occasionally there are minutes in a year with a leap second, which is why the seconds can go up to 60.
As this
is a recent standard, not every system's strftime
necessarily
supports all of the conversions listed here.
If you don't understand any of this, don't worry about
it; these facilities are meant to make it easier to "internationalize"
programs.
Other internationalization features are described in
Internationalization with gawk
.
This is because ISO C leaves the
behavior of the C version of strftime
undefined and gawk
uses the system's version of strftime
if it's there.
Typically, the conversion specifier either does not appear in the
returned string or appears literally.
This example
shows that 0's come in on the left side. For gawk
, this is
always true, but in some languages, it's possible to have the left side
fill with 1's. Caveat emptor.
For some operating systems, the gawk
port doesn't support GNU gettext
. This applies most notably to
the PC operating systems. As such, these features are not available
if you are using one of those operating systems. Sorry.
Americans
use a comma every three decimal places and a period for the decimal
point, while many Europeans do exactly the opposite:
1,234.56
versus 1.234,56
.
Starting with gettext
version 0.11.1, the xgettext
utility that comes with GNU
gettext
can handle .awk
files.
This example is borrowed
from the GNU gettext
manual.
This is good fodder for an "Obfuscated
awk
" contest.
Perhaps it would be better if it were called "Hippy." Ah, well.
This is very
different from the same operator in the C shell, csh
.
Not recommended.
Your version of gawk
may use a different directory; it
will depend upon how gawk
was built and installed. The actual
directory is the value of $(datadir)
generated when
gawk
was configured. You probably don't need to worry about this,
though.
The effects are
not identical. Output of the transformed
record will be in all lowercase, while IGNORECASE
preserves the original
contents of the input record.
While all the library routines could have
been rewritten to use this convention, this was not done, in order to
show how my own awk
programming style has evolved and to
provide some basis for this discussion.
gawk
's --dump-variables
command-line
option is useful for verifying this.
http://mathworld.wolfram.com/CliffRandomNumberGenerator.hmtl
ASCII
has been extended in many countries to use the values from 128 to 255
for country-specific characters. If your system uses these extensions,
you can simplify _ord_init
to simply loop from 0 to 255.
It would
be nice if awk
had an assignment operator for concatenation.
The lack of an explicit operator for concatenation makes string operations
more difficult than they really need to be.
This
function was written before gawk
acquired the ability to
split strings into single characters using ""
as the separator.
We have left it alone, since using substr
is more portable.
It is often the case that password information is stored in a network database.
It also introduces a subtle bug; if a match happens, we output the translated line, not the original.
wc
can't just use the value of
FNR
in endfile
. If you examine
the code in
Noting Data File Boundaries
you will see that
FNR
has already been reset by the time
endfile
is called.
On some older
System V systems,
tr
may require that the lists be written as
range expressions enclosed in square brackets ([a-z]
) and quoted,
to prevent the shell from attempting a file name expansion. This is
not a feature.
This
program was written before gawk
acquired the ability to
split each character in a string into separate array elements.
"Real world" is defined as "a program actually used to get something done."
On some very old versions of awk
, the test
getline junk < t
can loop forever if the file exists but is empty.
Caveat emptor.
This version is edited
slightly for presentation. The complete version can be found in
extension/filefuncs.c
in the gawk
distribution.
Compiled programs are typically written in lower-level languages such as C, C++, Fortran, or Ada, and then translated, or compiled, into a form that the computer can execute directly.
Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.