[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
11.13 Limitations of Usual Tools
The small set of tools you can expect to find on any machine can still include some limitations you should be aware of.
-
awk
-
Don't leave white space before the opening parenthesis in a user function call. Posix does not allow this and GNU Awk rejects it:
$ gawk 'function die () { print "Aaaaarg!" } BEGIN { die () }' gawk: cmd. line:2: BEGIN { die () } gawk: cmd. line:2: ^ parse error $ gawk 'function die () { print "Aaaaarg!" } BEGIN { die() }' Aaaaarg!
Posix says that if a program contains only ‘BEGIN’ actions, and contains no instances of
getline
, then the program merely executes the actions without reading input. However, traditional Awk implementations (such as Solaris 10awk
) read and discard input in this case. Portable scripts can redirect input from ‘/dev/null’ to work around the problem. For example:awk 'BEGIN {print "hello world"}' </dev/null
Posix says that in an ‘END’ action, ‘$NF’ (and presumably, ‘$1’) retain their value from the last record read, if no intervening ‘getline’ occurred. However, some implementations (such as Solaris 10 ‘/usr/bin/awk’, ‘nawk’, or Darwin ‘awk’) reset these variables. A workaround is to use an intermediate variable prior to the ‘END’ block. For example:
$ cat end.awk { tmp = $1 } END { print "a", $1, $NF, "b", tmp } $ echo 1 | awk -f end.awk a b 1 $ echo 1 | gawk -f end.awk a 1 1 b 1
If you want your program to be deterministic, don't depend on
for
on arrays:$ cat for.awk END { arr["foo"] = 1 arr["bar"] = 1 for (i in arr) print i } $ gawk -f for.awk </dev/null foo bar $ nawk -f for.awk </dev/null bar foo
Some Awk implementations, such as HP-UX 11.0's native one, mishandle anchors:
$ echo xfoo | $AWK '/foo|^bar/ { print }' $ echo bar | $AWK '/foo|^bar/ { print }' bar $ echo xfoo | $AWK '/^bar|foo/ { print }' xfoo $ echo bar | $AWK '/^bar|foo/ { print }' bar
Either do not depend on such patterns (i.e., use ‘/^(.*foo|bar)/’, or use a simple test to reject such implementations.
On ‘ia64-hp-hpux11.23’, Awk mishandles
printf
conversions after%u
:$ awk 'BEGIN { printf "%u %d\n", 0, -1 }' 0 0
AIX version 5.2 has an arbitrary limit of 399 on the length of regular expressions and literal strings in an Awk program.
Traditional Awk implementations derived from Unix version 7, such as Solaris
/bin/awk
, have many limitations and do not conform to Posix. NowadaysAC_PROG_AWK
(see section Particular Program Checks) finds you an Awk that doesn't have these problems, but if for some reason you prefer not to useAC_PROG_AWK
you may need to address them.Traditional Awk does not support multidimensional arrays or user-defined functions.
Traditional Awk does not support the ‘-v’ option. You can use assignments after the program instead, e.g.,
$AWK '{print v $1}' v=x
; however, don't forget that such assignments are not evaluated until they are encountered (e.g., after anyBEGIN
action).Traditional Awk does not support the keywords
delete
ordo
.Traditional Awk does not support the expressions
a?b:c
,!a
,a^b
, ora^=b
.Traditional Awk does not support the predefined
CONVFMT
variable.Traditional Awk supports only the predefined functions
exp
,index
,int
,length
,log
,split
,sprintf
,sqrt
, andsubstr
.Traditional Awk
getline
is not at all compatible with Posix; avoid it.Traditional Awk has
for (i in a) …
but no other uses of thein
keyword. For example, it lacksif (i in a) …
.In code portable to both traditional and modern Awk,
FS
must be a string containing just one ordinary character, and similarly for the field-separator argument tosplit
.Traditional Awk has a limit of 99 fields in a record. Since some Awk implementations, like Tru64's, split the input even if you don't refer to any field in the script, to circumvent this problem, set ‘FS’ to an unusual character and use
split
.Traditional Awk has a limit of at most 99 bytes in a number formatted by
OFMT
; for example,OFMT="%.300e"; print 0.1;
typically dumps core.The original version of Awk had a limit of at most 99 bytes per
split
field, 99 bytes persubstr
substring, and 99 bytes per run of non-special characters in aprintf
format, but these bugs have been fixed on all practical hosts that we know of.HP-UX 11.00 and IRIX 6.5 Awk require that input files have a line length of at most 3070 bytes.
-
basename
-
Not all hosts have a working
basename
. You can useexpr
instead. -
cat
-
Don't rely on any option.
-
cc
-
The command ‘cc -c foo.c’ traditionally produces an object file named ‘foo.o’. Most compilers allow ‘-c’ to be combined with ‘-o’ to specify a different object file name, but Posix does not require this combination and a few compilers lack support for it. See section C Compiler Characteristics, for how GNU Make tests for this feature with
AC_PROG_CC_C_O
.When a compilation such as ‘cc -o foo foo.c’ fails, some compilers (such as CDS on Reliant Unix) leave a ‘foo.o’.
HP-UX
cc
doesn't accept ‘.S’ files to preprocess and assemble. ‘cc -c foo.S’ appears to succeed, but in fact does nothing.The default executable, produced by ‘cc foo.c’, can be
- ‘a.out’ — usual Posix convention.
- ‘b.out’ — i960 compilers (including
gcc
). - ‘a.exe’ — DJGPP port of
gcc
. - ‘a_out.exe’ — GNV
cc
wrapper for DEC C on OpenVMS. - ‘foo.exe’ — various MS-DOS compilers.
The C compiler's traditional name is
cc
, but other names likegcc
are common. Posix 1003.1-2001 specifies the namec99
, but older Posix editions specifiedc89
and anyway these standard names are rarely used in practice. Typically the C compiler is invoked from makefiles that use ‘$(CC)’, so the value of the ‘CC’ make variable selects the compiler name. -
chgrp
-
chown
-
It is not portable to change a file's group to a group that the owner does not belong to.
-
chmod
-
Avoid usages like ‘chmod -w file’; use ‘chmod a-w file’ instead, for two reasons. First, plain ‘-w’ does not necessarily make the file unwritable, since it does not affect mode bits that correspond to bits in the file mode creation mask. Second, Posix says that the ‘-w’ might be interpreted as an implementation-specific option, not as a mode; Posix suggests using ‘chmod -- -w file’ to avoid this confusion, but unfortunately ‘--’ does not work on some older hosts.
-
cmp
-
cmp
performs a raw data comparison of two files, whilediff
compares two text files. Therefore, if you might compare DOS files, even if only checking whether two files are different, usediff
to avoid spurious differences due to differences of newline encoding. -
cp
-
Avoid the ‘-r’ option, since Posix 1003.1-2004 marks it as obsolescent and its behavior on special files is implementation-defined. Use ‘-R’ instead. On GNU hosts the two options are equivalent, but on Solaris hosts (for example)
cp -r
reads from pipes instead of replicating them.Some
cp
implementations (e.g., BSD/OS 4.2) do not allow trailing slashes at the end of nonexistent destination directories. To avoid this problem, omit the trailing slashes. For example, use ‘cp -R source /tmp/newdir’ rather than ‘cp -R source /tmp/newdir/’ if ‘/tmp/newdir’ does not exist.The ancient SunOS 4
cp
does not support ‘-f’, although itsmv
does.Traditionally, file timestamps had 1-second resolution, and ‘cp -p’ copied the timestamps exactly. However, many modern file systems have timestamps with 1-nanosecond resolution. Unfortunately, ‘cp -p’ implementations truncate timestamps when copying files, so this can result in the destination file appearing to be older than the source. The exact amount of truncation depends on the resolution of the system calls that
cp
uses; traditionally this wasutime
, which has 1-second resolution, but some newercp
implementations useutimes
, which has 1-microsecond resolution. These newer implementations include GNU Core Utilities 5.0.91 or later, and Solaris 8 (sparc) patch 109933-02 or later. Unfortunately as of January 2006 there is still no system call to set timestamps to the full nanosecond resolution.Bob Proulx notes that ‘cp -p’ always tries to copy ownerships. But whether it actually does copy ownerships or not is a system dependent policy decision implemented by the kernel. If the kernel allows it then it happens. If the kernel does not allow it then it does not happen. It is not something
cp
itself has control over.In Unix System V any user can chown files to any other user, and System V also has a non-sticky ‘/tmp’. That probably derives from the heritage of System V in a business environment without hostile users. BSD changed this to be a more secure model where only root can
chown
files and a sticky ‘/tmp’ is used. That undoubtedly derives from the heritage of BSD in a campus environment.GNU/Linux and Solaris by default follow BSD, but can be configured to allow a System V style
chown
. On the other hand, HP-UX follows System V, but can be configured to use the modern security model and disallowchown
. Since it is an administrator-configurable parameter you can't use the name of the kernel as an indicator of the behavior. -
date
-
Some versions of
date
do not recognize special ‘%’ directives, and unfortunately, instead of complaining, they just pass them through, and exit with success:$ uname -a OSF1 medusa.sis.pasteur.fr V5.1 732 alpha $ date "+%s" %s
-
diff
-
Option ‘-u’ is nonportable.
Some implementations, such as Tru64's, fail when comparing to ‘/dev/null’. Use an empty file instead.
-
dirname
-
Not all hosts have a working
dirname
, and you should instead useAS_DIRNAME
(see section Programming in M4sh). For example:dir=`dirname "$file"` # This is not portable. dir=`AS_DIRNAME(["$file"])` # This is more portable.
-
egrep
-
Posix 1003.1-2001 no longer requires
egrep
, but many hosts do not yet support the Posix replacementgrep -E
. Also, some traditional implementations do not work on long input lines. To work around these problems, invokeAC_PROG_EGREP
and then use$EGREP
.Portable extended regular expressions should use ‘\’ only to escape characters in the string ‘$()*+.?[\^{|’. For example, ‘\}’ is not portable, even though it typically matches ‘}’.
The empty alternative is not portable. Use ‘?’ instead. For instance with Digital Unix v5.0:
> printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$' |foo > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$' bar| > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$' foo |bar
$EGREP
also suffers the limitations ofgrep
(see Limitations of Usual Tools). -
expr
-
Not all implementations obey the Posix rule that ‘--’ separates options from arguments; likewise, not all implementations provide the extension to Posix that the first argument can be treated as part of a valid expression rather than an invalid option if it begins with ‘-’. When performing arithmetic, use ‘expr 0 + $var’ if ‘$var’ might be a negative number, to keep
expr
from interpreting it as an option.No
expr
keyword starts with ‘X’, so use ‘expr X"word" : 'Xregex'’ to keepexpr
from misinterpreting word.Don't use
length
,substr
,match
andindex
. -
expr
(‘|’) -
You can use ‘|’. Although Posix does require that ‘expr ''’ return the empty string, it does not specify the result when you ‘|’ together the empty string (or zero) with the empty string. For example:
expr '' \| ''
Posix 1003.2-1992 returns the empty string for this case, but traditional Unix returns ‘0’ (Solaris is one such example). In Posix 1003.1-2001, the specification was changed to match traditional Unix's behavior (which is bizarre, but it's too late to fix this). Please note that the same problem does arise when the empty string results from a computation, as in:
expr bar : foo \| foo : bar
Avoid this portability problem by avoiding the empty string.
-
expr
(‘:’) -
Portable
expr
regular expressions should use ‘\’ to escape only characters in the string ‘$()*.0123456789[\^n{}’. For example, alternation, ‘\|’, is common but Posix does not require its support, so it should be avoided in portable scripts. Similarly, ‘\+’ and ‘\?’ should be avoided.Portable
expr
regular expressions should not begin with ‘^’. Patterns are automatically anchored so leading ‘^’ is not needed anyway.On the other hand, the behavior of the ‘$’ anchor is not portable on multi-line strings. Posix is ambiguous whether the anchor applies to each line, as was done in older versions of GNU Coreutils, or whether it applies only to the end of the overall string, as in Coreutils 6.0 and most other implementations.
$ baz='foo > bar' $ expr "X$baz" : 'X\(foo\)$' $ expr-5.97 "X$baz" : 'X\(foo\)$' foo
The Posix standard is ambiguous as to whether ‘expr 'a' : '\(b\)'’ outputs ‘0’ or the empty string. In practice, it outputs the empty string on most platforms, but portable scripts should not assume this. For instance, the QNX 4.25 native
expr
returns ‘0’.One might think that a way to get a uniform behavior would be to use the empty string as a default value:
expr a : '\(b\)' \| ''
Unfortunately this behaves exactly as the original expression; see the
expr
(‘|’) entry for more information.Some ancient
expr
implementations (e.g., SunOS 4expr
and Solaris 8/usr/ucb/expr
) have a silly length limit that causesexpr
to fail if the matched substring is longer than 120 bytes. In this case, you might want to fall back on ‘echo|sed’ ifexpr
fails. Nowadays this is of practical importance only for the rare installer who mistakenly puts ‘/usr/ucb’ before ‘/usr/bin’ inPATH
.On Mac OS X 10.4,
expr
mishandles the pattern ‘[^-]’ in some cases. For example, the commandexpr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'
outputs ‘apple-darwin8.1.0’ rather than the correct ‘darwin8.1.0’. This particular case can be worked around by substituting ‘[^--]’ for ‘[^-]’.
Don't leave, there is some more!
The QNX 4.25
expr
, in addition of preferring ‘0’ to the empty string, has a funny behavior in its exit status: it's always 1 when parentheses are used!$ val=`expr 'a' : 'a'`; echo "$?: $val" 0: 1 $ val=`expr 'a' : 'b'`; echo "$?: $val" 1: 0 $ val=`expr 'a' : '\(a\)'`; echo "?: $val" 1: a $ val=`expr 'a' : '\(b\)'`; echo "?: $val" 1: 0
In practice this can be a big problem if you are ready to catch failures of
expr
programs with some other method (such as usingsed
), since you may get twice the result. For instance$ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'
outputs ‘a’ on most hosts, but ‘aa’ on QNX 4.25. A simple workaround consists of testing
expr
and using a variable set toexpr
or tofalse
according to the result.Tru64
expr
incorrectly treats the result as a number, if it can be interpreted that way:$ expr 00001 : '.*\(...\)' 1
On HP-UX 11,
expr
only supports a single sub-expression.$ expr 'Xfoo' : 'X\(f\(oo\)*\)$' expr: More than one '\(' was used.
-
fgrep
-
Posix 1003.1-2001 no longer requires
fgrep
, but many hosts do not yet support the Posix replacementgrep -F
. Also, some traditional implementations do not work on long input lines. To work around these problems, invokeAC_PROG_FGREP
and then use$FGREP
. -
find
-
The option ‘-maxdepth’ seems to be GNU specific. Tru64 v5.1, NetBSD 1.5 and Solaris
find
commands do not understand it.The replacement of ‘{}’ is guaranteed only if the argument is exactly {}, not if it's only a part of an argument. For instance on DU, and HP-UX 10.20 and HP-UX 11:
$ touch foo $ find . -name foo -exec echo "{}-{}" \; {}-{}
while GNU
find
reports ‘./foo-./foo’. -
grep
-
Portable scripts can rely on the
grep
options ‘-c’, ‘-l’, ‘-n’, and ‘-v’, but should avoid other options. For example, don't use ‘-w’, as Posix does not require it and Irix 6.5.16m'sgrep
does not support it. Also, portable scripts should not combine ‘-c’ with ‘-l’, as Posix does not allow this.Some of the options required by Posix are not portable in practice. Don't use ‘grep -q’ to suppress output, because many
grep
implementations (e.g., Solaris) do not support ‘-q’. Don't use ‘grep -s’ to suppress output either, because Posix says ‘-s’ does not suppress output, only some error messages; also, the ‘-s’ option of traditionalgrep
behaved like ‘-q’ does in most modern implementations. Instead, redirect the standard output and standard error (in case the file doesn't exist) ofgrep
to ‘/dev/null’. Check the exit status ofgrep
to determine whether it found a match.Some traditional
grep
implementations do not work on long input lines. On AIX the defaultgrep
silently truncates long lines on the input before matching.Also, many implementations do not support multiple regexps with ‘-e’: they either reject ‘-e’ entirely (e.g., Solaris) or honor only the last pattern (e.g., IRIX 6.5 and NeXT). To work around these problems, invoke
AC_PROG_GREP
and then use$GREP
.Another possible workaround for the multiple ‘-e’ problem is to separate the patterns by newlines, for example:
grep 'foo bar' in.txt
except that this fails with traditional
grep
implementations and with OpenBSD 3.8grep
.Traditional
grep
implementations (e.g., Solaris) do not support the ‘-E’ or ‘-F’ options. To work around these problems, invokeAC_PROG_EGREP
and then use$EGREP
, and similarly forAC_PROG_FGREP
and$FGREP
. Even if you are willing to require support for Posixgrep
, your script should not use both ‘-E’ and ‘-F’, since Posix does not allow this combination.Portable
grep
regular expressions should use ‘\’ only to escape characters in the string ‘$()*.0123456789[\^{}’. For example, alternation, ‘\|’, is common but Posix does not require its support in basic regular expressions, so it should be avoided in portable scripts. Solaris and HP-UXgrep
do not support it. Similarly, the following escape sequences should also be avoided: ‘\<’, ‘\>’, ‘\+’, ‘\?’, ‘\`’, ‘\'’, ‘\B’, ‘\b’, ‘\S’, ‘\s’, ‘\W’, and ‘\w’.Posix does not specify the behavior of
grep
on binary files. An example where this matters is using BSDgrep
to search text that includes embedded ANSI escape sequences for colored output to terminals (‘\033[m’ is the sequence to restore normal output); the behavior depends on whether input is seekable:$ printf 'esc\033[mape\n' > sample $ grep . sample Binary file sample matches $ cat sample | grep . escape
-
join
-
Solaris 8
join
has bugs when the second operand is standard input, and when standard input is a pipe. For example, the following shell script causes Solaris 8join
to loop forever:cat >file <<'EOF' 1 x 2 y EOF cat file | join file -
Use ‘join - file’ instead.
-
ln
-
Don't rely on
ln
having a ‘-f’ option. Symbolic links are not available on old systems; use ‘$(LN_S)’ as a portable substitute.For versions of the DJGPP before 2.04,
ln
emulates symbolic links to executables by generating a stub that in turn calls the real program. This feature also works with nonexistent files like in the Posix spec. So ‘ln -s file link’ generates ‘link.exe’, which attempts to call ‘file.exe’ if run. But this feature only works for executables, so ‘cp -p’ is used instead for these systems. DJGPP versions 2.04 and later have full support for symbolic links. -
ls
-
The portable options are ‘-acdilrtu’. Current practice is for ‘-l’ to output both owner and group, even though ancient versions of
ls
omitted the group.On ancient hosts, ‘ls foo’ sent the diagnostic ‘foo not found’ to standard output if ‘foo’ did not exist. Hence a shell command like ‘sources=`ls *.c 2>/dev/null`’ did not always work, since it was equivalent to ‘sources='*.c not found'’ in the absence of ‘.c’ files. This is no longer a practical problem, since current
ls
implementations send diagnostics to standard error.The behavior of
ls
on a directory that is being concurrently modified is not always predictable, because of a data race where cached information returned byreaddir
does not match the current directory state. In fact, MacOS 10.5 has an intermittent bug wherereaddir
, and thusls
, sometimes lists a file more than once if other files were added or removed from the directory immediately prior to thels
call. Sincels
already sorts its output, the duplicate entries can be avoided by piping the results throughuniq
. -
mkdir
-
No
mkdir
option is portable to older systems. Instead of ‘mkdir -p file-name’, you should useAS_MKDIR_P(file-name)
(see section Programming in M4sh) orAC_PROG_MKDIR_P
(see section Particular Program Checks).Combining the ‘-m’ and ‘-p’ options, as in ‘mkdir -m go-w -p dir’, often leads to trouble. FreeBSD
mkdir
incorrectly attempts to change the permissions of dir even if it already exists. HP-UX 11.23 and IRIX 6.5mkdir
often assign the wrong permissions to any newly-created parents of dir.Posix does not clearly specify whether ‘mkdir -p foo’ should succeed when ‘foo’ is a symbolic link to an already-existing directory. The GNU Core Utilities 5.1.0
mkdir
succeeds, but Solarismkdir
fails.Traditional
mkdir -p
implementations suffer from race conditions. For example, if you invokemkdir -p a/b
andmkdir -p a/c
at the same time, both processes might detect that ‘a’ is missing, one might create ‘a’, then the other might try to create ‘a’ and fail with aFile exists
diagnostic. The GNU Core Utilities (‘fileutils’ version 4.1), FreeBSD 5.0, NetBSD 2.0.2, and OpenBSD 2.4 are known to be race-free when two processes invokemkdir -p
simultaneously, but earlier versions are vulnerable. Solarismkdir
is still vulnerable as of Solaris 10, and other traditional Unix systems are probably vulnerable too. This possible race is harmful in parallel builds when several Make rules callmkdir -p
to construct directories. You may useinstall-sh -d
as a safe replacement, provided this script is recent enough; the copy shipped with Autoconf 2.60 and Automake 1.10 is OK, but copies from older versions are vulnerable. -
mkfifo
-
mknod
-
The GNU Coding Standards state that
mknod
is safe to use on platforms where it has been tested to exist; but it is generally portable only for creating named FIFOs, since device numbers are platform-specific. Autotest usesmkfifo
to implement parallel testsuites. Posix states that behavior is unspecified when opening a named FIFO for both reading and writing; on at least Cygwin, this results in failure on any attempt to read or write to that file descriptor. -
mktemp
-
Shell scripts can use temporary files safely with
mktemp
, but it does not exist on all systems. A portable way to create a safe temporary file name is to create a temporary directory with mode 700 and use a file inside this directory. Both methods prevent attackers from gaining control, thoughmktemp
is far less likely to fail gratuitously under attack.Here is sample code to create a new temporary directory safely:
# Create a temporary directory $tmp in $TMPDIR (default /tmp). # Use mktemp if possible; otherwise fall back on mkdir, # with $RANDOM to make collisions less likely. : ${TMPDIR=/tmp} { tmp=` (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null ` && test -n "$tmp" && test -d "$tmp" } || { tmp=$TMPDIR/foo$$-$RANDOM (umask 077 && mkdir "$tmp") } || exit $?
-
mv
-
The only portable options are ‘-f’ and ‘-i’.
Moving individual files between file systems is portable (it was in Unix version 6), but it is not always atomic: when doing ‘mv new existing’, there's a critical section where neither the old nor the new version of ‘existing’ actually exists.
On some systems moving files from ‘/tmp’ can sometimes cause undesirable (but perfectly valid) warnings, even if you created these files. This is because ‘/tmp’ belongs to a group that ordinary users are not members of, and files created in ‘/tmp’ inherit the group of ‘/tmp’. When the file is copied,
mv
issues a diagnostic without failing:$ touch /tmp/foo $ mv /tmp/foo . error-->mv: ./foo: set owner/group (was: 100/0): Operation not permitted $ echo $? 0 $ ls foo foo
This annoying behavior conforms to Posix, unfortunately.
Moving directories across mount points is not portable, use
cp
andrm
.DOS variants cannot rename or remove open files, and do not support commands like ‘mv foo bar >foo’, even though this is perfectly portable among Posix hosts.
-
od
-
In Mac OS X 10.3,
od
does not support the standard Posix options ‘-A’, ‘-j’, ‘-N’, or ‘-t’, or the XSI option ‘-s’. The only supported Posix option is ‘-v’, and the only supported XSI options are those in ‘-bcdox’. The BSDhexdump
program can be used instead.This problem no longer exists in Mac OS X 10.4.3.
-
rm
-
The ‘-f’ and ‘-r’ options are portable.
It is not portable to invoke
rm
without operands. For example, on many systems ‘rm -f -r’ (with no other arguments) silently succeeds without doing anything, but it fails with a diagnostic on NetBSD 2.0.2.A file might not be removed even if its parent directory is writable and searchable. Many Posix hosts cannot remove a mount point, a named stream, a working directory, or a last link to a file that is being executed.
DOS variants cannot rename or remove open files, and do not support commands like ‘rm foo >foo’, even though this is perfectly portable among Posix hosts.
-
rmdir
-
Just as with
rm
, some platforms refuse to remove a working directory. -
sed
-
Patterns should not include the separator (unless escaped), even as part of a character class. In conformance with Posix, the Cray
sed
rejects ‘s/[^/]*$//’: use ‘s,[^/]*$,,’.Avoid empty patterns within parentheses (i.e., ‘\(\)’). Posix does not require support for empty patterns, and Unicos 9
sed
rejects them.Unicos 9
sed
loops endlessly on patterns like ‘.*\n.*’.Sed scripts should not use branch labels longer than 7 characters and should not contain comments. HP-UX sed has a limit of 99 commands (not counting ‘:’ commands) and 48 labels, which can not be circumvented by using more than one script file. It can execute up to 19 reads with the ‘r’ command per cycle. Solaris
/usr/ucb/sed
rejects usages that exceed a limit of about 6000 bytes for the internal representation of commands.Avoid redundant ‘;’, as some
sed
implementations, such as NetBSD 1.4.2's, incorrectly try to interpret the second ‘;’ as a command:$ echo a | sed 's/x/x/;;s/x/x/' sed: 1: "s/x/x/;;s/x/x/": invalid command code ;
Input should not have unreasonably long lines, since some
sed
implementations have an input buffer limited to 4000 bytes. Likewise, not allsed
implementations can handle embeddedNUL
or a missing trailing newline.Portable
sed
regular expressions should use ‘\’ only to escape characters in the string ‘$()*.0123456789[\^n{}’. For example, alternation, ‘\|’, is common but Posix does not require its support, so it should be avoided in portable scripts. Solarissed
does not support alternation; e.g., ‘sed '/a\|b/d'’ deletes only lines that contain the literal string ‘a|b’. Similarly, ‘\+’ and ‘\?’ should be avoided.Anchors (‘^’ and ‘$’) inside groups are not portable.
Nested parentheses in patterns (e.g., ‘\(\(a*\)b*)\)’) are quite portable to current hosts, but was not supported by some ancient
sed
implementations like SVR3.Some
sed
implementations, e.g., Solaris, restrict the special role of the asterisk to one-character regular expressions. This may lead to unexpected behavior:$ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g' x2x4 $ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g' x
The ‘-e’ option is mostly portable. However, its argument cannot start with ‘a’, ‘c’, or ‘i’, as this runs afoul of a Tru64 5.1 bug. Also, its argument cannot be empty, as this fails on AIX 5.3. Some people prefer to use ‘-e’:
sed -e 'command-1' \ -e 'command-2'
as opposed to the equivalent:
sed ' command-1 command-2 '
The following usage is sometimes equivalent:
sed 'command-1;command-2'
but Posix says that this use of a semicolon has undefined effect if command-1's verb is ‘{’, ‘a’, ‘b’, ‘c’, ‘i’, ‘r’, ‘t’, ‘w’, ‘:’, or ‘#’, so you should use semicolon only with simple scripts that do not use these verbs.
Commands inside { } brackets are further restricted. Posix says that they cannot be preceded by addresses, ‘!’, or ‘;’, and that each command must be followed immediately by a newline, without any intervening blanks or semicolons. The closing bracket must be alone on a line, other than white space preceding or following it.
Contrary to yet another urban legend, you may portably use ‘&’ in the replacement part of the
s
command to mean “what was matched”. All descendants of Unix version 7sed
(at least; we don't have first hand experience with oldersed
implementations) have supported it.Posix requires that you must not have any white space between ‘!’ and the following command. It is OK to have blanks between the address and the ‘!’. For instance, on Solaris:
$ echo "foo" | sed -n '/bar/ ! p' error-->Unrecognized command: /bar/ ! p $ echo "foo" | sed -n '/bar/! p' error-->Unrecognized command: /bar/! p $ echo "foo" | sed -n '/bar/ !p' foo
Posix also says that you should not combine ‘!’ and ‘;’. If you use ‘!’, it is best to put it on a command that is delimited by newlines rather than ‘;’.
Also note that Posix requires that the ‘b’, ‘t’, ‘r’, and ‘w’ commands be followed by exactly one space before their argument. On the other hand, no white space is allowed between ‘:’ and the subsequent label name.
If a sed script is specified on the command line and ends in an ‘a’, ‘c’, or ‘i’ command, the last line of inserted text should be followed by a newline. Otherwise some
sed
implementations (e.g., OpenBSD 3.9) do not append a newline to the inserted text.Many
sed
implementations (e.g., MacOS X 10.4, OpenBSD 3.9, Solaris 10/usr/ucb/sed
) strip leading white space from the text of ‘a’, ‘c’, and ‘i’ commands. Prepend a backslash to work around this incompatibility with Posix:$ echo flushleft | sed 'a\ > indented > ' flushleft indented $ echo foo | sed 'a\ > \ indented > ' flushleft indented
Posix requires that with an empty regular expression, the last non-empty regular expression from either an address specification or substitution command is applied. However, busybox 1.6.1 complains when using a substitution command with a replacement containing a back-reference to an empty regular expression; the workaround is repeating the regular expression.
$ echo abc | busybox sed '/a\(b\)c/ s//\1/' sed: No previous regexp. $ echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/' b
-
sed
(‘t’) -
Some old systems have
sed
that “forget” to reset their ‘t’ flag when starting a new cycle. For instance on MIPS RISC/OS, and on IRIX 5.3, if you run the followingsed
script (the line numbers are not actual part of the texts):s/keep me/kept/g # a t end # b s/.*/deleted/g # c :end # d
on
delete me # 1 delete me # 2 keep me # 3 delete me # 4
you get
deleted delete me kept deleted
instead of
deleted deleted kept deleted
Why? When processing line 1, (c) matches, therefore sets the ‘t’ flag, and the output is produced. When processing line 2, the ‘t’ flag is still set (this is the bug). Command (a) fails to match, but
sed
is not supposed to clear the ‘t’ flag when a substitution fails. Command (b) sees that the flag is set, therefore it clears it, and jumps to (d), hence you get ‘delete me’ instead of ‘deleted’. When processing line (3), ‘t’ is clear, (a) matches, so the flag is set, hence (b) clears the flags and jumps. Finally, since the flag is clear, line 4 is processed properly.There are two things one should remember about ‘t’ in
sed
. Firstly, always remember that ‘t’ jumps if some substitution succeeded, not only the immediately preceding substitution. Therefore, always use a fake ‘t clear’ followed by a ‘:clear’ on the next line, to reset the ‘t’ flag where needed.Secondly, you cannot rely on
sed
to clear the flag at each new cycle.One portable implementation of the script above is:
t clear :clear s/keep me/kept/g t end s/.*/deleted/g :end
-
sleep
-
Using
sleep
is generally portable. However, remember that adding asleep
to work around timestamp issues, with a minimum granularity of one second, doesn't scale well for parallel builds on modern machines with sub-second process completion. -
sort
-
Remember that sort order is influenced by the current locale. Inside ‘configure’, the C locale is in effect, but in Makefile snippets, you may need to specify
LC_ALL=C sort
. -
tar
-
There are multiple file formats for
tar
; if you use Automake, the macroAM_INIT_AUTOMAKE
has some options controlling which level of portability to use. -
touch
-
If you specify the desired timestamp (e.g., with the ‘-r’ option),
touch
typically uses theutime
orutimes
system call, which can result in the same kind of timestamp truncation problems that ‘cp -p’ has.On ancient BSD systems,
touch
or any command that results in an empty file does not update the timestamps, so use a command likeecho
as a workaround. Also, GNUtouch
3.16r (and presumably all before that) fails to work on SunOS 4.1.3 when the empty file is on an NFS-mounted 4.2 volume. However, these problems are no longer of practical concern. -
tr
-
Not all versions of
tr
handle all backslash character escapes. For example, Solaris 10/usr/ucb/tr
falls over, even though Solaris contains more moderntr
in other locations. Therefore, it is more portable to use octal escapes, even though this ties the result to ASCII, when usingtr
to delete newlines or carriage returns.$ { echo moon; echo light; } | /usr/ucb/tr -d '\n' ; echo moo light $ { echo moon; echo light; } | /usr/bin/tr -d '\n' ; echo moonlight $ { echo moon; echo light; } | /usr/ucb/tr -d '\012' ; echo moonlight
Posix requires
tr
to operate on binary files. But at least Solaris/usr/ucb/tr
still fails to handle ‘\0’ as the octal escape forNUL
. On Solaris, when usingtr
to neutralize a binary file by convertingNUL
to a different character, it is necessary to use/usr/xpg4/bin/tr
instead.$ printf 'a\0b\n' | /usr/ucb/tr '\0' '~' | wc -c 3 $ printf 'a\0b\n' | /usr/xpg4/bin/tr '\0' '~' | wc -c 4
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |