manpagez: man pages & more
info coreutils
Home | html | info | man
[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

7.1 sort: Sort text files

sort sorts, merges, or compares all the lines from the given files, or standard input if none are given or for a file of ‘-’. By default, sort writes the results to standard output. Synopsis:

 
sort [option]… [file]…

sort has three modes of operation: sort (the default), merge, and check for sortedness. The following options change the operation mode:

-c
--check
--check=diagnose-first

Check whether the given file is already sorted: if it is not all sorted, print a diagnostic containing the first out-of-order line and exit with a status of 1. Otherwise, exit successfully. At most one input file can be given.

-C
--check=quiet
--check=silent

Exit successfully if the given file is already sorted, and exit with status 1 otherwise. At most one input file can be given. This is like ‘-c’, except it does not print a diagnostic.

-m
--merge

Merge the given files by sorting them as a group. Each input file must always be individually sorted. It always works to sort instead of merge; merging is provided because it is faster, in the case where it works.

A pair of lines is compared as follows: sort compares each pair of fields, in the order specified on the command line, according to the associated ordering options, until a difference is found or no fields are left. If no key fields are specified, sort uses a default key of the entire line. Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than ‘--reverse’ (‘-r’) were specified. The ‘--stable’ (‘-s’) option disables this last-resort comparison so that lines in which all fields compare equal are left in their original relative order. The ‘--unique’ (‘-u’) option also disables the last-resort comparison.

Unless otherwise specified, all comparisons use the character collating sequence specified by the LC_COLLATE locale.(2)

GNU sort (as specified for all GNU utilities) has no limit on input line length or restrictions on bytes allowed within lines. In addition, if the final byte of an input file is not a newline, GNU sort silently supplies one. A line's trailing newline is not part of the line for comparison purposes.

Exit status:

 
0 if no error occurred
1 if invoked with ‘-c’ or ‘-C’ and the input is not sorted
2 if an error occurred

If the environment variable TMPDIR is set, sort uses its value as the directory for temporary files instead of ‘/tmp’. The ‘--temporary-directory’ (‘-T’) option in turn overrides the environment variable.

The following options affect the ordering of output lines. They may be specified globally or as part of a specific key field. If no key fields are specified, global options apply to comparison of entire lines; otherwise the global options are inherited by key fields that do not specify any special options of their own. In pre-POSIX versions of sort, global options affect only later key fields, so portable shell scripts should specify global options first.

-b
--ignore-leading-blanks

Ignore leading blanks when finding sort keys in each line. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

-d
--dictionary-order

Sort in phone directory order: ignore all characters except letters, digits and blanks when sorting. By default letters and digits are those of ASCII and a blank is a space or a tab, but the LC_CTYPE locale can change this.

-f
--ignore-case

Fold lowercase characters into the equivalent uppercase characters when comparing so that, for example, ‘b’ and ‘B’ sort as equal. The LC_CTYPE locale determines character types.

-g
--general-numeric-sort

Sort numerically, using the standard C function strtod to convert a prefix of each line to a double-precision floating point number. This allows floating point numbers to be specified in scientific notation, like 1.0e-34 and 10e100. The LC_NUMERIC locale determines the decimal-point character. Do not report overflow, underflow, or conversion errors. Use the following collating sequence:

  • Lines that do not start with numbers (all considered to be equal).
  • NaNs (“Not a Number” values, in IEEE floating point arithmetic) in a consistent but machine-dependent order.
  • Minus infinity.
  • Finite numbers in ascending numeric order (with -0 and +0 equal).
  • Plus infinity.

Use this option only if there is no alternative; it is much slower than ‘--numeric-sort’ (‘-n’) and it can lose information when converting to floating point.

-i
--ignore-nonprinting

Ignore nonprinting characters. The LC_CTYPE locale determines character types. This option has no effect if the stronger ‘--dictionary-order’ (‘-d’) option is also given.

-M
--month-sort

An initial string, consisting of any amount of blanks, followed by a month name abbreviation, is folded to UPPER case and compared in the order ‘JAN’ < ‘FEB’ < … < ‘DEC’. Invalid names compare low to valid names. The LC_TIME locale category determines the month spellings. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

-n
--numeric-sort

Sort numerically. The number begins each line and consists of optional blanks, an optional ‘-’ sign, and zero or more digits possibly separated by thousands separators, optionally followed by a decimal-point character and zero or more digits. An empty number is treated as ‘0’. The LC_NUMERIC locale specifies the decimal-point character and thousands separator. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

Comparison is exact; there is no rounding error.

Neither a leading ‘+’ nor exponential notation is recognized. To compare such strings numerically, use the ‘--general-numeric-sort’ (‘-g’) option.

-r
--reverse

Reverse the result of comparison, so that lines with greater key values appear earlier in the output instead of later.

-R
--random-sort

Sort by hashing the input keys and then sorting the hash values. Choose the hash function at random, ensuring that it is free of collisions so that differing keys have differing hash values. This is like a random permutation of the inputs (see section shuf: Shuffling text), except that keys with the same value sort together.

If multiple random sort fields are specified, the same random hash function is used for all fields. To use different random hash functions for different fields, you can invoke sort more than once.

The choice of hash function is affected by the ‘--random-source’ option.

Other options are:

--compress-program=prog

Compress any temporary files with the program prog.

With no arguments, prog must compress standard input to standard output, and when given the ‘-d’ option it must decompress standard input to standard output.

Terminate with an error if prog exits with nonzero status.

Whitespace and the backslash character should not appear in prog; they are reserved for future use.

-k pos1[,pos2]
--key=pos1[,pos2]

Specify a sort field that consists of the part of the line between pos1 and pos2 (or the end of the line, if pos2 is omitted), inclusive.

Each pos has the form ‘f[.c][opts]’, where f is the number of the field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If ‘.c’ is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options, allowing individual keys to be sorted according to different rules; see below for details. Keys can span multiple fields.

Example: To sort on the second field, use ‘--key=2,2’ (‘-k 2,2’). See below for more examples.

-o output-file
--output=output-file

Write output to output-file instead of standard output. Normally, sort reads all input before opening output-file, so you can safely sort a file in place by using commands like sort -o F F and cat F | sort -o F. However, sort with ‘--merge’ (‘-m’) can open the output file before reading all input, so a command like cat F | sort -m -o F - G is not safe as sort might start writing ‘F’ before cat is done reading it.

On newer systems, ‘-o’ cannot appear after an input file if POSIXLY_CORRECT is set, e.g., ‘sort F -o F’. Portable scripts should specify ‘-o output-file’ before any input files.

--random-source=file

Use file as a source of random data used to determine which random hash function to use with the ‘-R’ option. See section Sources of random data.

-s
--stable

Make sort stable by disabling its last-resort comparison. This option has no effect if no fields or global ordering options other than ‘--reverse’ (‘-r’) are specified.

-S size
--buffer-size=size

Use a main-memory sort buffer of the given size. By default, size is in units of 1024 bytes. Appending ‘%’ causes size to be interpreted as a percentage of physical memory. Appending ‘K’ multiplies size by 1024 (the default), ‘M’ by 1,048,576, ‘G’ by 1,073,741,824, and so on for ‘T’, ‘P’, ‘E’, ‘Z’, and ‘Y’. Appending ‘b’ causes size to be interpreted as a byte count, with no multiplication.

This option can improve the performance of sort by causing it to start with a larger or smaller sort buffer than the default. However, this option affects only the initial buffer size. The buffer grows beyond size if sort encounters input lines larger than size.

-t separator
--field-separator=separator

Use character separator as the field separator when finding the sort keys in each line. By default, fields are separated by the empty string between a non-blank character and a blank character. By default a blank is a space or a tab, but the LC_CTYPE locale can change this.

That is, given the input line ‘ foo bar’, sort breaks it into fields ‘ foo’ and ‘ bar’. The field separator is not considered to be part of either the field preceding or the field following, so with ‘sort -t " "’ the same input line has three fields: an empty field, ‘foo’, and ‘bar’. However, fields that extend to the end of the line, as ‘-k 2’, or fields consisting of a range, as ‘-k 2,3’, retain the field separators present between the endpoints of the range.

To specify a null character (ASCII NUL) as the field separator, use the two-character string ‘\0’, e.g., ‘sort -t '\0'’.

-T tempdir
--temporary-directory=tempdir

Use directory tempdir to store temporary files, overriding the TMPDIR environment variable. If this option is given more than once, temporary files are stored in all the directories given. If you have a large sort or merge that is I/O-bound, you can often improve performance by using this option to specify directories on different disks and controllers.

-u
--unique

Normally, output only the first of a sequence of lines that compare equal. For the ‘--check’ (‘-c’ or ‘-C’) option, check that no pair of consecutive lines compares equal.

This option also disables the default last-resort comparison.

The commands sort -u and sort | uniq are equivalent, but this equivalence does not extend to arbitrary sort options. For example, sort -n -u inspects only the value of the initial numeric string when checking for uniqueness, whereas sort -n | uniq inspects the entire line. See section uniq: Uniquify files.

-z
--zero-terminated

Treat the input as a set of lines, each terminated by a null character (ASCII NUL) instead of a line feed (ASCII LF). This option can be useful in conjunction with ‘perl -0’ or ‘find -print0’ and ‘xargs -0’ which do the same in order to reliably handle arbitrary file names (even those containing blanks or other special characters).

Historical (BSD and System V) implementations of sort have differed in their interpretation of some options, particularly ‘-b’, ‘-f’, and ‘-n’. GNU sort follows the POSIX behavior, which is usually (but not always!) like the System V behavior. According to POSIX, ‘-n’ no longer implies ‘-b’. For consistency, ‘-M’ has been changed in the same way. This may affect the meaning of character positions in field specifications in obscure cases. The only fix is to add an explicit ‘-b’.

A position in a sort field specified with ‘-k’ may have any of the option letters ‘Mbdfinr’ appended to it, in which case the global ordering options are not used for that particular field. The ‘-b’ option may be independently attached to either or both of the start and end positions of a field specification, and if it is inherited from the global options it will be attached to both. If input lines can contain leading or adjacent blanks and ‘-t’ is not used, then ‘-k’ is typically combined with ‘-b’, ‘-g’, ‘-M’, or ‘-n’; otherwise the varying numbers of leading blanks in fields can cause confusing results.

If the start position in a sort field specifier falls after the end of the line or after the end field, the field is empty. If the ‘-b’ option was specified, the ‘.c’ part of a field specification is counted from the first nonblank character of the field.

On older systems, sort supports an obsolete origin-zero syntax ‘+pos1 [-pos2]’ for specifying sort keys. This obsolete behavior can be enabled or disabled with the _POSIX2_VERSION environment variable (see section Standards conformance); it can also be enabled when POSIXLY_CORRECT is not set by using the obsolete syntax with ‘-pos2’ present.

Scripts intended for use on standard hosts should avoid obsolete syntax and should use ‘-k’ instead. For example, avoid ‘sort +2’, since it might be interpreted as either ‘sort ./+2’ or ‘sort -k 3’. If your script must also run on hosts that support only the obsolete syntax, it can use a test like ‘if sort -k 1 </dev/null >/dev/null 2>&1; then …’ to decide which syntax to use.

Here are some examples to illustrate various combinations of options.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]
© manpagez.com 2000-2017
Individual documents may contain additional copyright information.