Footnotes

(1)

You can read the tutorial in bit-reversed order after computing your first transform.

(2)

The term “rank” is commonly used in the APL, FORTRAN, and Common Lisp traditions, although it is not so common in the C world.

(3)

There are also type V-VIII transforms, which correspond to a logical DFT of odd size N, independent of whether the physical size n is odd, but we do not support these variants.

R*DFT00 is sometimes slower in FFTW because we discovered that the standard algorithm for computing this by a pre/post-processed real DFT—the algorithm used in FFTPACK, Numerical Recipes, and other sources for decades now—has serious numerical problems: it already loses several decimal places of accuracy for 16k sizes. There seem to be only two alternatives in the literature that do not suffer similarly: a recursive decomposition into smaller DCTs, which would require a large set of codelets for efficiency and generality, or sacrificing a factor of 2 in speed to use a real DFT of twice the size. We currently employ the latter technique for general n, as well as a limited form of the former method: a split-radix decomposition when n is odd (N a multiple of 4). For N containing many factors of 2, the split-radix method seems to recover most of the speed of the standard algorithm without the accuracy tradeoff.

(5)

We provide the DHT mainly as a byproduct of some internal algorithms. FFTW computes a real input/output DFT of prime size by re-expressing it as a DHT plus post/pre-processing and then using Rader’s prime-DFT algorithm adapted to the DHT.

(6)

Gallia est omnis divisa in partes tres (Julius Caesar).

(7)

In fact, even this assumption is not technically guaranteed by the standard, although it seems to be universal in actual MPI implementations and is widely assumed by MPI-using software. Technically, you need to query the MPI_IO attribute of MPI_COMM_WORLD with MPI_Attr_get. If this attribute is MPI_PROC_NULL, no I/O is possible. If it is MPI_ANY_SOURCE, any process can perform I/O. Otherwise, it is the rank of a process that can perform I/O ... but since it is not guaranteed to yield the same rank on all processes, you have to do an MPI_Allreduce of some kind if you want all processes to agree about which is going to do I/O. And even then, the standard only guarantees that this process can perform output, but not input. See e.g. Parallel Programming with MPI by P. S. Pacheco, section 8.1.3. Needless to say, in our experience virtually no MPI programmers worry about this.

(8)

Technically, this is because you aren’t actually calling the C functions directly. You are calling wrapper functions that translate the communicator with MPI_Comm_f2c before calling the ordinary C interface. This is all done transparently, however, since the fftw3-mpi.f03 interface file renames the wrappers so that they are called in Fortran with the same names as the C interface functions.

Footnotes

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)