manpagez: man pages & more
info fftw3
Home | html | info | man
[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1 SIMD alignment and fftw_malloc

SIMD, which stands for “Single Instruction Multiple Data,” is a set of special operations supported by some processors to perform a single operation on several numbers (usually 2 or 4) simultaneously. SIMD floating-point instructions are available on several popular CPUs: SSE/SSE2/AVX on recent x86/x86-64 processors, AltiVec (single precision) on some PowerPCs (Apple G4 and higher), NEON on some ARM models, and MIPS Paired Single (currently only in FFTW 3.2.x). FFTW can be compiled to support the SIMD instructions on any of these systems.

A program linking to an FFTW library compiled with SIMD support can obtain a nonnegligible speedup for most complex and r2c/c2r transforms. In order to obtain this speedup, however, the arrays of complex (or real) data passed to FFTW must be specially aligned in memory (typically 16-byte aligned), and often this alignment is more stringent than that provided by the usual malloc (etc.) allocation routines.

In order to guarantee proper alignment for SIMD, therefore, in case your program is ever linked against a SIMD-using FFTW, we recommend allocating your transform data with fftw_malloc and de-allocating it with fftw_free. These have exactly the same interface and behavior as malloc/free, except that for a SIMD FFTW they ensure that the returned pointer has the necessary alignment (by calling memalign or its equivalent on your OS).

You are not required to use fftw_malloc. You can allocate your data in any way that you like, from malloc to new (in C++) to a fixed-size array declaration. If the array happens not to be properly aligned, FFTW will not use the SIMD extensions.

Since fftw_malloc only ever needs to be used for real and complex arrays, we provide two convenient wrapper routines fftw_alloc_real(N) and fftw_alloc_complex(N) that are equivalent to (double*)fftw_malloc(sizeof(double) * N) and (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * N), respectively (or their equivalents in other precisions).

[ << ] [ < ] [ Up ] [ > ] [ >> ]         [Top] [Contents] [Index] [ ? ]

This document was generated on March 3, 2012 using texi2html 5.0.

© 2000-2020
Individual documents may contain additional copyright information.