Accelerate(7) BSD Miscellaneous Information Manual Accelerate(7)
NAME
Accelerate vecLib vImage AltiVec vMathLib BLAS LAPACK vDSP vBigNum
vBasicOps Vector Computation Velocity Engine Extended Math Library --
This man page introduces the vector instruction set extension to the Pow-
erPC architecture known as Velocity Engine (or AltiVec), the Accelerate
umbrella framework, its constituent libraries and programming support in
Mac OS X.
DESCRIPTION
The PowerPC vector instruction set architecture is based on a separate
SIMD style execution unit with inherently high data parallelism. This
high degree of parallelism is enhanced with additional parallelism
through superscalar dispatch to multiple execution units and execution
unit pipelines. All vector instructions are designed to be easily
pipelined with pipeline latencies no greater than the scalar double pre-
cision floating-point multiply-add fused class of instructions. There
are no operating mode switches which preclude fine grain interleaving of
instructions with the existing floating-point and integer instructions.
Parallelism with the integer and floating-point instructions is simpli-
fied by the facts that the vector unit never generates an exception and
has few shared resources or communication paths that require it to be
tightly synchronized with the other units.
Highlights
Fixed vector length of 128-bits (16 8-bit elements, 8 16-bit elements, or
4 32-bit elements.
Signed and unsigned 8-, 16-, and 32-bit integers, and IEEE single-preci-
sion floats.
Saturation arithmetic.
32-register namespace.
Vector register file architecturally separate from floating-point and
integer registers.
No mode switching that would increase the overhead of using the instruc-
tions.
4 operand, non-destructive instructions (3 source, 1 result).
Operations selected based on utility to digital signal processing algo-
rithms (including 2D and 3D image processing).
Who benefits?
Many of the services provided by MacOS X (e.g., Quartz, QuickTime,
OpenGL, CoreAudio) already exploit the vector acceleration available on
Macintosh G4 and G5 computers. All MacOS X users enjoy these benefits.
Many applications that run on MacOS X (e.g., iTunes, iMovie) have already
been coded to use the vector libraries and vector instruction set. Users
of these applications enjoy the benefits of vector acceleration.
Software developers who would like their code to use the vector facility
on Macintosh G4 and G5 computers may choose to:
(1) Make explicit calls to entry points in the Accelerate framework.
Apple has optimized many of these routines for the vector engine (see the
framework discussion that follows.)
and/or (2) Program directly to the vector unit using the "Programming
Interface Model."
Note that a programmer must take explicit actions (as above) to engage
the vector engine, otherwise it remains idle.
Where to go from here:
Browse a comprehensive introduction to vector programming:
http://developer.apple.com/hardware/ve
Examine the prototypes for functions you can invoke:
/System/Library/Frameworks/vecLib.framework/Headers/*.h
/System/Library/Frameworks/Accelerate.framework/Frameworks/vImage.framework/Headers/*.h
Include the interfaces in the code you write:
#include <Accelerate/Accelerate.h>
Compile and link your code:
cc -faltivec -framework Accelerate file.c
Accelerate Umbrella Framework
The Accelerate umbrella framework encompasses all the libraries provided
with MacOS X that Apple has optimized for high performance vector and
numerical computing. Subsequent sections describe the sub-frameworks
that comprise the Accelerate framework.
vImage Framework
A collection of basic image processing filters such as Convolution, Mor-
phological, and Geometric transforms. Alpha compositing and histogram
operations are also supported.
vecLib Framework
The vecLib framework is a collection of facilities covering digital sig-
nal processing (vDSP), matrix computations (BLAS), numerical linear alge-
bra (LAPACK), mathematical routines (vMathLib), basic operations (vBasi-
cOps) and large number calculations (vBigNum).
The vDSP, BLAS and LAPACK components of vecLib run on the scalar and vec-
tor domain. vecLib automatically detects the presence of the vector
engine and uses it. vMathLib mirrors the existing scalar libm on the
vector engine and vBasicOps is meant to complement the processor by pro-
viding more functionality such as a 32x32 vector integer multiply.
vBigNum, vBasicOps and vMathLib run only on the vector engine.
There is also another matrix computation package in vecLib called vBasi-
cOps. It works somewhat in the same spirit as the BLAS. It is best
suited for small problems when availability of source is preferred. It
can also be used as an educational tool to gain insights into the working
of the PowerPC vector unit. In most cases, the use of BLAS instead of
vectorOps is recommended.
vDSP
The vDSP Library provides mathematical functions for applications such as
speech, sound, audio, and video processing, diagnostic medical imaging,
radar signal processing, seismic analysis, and scientific data process-
ing.
The vDSP functions operate on real and complex data types. The functions
include data type conversions, fast Fourier transforms (FFTs), and vec-
tor-to-vector and vector-to-scalar operations.
The vDSP functions have been implemented in two ways: as vectorized code
(for single precision only), which uses the vector unit on the PowerPC G4
and G5 microprocessors, and as scalar code, which runs on Macintosh mod-
els that have a G3 microprocessor.
It is noteworthy that vDSP's FFTs are one of the fastest implementations
of the Discrete Fourier Transforms available anywhere.
The vDSP Library itself is included as part of vecLib in Mac OS X. The
header file, vDSP.h, defines data types used by the vDSP functions and
symbols accepted as flag arguments to vDSP functions.
vDSP functions are available in single and double precision. Note that
only the single precision is vectorized due to the underlying instruction
set architecture of the vector engine on board G4 and G5 processors.
For more information about vDSP download the manual at <http://devel-
oper.apple.com/hardware/ve/downloads/vDSP.sit.hqx>
BLAS
The Basic Linear Algebra Subroutines (BLAS) are high quality routines for
performing basic vector and matrix operations. Level 1 BLAS consists of
vector-vector operations, Level 2 BLAS consists of matrix-vector opera-
tions, and Level 3 BLAS have matrix-matrix operations. The efficiency,
portability, and the wide adoption of the BLAS have made them commonplace
in the development of high quality linear algebra software such as LAPACK
and in other technologies requiring fast vector and matrix calculations.
All the industry standard FORTRAN BLAS entry points and the standard C
BLAS entry points are exported from the vecLib framework (the latter are
commonly denoted the legacy C BLAS.) For more information refer to
<http://www.netlib.org/blas/faq/>
LAPACK
LAPACK provides routines for solving systems of simultaneous linear equa-
tions, least-squares solutions of linear systems of equations, eigenvalue
problems, and singular value problems. The associated matrix factoriza-
tions (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also pro-
vided, as are related computations such as reordering of the Schur fac-
torizations and estimating condition numbers. Dense and banded matrices
are handled, but not general sparse matrices. In all areas, similar func-
tionality is provided for real and complex matrices, in both single and
double precision. LAPACK in vecLib makes full use of the optimized BLAS
and fully benefits from their performance. All the industry standard
FORTRAN LAPACK entry points are exported from the vecLib framework. C
programs may make calls to the FORTRAN entry points using the prototypes
set out in "/System/Library/Frameworks/vecLib.framework/Headers/cla-
pack.h".
For more information refer to <http://www.netlib.org/lapack/index/>.
Note that vecLib's LAPACK was built using the FORTRAN to C converter
called f2c. Users must be aware that:
ALL arguments must be passed by reference. This includes all scalar
arguments such as matrix dimension M and N, further note there is a dif-
ference in the memory arrangement of a two-dimensional array in Fortran
and C.
For more information refer to <http://www.netlib.org/clapack/readme>.
vBasicOps
A collection of basic operations such as add, subtract, multiply and
divide that complement the vector processor's basic operations up to 128
bits. Consult "/System/Library/Frameworks/vecLib.framework/Headers/vBa-
sicOps.h" for further information.
vBigNum
Routines for large number calculations from 128 bits. Consult "/Sys-
tem/Library/Frameworks/vecLib.framework/Headers/vBigNum.h" for further
information.
Darwin June 6, 2002 Darwin
Mac OS X 10.4.6 - Generated Sun Apr 16 13:38:10 CDT 2006
