Multimedia instructions key on CPU medical imaging
Matthew Donham
Medical imaging has computational needs that cannot
be met by general-purpose central
processing units (CPUs). For tasks ranging from signal processing for ultrasound and image reconstruction for computed tomography to two-dimensional (2-D) image processing for digital x-ray imaging and visualization for
magnetic-resonance imaging (MRI), expensive, complex, inflexible dedicated hardware solutions have always been required. Now the advent of multimedia instruction-set extensions, typified by the Sun Microsystems (Mountain View, CA) VIS instruction set, is opening u¥new possibilities.
As part of the Sun UltraSPARC processor, the VIS instruction set adds integer processing throughput commonly found in high-end digital-signal-processor chips to a strong general-purpose CPU. The VIS instruction set consists of about 30 instructions in addition to the normal SPARC instruction set. These extensions include parallel math, block data movement, and three-dimensional (3-D) addressing. They can accelerate functions in applications using audio, video, image processing, linear algebra, and graphics by a factor of two to seven.
The parallel math instructions allow CPU native 64-bit data words to be treated as eight 8-bit values, four 16-bit values, or two 32-bit values. Additions, subtractions, comparisons, and multiplications can be performed on four pairs of 16-bit values or two pairs of 32-bit values at a time. Eight pairs of 8-bit values can be compared at a time, using a sum of absolute values of differences that is useful for motion estimation or cross correlation. The VIS instruction set does not use the UltraSPARC processor general-purpose integer resources, which are thus available for other operations in parallel; it uses a faster multiplier than normal integer processing, resulting in four times faster processing.
Medical imaging is rich in fixed-point processing and is an ideal application for the VIS instruction set, which can accelerate nearly any fixed-point signal processing (see figure). The parallel math can be applied in a number of places in acquisition-end signal processing. Operations such as one- and two-dimensional convolution, fast Fourier transforms, frequency-domain filtering, and nonlinear filtering can all be accelerated, sometimes by as much as a factor of seven.
Image reconstruction, particularly the back projection used in computed tomography (CT), also benefits from the parallel-processing capabilities of these extensions. The VIS instruction set is also applicable in the visualization of the image data from any medical modality. For example, in x-ray imaging, it can be used for digital subtraction angiography. The set can be used for fast pan and zoom, which is particularly useful for digital mammography, or image compression, which is important for ultrasound.
Even with this breadth of capabilities, the most impressive use of the VIS instruction set is probably for volumetric rendering of data from
3-D modalities such as CT
and MRI. Here, the Sun
3-D addressing instruction (ARRAY) comes into play. This instruction was developed because cache efficiency and memory bandwidth are often the limiting factors in the performance of volumetric rendering on a general-purpose CPU.
Caching in
The fundamental idea behind a cache is to increase performance on programs that exhibit good locality of reference in their data access: the next point that the CPU reads from memory has a great likelihood of being near the last point that it read. The shortcoming of a cache for 3-D operations hinges on the definition of Onear.O In systems where near is one-dimensional (1-D), the cache loads lines. With the VIS instruction set, the ARRAY instruction re-maps 3-D addresses so that the UltraSPARC processor cache can benefit from 3-D locality because the cache loads cubes. In volume-rendering algorithms, the ARRAY instruction can double the performance in addition to whatever else the VIS instruction set provides, simply by increasing the cache performance.
Given current compiler technology and the difficulty of expressing parallel computations in C language, the compiler is not able to extract the parallelism from code and apply VIS instructions automatically. Therefore, the VIS instruction set must be introduced into the program explicitly. This is done through a set of C-callable macros that are expanded into in-line assembly.
Each macro, which looks like a function call, causes the inclusion of a single VIS instruction. In this way, the instruction can be introduced into the inner loops of signal-processing or visualization code. While writing VIS code is somewhat more complex than writing C code, VIS instruction is usually inserted in such a small part of the overall application that total development time is not substantially impacted.
Library hel¥included
To hel¥users shorten time to market and to ensure that the VIS instruction set is being used to its greatest advantage, Sun Microsystems elected to implement a VIS-optimized library called the mediaLib library. Including about 400 functions from image and signal processing, audio, video, linear algebra, graphics, and visualization, the mediaLib library has been designed to quickly get a user u¥and running with VIS instruction-set performance. Library functions relevant to medical imaging include 1-D and 2-D convolution, image subtraction, bilinear rotate and zoom, and volumetric maximum-intensity projection or compositing.
The design, implementation, and packaging of the mediaLib library is keyed to its use. From a design perspective, the mediaLib library presents a simple, consistent interface. It is stateless, as compared to OpenGL and other higher-level application program interfaces; each mediaLib call stands on its own and requires a minimal number of set-u¥calls. Thus, it is possible to switch between mediaLib functions and the customer?s proprietary functions with ease.
From an implementation perspective, mediaLib?s simple interface keeps the VIS performance from becoming lost in the overhead of parameter interpretation and error checking. When specific instances of general functions are expected to see great use, such as 3 3 convolution, code is optimized for the particular case, and a separate function is provided.
Three key packaging decisions were made to facilitate acceptance of the mediaLib library. The first was inclusion of a non-VIS instruction-set version of all of the mediaLib library functions. A non-VIS version of mediaLib library allows developers to move their code to a non-UltraSPARC product with minimal effort.
Second was to provide sources for mediaLib library, both for the VIS and non-VIS versions. This dual sourcing has two advantages. Continuing the platform independence as the non-VIS source gives the customer a base on which to start non-UltraSPARC optimizations. The duality also can be used when the mediaLib functions do not provide exactly what the customer needs; the user can start with the most relevant mediaLib function and tailor it to the specific application.
Finally, Sun Microsystems decided to provide the mediaLib library at no charge. The library is an enabler for chip, board, and system sales, and the software license gives customers the right to incorporate the mediaLib library into their products.
VIS instruction-set performance, particularly as delivered with the mediaLib library, will reduce the cost and development time of dedicated hardware implementations of applications such as medical imaging. The set will improve time to market with flexible, upgradeable products. o
Four views at different angles from the same CT data set show bulbous vascular defects in the Circle of Willis at the base of the brain. These images were created using a VIS-optimized library, media Lib, for CPU-based image processing. Each square represents 5 5-in. slide of tissue.