The most computationally intensive part of the minimum LP Norm phase unwrapping algorithm[1] (its kernel) is the 2D Discrete Cosine Transform(DCT) that computes the variable p in the equation Qp=c using the Preconditioned Conjugate Gradient (PCG) method. The separability of the DCT means that the 2D transform can be decomposed into a series of 1D DCTS that compute the transforms of the rows followed by the transforms of the columns. Furthermore, the DCT can be expressed in terms of a Fast Fourier Transform (FFT), which allows the hardware implementation to use a pre-designed FFT core. This poster presents a design that implements the 1D DCT on a Xilinx FPGA that is part of the Wildstar II Pro board. This implementation performs 1D DCTs on large block sizes using a block floating point format. The DCT was designed to use fewer resources than other popular approaches due to the larger point sizes supported which would otherwise consume all available chip area, but at the cost of higher latency. This latency is similar to that required for an identically sized FFT. A 512-point DCT has been shown to take 1771 cycles or 13.3 us at 133 MHz as compared to a similarly sized FFT that takes 1757 cycles or 13.2 us (including full data load and unload times).


Poster presented at the 2007 Thrust R3A Parallel Hardware Implementation for Fast Subsurface Detection Conference


Phase Unwrapping, FFT, DCT, PCG, Reconfigurable Hardware

Subject Categories



Computer Engineering


Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems (Gordon-CenSSIS)

Publication Date


Rights Holder

Bernard M. Gordon Center for Subsurface Sensing and Imaging Systems (Gordon-CenSSIS)

Click button above to open, or right-click to save.