FPGA
UART
The design of this project requires data to be sent from a Host to the FPGA, data to be computed on the FPGA, then the FPGA to send the computed data back to the Host. Our group decided to use serial over USB to transmit the data to the external source. The FTDI chip on the board is able to perform this task, but it is required to communicate with this chip using UART protocol. The top level of the UART module consists of two submodules, one to receive data and one to send data. Both of the modules follow serial protocol with custom parameters, such as having hardware flow control, fixed baud rate, one stop bit, no parity bits, and sending eight bits of data. The representation for eight bits of data and one stop bit is 8N1, and is the format that our group decided upon for our project.
Figure 1: UART Protocol
Figure 2: UART Module Block Diagram
Figure 3: Image Line Buffer Usage
Figure 4: Image Line Buffer Block External Diagram
Figure 5: Sum of Products Module Block Diagram
Software
UART:
The design of this project requires data to be received from a computer and send back the computed data. Our group decided to use UART over USB to transmit the data to the external source. The UART module will be capable of both sending and receiving data. This program will also have flow control by using the request to send (RTS) and clear to send (CTS). These two signals work by setting RTS high when the device is able to receive data from the external source. CTS is an input that controls when the UART is able to transmit. In addition, this program will have a customizable BAUD rate.
This program will be able to successfully send data to the UART module on the FPGA. Both UART modules will share the same BAUD rate, and each of the pins on the PC UART will be connected to the opposite pin on the FPGA UART such as seen in the figure below.
Figure 6: UART Connectivity
Image and Kernel storage
The overall purpose behind the Image and Kernel storage module is to allow the user to store all the initial data on their hard disk and have the hardware accelerator program pull that data and send it to the FPGA. The program will then take the output from the FPGA and store it back on the user computer’s hard disk. This is accomplished by having multiple file reading and writing protocols interfaced in the same block, as shown in Figure 7.
Figure 7: Storage Module Block Diagram
All blocks within the storage module are connected to the HDD/SSD in some form or fashion; the Image- and Kernel-3D array blocks are reading data from the HDD, and the Result 3D array block is writing to the HDD. The Image-3D array block uses the openCV() library to read in the image data to a 3D array so it can then be manipulated by other modules. The Kernel-3D array block only needs to use fopen() to pull the kernel data from storage, as it is not stored as an image file. The Result-3D array block then can use fwrite() to write the output data to storage. The Image-3D array block sends it’s 3D array to the Image Preprocessor along with the amount of zero-padding needed, and the Kernel-3D array block sends it’s 3D array to the Kernel Preprocessor. The Result-3D array block receives a 3D array from the Present Data module.
The image file can be any of the following formats: .bmp, .pbm, .pgm, .ppm, .sr, .ras, .jpeg, .jpg, .jpe, .jp2, .tiff, .tif, .png. The kernel and data will be stored as a .csv file with spaces and commas between each item, a newline for each row, and an empty line for each layer. The result data will be stored back as a .jpg file for ease of use in future convolutions.
The use of openCV() necessitates the use of the C++ programing language for this module. For convenience, the rest of the software will also be written in C++.
Image Preprocessor
The overall purpose of the Image Preprocessor is to take the loaded 3D image array and prepare it for transfer to the FPGA. This is accomplished by taking the 3D array, zero-padding it, and then converting it to a 1D stream to be sent to the PC UART module for transmission. Both of these goals are accomplished using a structure shown in Figure 8.
Figure 8: Image Preprocessor Block Diagram
The Zero-Padding module takes in the amount of zero-padding needed and the image 3D array. The Zero-Padding module creates its own larger 3D array to fit in the zero padding. It then runs a nested for-loop throughout the entire array, comparing the index it currently is on with the amount of zero-padding needed; if the current index requires zero padding it adds a zero, otherwise it will pass in the corresponding pixel data. The for-loop iterator takes in the zero-padded 3D array and runs through its own set of nested for-loops to convert the 3D array into a 1D array that can then be streamed over UART; this 1D stream is then sent to the PC Data Output UART module. The for-loop iterator also sends the dimensions of the image over to the Present Data module.
Kernel Preprocessor
The overall purpose of the Kernel Preprocessor is very similar to that of the Image Preprocessor; the Kernel Preprocessor will convert the 3D kernel array into a 1D stream to be sent to the PC UART module for transmission. This is accomplished using a structure shown in Figure 9.
Figure 9: Kernel Preprocessor Block Diagram
The for-loop iterator block works in much the same way as the one in the Image Preprocessor module; the 3D kernel array is sent through a series of nested for-loops to convert it to a 1D array for transfer to the PC Data Output UART module. Pseudocode for the for-loop iterator.
Present Data
The overall purpose of the Present Data module is to take the 1D stream of output data and convert it to a 3D array. It then both presents said data and sends it to the storage module to be stored on the user’s HDD. These goals are accomplished through a structure described in Figure 10.
Figure 10: Present Data Block Diagram
The for-loop iterator works in the reverse fashion as the ones in the Image and Kernel Preprocessors. This for-loop iterator takes in the dimensions of the image from the Image Preprocessor and the result data 3D array from the PC Data Input UART module. It then iterates through the different dimensions placing data from the 1D stream into the 3D array to then be displayed. This new 3D array is then passed to both the Display Image block and the Storage module. The Display Image block then uses openCV() to display the resulting image to the user.
Communication Protocol
General
For this project the transfer of data from modules to modules is as follows. First, the kernel filter values are in Q0.7 format while the image pixel data are unsigned bytes. Both the kernel filter values and the image pixel data are broken into three bytes per position, one for red, green, and blue. The kernel filter values are sent over first, followed by the image pixel data. The data is sent over by sending red byte value, followed by the green, then the blue. This continues line by line in the kernel filter values, then the image.
When any values are sent to the FPGA, it is received by the UART module. After which, it is then transmitted to the SoPU. When the SoPU module performs a sum of products, it receives one pixel color byte from the UART module, and six from the Image Line Buffer. This allows the convolution of a new pixel color byte from the image, provided by the UART module, and reuses previous pixel data, provided by the Image Line Buffer. At the same time, the SoPU module will write an old pixel value within its Image Window to the Image Line Buffer, since this value has been read in from the UART module, but is not currently stored within the Image Line Buffer. This value will need to be used again so it is important that it is stored within the Image Line Buffer.
The SoPU will also transfer the resulting sum of products byte to the UART module. The UART module once again communicates to the FTDI chip to convert the UART to serial over USB. The FTDI sends this data via USB to the PC, so the filtered image can be reconstructed on the computer.