Creating a simple Overlay for PYNQ-Z1 board from Vivado HLx

The content presented in this post was developed during the winter class given at Federal University of Rio Grande do Norte, with professors Carlos Valderrama and Samuel Xavier. My group was composed by Wagner Wesner and me.

Our group task was targeting Vivado HLS to implement accelerator blocks for the PYNQ-Z1 board. The PYNQ consists of a board with some peripherals and a ZYNQ chip, the ZYNQ has a cluster with a Central Processing Unit (CPU) and a Field-Programmable Gate Array (FPGA) which enables the test of the synthesized blocks on Vivado. Vivado outputs such as a bitstream and a Tcl file are used to create a PYNQ overlay. The overlay is further used to communicate the generated blocks with the PYNQ python interface.

The High-Level Synthesis (HLS) is very useful to transform complex algorithms into Hardware Description Language (HDL) code. There is a variety of algorithms which takes considerable CPU processing time, those algorithms can be translated to a hardware description which can be implemented on an FPGA. Once the circuit is configured on the FPGA, the algorithm time demanding tasks are parallelized (summing up), which increases performance and brings other potential benefits.

The Vivavo HLS software starts the PYNQ overlay creation with a custom block.

vivado_hls_wel.png

This tutorial will present the sum of the steps needed to not create a really extended post, also assuming that the reader is a little familiarized with the basic Vivado HLS steps. The version used is 2017.2.

When creating a new project, choose the PYNQ ZYNQ board part “xc7z020clg400-1”.

The accelerator block modeling is achieved by writing C++ code. The Vivado HLS interface needs both a source file indicating the block behavior and a test bench to observe the block outputs. A simple 32-bit adder block is constructed and tested as shown below.

add_source_tb.png

After testing with “run C simulation,” do “Solution->C Synthesis->Active Solution” to create the first synthesis blocks and enable the directives tab on your code.

At this step, the directives should list the ports from the C function. It is needed to set all inputs and outputs to a “s_axilite” interface, this is required to make the communication between CPU and the custom block easier when it is imported at Vivado. For the presented block, the directives should be as shown in the picture below:

add_directive

The picture below shows the “s_axilite” interface set to the port “a”:

bundle_io

By rerunning the block synthesis, the block should be ready to be exported for Vivado. To do so, “Solution->Export RTL.” Vivado HLS will create a folder (“Explorer->Solution1->impl->ip”) which will contain all the data needed for the block to be imported at Vivado.

vivado_hlx

Now it’s time to open Vivado and create a new project. Select “RTL project” and choose the same board mentioned before “xc7z020clg400-1”.

From the “Project Manager” tab, select “IP Catalog,” the IP Catalog window will open. Right-click inside the window and select “Add Repository.”

repositories.png

Search where the Vivado HLS block was synthesized and select the “IP” folder under “solution1->impl”. The block now should be available on the Vivado IP catalog.

On the “IP INTEGRATOR” tab, create a new block design by selecting “Create Block Design.” It will open a new blank window. Right click inside and select “Add IP…”, search for the generated block name inside a window that will open as shown below:

Test_add

After the block is instantiated, this is how it should look:

test_add_block

The next step is instantiating the Zynq processor system:

ZYNQ_search

zynq_block

With both blocks placed, just select “Run Block Automation” and “Run Connection Automation” which appears on the top.

block_automation

After the routing process, the schematic should look like this:

block_beautiful.png

Note: I have added another block “sub_hls”, to show multiple blocks managing. The other block implements a subtraction.

The next step is creating an HDL wrapper to the design. On the “Sources” tab viewed from the “Diagram” window, right-click on the “design_1” with “design_1.bd” attached (or something similar) and select “Create HDL wrapper.” Keep the option “Let Vivado manage wrapper and auto-update” selected. After the wrapper is created, select “Generate Bitstream” for the final processing step on Vivado.

The overlay files are ready to be exported. On the Tcl Console type “write_bd_tcl filename” and the Tcl script will be generated. To export the bitstream, do “File->Export->Export Bitstream File…”, put the same name for both files to be used as a PYNQ overlay.

add_sub_overlay

There is some data needed to be acquired before running the overlay on the PYNQ board. On the “Address Editor” tab, the “processing_system7_0” can be expanded to show the address attributed to each placed block. It is necessary for the overlay driver to know the block’s address. As the example below:

address_editor

Another important information to be listed is the address for each signal on the Vivado HLS generated block. To check the files with all the information needed, do “Sources->Design Sources->design_1_wrapper->…” and go down the hierarchy until you find a file ending with “…io_s_axi” as shown below:

axi_file.png

Double click on it and search for a bunch of commented lines indicating each port address:

ports_info

With all files and information required, transfer the overlay data (‘.bit’ and ‘.tcl’) to the PYNQ board and open it on a browser to visualize the Jupyter notebooks.

The Overlay class is used to download the created files on the PYNQ FPGA. MMIO class is used to write and read data on the blocks of the schematic.

from pynq import Overlay
from pynq import MMIO

Indicate the path of the Overlay files and download it:


ol = Overlay("/home/xilinx/jupyter_notebooks/add_sub_overlay/add_sub.bit")
ol.download()

Instantiate both IPs by indicating their Offset address and their size on memory (64k = 0x10000). Both data got from Vivado interface as shown before.


add_ip = MMIO(0x43C00000,0x10000)

sub_ip = MMIO(0x43C10000,0x10000)

Write some value on the block ports, passing the address (got from the Address info file shown before) and the value as parameters:


#port a
add_ip.write(0x10,7)
print("add a:",add_ip.read(0x10))
#port b
add_ip.write(0x18,12)
print("add b:",add_ip.read(0x18))

It is important to analyze the created block signals. The “start” bit needs to be activated with a logical ‘1’ so the block can start its calculation:


#ap_start bit
add_ip.write(0x00,1)

The adder IP finishes its job with just one clock cycle, so there is no need to keep checking the “done” or “ready” bit to see if it has finished.

To get the result, read the output port with the current address:


#port y
print("add y:",add_ip.read(0x20))

The result for both IPS are listed below on a Jupyter notebook on PYNQ:

result

The presented methodology can be incorporated on the base overlay from PYNQ to be an extension of it.

The source files can be found in:

https://github.com/YangTavares/PYNQ/tree/master/custom_overlays/add_sub_overlay

For another project implementation, check this prime number calculator made from Wagner:

https://github.com/wagnerrn/pynq-primenumber

 

 

 

12 thoughts on “Creating a simple Overlay for PYNQ-Z1 board from Vivado HLx

  1. Pingback: PYNQ-Z1 peripherals control with an Overlay created from Vivado | Yang Tavares

  2. Dear Yang,
    Your tutorials have helped me a lot as I’m familiarizing myself with the Vivado workflow for Pynq, thank you!

    I’ve followed this tutorial closely, but for some reason vivado doesn’t give me address info for control signals. I am able to generate the bitstream and tcl files, download it to pynq and write to the input ports, but I am not able to start the accelerator seeing that control input ports are not being generated. I was wondering if you’d have any pointers as to why this is happening and what are some ways I can try to solve it. I’ve tried explicitly assigning ports and pin locations to ap_start, ap_done, etc., which still didn’t work. I am using Vivado 2016.4 for compatibility purposes. I’d appreciate any help.

    Best,
    Matheus

    Liked by 1 person

    • Hi Matheus!

      My first guess would be that you missed some step during the axilite interface or HDL wrapper creation? The ports address should be at some Verilog file way down on the design wrapper hierarchy.

      Thanks for the feedback!

      Like

  3. Hi, Yang,
    Thanks for the sharing. I’m a little confused that the overlay you created is a separate base layer or extra overlay which integrates with the PYNQ base overlay since I saw you integrated the ARM processor when generating the overlay .bit.
    Thanks

    Liked by 1 person

    • Hi Nan-Sheng!

      The presented created Overlay is independent from the base Overlay. I tried to show how to create an Overlay from scratch because at the time I didn’t find a straightforward explanation on how to do that. Overall, you don’t need the base Overlay to run the Overlay created in this post. In addition, you can use the presented flow to create a custom base Overlay or even add new IPs inside the standard base Overlay.

      Thanks for the feedback!

      Like

  4. Hi, Yan,

    Thanks for the clarification. However, I still have a question for you.

    Since the overlap you created embedded a ARM core which is a hard core, I assume the overlay you built up is base overlay (base overlay contains the ARM hard core?). If that’s that case, I’m curious which host executes the following code to download the overlay to the PYNQ board. Is it the PC or laptop running the Vivado ? I think the PYTHON runs over the PYNQ board after ARM boots. Maybe this is where I misunderstood. Thanks

    from pynq import Overlay
    from pynq import MMIO
    ol = Overlay(“/home/xilinx/jupyter_notebooks/add_sub_overlay/add_sub.bit”)
    ol.download()
    etc…

    Liked by 1 person

    • Hi Nan-Sheng,

      Since the Zynq SoC is composed of both FPGA and a processor (ARM), I had to instantiate the ARM and make its interface with the created FPGA IPs. The .bit and .tcl files were produced on Vivado from my laptop, and then those files were transferred to the PYNQ board and downloaded with the “ol.download()”. So, you are right, the python runs over the PYNQ board, I managed the PYNQ files through ethernet. I believe that when the Overlay is downloaded, the Zynq SoC is configured based on the schematic created at Vivado, such that the FPGA inside incorporates the IPs and the interface between the FPGA and processor. If I am not mistaken, the name “base Overlay” stands for the standard Overlay created by Xilinx with various functionalities that are not necessarily focused on a specific application. Please feel free to ask anything!

      Like

  5. Hi, Yang,

    Thanks for the discussion.
    What I thought was there is one FPGA which composed of both PS and PL on PYNQ. Before you download your overloay, the ARM@PYNQ is not active. If that’s the case, it seems not possible to execute the python to download the .bit to PYNQ board. I got stuck here. 🙂

    All the best,
    Nan-Sheng

    Liked by 1 person

    • Hi Nan-Sheng,

      I am almost sure that when you power the board, the ARM automatically loads the PYNQ-Z1 image, so actually, it is currently active even without downloading an Overlay. It’s the python running in the PYNQ board that downloads the Overlay, I am just able to access the board through ssh or browser, I also have to pass the .bit and .tcl file to the board before downloading. When the Overlay is downloaded, the FPGA IPs are attached to the processor such as the interface and the adder described in the post. The processor inside the Zynq SoC is not constructed from programmable logic (If I am right). The instantiation of the Zynq processor in the Vivado schematic is just a representation of the existence of the processor itself, to tell Vivado how to connect the programmable logic with my processor. I hope it is clearer now!

      Like

  6. Dear Yang,

    Your post is very good , thank you for this !
    I followed the steps provided.
    I found that the wrapper was .v in your implementation and I got a .vhd wrapper.
    I was able to find address of all port and also was able to validate .write() and .read() on both input ports but I was unable to set the ap_start bit. And because of this my output is always 0.
    I’d appreciate any help 🙂

    Best,
    Arun

    Liked by 1 person

    • Hi Arun!

      I believe there would be various reasons for the malfunction of your developed block, could you validate it in the Vivado HLx? I am not sure why it created a .vhd file, maybe a wrapper option? Unfortunately, I am not working with the PYNQ board these days, so I might not be able to help you with the details of the software used to create the Overlay.

      Thanks for the feedback!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s