PYNQ-Z1 peripherals control with an Overlay created from Vivado

This post is an extension of “Creating a simple overlay for PYNQ-Z1 board from Vivado HLx“, which presented an Overlay creation methodology for an accelerator block. The implemented block only communicates with Zynq Processing System (PS) and does not explain the PYNQ peripherals management. This work was developed with the help of Wagner Wesner.

The “base” Overlay found inside the PYNQ package is composed of the basic structures needed to handle PYNQ functionalities. The Vivado project (built on Vivado 2016.1) used to develop the “base” Overlay can be reconstructed from the Tcl file and observed:


Continue reading

Creating a simple Overlay for PYNQ-Z1 board from Vivado HLx

The content presented in this post was developed during the winter class given at Federal University of Rio Grande do Norte, with professors Carlos Valderrama and Samuel Xavier. My group was composed by Wagner Wesner and me.

Our group task was targeting Vivado HLS to implement accelerator blocks for the PYNQ-Z1 board. The PYNQ consists of a board with some peripherals and a ZYNQ chip, the ZYNQ has a cluster with a Central Processing Unit (CPU) and a Field-Programmable Gate Array (FPGA) which enables the test of the synthesized blocks on Vivado. Vivado outputs such as a bitstream and a Tcl file are used to create a PYNQ overlay. The overlay is further used to communicate the generated blocks with the PYNQ python interface.

The High-Level Synthesis (HLS) is very useful to transform complex algorithms into Hardware Description Language (HDL) code. There is a variety of algorithms which takes considerable CPU processing time, those algorithms can be translated to a hardware description which can be implemented on an FPGA. Once the circuit is configured on the FPGA, the algorithm time demanding tasks are parallelized (summing up), which increases performance and brings other potential benefits.

The Vivavo HLS software starts the PYNQ overlay creation with a custom block.


Continue reading

A Viterbi Decoder Python implementation

A Viterbi decoder uses the Viterbi algorithm for decoding a bitstream that was generated by a convolutional encoder, finding the most-likely sequence of hidden states from a sequence of observed events, in the context of hidden Markov models. In other words, in a communication system, for example, the transceiver encodes the desired bits to be transferred, encrypting and at the same time preparing the encoded bitstream for an unfortunate data change caused by the channel noise. The receiver decoder knows the state machine created by the encoder hardware, which can find the most-likely transmitted bits based on the most-likely path.

A convolutional encoder is built the same as presented in the video:


The input bits shifts at different times (clocks) on the three registers shown in the figure above. Depending on the bits available inside those registers, the system will deliver a pair of bits produced (in this case) by two xor gates connected like so. The constraint length (k) is the number of registers an input bit can influence on the encoded bits, in the presented case k=3. The code rate (R) is a relation upon the number of bits that enter the system and the number of bits leaving, leading to R=1/2.

Continue reading

Controlling Virtuoso schematic inputs with Digital Vector File

The utilization of a Vector File makes controlling multiple input and output signals independently possible. Its implementation is important for a variety of circuits which need different pulses at specific times, relieving the effort to manage PWL voltage sources at schematic.

Note: comments begins with “;” character

radix 1 1 4 4
; radix specifies the number of ports and number of bits for each port,
; the base number is also specified ( from binary '1' to hexadecimal '4' )

io i i i i

; io defines the state of each port, as input "i", output "o"
; or bidirectional "b"
vname Primeiro_Sinal Segundo_Sinal Terceiro_Sinal[0:3] Quarto_Sinal,Quinto_Sinal,Sexto_Sinal,Setimo_Sinal

; vname sets the name of each port

tunit 1ns

; tunit configures the time unit of simulation

trise 1ps
; trise defines the rise time of each vector

tfall 1ps
; tfall defines fall time of each vector

vih 5.0
; vih defines logical '1' voltage

vil 0.0
; vil defines logical '0' voltage
period 10.0

; period determines the time interval between each vector step, the
; number is multiplied by the previously time unit set

idelay 5 0 1 1 0

; idelay inserts a delay at the selected ports multiplied by the time unit

; The next lines will define the states of each port as the
; transient simulation happens, each line represents a time step

; The binary number can be represented with 0 or 1, and the
; hexadecimals with 0-F

0 0 0 0 ; All signals initially set to 0
1 0 F F ; First signal with a logical '1', second with a logical '0' , the other ones set to a logical '1'
0 1 C C ; And goes on
1 0 A A
1 1 3 3

The ports declaration and manipulation are indented by its position from left to right.

Continue reading

SRAM write and read basics (1/2)

The Static Random Access Memory is widely used among digital circuits for its good trade-off between memory architectures. Being not large like register files, at the same time not slow as DRAM or hard disk management. It is also compatible with standard CMOS process.


The standard SRAM cell architecture is shown above, a pair of pMOS and nMOS transistors create cross-coupled inverters that can hold its state without needing to have its input driven by an extern signal.

The word line “WL” enables the bit lines “BL and “!BL” to drive the net “Q” to a logical value (either 0 or 1) and at the same time, opening the cell data to be read by a special circuitry.

Sizing those transistors correctly is required. If the inverter transistors “M1 to M4” are considerably stronger than “M5” and “M6” access transistors, the current passing through the transistors “M5 and M6” will not be enough to flip the state of the memory cell. The stable state created by the cross-coupled inverters needs to be overpowered to change its logical level. Moreover, the opposite situation with unbalanced strength (size) causes the logical state to be changed from bit lines noise even with the word lines disabled. The same problem can happen during the read operation which is going to be presented further.

Continue reading