A way to remove special characters from a text file

Hi everyone! I have been busy for the last months with various duties related to my university classes and research. But now I have separated a little time to write another post! Soon I will bring more contents related to my university environment.

Last month, the campus party happened in my city. So my friends and I decided to participate in two hackathons that happened simultaneously during the event just for fun, which later showed to be a great challenge. The first hackathon involved data science directed to the Brazilian health, and the second one was intended to accelerate the national overall judgment process. In the picture below I am presenting my groups work related to the data science hackathon, the TV shows a correlation matrix made from various databases data.

word

Jumping now to this post purpose!

While trying to develop a solution for one of the hackathons, we had to process the text of pdf files.  To do so, we used the PDFMiner package. The next line illustrates its usage:

pdf2txt.py -o 'output file' 'pdf file'

The application seemed to extract the pdf text pretty well, with just one little peculiarity: There were special/control characters (“Ctrl+l”) inserted whenever there was a new page at the original pdf. The picture below illustrates the character with the VIM text editor:

word2

This character got in our way to process the file for our final purpose. At first, we tried to just remove it with ordinary python functions and with different inputs representing the character, without success at the beginning. So we found a pretty interesting solution!


text_file = open('text.txt','r')
text_file = text_file.read()
text_file = repr(text_file)
text_file = text_file.replace("\\x0c","")
text_file = literal_eval(text_file)

The trick consists of using the “repr()” function which returns the completely “raw” string. From there, the special character can be easily identified an removed. In the end, just return the string to the original form with the “literal_eval()” function! The relation of “\x0c” and “Ctrl+l” was found by analyzing the raw text file with the “repr()” function.

I believe this approach can be used to similarly solve other issues involving different characters and programming languages.

A simple chat with Sockets using Python

The following work was developed during the Computer Network class given by professor Carlos Viegas at the Federal University of Rio Grande do Norte. The group was composed of Lenildo and me.

The assignment consisted of implementing a simple chat with Sockets based on a client/server communication. We also needed to insert some functionalities into the application, like changing the name of the user, listing the active clients connected to the chat and even to start a private chat with a specific user. There is no direct communication between clients, all messages need to pass through the server to reach the other end.

Now let’s start to talk about the code!

The strategies to create the system involves using threads to communicate the clients and server parallelly with the help of sockets:


# import libraries
from socket import *
from threading import Thread # thread

Beginning with the server, the following lines were used for the main configuration. It is needed to describe the server IP, port and operating protocol (TCP in our case):


# Server IP
serverName = ''
# Server port to connect
serverPort = 12000
# TCP protocol
serverSocket = socket(AF_INET,SOCK_STREAM)
# Bind the Server IP with its port
serverSocket.bind((serverName,serverPort))
# Ready to receive connections
serverSocket.listen(1) 

Continue reading

Analyzing the state of the Basic Health Units of Brazil (UBS) using Python for Data Science

This work was developed during the “Python IMD challenge” happened on 10/21/2017 with Igor, Ricardo, Luiza and me. The competition purpose was to develop a project involving Data Science during 5 hours. Our goal focused on choosing something impactful and at the same time simple to be developed in the short given time. We were very happy to know that we won the first position in the competition at the end! The prize is a free ticket to the national Python event that is going to happen next year.

eqp.jpg

Without further ado, let’s talk about the project itself!

During our searches for datasets about various topics, we found the national website which contains numerous pre-formatted data about national interests:   http://dados.gov.br/

The subject that called our attention was about the Basic Health Units of Brazil (Unidade básica de saúde), which are small public hospitals basically. The dataset had some interesting columns that we thought could bring an important conclusion, for example, the hospitals coordinates and their evaluation about different aspects like the hospital structure and medical supplies.

Continue reading

PYNQ-Z1 peripherals control with an Overlay created from Vivado

This post is an extension of “Creating a simple overlay for PYNQ-Z1 board from Vivado HLx“, which presented an Overlay creation methodology for an accelerator block. The implemented block only communicates with Zynq Processing System (PS) and does not explain the PYNQ peripherals management. This work was developed with the help of Wagner Wesner.

The “base” Overlay found inside the PYNQ package is composed of the basic structures needed to handle PYNQ functionalities. The Vivado project (built on Vivado 2016.1) used to develop the “base” Overlay can be reconstructed from the Tcl file and observed:

base_blockd

Continue reading

Creating a simple Overlay for PYNQ-Z1 board from Vivado HLx

The content presented in this post was developed during the winter class given at Federal University of Rio Grande do Norte, with professors Carlos Valderrama and Samuel Xavier. My group was composed by Wagner Wesner and me.

Our group task was targeting Vivado HLS to implement accelerator blocks for the PYNQ-Z1 board. The PYNQ consists of a board with some peripherals and a ZYNQ chip, the ZYNQ has a cluster with a Central Processing Unit (CPU) and a Field-Programmable Gate Array (FPGA) which enables the test of the synthesized blocks on Vivado. Vivado outputs such as a bitstream and a Tcl file are used to create a PYNQ overlay. The overlay is further used to communicate the generated blocks with the PYNQ python interface.

The High-Level Synthesis (HLS) is very useful to transform complex algorithms into Hardware Description Language (HDL) code. There is a variety of algorithms which takes considerable CPU processing time, those algorithms can be translated to a hardware description which can be implemented on an FPGA. Once the circuit is configured on the FPGA, the algorithm time demanding tasks are parallelized (summing up), which increases performance and brings other potential benefits.

The Vivavo HLS software starts the PYNQ overlay creation with a custom block.

vivado_hls_wel.png

Continue reading

A Viterbi Decoder Python implementation

A Viterbi decoder uses the Viterbi algorithm for decoding a bitstream that was generated by a convolutional encoder, finding the most-likely sequence of hidden states from a sequence of observed events, in the context of hidden Markov models. In other words, in a communication system, for example, the transceiver encodes the desired bits to be transferred, encrypting and at the same time preparing the encoded bitstream for an unfortunate data change caused by the channel noise. The receiver decoder knows the state machine created by the encoder hardware, which can find the most-likely transmitted bits based on the most-likely path.

A convolutional encoder is built the same as presented in the video: https://www.youtube.com/watch?v=oI0MwwXJ8dE&t=640s

viterbi_encoder

The input bits shifts at different times (clocks) on the three registers shown in the figure above. Depending on the bits available inside those registers, the system will deliver a pair of bits produced (in this case) by two xor gates connected like so. The constraint length (k) is the number of registers an input bit can influence on the encoded bits, in the presented case k=3. The code rate (R) is a relation upon the number of bits that enter the system and the number of bits leaving, leading to R=1/2.

Continue reading