A way to remove special characters from a text file

Hi everyone! I have been busy for the last months with various duties related to my university classes and research. But now I have separated a little time to write another post! Soon I will bring more contents related to my university environment.

Last month, the campus party happened in my city. So my friends and I decided to participate in two hackathons that happened simultaneously during the event just for fun, which later showed to be a great challenge. The first hackathon involved data science directed to the Brazilian health, and the second one was intended to accelerate the national overall judgment process. In the picture below I am presenting my groups work related to the data science hackathon, the TV shows a correlation matrix made from various databases data.


Jumping now to this post purpose!

While trying to develop a solution for one of the hackathons, we had to process the text of pdf files.  To do so, we used the PDFMiner package. The next line illustrates its usage:

pdf2txt.py -o 'output file' 'pdf file'

The application seemed to extract the pdf text pretty well, with just one little peculiarity: There were special/control characters (“Ctrl+l”) inserted whenever there was a new page at the original pdf. The picture below illustrates the character with the VIM text editor:


This character got in our way to process the file for our final purpose. At first, we tried to just remove it with ordinary python functions and with different inputs representing the character, without success at the beginning. So we found a pretty interesting solution!

text_file = open('text.txt','r')
text_file = text_file.read()
text_file = repr(text_file)
text_file = text_file.replace("\\x0c","")
text_file = literal_eval(text_file)

The trick consists of using the “repr()” function which returns the completely “raw” string. From there, the special character can be easily identified an removed. In the end, just return the string to the original form with the “literal_eval()” function! The relation of “\x0c” and “Ctrl+l” was found by analyzing the raw text file with the “repr()” function.

I believe this approach can be used to similarly solve other issues involving different characters and programming languages.

Robot path generation from a Genetic Algorithm

The project presented in this post was accomplished during the Artificial Intelligence class given by professor Caroline. It consisted of using the Genetic Algorithm (GA) to make a robot (from the iRobot Matlab toolbox) navigate from a start point to an end point without hitting any obstacles (walls).

A GA flowchart is shown below to quickly resume the method. At the beginning, a new population of possible solutions is created, each individual of this population can be the solution for a given problem. The population size is strictly fixed to a selected size. Next, an evaluation (or fitness) is applied to each individual of the population, to know how well their solution fits into the problem solution. From there, the process of “evolution” begins, a new generation is created by selecting the best individuals, reproducing them (crossover) and applying a chance of mutation for each individual. At the end, this new generation is evaluated and the process of creating a new generation starts again. The algorithm stops when the evaluation of the population is good enough or the number of generations specified is reached.


source: http://techeffigytutorials.blogspot.com.br/2015/02/the-genetic-algorithm-explained.html

The robot path generation was adapted based on the explanation given of the GA. A brief summary of the developed code in this post is shown:

  1. The start population is created from the possible paths that the robot can follow. Each path (individual) is composed of a sequence of cartesian coordinates (x,y), the path is constructed from the interconnection of those coordinates. It is important to check if the path is valid (no obstacles in the way).
  2. The evaluation (fitness) of the population is retrieved based on the success of the path to reach the destination and the distance traveled from the start point to the finish point.
  3. A new generation is started by selecting the new individuals randomly (Roulette selection) until a selected percentage of the population size. The individuals with the best evaluation have more chances to be selected (Elitism).
  4. The last empty spots of the new population are filled with the reproduction (crossover) of two individuals selected the same way as step 3. The reproduction generates a single child which has one part of the first selected individual and another part of the second individual from a common intersection of both.
  5. With the new population filled, a chance of mutation is applied to each individual. Each Cartesian coordinate have a really small chance of being replaced by another random one. It is also important to check if this new random point is valid to create a path. Mutation is applied to avoid the global minimum solution to the problem.
  6. The population is evaluated again, and the algorithm repeats from step 2 until the requirements are matched.
population_size = 100;
path_size = 10;
%Start point coordinates
start_x = 3;
start_y = 3;
%End point
desired_x = -3;
desired_y = -3;
%survive rate for next generation
keep_alive = 90;
%Number of generations
epoch = 10;
%Mutation chance for each node
mutation_chance = 0.0005; %For each point in the path

Continue reading

A simple chat with Sockets using Python

The following work was developed during the Computer Network class given by professor Carlos Viegas at the Federal University of Rio Grande do Norte. The group was composed of Lenildo and me.

The assignment consisted of implementing a simple chat with Sockets based on a client/server communication. We also needed to insert some functionalities into the application, like changing the name of the user, listing the active clients connected to the chat and even to start a private chat with a specific user. There is no direct communication between clients, all messages need to pass through the server to reach the other end.

Now let’s start to talk about the code!

The strategies to create the system involves using threads to communicate the clients and server parallelly with the help of sockets:

# import libraries
from socket import *
from threading import Thread # thread

Beginning with the server, the following lines were used for the main configuration. It is needed to describe the server IP, port and operating protocol (TCP in our case):

# Server IP
serverName = ''
# Server port to connect
serverPort = 12000
# TCP protocol
serverSocket = socket(AF_INET,SOCK_STREAM)
# Bind the Server IP with its port
# Ready to receive connections

Continue reading

Analyzing the state of the Basic Health Units of Brazil (UBS) using Python for Data Science

This work was developed during the “Python IMD challenge” happened on 10/21/2017 with Igor, Ricardo, Luiza and me. The competition purpose was to develop a project involving Data Science during 5 hours. Our goal focused on choosing something impactful and at the same time simple to be developed in the short given time. We were very happy to know that we won the first position in the competition at the end! The prize is a free ticket to the national Python event that is going to happen next year.


Without further ado, let’s talk about the project itself!

During our searches for datasets about various topics, we found the national website which contains numerous pre-formatted data about national interests:   http://dados.gov.br/

The subject that called our attention was about the Basic Health Units of Brazil (Unidade básica de saúde), which are small public hospitals basically. The dataset had some interesting columns that we thought could bring an important conclusion, for example, the hospitals coordinates and their evaluation about different aspects like the hospital structure and medical supplies.

Continue reading

A Fan control system with ATmega2560 Microcontroller

The following project was developed during a class taken at Federal University of Rio Grande do Norte with professor Marcelo. The group was composed by GiovannaJoãoRaí and me.

The group goal was to implement a drying grain system. The process of drying should start with the press of a button, after that, a fan speed is controlled based on a “dry curve” and the reading of a luminosity and temperature sensor. With 3 minutes the routine is finished, and the machine should wait for another button press to start the system again.


The hardware of the control system consists of:

  • An ATmega2560 Microcontroller
  • An LDR to sense the luminosity
  • An NTC sensor for the temperature
  • A Push-Button to start the system
  • LEDs for behavior visualization
  • A “4N25” Photocoupler to protect the Microcontroller from the load
  • An NPN transistor with a diode for circuit protection
  • Power supply to the Fan
  • Some resistors

The last picture represents the components, an Arduino Mega board is used to facilitate communication with the microprocessor, the DC motor illustrates the fan. Since the LDR and NTC sensors are a variable resistance for the desired domain, a voltage divider is applied to convert the input voltage to a measured voltage. An operational range should be specified in this type of configuration, in the example of the NTC temperature sensor, the fixed resistor was chosen based on its measured resistance at the ambient temperature which, for 25 °C is approximately 50Ω.


Continue reading

Easily Using SWI-Prolog within Matlab

This post intends to show how to interact with SWI-Prolog codes inside MatLab.

The following code is an arbitrary example of a special system, which covers different situations for “control” based on various inputs.

:- dynamic upOn/0, upOff/0.


getOn(sensor,Value) :- Value < 0.5.
getOff(sensor,Value) :- Value > 0.5. 

setOn(valve,Value) :- Value is 1.0.
setOff(valve,Value) :- Value is 0.0.

control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOff(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOff(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOn(valve,V_in),setOff(valve,V_out),retract(upOff),asserta(upOn).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOn,setOn(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOn,setOn(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOn(valve,V_in),setOff(valve,V_out),retract(upOff),asserta(upOn).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).

The above code can be called by another Prolog script as follows:

#!/usr/bin/env swipl

:- initialization main. 

main:-  current_prolog_flag(argv,Argv),
        nth0(0, Argv, A), % get first argument
		nth0(1, Argv, B), % get second argument
		nth0(2, Argv, C), % get third argument
		nth0(3, Argv, D), % get fourth argument
		format("~w ~w ~w ~w \n",[A,B,C,D]), % Print inputs
        consult('example1Prolog'), % Load main Prolog code
		atom_number(A,E), % Transform inputs into Prolog integers
		control(X,Y,E,F,G,H), % Query control function with inputs
		format("X= ~w Y= ~w \n",[X,Y]),	% Prints output variables
		halt. % Finishes execution

The script is executed with the bash shell command:

swipl -s plscript.pl 1 1 1 1

Where the last 4 numbers stand for the Prolog code inputs. The following output with the above command is:

1 1 1 1
X= 0.0 Y= 1.0

Most of the work is already done! (WOW). The last step is executing the same bash command within Matlab using the “system” function:

[status,term_out] = system('swipl -s plscript.pl 1 1 1 1')

“term_out” variable will hold the script output for the inputs given. The following picture shows the previous call output on the main console with the described method.


Note: It is important to check if a set of inputs will return valid outputs for the system (if they are covered). Also, make sure to test if the script is reaching the last line “halt.”, otherwise the SWI-Prolog will open without closing and the Matlab will wait until the application finishes (never) and crash.

The following Prolog command is helpful if the developed Prolog code depends on dynamic variables like the one shown in this tutorial:


The problem is that the saved data can not be loaded into a running environment, making this post approach not to work for dynamic variables (for now).

Files can be accessed at:


Robot navigation using a Multilayer Perceptron Neural Network

The Single-Layer Perceptron (SLP) was one of the first artificial neural networks to be developed. It consists of a system that can classify a series of inputs based on their weights, and distinguish two different type of classes linearly. The activation function, in the case of the picture below, is a step function, meaning the resulting output can assume only two values. An input constant, also known as bias, determines the system threshold to define the output.perceptron-picture

Image source: http://abhay.harpale.net/blog/machine-learning/a-hands-on-tutorial-on-the-perceptron-learning-algorithm/

The limitation of the SLP consists of only separating two classes with a single line, in the example below, if one blue dot were in the middle of the red dots, the training algorithm would not converge to an acceptable solution.


Image source: https://glowingpython.blogspot.com.br/2011/10/perceptron.html


Image source: http://ecee.colorado.edu/

The Multilayer Perceptron solves the problem of the SLP linearity, which can address a wider range of applications.  In the picture above, the MLP is able to learn all three logic gates including the “XOR”, the two dots classes can’t be separated by one line. Professor Marcelo implemented an MLP code with a single hidden layer, available on Matlab repository, which has an example of an MLP learning the behavior of a XOR gate.


Image source: http://www.mdpi.com/

Continue reading