A way to remove special characters from a text file

Hi everyone! I have been busy for the last months with various duties related to my university classes and research. But now I have separated a little time to write another post! Soon I will bring more contents related to my university environment.

Last month, the campus party happened in my city. So my friends and I decided to participate in two hackathons that happened simultaneously during the event just for fun, which later showed to be a great challenge. The first hackathon involved data science directed to the Brazilian health, and the second one was intended to accelerate the national overall judgment process. In the picture below I am presenting my groups work related to the data science hackathon, the TV shows a correlation matrix made from various databases data.

word

Jumping now to this post purpose!

While trying to develop a solution for one of the hackathons, we had to process the text of pdf files.  To do so, we used the PDFMiner package. The next line illustrates its usage:

pdf2txt.py -o 'output file' 'pdf file'

The application seemed to extract the pdf text pretty well, with just one little peculiarity: There were special/control characters (“Ctrl+l”) inserted whenever there was a new page at the original pdf. The picture below illustrates the character with the VIM text editor:

word2

This character got in our way to process the file for our final purpose. At first, we tried to just remove it with ordinary python functions and with different inputs representing the character, without success at the beginning. So we found a pretty interesting solution!


text_file = open('text.txt','r')
text_file = text_file.read()
text_file = repr(text_file)
text_file = text_file.replace("\\x0c","")
text_file = literal_eval(text_file)

The trick consists of using the “repr()” function which returns the completely “raw” string. From there, the special character can be easily identified an removed. In the end, just return the string to the original form with the “literal_eval()” function! The relation of “\x0c” and “Ctrl+l” was found by analyzing the raw text file with the “repr()” function.

I believe this approach can be used to similarly solve other issues involving different characters and programming languages.

Easily Using SWI-Prolog within Matlab

This post intends to show how to interact with SWI-Prolog codes inside MatLab.

The following code is an arbitrary example of a special system, which covers different situations for “control” based on various inputs.


:- dynamic upOn/0, upOff/0.

upOff.

getOn(sensor,Value) :- Value < 0.5.
getOff(sensor,Value) :- Value > 0.5. 

setOn(valve,Value) :- Value is 1.0.
setOff(valve,Value) :- Value is 0.0.

control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOff(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOff(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOn(valve,V_in),setOff(valve,V_out),retract(upOff),asserta(upOn).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOn,setOn(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOn,setOn(valve,V_in),setOff(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOn(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOn(valve,V_in),setOff(valve,V_out),retract(upOff),asserta(upOn).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOn(sensor,S2),setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOn(sensor,S1),getOff(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOff,setOff(valve,V_in),setOn(valve,V_out).
control(V_in,V_out,S0,S1,S2,SW) :- getOff(sensor,SW),getOn(sensor,S0),getOff(sensor,S1),getOff(sensor,S2),upOn,setOff(valve,V_in),setOn(valve,V_out),retract(upOn),asserta(upOff).

The above code can be called by another Prolog script as follows:

#!/usr/bin/env swipl

:- initialization main. 

main:-  current_prolog_flag(argv,Argv),
        nth0(0, Argv, A), % get first argument
		nth0(1, Argv, B), % get second argument
		nth0(2, Argv, C), % get third argument
		nth0(3, Argv, D), % get fourth argument
		format("~w ~w ~w ~w \n",[A,B,C,D]), % Print inputs
        consult('example1Prolog'), % Load main Prolog code
		atom_number(A,E), % Transform inputs into Prolog integers
		atom_number(B,F),
		atom_number(C,G),
		atom_number(D,H),
		control(X,Y,E,F,G,H), % Query control function with inputs
		format("X= ~w Y= ~w \n",[X,Y]),	% Prints output variables
		halt. % Finishes execution

The script is executed with the bash shell command:


swipl -s plscript.pl 1 1 1 1

Where the last 4 numbers stand for the Prolog code inputs. The following output with the above command is:


1 1 1 1
X= 0.0 Y= 1.0

Most of the work is already done! (WOW). The last step is executing the same bash command within Matlab using the “system” function:


[status,term_out] = system('swipl -s plscript.pl 1 1 1 1')

“term_out” variable will hold the script output for the inputs given. The following picture shows the previous call output on the main console with the described method.

matlab_run_prolog

Note: It is important to check if a set of inputs will return valid outputs for the system (if they are covered). Also, make sure to test if the script is reaching the last line “halt.”, otherwise the SWI-Prolog will open without closing and the Matlab will wait until the application finishes (never) and crash.

The following Prolog command is helpful if the developed Prolog code depends on dynamic variables like the one shown in this tutorial:


qsave_program(runtime_data).

The problem is that the saved data can not be loaded into a running environment, making this post approach not to work for dynamic variables (for now).

Files can be accessed at:

https://github.com/YangTavares/Deep_learning/tree/master/Prolog_to_Matlab