text_processor Tutorial

Using text_processor

For this tutorial, we will be working with a text file containing Edgar Allan Poe’s famous poem, Annabel Lee. This file can be found in our tests repository with the file path tests/poe.txt.

text_find()

First, we want to check if this text file contains our keyword of interest, “sea”. To do this, we use text_find():

from text_processor.text_find import text_find

text_find("tests/poe.txt", "sea")
56

The function returns a non-negative integer, 56. This tells us that the keyword “sea” exists in the text file and that it is the 56th character in the file.

text_lower()

We want to do some text processing so that we can identify all of the keywords in the file for model training later. To do that, we first need to be able to identify unique keywords regardless of case, so we use text_lower() to change all the text to lower case.

from text_processor.text_lower import text_lower

text_lower("tests/poe.txt", "tests/output.txt")

text_remove()

Next, we want to remove some stop words that we are not interested in from the text. We do not want to see “the” and “a” as keywords, so we remove them using text_remove().

from text_processor.text_remove import text_remove

text_remove("tests/output.txt", "tests/output.txt", "the")
text_remove("tests/output.txt", "tests/output.txt", "a")

text_replace()

One more thing - in this poem, Poe uses the word “sepulchre”, which is a British English spelling. However, the rest of our text files are written using American English. Therefore, we want the American English spelling of the word, which is “sepulcher”: we change this by using text_replace().

from text_processor.text_replace import text_replace

text_replace("tests/output.txt", "tests/output.txt", "sepulchre", "sepulcher")

Final output

We now have a text file output.txt with the following processed text:

with open("tests/output.txt", "r", encoding='utf-8') as f:
        result = f.read()

print(result)
it was many and many  year ago,
   in  kingdom by  sea,
that  maiden there lived whom you may know
   by  name of annabel lee;
and this maiden she lived with no other thought
   than to love and be loved by me.

i was  child and she was  child,
   in this kingdom by  sea,
but we loved with  love that was more than love—
   i and my annabel lee—
with  love that  wingèd seraphs of heaven
   coveted her and me.

and this was  reason that, long ago,
   in this kingdom by  sea,
 wind blew out of  cloud, chilling
   my beautiful annabel lee;
so that her highborn kinsmen came
   and bore her away from me,
to shut her up in  sepulcher
   in this kingdom by  sea.

 angels, not half so happy in heaven,
   went envying her and me-
yes!—that was  reason (as all men know,
   in this kingdom by  sea)
that  wind came out of  cloud by night,
   chilling and killing my annabel lee.

but our love it was stronger by far than  love
   of those who were older than we—
   of many far wiser than we—
and neither  angels in heaven above
   nor  demons down under  sea
can ever dissever my soul from  soul
   of  beautiful annabel lee;

for  moon never beams, without bringing me dreams
   of  beautiful annabel lee;
and  stars never rise, but i feel  bright eyes
   of  beautiful annabel lee;
and so, all  night-tide, i lie down by  side
   of my darling—my darling—my life and my bride,
   in her sepulcher there by  sea—
   in her tomb by  sounding sea.