clean_text
clean_text
A module cleans a string of text and parses it into a list of individual words
Functions
| Name | Description |
|---|---|
| clean_text | Cleans a string of text according to function arguments. |
clean_text
clean_text.clean_text(text, pref_case='lower', rm_all_punc=True, punctuation=[])Cleans a string of text according to function arguments.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| text | str | Any string of words, with or without punctuation. | required |
| pref_case | (str, {lower, upper, asis}) | The case to convert the string to. “asis” indicates that the type case should not be changed | "lower" |
| rm_all_punc | bool | Indicates whether ALL punctuation should be removed from the string | True |
| punctuation | list | Only used if rm_all_punc is false, punctuation should be a list of specific punctuation to remove, all other punctuation will remain in the clean text string. | [] |
Returns
| Name | Type | Description |
|---|---|---|
| string | A cleaned string without whitespace other than spaces, coverted to a specific case if relevant and with punctuation removed as specified |
Examples
>>> clean_text("Hello, it is so lovely to meet you today.")
"hello it is so lovely to meet you today">>> clean_text("Hello, it is so lovely to meet you today.", pref_case="upper", rm_all_punc=False, punctuation=[",", "!"])
"HELLO IT IS SO LOVELY TO MEET YOU TODAY."