Tokenize words, and remove stopwords from corpus

clean_tokens(corpus, ignore = stopwords::stopwords("en"))

Arguments

corpus	character vector representing a corpus
ignore	stopwords to ignore, optional (default: common English words and punctuations)

Value

character vector of word tokens

Examples

coRPysprofiling::clean_tokens("How many species of animals are there in Russia?")
#> [[1]]
#> [1] "many"    "species" "animals" "russia" 
#> 
coRPysprofiling::clean_tokens("How many species of animals are there in Russia?", ignore='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
#> [[1]]
#> [1] "how"     "many"    "species" "of"      "animals" "are"     "there"  
#> [8] "in"      "russia" 
#>