Tokenize words, and remove stopwords from corpus

clean_tokens(corpus, ignore = stopwords::stopwords("en"))

Arguments

corpus

character vector representing a corpus

ignore

stopwords to ignore, optional (default: common English words and punctuations)

Value

character vector of word tokens

Examples

coRPysprofiling::clean_tokens("How many species of animals are there in Russia?")
#> [[1]] #> [1] "many" "species" "animals" "russia" #>
coRPysprofiling::clean_tokens("How many species of animals are there in Russia?", ignore='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')
#> [[1]] #> [1] "how" "many" "species" "of" "animals" "are" "there" #> [8] "in" "russia" #>