clean_tokens.Rd
Tokenize words, and remove stopwords from corpus
clean_tokens(corpus, ignore = stopwords::stopwords("en"))
corpus | character vector representing a corpus |
---|---|
ignore | stopwords to ignore, optional (default: common English words and punctuations) |
character vector of word tokens
coRPysprofiling::clean_tokens("How many species of animals are there in Russia?")#> [[1]] #> [1] "many" "species" "animals" "russia" #>coRPysprofiling::clean_tokens("How many species of animals are there in Russia?", ignore='!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~')#> [[1]] #> [1] "how" "many" "species" "of" "animals" "are" "there" #> [8] "in" "russia" #>