count_unique_words

count_unique_words(
    text,
    ignore_words=None,
    count_punc=False,
    case_sensitive=True,
)

Count the instances of unique words in a text string.

By default, punctuation is removed (except apostrophes inside words). Words are split on any whitespace (spaces/tabs/newlines), not just single spaces.

Parameters

Name Type Description Default
text str String of text to count instances of unique words. required
ignore_words Iterable[str] | None Words to exclude from counting. Accepts any iterable of strings (e.g., list, set, tuple). Matching respects case_sensitive. None
count_punc bool If True, punctuation symbols are tokenized and counted as separate tokens. If False, punctuation is removed (apostrophes inside words are kept). False
case_sensitive bool If False, words are normalized to lowercase before counting (and ignore_words is too). True

Returns

Name Type Description
dict[str, int] Dictionary of tokens to counts.

Raises

Name Type Description
TypeError If input types are incorrect or ignore_words contains non-strings.

Examples

count_unique_words(‘I go where I go’) {‘I’: 2, ‘go’: 2, ‘where’: 1}

count_unique_words(‘The the the thing’, ignore_words=[‘the’], case_sensitive=False) {‘thing’: 1}