count_unique_words

count_unique_words(
    text,
    ignore_words=None,
    count_punc=False,
    case_sensitive=True,
)

Count the instances of unique words in a text string.

By default, punctuation is removed (except apostrophes inside words). Words are split on any whitespace (spaces/tabs/newlines), not just single spaces.

Parameters

Name	Type	Description	Default
text	str	String of text to count instances of unique words.	required
ignore_words	Iterable[str] \| None	Words to exclude from counting. Accepts any iterable of strings (e.g., list, set, tuple). Matching respects `case_sensitive`.	`None`
count_punc	bool	If True, punctuation symbols are tokenized and counted as separate tokens. If False, punctuation is removed (apostrophes inside words are kept).	`False`
case_sensitive	bool	If False, words are normalized to lowercase before counting (and ignore_words is too).	`True`

Returns

Name	Type	Description
	dict[str, int]	Dictionary of tokens to counts.

Raises

Name	Type	Description
	TypeError	If input types are incorrect or ignore_words contains non-strings.

Examples

count_unique_words(‘I go where I go’) {‘I’: 2, ‘go’: 2, ‘where’: 1}

count_unique_words(‘The the the thing’, ignore_words=[‘the’], case_sensitive=False) {‘thing’: 1}